documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 1934 bytes Raw Blame History

First prompt

dlm prompt runs inference against the current adapter using the base model. It's the fastest way to check "did the training actually stick?" without involving Ollama or GGUF conversion.

The happy path

$ uv run dlm prompt tutor.dlm "What is a Python decorator?"
A decorator is a function that takes another function as input…

Behind the scenes:

dlm prompt parses the .dlm, resolves the base model, and checks the hardware doctor's capability report.
It loads the base model + adapter/current.txt-pointed LoRA weights via PEFT.
It calls generate() with your prompt, --max-tokens 256, --temp 0.7 by default.
The response is streamed to stdout; the Rich reporter writes progress / plan info to stderr so you can pipe stdout cleanly.

Deterministic generation

For reproducible output (useful for comparing adapters), pin temperature to 0:

$ uv run dlm prompt tutor.dlm --temp 0 --max-tokens 32 "Say hi"

Greedy decoding is deterministic when the weights are byte-identical — which is the whole point of the determinism contract.

Verbose plan

Pass --verbose to surface the inference plan before generation:

$ uv run dlm prompt tutor.dlm --verbose "Hello"
plan: {'device': 'mps', 'dtype': 'fp16', 'adapter_path': '...', 'quantization': 'none'}
adapter: ~/.dlm/store/01KC…/adapter/versions/v0001
Hello! How can I help you today?

The plan dict is the same object written into manifest.json on training, so you can cross-reference what the model was doing the last time it trained.

Piping and stdin

Prompt via stdin for long inputs:

$ cat long-prompt.txt | uv run dlm prompt tutor.dlm

An empty stdin (no query argument either) exits with a non-zero code and a clear error, rather than hanging.

Happy with inference? Export to Ollama for a real standalone model.

View source

  
        1
        # First prompt
      
        2
        
        3
        `dlm prompt` runs inference against the current adapter using the base
      
        4
        model. It's the fastest way to check "did the training actually stick?"
      
        5
        without involving Ollama or GGUF conversion.
      
        6
        
        7
        ## The happy path
      
        8
        
        9
        ```sh
      
        10
        $ uv run dlm prompt tutor.dlm "What is a Python decorator?"
      
        11
        A decorator is a function that takes another function as input…
      
        12
        ```
      
        13
        
        14
        Behind the scenes:
      
        15
        
        16
        1. `dlm prompt` parses the `.dlm`, resolves the base model, and
      
        17
           checks the hardware doctor's capability report.
      
        18
        2. It loads the base model + `adapter/current.txt`-pointed LoRA
      
        19
           weights via PEFT.
      
        20
        3. It calls `generate()` with your prompt, `--max-tokens 256`,
      
        21
           `--temp 0.7` by default.
      
        22
        4. The response is streamed to stdout; the Rich reporter writes
      
        23
           progress / plan info to stderr so you can pipe stdout cleanly.
      
        24
        
        25
        ## Deterministic generation
      
        26
        
        27
        For reproducible output (useful for comparing adapters), pin
      
        28
        temperature to 0:
      
        29
        
        30
        ```sh
      
        31
        $ uv run dlm prompt tutor.dlm --temp 0 --max-tokens 32 "Say hi"
      
        32
        ```
      
        33
        
        34
        Greedy decoding is deterministic when the weights are byte-identical —
      
        35
        which is the whole point of the [determinism contract](../determinism.md).
      
        36
        
        37
        ## Verbose plan
      
        38
        
        39
        Pass `--verbose` to surface the inference plan before generation:
      
        40
        
        41
        ```sh
      
        42
        $ uv run dlm prompt tutor.dlm --verbose "Hello"
      
        43
        plan: {'device': 'mps', 'dtype': 'fp16', 'adapter_path': '...', 'quantization': 'none'}
      
        44
        adapter: ~/.dlm/store/01KC…/adapter/versions/v0001
      
        45
        Hello! How can I help you today?
      
        46
        ```
      
        47
        
        48
        The `plan` dict is the same object written into `manifest.json` on
      
        49
        training, so you can cross-reference what the model was doing the
      
        50
        last time it trained.
      
        51
        
        52
        ## Piping and stdin
      
        53
        
        54
        Prompt via stdin for long inputs:
      
        55
        
        56
        ```sh
      
        57
        $ cat long-prompt.txt | uv run dlm prompt tutor.dlm
      
        58
        ```
      
        59
        
        60
        An empty stdin (no query argument either) exits with a non-zero code
      
        61
        and a clear error, rather than hanging.
      
        62
        
        63
        ## Next
      
        64
        
        65
        Happy with inference? [Export to Ollama](first-export.md) for a real
      
        66
        standalone model.

1	# First prompt
2
3	`dlm prompt` runs inference against the current adapter using the base
4	model. It's the fastest way to check "did the training actually stick?"
5	without involving Ollama or GGUF conversion.
6
7	## The happy path
8
9	```sh
10	$ uv run dlm prompt tutor.dlm "What is a Python decorator?"
11	A decorator is a function that takes another function as input…
12	```
13
14	Behind the scenes:
15
16	1. `dlm prompt` parses the `.dlm`, resolves the base model, and
17	checks the hardware doctor's capability report.
18	2. It loads the base model + `adapter/current.txt`-pointed LoRA
19	weights via PEFT.
20	3. It calls `generate()` with your prompt, `--max-tokens 256`,
21	`--temp 0.7` by default.
22	4. The response is streamed to stdout; the Rich reporter writes
23	progress / plan info to stderr so you can pipe stdout cleanly.
24
25	## Deterministic generation
26
27	For reproducible output (useful for comparing adapters), pin
28	temperature to 0:
29
30	```sh
31	$ uv run dlm prompt tutor.dlm --temp 0 --max-tokens 32 "Say hi"
32	```
33
34	Greedy decoding is deterministic when the weights are byte-identical —
35	which is the whole point of the [determinism contract](../determinism.md).
36
37	## Verbose plan
38
39	Pass `--verbose` to surface the inference plan before generation:
40
41	```sh
42	$ uv run dlm prompt tutor.dlm --verbose "Hello"
43	plan: {'device': 'mps', 'dtype': 'fp16', 'adapter_path': '...', 'quantization': 'none'}
44	adapter: ~/.dlm/store/01KC…/adapter/versions/v0001
45	Hello! How can I help you today?
46	```
47
48	The `plan` dict is the same object written into `manifest.json` on
49	training, so you can cross-reference what the model was doing the
50	last time it trained.
51
52	## Piping and stdin
53
54	Prompt via stdin for long inputs:
55
56	```sh
57	$ cat long-prompt.txt \| uv run dlm prompt tutor.dlm
58	```
59
60	An empty stdin (no query argument either) exits with a non-zero code
61	and a clear error, rather than hanging.
62
63	## Next
64
65	Happy with inference? [Export to Ollama](first-export.md) for a real
66	standalone model.