documentlanguagemodel Public
First prompt
dlm prompt runs inference against the current adapter using the base
model. It's the fastest way to check "did the training actually stick?"
without involving Ollama or GGUF conversion.
The happy path
$ uv run dlm prompt tutor.dlm "What is a Python decorator?"
A decorator is a function that takes another function as input…
Behind the scenes:
dlm promptparses the.dlm, resolves the base model, and checks the hardware doctor's capability report.- It loads the base model +
adapter/current.txt-pointed LoRA weights via PEFT. - It calls
generate()with your prompt,--max-tokens 256,--temp 0.7by default. - The response is streamed to stdout; the Rich reporter writes progress / plan info to stderr so you can pipe stdout cleanly.
Deterministic generation
For reproducible output (useful for comparing adapters), pin temperature to 0:
$ uv run dlm prompt tutor.dlm --temp 0 --max-tokens 32 "Say hi"
Greedy decoding is deterministic when the weights are byte-identical — which is the whole point of the determinism contract.
Verbose plan
Pass --verbose to surface the inference plan before generation:
$ uv run dlm prompt tutor.dlm --verbose "Hello"
plan: {'device': 'mps', 'dtype': 'fp16', 'adapter_path': '...', 'quantization': 'none'}
adapter: ~/.dlm/store/01KC…/adapter/versions/v0001
Hello! How can I help you today?
The plan dict is the same object written into manifest.json on
training, so you can cross-reference what the model was doing the
last time it trained.
Piping and stdin
Prompt via stdin for long inputs:
$ cat long-prompt.txt | uv run dlm prompt tutor.dlm
An empty stdin (no query argument either) exits with a non-zero code and a clear error, rather than hanging.
Next
Happy with inference? Export to Ollama for a real standalone model.
View source
| 1 | # First prompt |
| 2 | |
| 3 | `dlm prompt` runs inference against the current adapter using the base |
| 4 | model. It's the fastest way to check "did the training actually stick?" |
| 5 | without involving Ollama or GGUF conversion. |
| 6 | |
| 7 | ## The happy path |
| 8 | |
| 9 | ```sh |
| 10 | $ uv run dlm prompt tutor.dlm "What is a Python decorator?" |
| 11 | A decorator is a function that takes another function as input… |
| 12 | ``` |
| 13 | |
| 14 | Behind the scenes: |
| 15 | |
| 16 | 1. `dlm prompt` parses the `.dlm`, resolves the base model, and |
| 17 | checks the hardware doctor's capability report. |
| 18 | 2. It loads the base model + `adapter/current.txt`-pointed LoRA |
| 19 | weights via PEFT. |
| 20 | 3. It calls `generate()` with your prompt, `--max-tokens 256`, |
| 21 | `--temp 0.7` by default. |
| 22 | 4. The response is streamed to stdout; the Rich reporter writes |
| 23 | progress / plan info to stderr so you can pipe stdout cleanly. |
| 24 | |
| 25 | ## Deterministic generation |
| 26 | |
| 27 | For reproducible output (useful for comparing adapters), pin |
| 28 | temperature to 0: |
| 29 | |
| 30 | ```sh |
| 31 | $ uv run dlm prompt tutor.dlm --temp 0 --max-tokens 32 "Say hi" |
| 32 | ``` |
| 33 | |
| 34 | Greedy decoding is deterministic when the weights are byte-identical — |
| 35 | which is the whole point of the [determinism contract](../determinism.md). |
| 36 | |
| 37 | ## Verbose plan |
| 38 | |
| 39 | Pass `--verbose` to surface the inference plan before generation: |
| 40 | |
| 41 | ```sh |
| 42 | $ uv run dlm prompt tutor.dlm --verbose "Hello" |
| 43 | plan: {'device': 'mps', 'dtype': 'fp16', 'adapter_path': '...', 'quantization': 'none'} |
| 44 | adapter: ~/.dlm/store/01KC…/adapter/versions/v0001 |
| 45 | Hello! How can I help you today? |
| 46 | ``` |
| 47 | |
| 48 | The `plan` dict is the same object written into `manifest.json` on |
| 49 | training, so you can cross-reference what the model was doing the |
| 50 | last time it trained. |
| 51 | |
| 52 | ## Piping and stdin |
| 53 | |
| 54 | Prompt via stdin for long inputs: |
| 55 | |
| 56 | ```sh |
| 57 | $ cat long-prompt.txt | uv run dlm prompt tutor.dlm |
| 58 | ``` |
| 59 | |
| 60 | An empty stdin (no query argument either) exits with a non-zero code |
| 61 | and a clear error, rather than hanging. |
| 62 | |
| 63 | ## Next |
| 64 | |
| 65 | Happy with inference? [Export to Ollama](first-export.md) for a real |
| 66 | standalone model. |