documentlanguagemodel Public
Interactive sessions
dlm repl <path> gives you a conversational prompt against a trained
.dlm without reloading the model between turns. It's the
human-facing counterpart to dlm prompt: same backend plumbing,
multi-turn context, readline-style editing, history that persists
across sessions.
When to use it
- You're iterating on how the adapter responds to a series of related prompts.
- You want to tune generation knobs (
temperature,top_p) on the fly without restarting. - You're demoing the trained document to someone who expects a chat-style interface.
Single-shot dlm prompt <path> "…" is still the right call for
scripts, one-liners, or piped input.
Basic usage
$ dlm repl mydoc.dlm
dlm repl — /help for commands, /exit to quit (history: ~/.dlm/history)
> Hello.
Hi! How can I help?
[1] > What does the document cover?
…
The prompt [N] > reports how many turn pairs have already
happened. Full chat template is applied every turn, so the model
sees the whole conversation in context.
Slash commands
| Command | Effect |
|---|---|
/help |
Print the command list. |
/exit or /quit |
End the session (Ctrl-D does the same). |
/clear |
Reset conversation history; model stays loaded. |
/save <path> |
Write history as JSON for later review or replay. |
/history |
Print the current conversation. |
/adapter <name> |
Switch active adapter (multi-adapter docs only). |
/params key=value |
Update a generation knob in place. |
/params |
Print current generation knobs. |
/model |
Print the active backend + adapter. |
/params
Accepts these keys: temperature, top_p, top_k, max_new_tokens,
repetition_penalty. Multiple updates in one line are allowed:
[2] > /params temperature=0.3 top_p=0.9
temperature=0.3 top_p=0.9 top_k=None max_new_tokens=256 repetition_penalty=None
Bad values reject without partial updates — your previous knobs stay intact.
Ctrl-C semantics
- During input: cancels the line you're editing. The REPL
redraws the prompt; session keeps running. (Use
/exitor Ctrl-D to leave.) - During generation: stops the model mid-stream. Tokens
emitted so far stay on screen, and the partial response is
appended to history with a
[cancelled]marker so the model sees it in future turns.
History persistence
Readline history (your past prompts) is stored at
~/.dlm/history. Arrow-up / Ctrl-R work across sessions. This is
not the conversation transcript — use /save <path> to export
that.
Non-interactive output
dlm repl mydoc.dlm > transcript.txt detects the non-TTY stdout
and disables streaming; each response is printed once, complete.
Useful for capturing a scripted session (feed prompts on stdin).
Backend selection
--backend auto (default) picks MLX on Apple Silicon when the
mlx extra is installed, PyTorch otherwise. Force either with
--backend pytorch|mlx. MLX drops top_p/top_k/
repetition_penalty today — see the CLI reference
for the full matrix.
View source
| 1 | # Interactive sessions |
| 2 | |
| 3 | `dlm repl <path>` gives you a conversational prompt against a trained |
| 4 | `.dlm` without reloading the model between turns. It's the |
| 5 | human-facing counterpart to `dlm prompt`: same backend plumbing, |
| 6 | multi-turn context, readline-style editing, history that persists |
| 7 | across sessions. |
| 8 | |
| 9 | ## When to use it |
| 10 | |
| 11 | - You're iterating on how the adapter responds to a series of |
| 12 | related prompts. |
| 13 | - You want to tune generation knobs (`temperature`, `top_p`) on the |
| 14 | fly without restarting. |
| 15 | - You're demoing the trained document to someone who expects a |
| 16 | chat-style interface. |
| 17 | |
| 18 | Single-shot `dlm prompt <path> "…"` is still the right call for |
| 19 | scripts, one-liners, or piped input. |
| 20 | |
| 21 | ## Basic usage |
| 22 | |
| 23 | ```bash |
| 24 | $ dlm repl mydoc.dlm |
| 25 | dlm repl — /help for commands, /exit to quit (history: ~/.dlm/history) |
| 26 | > Hello. |
| 27 | Hi! How can I help? |
| 28 | [1] > What does the document cover? |
| 29 | … |
| 30 | ``` |
| 31 | |
| 32 | The prompt `[N] > ` reports how many turn pairs have already |
| 33 | happened. Full chat template is applied every turn, so the model |
| 34 | sees the whole conversation in context. |
| 35 | |
| 36 | ## Slash commands |
| 37 | |
| 38 | | Command | Effect | |
| 39 | |---|---| |
| 40 | | `/help` | Print the command list. | |
| 41 | | `/exit` or `/quit` | End the session (Ctrl-D does the same). | |
| 42 | | `/clear` | Reset conversation history; model stays loaded. | |
| 43 | | `/save <path>` | Write history as JSON for later review or replay. | |
| 44 | | `/history` | Print the current conversation. | |
| 45 | | `/adapter <name>` | Switch active adapter (multi-adapter docs only). | |
| 46 | | `/params key=value` | Update a generation knob in place. | |
| 47 | | `/params` | Print current generation knobs. | |
| 48 | | `/model` | Print the active backend + adapter. | |
| 49 | |
| 50 | ### `/params` |
| 51 | |
| 52 | Accepts these keys: `temperature`, `top_p`, `top_k`, `max_new_tokens`, |
| 53 | `repetition_penalty`. Multiple updates in one line are allowed: |
| 54 | |
| 55 | ``` |
| 56 | [2] > /params temperature=0.3 top_p=0.9 |
| 57 | temperature=0.3 top_p=0.9 top_k=None max_new_tokens=256 repetition_penalty=None |
| 58 | ``` |
| 59 | |
| 60 | Bad values reject without partial updates — your previous knobs |
| 61 | stay intact. |
| 62 | |
| 63 | ## Ctrl-C semantics |
| 64 | |
| 65 | - **During input**: cancels the line you're editing. The REPL |
| 66 | redraws the prompt; session keeps running. (Use `/exit` or |
| 67 | Ctrl-D to leave.) |
| 68 | - **During generation**: stops the model mid-stream. Tokens |
| 69 | emitted so far stay on screen, and the partial response is |
| 70 | appended to history with a `[cancelled]` marker so the model |
| 71 | sees it in future turns. |
| 72 | |
| 73 | ## History persistence |
| 74 | |
| 75 | Readline history (your past prompts) is stored at |
| 76 | `~/.dlm/history`. Arrow-up / Ctrl-R work across sessions. This is |
| 77 | *not* the conversation transcript — use `/save <path>` to export |
| 78 | that. |
| 79 | |
| 80 | ## Non-interactive output |
| 81 | |
| 82 | `dlm repl mydoc.dlm > transcript.txt` detects the non-TTY stdout |
| 83 | and disables streaming; each response is printed once, complete. |
| 84 | Useful for capturing a scripted session (feed prompts on stdin). |
| 85 | |
| 86 | ## Backend selection |
| 87 | |
| 88 | `--backend auto` (default) picks MLX on Apple Silicon when the |
| 89 | `mlx` extra is installed, PyTorch otherwise. Force either with |
| 90 | `--backend pytorch|mlx`. MLX drops `top_p`/`top_k`/ |
| 91 | `repetition_penalty` today — see the [CLI reference](../cli/reference.md) |
| 92 | for the full matrix. |