# Interactive sessions `dlm repl ` gives you a conversational prompt against a trained `.dlm` without reloading the model between turns. It's the human-facing counterpart to `dlm prompt`: same backend plumbing, multi-turn context, readline-style editing, history that persists across sessions. ## When to use it - You're iterating on how the adapter responds to a series of related prompts. - You want to tune generation knobs (`temperature`, `top_p`) on the fly without restarting. - You're demoing the trained document to someone who expects a chat-style interface. Single-shot `dlm prompt "…"` is still the right call for scripts, one-liners, or piped input. ## Basic usage ```bash $ dlm repl mydoc.dlm dlm repl — /help for commands, /exit to quit (history: ~/.dlm/history) > Hello. Hi! How can I help? [1] > What does the document cover? … ``` The prompt `[N] > ` reports how many turn pairs have already happened. Full chat template is applied every turn, so the model sees the whole conversation in context. ## Slash commands | Command | Effect | |---|---| | `/help` | Print the command list. | | `/exit` or `/quit` | End the session (Ctrl-D does the same). | | `/clear` | Reset conversation history; model stays loaded. | | `/save ` | Write history as JSON for later review or replay. | | `/history` | Print the current conversation. | | `/adapter ` | Switch active adapter (multi-adapter docs only). | | `/params key=value` | Update a generation knob in place. | | `/params` | Print current generation knobs. | | `/model` | Print the active backend + adapter. | ### `/params` Accepts these keys: `temperature`, `top_p`, `top_k`, `max_new_tokens`, `repetition_penalty`. Multiple updates in one line are allowed: ``` [2] > /params temperature=0.3 top_p=0.9 temperature=0.3 top_p=0.9 top_k=None max_new_tokens=256 repetition_penalty=None ``` Bad values reject without partial updates — your previous knobs stay intact. ## Ctrl-C semantics - **During input**: cancels the line you're editing. The REPL redraws the prompt; session keeps running. (Use `/exit` or Ctrl-D to leave.) - **During generation**: stops the model mid-stream. Tokens emitted so far stay on screen, and the partial response is appended to history with a `[cancelled]` marker so the model sees it in future turns. ## History persistence Readline history (your past prompts) is stored at `~/.dlm/history`. Arrow-up / Ctrl-R work across sessions. This is *not* the conversation transcript — use `/save ` to export that. ## Non-interactive output `dlm repl mydoc.dlm > transcript.txt` detects the non-TTY stdout and disables streaming; each response is printed once, complete. Useful for capturing a scripted session (feed prompts on stdin). ## Backend selection `--backend auto` (default) picks MLX on Apple Silicon when the `mlx` extra is installed, PyTorch otherwise. Force either with `--backend pytorch|mlx`. MLX drops `top_p`/`top_k`/ `repetition_penalty` today — see the [CLI reference](../cli/reference.md) for the full matrix.