@@ -0,0 +1,92 @@ |
| 1 | +# Interactive sessions |
| 2 | + |
| 3 | +`dlm repl <path>` gives you a conversational prompt against a trained |
| 4 | +`.dlm` without reloading the model between turns. It's the |
| 5 | +human-facing counterpart to `dlm prompt`: same backend plumbing, |
| 6 | +multi-turn context, readline-style editing, history that persists |
| 7 | +across sessions. |
| 8 | + |
| 9 | +## When to use it |
| 10 | + |
| 11 | +- You're iterating on how the adapter responds to a series of |
| 12 | + related prompts. |
| 13 | +- You want to tune generation knobs (`temperature`, `top_p`) on the |
| 14 | + fly without restarting. |
| 15 | +- You're demoing the trained document to someone who expects a |
| 16 | + chat-style interface. |
| 17 | + |
| 18 | +Single-shot `dlm prompt <path> "…"` is still the right call for |
| 19 | +scripts, one-liners, or piped input. |
| 20 | + |
| 21 | +## Basic usage |
| 22 | + |
| 23 | +```bash |
| 24 | +$ dlm repl mydoc.dlm |
| 25 | +dlm repl — /help for commands, /exit to quit (history: ~/.dlm/history) |
| 26 | +> Hello. |
| 27 | +Hi! How can I help? |
| 28 | +[1] > What does the document cover? |
| 29 | +… |
| 30 | +``` |
| 31 | + |
| 32 | +The prompt `[N] > ` reports how many turn pairs have already |
| 33 | +happened. Full chat template is applied every turn, so the model |
| 34 | +sees the whole conversation in context. |
| 35 | + |
| 36 | +## Slash commands |
| 37 | + |
| 38 | +| Command | Effect | |
| 39 | +|---|---| |
| 40 | +| `/help` | Print the command list. | |
| 41 | +| `/exit` or `/quit` | End the session (Ctrl-D does the same). | |
| 42 | +| `/clear` | Reset conversation history; model stays loaded. | |
| 43 | +| `/save <path>` | Write history as JSON for later review or replay. | |
| 44 | +| `/history` | Print the current conversation. | |
| 45 | +| `/adapter <name>` | Switch active adapter (multi-adapter docs only). | |
| 46 | +| `/params key=value` | Update a generation knob in place. | |
| 47 | +| `/params` | Print current generation knobs. | |
| 48 | +| `/model` | Print the active backend + adapter. | |
| 49 | + |
| 50 | +### `/params` |
| 51 | + |
| 52 | +Accepts these keys: `temperature`, `top_p`, `top_k`, `max_new_tokens`, |
| 53 | +`repetition_penalty`. Multiple updates in one line are allowed: |
| 54 | + |
| 55 | +``` |
| 56 | +[2] > /params temperature=0.3 top_p=0.9 |
| 57 | +temperature=0.3 top_p=0.9 top_k=None max_new_tokens=256 repetition_penalty=None |
| 58 | +``` |
| 59 | + |
| 60 | +Bad values reject without partial updates — your previous knobs |
| 61 | +stay intact. |
| 62 | + |
| 63 | +## Ctrl-C semantics |
| 64 | + |
| 65 | +- **During input**: cancels the line you're editing. The REPL |
| 66 | + redraws the prompt; session keeps running. (Use `/exit` or |
| 67 | + Ctrl-D to leave.) |
| 68 | +- **During generation**: stops the model mid-stream. Tokens |
| 69 | + emitted so far stay on screen, and the partial response is |
| 70 | + appended to history with a `[cancelled]` marker so the model |
| 71 | + sees it in future turns. |
| 72 | + |
| 73 | +## History persistence |
| 74 | + |
| 75 | +Readline history (your past prompts) is stored at |
| 76 | +`~/.dlm/history`. Arrow-up / Ctrl-R work across sessions. This is |
| 77 | +*not* the conversation transcript — use `/save <path>` to export |
| 78 | +that. |
| 79 | + |
| 80 | +## Non-interactive output |
| 81 | + |
| 82 | +`dlm repl mydoc.dlm > transcript.txt` detects the non-TTY stdout |
| 83 | +and disables streaming; each response is printed once, complete. |
| 84 | +Useful for capturing a scripted session (feed prompts on stdin). |
| 85 | + |
| 86 | +## Backend selection |
| 87 | + |
| 88 | +`--backend auto` (default) picks MLX on Apple Silicon when the |
| 89 | +`mlx` extra is installed, PyTorch otherwise. Force either with |
| 90 | +`--backend pytorch|mlx`. MLX drops `top_p`/`top_k`/ |
| 91 | +`repetition_penalty` today — see the [CLI reference](../cli/reference.md) |
| 92 | +for the full matrix. |