documentlanguagemodel Public
Synthesize training data
dlm synth instructions turns prose-heavy .dlm files into usable
::instruction:: sections.
This is the shortest path from "I have notes" to "I have supervised training pairs" when the document already contains domain prose but not enough authored Q/A.
What it does
The synth loop:
- Finds non-empty prose sections in the document.
- Prompts a teacher model to generate question/answer pairs about that prose.
- Deduplicates the generated pairs.
- Optionally filters them through the
swayjudge. - Either stages the accepted
auto_synthsections for inspection or writes them straight back into the.dlm.
The generated sections are still normal ::instruction:: sections.
They just carry provenance metadata so DLM can tell synthesized pairs
from hand-authored ones.
Choose a teacher
The teacher decides who writes the candidate Q/A pairs:
self: use the current local adapter for this documenthf:<model>: use a HuggingFace text modelopenai:<model>: use the OpenAI APIanthropic:<model>: use the Anthropic APIvllm-server:<url>: use an OpenAI-compatible local server
The current default is self, but that only makes sense once the
document already has a trained adapter. For a cold start, either:
- train once first, then synth with
self, or - use
hf:/openai:/anthropic:/vllm-server:as the teacher
Minimal example
Start with a prose-heavy document:
---
dlm_id: 01K...
dlm_version: 15
base_model: smollm2-135m
---
DGEMM multiplies two dense matrices and can optionally accumulate the
result into an existing output matrix.
Generate one extraction-style pair per prose section with an HF teacher:
uv run dlm synth instructions notes.dlm \
--teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
--per-section 1 \
--strategy extraction
That prints two summaries:
- the raw synth plan
- the filter report (
generated,dedup,judge passed,threshold)
By default, accepted sections are staged under the store so you can inspect them:
uv run dlm synth list notes.dlm
If you want the accepted pairs written straight back into the document,
use --apply:
uv run dlm synth instructions notes.dlm \
--teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
--per-section 1 \
--strategy extraction \
--apply
Strategy choices
The --strategy flag controls what kind of questions the teacher is
asked to produce:
extraction: questions answered directly by the proseexpansion: questions a curious reader might ask beyond the exact wording of the proseboth: split the per-section budget across both prompt styles
Start with extraction when you care about faithfulness. Reach for
expansion once the document already has a stable domain voice and you
want broader instructional coverage.
Filter choices
The --filter flag controls post-generation cleanup:
sway: dedup plus judge filtering against an empty baselinededup-only: keep only near-duplicate suppressionnone: accept everything that parses as a valid pair
sway is the safest default and is what most users should keep. It is
especially helpful when using creative teachers or --strategy both.
If you are debugging prompt quality, use --filter none once and look
at the raw plan before deciding whether the issue is generation or
filtering.
Useful knobs
uv run dlm synth instructions notes.dlm \
--teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
--per-section 3 \
--strategy both \
--filter sway \
--threshold 0.2 \
--max-pairs 8 \
--max-new-tokens 512 \
--temp 0.2 \
--top-p 0.95 \
--seed 7
The most useful flags in practice are:
--per-section: generate more than one candidate pair per prose block--max-pairs: cap document churn on large files--threshold: tighten or loosenswayacceptance--tempand--top-p: increase diversity when the teacher is too repetitive
Training after synth
Once the document has accepted auto_synth instruction sections, the
next normal train run consumes them like any other instruction pair:
uv run dlm train notes.dlm
No special train flag is needed. Synthesized instruction sections flow through the same SFT path as hand-authored sections.
Revert and inspection
List applied auto-synth sections:
uv run dlm synth list notes.dlm
Strip every synthesized instruction section from the document:
uv run dlm synth revert notes.dlm
This only removes auto_synth: true instruction sections. Hand-authored
instruction blocks stay untouched.
Common failure modes
The self teacher is weak
If --teacher self produces junk, the adapter probably is not ready
yet. Train once more first, or use a stronger external teacher for the
first synth pass.
Everything gets filtered out
That usually means one of three things:
- the teacher produced near-duplicates
- the generated answers were worse than the empty-baseline comparison in
sway - the threshold is too strict
Lower --threshold, or temporarily switch to --filter dedup-only to
see whether the judge is the main bottleneck.
The document churns too much
Use --max-pairs aggressively at first. A small accepted batch is much
easier to reason about than dumping dozens of synthetic sections into a
single file.
See also
View source
| 1 | # Synthesize training data |
| 2 | |
| 3 | `dlm synth instructions` turns prose-heavy `.dlm` files into usable |
| 4 | `::instruction::` sections. |
| 5 | |
| 6 | This is the shortest path from "I have notes" to "I have supervised |
| 7 | training pairs" when the document already contains domain prose but not |
| 8 | enough authored Q/A. |
| 9 | |
| 10 | ## What it does |
| 11 | |
| 12 | The synth loop: |
| 13 | |
| 14 | 1. Finds non-empty prose sections in the document. |
| 15 | 2. Prompts a teacher model to generate question/answer pairs about that |
| 16 | prose. |
| 17 | 3. Deduplicates the generated pairs. |
| 18 | 4. Optionally filters them through the `sway` judge. |
| 19 | 5. Either stages the accepted `auto_synth` sections for inspection or |
| 20 | writes them straight back into the `.dlm`. |
| 21 | |
| 22 | The generated sections are still normal `::instruction::` sections. |
| 23 | They just carry provenance metadata so DLM can tell synthesized pairs |
| 24 | from hand-authored ones. |
| 25 | |
| 26 | ## Choose a teacher |
| 27 | |
| 28 | The teacher decides who writes the candidate Q/A pairs: |
| 29 | |
| 30 | - `self`: use the current local adapter for this document |
| 31 | - `hf:<model>`: use a HuggingFace text model |
| 32 | - `openai:<model>`: use the OpenAI API |
| 33 | - `anthropic:<model>`: use the Anthropic API |
| 34 | - `vllm-server:<url>`: use an OpenAI-compatible local server |
| 35 | |
| 36 | The current default is `self`, but that only makes sense once the |
| 37 | document already has a trained adapter. For a cold start, either: |
| 38 | |
| 39 | - train once first, then synth with `self`, or |
| 40 | - use `hf:` / `openai:` / `anthropic:` / `vllm-server:` as the teacher |
| 41 | |
| 42 | ## Minimal example |
| 43 | |
| 44 | Start with a prose-heavy document: |
| 45 | |
| 46 | ```dlm |
| 47 | --- |
| 48 | dlm_id: 01K... |
| 49 | dlm_version: 15 |
| 50 | base_model: smollm2-135m |
| 51 | --- |
| 52 | |
| 53 | DGEMM multiplies two dense matrices and can optionally accumulate the |
| 54 | result into an existing output matrix. |
| 55 | ``` |
| 56 | |
| 57 | Generate one extraction-style pair per prose section with an HF teacher: |
| 58 | |
| 59 | ```sh |
| 60 | uv run dlm synth instructions notes.dlm \ |
| 61 | --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \ |
| 62 | --per-section 1 \ |
| 63 | --strategy extraction |
| 64 | ``` |
| 65 | |
| 66 | That prints two summaries: |
| 67 | |
| 68 | - the raw synth plan |
| 69 | - the filter report (`generated`, `dedup`, `judge passed`, `threshold`) |
| 70 | |
| 71 | By default, accepted sections are staged under the store so you can |
| 72 | inspect them: |
| 73 | |
| 74 | ```sh |
| 75 | uv run dlm synth list notes.dlm |
| 76 | ``` |
| 77 | |
| 78 | If you want the accepted pairs written straight back into the document, |
| 79 | use `--apply`: |
| 80 | |
| 81 | ```sh |
| 82 | uv run dlm synth instructions notes.dlm \ |
| 83 | --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \ |
| 84 | --per-section 1 \ |
| 85 | --strategy extraction \ |
| 86 | --apply |
| 87 | ``` |
| 88 | |
| 89 | ## Strategy choices |
| 90 | |
| 91 | The `--strategy` flag controls what kind of questions the teacher is |
| 92 | asked to produce: |
| 93 | |
| 94 | - `extraction`: questions answered directly by the prose |
| 95 | - `expansion`: questions a curious reader might ask beyond the exact |
| 96 | wording of the prose |
| 97 | - `both`: split the per-section budget across both prompt styles |
| 98 | |
| 99 | Start with `extraction` when you care about faithfulness. Reach for |
| 100 | `expansion` once the document already has a stable domain voice and you |
| 101 | want broader instructional coverage. |
| 102 | |
| 103 | ## Filter choices |
| 104 | |
| 105 | The `--filter` flag controls post-generation cleanup: |
| 106 | |
| 107 | - `sway`: dedup plus judge filtering against an empty baseline |
| 108 | - `dedup-only`: keep only near-duplicate suppression |
| 109 | - `none`: accept everything that parses as a valid pair |
| 110 | |
| 111 | `sway` is the safest default and is what most users should keep. It is |
| 112 | especially helpful when using creative teachers or `--strategy both`. |
| 113 | |
| 114 | If you are debugging prompt quality, use `--filter none` once and look |
| 115 | at the raw plan before deciding whether the issue is generation or |
| 116 | filtering. |
| 117 | |
| 118 | ## Useful knobs |
| 119 | |
| 120 | ```sh |
| 121 | uv run dlm synth instructions notes.dlm \ |
| 122 | --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \ |
| 123 | --per-section 3 \ |
| 124 | --strategy both \ |
| 125 | --filter sway \ |
| 126 | --threshold 0.2 \ |
| 127 | --max-pairs 8 \ |
| 128 | --max-new-tokens 512 \ |
| 129 | --temp 0.2 \ |
| 130 | --top-p 0.95 \ |
| 131 | --seed 7 |
| 132 | ``` |
| 133 | |
| 134 | The most useful flags in practice are: |
| 135 | |
| 136 | - `--per-section`: generate more than one candidate pair per prose block |
| 137 | - `--max-pairs`: cap document churn on large files |
| 138 | - `--threshold`: tighten or loosen `sway` acceptance |
| 139 | - `--temp` and `--top-p`: increase diversity when the teacher is too |
| 140 | repetitive |
| 141 | |
| 142 | ## Training after synth |
| 143 | |
| 144 | Once the document has accepted `auto_synth` instruction sections, the |
| 145 | next normal train run consumes them like any other instruction pair: |
| 146 | |
| 147 | ```sh |
| 148 | uv run dlm train notes.dlm |
| 149 | ``` |
| 150 | |
| 151 | No special train flag is needed. Synthesized instruction sections flow |
| 152 | through the same SFT path as hand-authored sections. |
| 153 | |
| 154 | ## Revert and inspection |
| 155 | |
| 156 | List applied auto-synth sections: |
| 157 | |
| 158 | ```sh |
| 159 | uv run dlm synth list notes.dlm |
| 160 | ``` |
| 161 | |
| 162 | Strip every synthesized instruction section from the document: |
| 163 | |
| 164 | ```sh |
| 165 | uv run dlm synth revert notes.dlm |
| 166 | ``` |
| 167 | |
| 168 | This only removes `auto_synth: true` instruction sections. Hand-authored |
| 169 | instruction blocks stay untouched. |
| 170 | |
| 171 | ## Common failure modes |
| 172 | |
| 173 | ### The self teacher is weak |
| 174 | |
| 175 | If `--teacher self` produces junk, the adapter probably is not ready |
| 176 | yet. Train once more first, or use a stronger external teacher for the |
| 177 | first synth pass. |
| 178 | |
| 179 | ### Everything gets filtered out |
| 180 | |
| 181 | That usually means one of three things: |
| 182 | |
| 183 | - the teacher produced near-duplicates |
| 184 | - the generated answers were worse than the empty-baseline comparison in |
| 185 | `sway` |
| 186 | - the threshold is too strict |
| 187 | |
| 188 | Lower `--threshold`, or temporarily switch to `--filter dedup-only` to |
| 189 | see whether the judge is the main bottleneck. |
| 190 | |
| 191 | ### The document churns too much |
| 192 | |
| 193 | Use `--max-pairs` aggressively at first. A small accepted batch is much |
| 194 | easier to reason about than dumping dozens of synthetic sections into a |
| 195 | single file. |
| 196 | |
| 197 | ## See also |
| 198 | |
| 199 | - [Instruction section reference](../format/instruction-section.md) |
| 200 | - [Bootstrap self-improving](bootstrap-self-improving.md) |
| 201 | - [Self-improving loop](self-improving-loop.md) |
| 202 | - [CLI reference](../cli/reference.md) |