Document synth workflows
Authored by
mfwolffe <wolffemf@dukes.jmu.edu>
- SHA
1556d3896fba7763c0533997f5209bf4da28e6c6- Parents
-
b41f337 - Tree
10c8c71
1556d38
1556d3896fba7763c0533997f5209bf4da28e6c6b41f337
10c8c71| Status | File | + | - |
|---|---|---|---|
| A |
docs/cookbook/bootstrap-self-improving.md
|
193 | 0 |
| A |
docs/cookbook/synthesize-training-data.md
|
202 | 0 |
| A |
docs/format/instruction-section.md
|
119 | 0 |
| M |
docs/format/sections.md
|
7 | 0 |
| M |
docs/index.md
|
3 | 0 |
| M |
mkdocs.yml
|
3 | 0 |
docs/cookbook/bootstrap-self-improving.mdadded@@ -0,0 +1,193 @@ | ||
| 1 | +# Bootstrap self-improving | |
| 2 | + | |
| 3 | +The self-teacher loop is the most interesting version of Sprint 43: | |
| 4 | +your current adapter writes new `::instruction::` sections for its own | |
| 5 | +document, then the next train run folds them back in. | |
| 6 | + | |
| 7 | +This is not magic. It works because DLM already has: | |
| 8 | + | |
| 9 | +- replay-backed retraining | |
| 10 | +- synthesized instruction provenance (`auto_synth`) | |
| 11 | +- a local `sway` judge for filtering weak candidates | |
| 12 | + | |
| 13 | +Used carefully, it turns one trained document into a steadily better | |
| 14 | +instruction corpus. | |
| 15 | + | |
| 16 | +## The honest starting point | |
| 17 | + | |
| 18 | +`--teacher self` uses the current adapter for that `.dlm`. That means | |
| 19 | +the loop starts **after** there is already a trainable local adapter. | |
| 20 | + | |
| 21 | +A good bootstrap pattern is: | |
| 22 | + | |
| 23 | +1. Start with prose plus at least some useful seed supervision, or do an | |
| 24 | + initial train from prose and existing sections. | |
| 25 | +2. Run `dlm synth instructions --teacher self`. | |
| 26 | +3. Retrain on the accepted synth sections. | |
| 27 | +4. Repeat in small batches. | |
| 28 | + | |
| 29 | +If the adapter still cannot answer basic questions about the document, | |
| 30 | +synthetic instruction generation will mostly amplify noise. | |
| 31 | + | |
| 32 | +## Minimal loop | |
| 33 | + | |
| 34 | +Train once: | |
| 35 | + | |
| 36 | +```sh | |
| 37 | +uv run dlm train notes.dlm | |
| 38 | +``` | |
| 39 | + | |
| 40 | +Generate a small accepted batch from the current adapter and write it | |
| 41 | +back immediately: | |
| 42 | + | |
| 43 | +```sh | |
| 44 | +uv run dlm synth instructions notes.dlm \ | |
| 45 | + --teacher self \ | |
| 46 | + --per-section 1 \ | |
| 47 | + --strategy extraction \ | |
| 48 | + --max-pairs 4 \ | |
| 49 | + --apply | |
| 50 | +``` | |
| 51 | + | |
| 52 | +Retrain on the expanded instruction set: | |
| 53 | + | |
| 54 | +```sh | |
| 55 | +uv run dlm train notes.dlm | |
| 56 | +``` | |
| 57 | + | |
| 58 | +Then inspect real output quality: | |
| 59 | + | |
| 60 | +```sh | |
| 61 | +uv run dlm prompt notes.dlm "What does DGEMM do?" | |
| 62 | +``` | |
| 63 | + | |
| 64 | +That is the basic self-improving loop. | |
| 65 | + | |
| 66 | +## Safer staged version | |
| 67 | + | |
| 68 | +If you want to inspect before writing: | |
| 69 | + | |
| 70 | +```sh | |
| 71 | +uv run dlm synth instructions notes.dlm \ | |
| 72 | + --teacher self \ | |
| 73 | + --per-section 1 \ | |
| 74 | + --strategy extraction | |
| 75 | + | |
| 76 | +uv run dlm synth list notes.dlm | |
| 77 | +``` | |
| 78 | + | |
| 79 | +The current implementation stages accepted synth sections for | |
| 80 | +inspection, but it does not yet have a separate `dlm synth apply` | |
| 81 | +subcommand. Use `--apply` on the synth run when you want the sections | |
| 82 | +written straight into the document. | |
| 83 | + | |
| 84 | +## Why `sway` stays the default | |
| 85 | + | |
| 86 | +The self-teacher path is the place where the default `--filter sway` | |
| 87 | +matters most. | |
| 88 | + | |
| 89 | +Without filtering, a weak adapter can happily generate: | |
| 90 | + | |
| 91 | +- duplicates | |
| 92 | +- overly generic answers | |
| 93 | +- plausible but wrong extrapolations | |
| 94 | + | |
| 95 | +The current synth filter stack is: | |
| 96 | + | |
| 97 | +1. dedup | |
| 98 | +2. optional judge pass | |
| 99 | +3. optional threshold cut | |
| 100 | + | |
| 101 | +The CLI prints those counts so you can tell whether the loop is getting | |
| 102 | +better or just louder. | |
| 103 | + | |
| 104 | +## A conservative rhythm | |
| 105 | + | |
| 106 | +This is a healthy local rhythm for a real project: | |
| 107 | + | |
| 108 | +```sh | |
| 109 | +uv run dlm train notes.dlm | |
| 110 | +uv run dlm synth instructions notes.dlm \ | |
| 111 | + --teacher self \ | |
| 112 | + --per-section 1 \ | |
| 113 | + --max-pairs 4 \ | |
| 114 | + --apply | |
| 115 | +uv run dlm train notes.dlm | |
| 116 | +uv run dlm prompt notes.dlm "Explain the core idea." | |
| 117 | +``` | |
| 118 | + | |
| 119 | +Keep the accepted batch small at first. The point is to improve the | |
| 120 | +document's instruction surface, not flood it with speculative rows. | |
| 121 | + | |
| 122 | +## When to switch away from `self` | |
| 123 | + | |
| 124 | +The self-teacher is convenient, but not always the right teacher. | |
| 125 | + | |
| 126 | +Prefer an external teacher when: | |
| 127 | + | |
| 128 | +- the local adapter is still very early and weak | |
| 129 | +- you need broader general knowledge than the current adapter can supply | |
| 130 | +- you want to compare local-vs-external synth quality on the same prose | |
| 131 | + | |
| 132 | +That usually looks like: | |
| 133 | + | |
| 134 | +```sh | |
| 135 | +uv run dlm synth instructions notes.dlm \ | |
| 136 | + --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \ | |
| 137 | + --per-section 1 \ | |
| 138 | + --apply | |
| 139 | +``` | |
| 140 | + | |
| 141 | +and then later moving back to `--teacher self` once the adapter has real | |
| 142 | +domain traction. | |
| 143 | + | |
| 144 | +## Pairing Sprint 43 with Sprint 42 | |
| 145 | + | |
| 146 | +Instruction synthesis and preference mining are complementary: | |
| 147 | + | |
| 148 | +- `dlm synth instructions` grows the SFT side of the document | |
| 149 | +- `dlm synth preferences` / `dlm preference mine` sharpens ranking and | |
| 150 | + behavior once the adapter can already produce multiple plausible | |
| 151 | + answers | |
| 152 | + | |
| 153 | +A practical sequence is: | |
| 154 | + | |
| 155 | +1. train | |
| 156 | +2. synth instructions | |
| 157 | +3. train | |
| 158 | +4. mine preferences | |
| 159 | +5. train preference phase | |
| 160 | + | |
| 161 | +That is the closest current DLM path to a fully local self-improving | |
| 162 | +document loop. | |
| 163 | + | |
| 164 | +## Failure modes to watch | |
| 165 | + | |
| 166 | +### The second pass is not better | |
| 167 | + | |
| 168 | +That usually means one of: | |
| 169 | + | |
| 170 | +- the first synth batch was too weak | |
| 171 | +- the document still lacks enough domain prose | |
| 172 | +- the adapter is too small for the domain | |
| 173 | + | |
| 174 | +Do not assume "more synthetic rows" automatically means "better model." | |
| 175 | + | |
| 176 | +### Expansion mode gets weird | |
| 177 | + | |
| 178 | +`--strategy expansion` is useful, but it is also the fastest route to | |
| 179 | +polished nonsense. Prefer `extraction` for early loops and only widen to | |
| 180 | +`both` or `expansion` once the adapter is already grounded. | |
| 181 | + | |
| 182 | +### Prompt quality improves but factuality does not | |
| 183 | + | |
| 184 | +That is a signal to go back to better prose or hand-authored | |
| 185 | +instructional supervision. Self-improvement cannot invent missing source | |
| 186 | +knowledge. | |
| 187 | + | |
| 188 | +## See also | |
| 189 | + | |
| 190 | +- [Synthesize training data](synthesize-training-data.md) | |
| 191 | +- [Instruction section reference](../format/instruction-section.md) | |
| 192 | +- [Self-improving loop](self-improving-loop.md) | |
| 193 | +- [Reward-model integration](reward-model-integration.md) | |
docs/cookbook/synthesize-training-data.mdadded@@ -0,0 +1,202 @@ | ||
| 1 | +# Synthesize training data | |
| 2 | + | |
| 3 | +`dlm synth instructions` turns prose-heavy `.dlm` files into usable | |
| 4 | +`::instruction::` sections. | |
| 5 | + | |
| 6 | +This is the shortest path from "I have notes" to "I have supervised | |
| 7 | +training pairs" when the document already contains domain prose but not | |
| 8 | +enough authored Q/A. | |
| 9 | + | |
| 10 | +## What it does | |
| 11 | + | |
| 12 | +The synth loop: | |
| 13 | + | |
| 14 | +1. Finds non-empty prose sections in the document. | |
| 15 | +2. Prompts a teacher model to generate question/answer pairs about that | |
| 16 | + prose. | |
| 17 | +3. Deduplicates the generated pairs. | |
| 18 | +4. Optionally filters them through the `sway` judge. | |
| 19 | +5. Either stages the accepted `auto_synth` sections for inspection or | |
| 20 | + writes them straight back into the `.dlm`. | |
| 21 | + | |
| 22 | +The generated sections are still normal `::instruction::` sections. | |
| 23 | +They just carry provenance metadata so DLM can tell synthesized pairs | |
| 24 | +from hand-authored ones. | |
| 25 | + | |
| 26 | +## Choose a teacher | |
| 27 | + | |
| 28 | +The teacher decides who writes the candidate Q/A pairs: | |
| 29 | + | |
| 30 | +- `self`: use the current local adapter for this document | |
| 31 | +- `hf:<model>`: use a HuggingFace text model | |
| 32 | +- `openai:<model>`: use the OpenAI API | |
| 33 | +- `anthropic:<model>`: use the Anthropic API | |
| 34 | +- `vllm-server:<url>`: use an OpenAI-compatible local server | |
| 35 | + | |
| 36 | +The current default is `self`, but that only makes sense once the | |
| 37 | +document already has a trained adapter. For a cold start, either: | |
| 38 | + | |
| 39 | +- train once first, then synth with `self`, or | |
| 40 | +- use `hf:` / `openai:` / `anthropic:` / `vllm-server:` as the teacher | |
| 41 | + | |
| 42 | +## Minimal example | |
| 43 | + | |
| 44 | +Start with a prose-heavy document: | |
| 45 | + | |
| 46 | +```dlm | |
| 47 | +--- | |
| 48 | +dlm_id: 01K... | |
| 49 | +dlm_version: 15 | |
| 50 | +base_model: smollm2-135m | |
| 51 | +--- | |
| 52 | + | |
| 53 | +DGEMM multiplies two dense matrices and can optionally accumulate the | |
| 54 | +result into an existing output matrix. | |
| 55 | +``` | |
| 56 | + | |
| 57 | +Generate one extraction-style pair per prose section with an HF teacher: | |
| 58 | + | |
| 59 | +```sh | |
| 60 | +uv run dlm synth instructions notes.dlm \ | |
| 61 | + --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \ | |
| 62 | + --per-section 1 \ | |
| 63 | + --strategy extraction | |
| 64 | +``` | |
| 65 | + | |
| 66 | +That prints two summaries: | |
| 67 | + | |
| 68 | +- the raw synth plan | |
| 69 | +- the filter report (`generated`, `dedup`, `judge passed`, `threshold`) | |
| 70 | + | |
| 71 | +By default, accepted sections are staged under the store so you can | |
| 72 | +inspect them: | |
| 73 | + | |
| 74 | +```sh | |
| 75 | +uv run dlm synth list notes.dlm | |
| 76 | +``` | |
| 77 | + | |
| 78 | +If you want the accepted pairs written straight back into the document, | |
| 79 | +use `--apply`: | |
| 80 | + | |
| 81 | +```sh | |
| 82 | +uv run dlm synth instructions notes.dlm \ | |
| 83 | + --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \ | |
| 84 | + --per-section 1 \ | |
| 85 | + --strategy extraction \ | |
| 86 | + --apply | |
| 87 | +``` | |
| 88 | + | |
| 89 | +## Strategy choices | |
| 90 | + | |
| 91 | +The `--strategy` flag controls what kind of questions the teacher is | |
| 92 | +asked to produce: | |
| 93 | + | |
| 94 | +- `extraction`: questions answered directly by the prose | |
| 95 | +- `expansion`: questions a curious reader might ask beyond the exact | |
| 96 | + wording of the prose | |
| 97 | +- `both`: split the per-section budget across both prompt styles | |
| 98 | + | |
| 99 | +Start with `extraction` when you care about faithfulness. Reach for | |
| 100 | +`expansion` once the document already has a stable domain voice and you | |
| 101 | +want broader instructional coverage. | |
| 102 | + | |
| 103 | +## Filter choices | |
| 104 | + | |
| 105 | +The `--filter` flag controls post-generation cleanup: | |
| 106 | + | |
| 107 | +- `sway`: dedup plus judge filtering against an empty baseline | |
| 108 | +- `dedup-only`: keep only near-duplicate suppression | |
| 109 | +- `none`: accept everything that parses as a valid pair | |
| 110 | + | |
| 111 | +`sway` is the safest default and is what most users should keep. It is | |
| 112 | +especially helpful when using creative teachers or `--strategy both`. | |
| 113 | + | |
| 114 | +If you are debugging prompt quality, use `--filter none` once and look | |
| 115 | +at the raw plan before deciding whether the issue is generation or | |
| 116 | +filtering. | |
| 117 | + | |
| 118 | +## Useful knobs | |
| 119 | + | |
| 120 | +```sh | |
| 121 | +uv run dlm synth instructions notes.dlm \ | |
| 122 | + --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \ | |
| 123 | + --per-section 3 \ | |
| 124 | + --strategy both \ | |
| 125 | + --filter sway \ | |
| 126 | + --threshold 0.2 \ | |
| 127 | + --max-pairs 8 \ | |
| 128 | + --max-new-tokens 512 \ | |
| 129 | + --temp 0.2 \ | |
| 130 | + --top-p 0.95 \ | |
| 131 | + --seed 7 | |
| 132 | +``` | |
| 133 | + | |
| 134 | +The most useful flags in practice are: | |
| 135 | + | |
| 136 | +- `--per-section`: generate more than one candidate pair per prose block | |
| 137 | +- `--max-pairs`: cap document churn on large files | |
| 138 | +- `--threshold`: tighten or loosen `sway` acceptance | |
| 139 | +- `--temp` and `--top-p`: increase diversity when the teacher is too | |
| 140 | + repetitive | |
| 141 | + | |
| 142 | +## Training after synth | |
| 143 | + | |
| 144 | +Once the document has accepted `auto_synth` instruction sections, the | |
| 145 | +next normal train run consumes them like any other instruction pair: | |
| 146 | + | |
| 147 | +```sh | |
| 148 | +uv run dlm train notes.dlm | |
| 149 | +``` | |
| 150 | + | |
| 151 | +No special train flag is needed. Synthesized instruction sections flow | |
| 152 | +through the same SFT path as hand-authored sections. | |
| 153 | + | |
| 154 | +## Revert and inspection | |
| 155 | + | |
| 156 | +List applied auto-synth sections: | |
| 157 | + | |
| 158 | +```sh | |
| 159 | +uv run dlm synth list notes.dlm | |
| 160 | +``` | |
| 161 | + | |
| 162 | +Strip every synthesized instruction section from the document: | |
| 163 | + | |
| 164 | +```sh | |
| 165 | +uv run dlm synth revert notes.dlm | |
| 166 | +``` | |
| 167 | + | |
| 168 | +This only removes `auto_synth: true` instruction sections. Hand-authored | |
| 169 | +instruction blocks stay untouched. | |
| 170 | + | |
| 171 | +## Common failure modes | |
| 172 | + | |
| 173 | +### The self teacher is weak | |
| 174 | + | |
| 175 | +If `--teacher self` produces junk, the adapter probably is not ready | |
| 176 | +yet. Train once more first, or use a stronger external teacher for the | |
| 177 | +first synth pass. | |
| 178 | + | |
| 179 | +### Everything gets filtered out | |
| 180 | + | |
| 181 | +That usually means one of three things: | |
| 182 | + | |
| 183 | +- the teacher produced near-duplicates | |
| 184 | +- the generated answers were worse than the empty-baseline comparison in | |
| 185 | + `sway` | |
| 186 | +- the threshold is too strict | |
| 187 | + | |
| 188 | +Lower `--threshold`, or temporarily switch to `--filter dedup-only` to | |
| 189 | +see whether the judge is the main bottleneck. | |
| 190 | + | |
| 191 | +### The document churns too much | |
| 192 | + | |
| 193 | +Use `--max-pairs` aggressively at first. A small accepted batch is much | |
| 194 | +easier to reason about than dumping dozens of synthetic sections into a | |
| 195 | +single file. | |
| 196 | + | |
| 197 | +## See also | |
| 198 | + | |
| 199 | +- [Instruction section reference](../format/instruction-section.md) | |
| 200 | +- [Bootstrap self-improving](bootstrap-self-improving.md) | |
| 201 | +- [Self-improving loop](self-improving-loop.md) | |
| 202 | +- [CLI reference](../cli/reference.md) | |
docs/format/instruction-section.mdadded@@ -0,0 +1,119 @@ | ||
| 1 | +# Instruction section reference | |
| 2 | + | |
| 3 | +`::instruction::` sections are the supervised fine-tuning format DLM | |
| 4 | +uses for prompt/answer training data. | |
| 5 | + | |
| 6 | +They are valid in hand-authored `.dlm` files and in synthetic output | |
| 7 | +written by `dlm synth instructions --apply`. | |
| 8 | + | |
| 9 | +## Basic shape | |
| 10 | + | |
| 11 | +Each instruction section contains one or more `Q` / `A` pairs: | |
| 12 | + | |
| 13 | +```dlm | |
| 14 | +::instruction:: | |
| 15 | +### Q | |
| 16 | +What is a decorator? | |
| 17 | + | |
| 18 | +### A | |
| 19 | +A function that takes a function and returns a wrapped function. | |
| 20 | + | |
| 21 | +### Q | |
| 22 | +When should I use `functools.wraps`? | |
| 23 | + | |
| 24 | +### A | |
| 25 | +Whenever a decorator returns another callable and you want to preserve | |
| 26 | +the wrapped function's metadata. | |
| 27 | +``` | |
| 28 | + | |
| 29 | +DLM splits those into individual supervised rows at parse time. | |
| 30 | + | |
| 31 | +## Semantics | |
| 32 | + | |
| 33 | +- `Q` is the prompt shown to the model. | |
| 34 | +- `A` is the target response. | |
| 35 | + | |
| 36 | +At train time, DLM uses the question as context and the answer as the | |
| 37 | +supervised target. This is the section type that most directly shapes | |
| 38 | +assistant behavior. | |
| 39 | + | |
| 40 | +## Auto-synth instruction sections | |
| 41 | + | |
| 42 | +When `dlm synth instructions` writes sections back into a document, it | |
| 43 | +adds an HTML marker immediately after the section fence: | |
| 44 | + | |
| 45 | +```dlm | |
| 46 | +::instruction:: | |
| 47 | +<!-- dlm-auto-synth: synth_teacher="self" synth_strategy="extraction" synth_at="2026-04-24T10:18:42Z" source_section_id="b6b7d8a2f4b3f9c0" --> | |
| 48 | +### Q | |
| 49 | +What does DGEMM do? | |
| 50 | + | |
| 51 | +### A | |
| 52 | +It multiplies dense matrices and can optionally accumulate the result. | |
| 53 | +``` | |
| 54 | + | |
| 55 | +That marker corresponds to these parsed fields on the section: | |
| 56 | + | |
| 57 | +- `auto_synth: true` | |
| 58 | +- `synth_teacher` | |
| 59 | +- `synth_strategy` | |
| 60 | +- `synth_at` | |
| 61 | +- `source_section_id` | |
| 62 | + | |
| 63 | +Hand-authored instruction sections omit the marker and keep | |
| 64 | +`auto_synth=false`. | |
| 65 | + | |
| 66 | +## Validation rules | |
| 67 | + | |
| 68 | +- The auto-synth marker is only valid on `::instruction::` sections. | |
| 69 | +- Auto-synth sections must provide all metadata fields together. | |
| 70 | +- `synth_teacher` and `synth_strategy` must be non-empty strings. | |
| 71 | +- `source_section_id` must be a valid referenced section ID. | |
| 72 | +- Section identity ignores the synth metadata, so the same logical | |
| 73 | + question/answer pair keeps the same content identity whether it was | |
| 74 | + written by hand or synthesized automatically. | |
| 75 | + | |
| 76 | +## Interaction with training | |
| 77 | + | |
| 78 | +- `dlm train` includes synthesized instruction sections by default. | |
| 79 | +- There is currently no separate "ignore auto-synth instructions" train | |
| 80 | + flag; they flow through the normal SFT path once they are present in | |
| 81 | + the document. | |
| 82 | +- `dlm synth revert` strips every `auto_synth: true` instruction section | |
| 83 | + from the file without touching hand-authored rows. | |
| 84 | + | |
| 85 | +## Interaction with `dlm synth` | |
| 86 | + | |
| 87 | +Relevant commands: | |
| 88 | + | |
| 89 | +- `dlm synth instructions <path>` | |
| 90 | +- `dlm synth list <path>` | |
| 91 | +- `dlm synth revert <path>` | |
| 92 | + | |
| 93 | +The current `instructions` command can: | |
| 94 | + | |
| 95 | +- stage accepted synth sections for inspection | |
| 96 | +- write accepted synth sections directly with `--apply` | |
| 97 | +- preview only with `--dry-run` | |
| 98 | + | |
| 99 | +## Choosing a good instruction section | |
| 100 | + | |
| 101 | +Hand-authored or synthesized, good instruction sections tend to have: | |
| 102 | + | |
| 103 | +- a clear prompt with one task | |
| 104 | +- an answer that matches the tone you want the adapter to learn | |
| 105 | +- enough domain specificity that the pair teaches something real | |
| 106 | + | |
| 107 | +Weak instruction sections tend to be: | |
| 108 | + | |
| 109 | +- generic | |
| 110 | +- repetitive | |
| 111 | +- too broad to answer well | |
| 112 | +- stylistically inconsistent with the rest of the document | |
| 113 | + | |
| 114 | +## See also | |
| 115 | + | |
| 116 | +- [Section grammar](sections.md) | |
| 117 | +- [Synthesize training data](../cookbook/synthesize-training-data.md) | |
| 118 | +- [Bootstrap self-improving](../cookbook/bootstrap-self-improving.md) | |
| 119 | +- [CLI reference](../cli/reference.md) | |
docs/format/sections.mdmodified@@ -47,6 +47,12 @@ Trains via **supervised fine-tuning (SFT)**: the model sees `Q` text | ||
| 47 | 47 | as the prompt, `A` text as the target. This is the pattern that |
| 48 | 48 | produces "helpful assistant" behavior. |
| 49 | 49 | |
| 50 | +`dlm synth instructions` can also write synthesized instruction | |
| 51 | +sections back into the document. Those keep the same basic body grammar | |
| 52 | +but add an HTML provenance marker immediately after the fence. See the | |
| 53 | +[instruction section reference](instruction-section.md) for the full | |
| 54 | +marker shape and validation rules. | |
| 55 | + | |
| 50 | 56 | ### Preference (`::preference::`) |
| 51 | 57 | |
| 52 | 58 | Open with `::preference::`. Each record has three blocks: |
@@ -159,6 +165,7 @@ being picked up as new?", the ID in `dlm show --json` is the answer. | ||
| 159 | 165 | |
| 160 | 166 | ## See also |
| 161 | 167 | |
| 168 | +- [Instruction section reference](instruction-section.md) | |
| 162 | 169 | - [Preference section reference](preference-section.md) |
| 163 | 170 | - [First train walkthrough](../getting-started/first-train.md) |
| 164 | 171 | - [Cookbook: coding tutor](../cookbook/coding-tutor.md) — full |
docs/index.mdmodified@@ -23,6 +23,8 @@ as Ollama, `llama-server`, `vllm`, and `mlx-serve`. | ||
| 23 | 23 | vision-language, and audio-language rows |
| 24 | 24 | - **Replay-backed retraining** so edits accumulate instead of silently wiping |
| 25 | 25 | prior state |
| 26 | +- **Synthetic data loops** through `dlm synth instructions` and | |
| 27 | + `dlm synth preferences` | |
| 26 | 28 | - **Multi-adapter docs + learned gating** for separating knowledge, tone, or |
| 27 | 29 | persona lanes inside one project |
| 28 | 30 | - **Local iteration UX** with `prompt`, `repl`, `train --watch`, `metrics`, |
@@ -49,6 +51,7 @@ $ uv run dlm export tutor.dlm --target ollama --name my-tutor | ||
| 49 | 51 | | Train across a real repo | [Training across codebases](cookbook/training-across-codebases.md) | |
| 50 | 52 | | Use named adapters and routing | [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) | |
| 51 | 53 | | Work with images or audio | [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) | |
| 54 | +| Turn prose into instruction data | [Synthesize training data](cookbook/synthesize-training-data.md) and [Bootstrap self-improving](cookbook/bootstrap-self-improving.md) | | |
| 52 | 55 | | Mine preference pairs from a live adapter | [Self-improving loop](cookbook/self-improving-loop.md) and [Reward-model integration](cookbook/reward-model-integration.md) | |
| 53 | 56 | | Export or ship a model | [Multi-target export](cookbook/multi-target-export.md), [CLI reference](cli/reference.md), and [Determinism](determinism.md) | |
| 54 | 57 | | Pull eval failures back into training | [Probe-driven training](cookbook/probe-driven-training.md) | |
mkdocs.ymlmodified@@ -58,6 +58,7 @@ nav: | ||
| 58 | 58 | - The .dlm format: |
| 59 | 59 | - Frontmatter: format/frontmatter.md |
| 60 | 60 | - Sections: format/sections.md |
| 61 | + - Instruction sections: format/instruction-section.md | |
| 61 | 62 | - Preference sections: format/preference-section.md |
| 62 | 63 | - Export manifest: format/export-manifest.md |
| 63 | 64 | - .dlm/training.yaml: format/dlm-training-yaml.md |
@@ -72,6 +73,8 @@ nav: | ||
| 72 | 73 | - Sharing with pack: cookbook/sharing-with-pack.md |
| 73 | 74 | - Quantization tradeoffs: cookbook/quantization-tradeoffs.md |
| 74 | 75 | - Preference (DPO vs ORPO): cookbook/preference-dpo-vs-orpo.md |
| 76 | + - Synthesize training data: cookbook/synthesize-training-data.md | |
| 77 | + - Bootstrap self-improving: cookbook/bootstrap-self-improving.md | |
| 75 | 78 | - Self-improving loop: cookbook/self-improving-loop.md |
| 76 | 79 | - Reward-model integration: cookbook/reward-model-integration.md |
| 77 | 80 | - Multi-adapter composition: cookbook/multi-adapter.md |