Document synth workflows
Authored by
mfwolffe <wolffemf@dukes.jmu.edu>
- SHA
1556d3896fba7763c0533997f5209bf4da28e6c6- Parents
-
b41f337 - Tree
10c8c71
1556d38
1556d3896fba7763c0533997f5209bf4da28e6c6b41f337
10c8c71| Status | File | + | - |
|---|---|---|---|
| A |
docs/cookbook/bootstrap-self-improving.md
|
193 | 0 |
| A |
docs/cookbook/synthesize-training-data.md
|
202 | 0 |
| A |
docs/format/instruction-section.md
|
119 | 0 |
| M |
docs/format/sections.md
|
7 | 0 |
| M |
docs/index.md
|
3 | 0 |
| M |
mkdocs.yml
|
3 | 0 |
docs/cookbook/bootstrap-self-improving.mdadded@@ -0,0 +1,193 @@ | |||
| 1 | +# Bootstrap self-improving | ||
| 2 | + | ||
| 3 | +The self-teacher loop is the most interesting version of Sprint 43: | ||
| 4 | +your current adapter writes new `::instruction::` sections for its own | ||
| 5 | +document, then the next train run folds them back in. | ||
| 6 | + | ||
| 7 | +This is not magic. It works because DLM already has: | ||
| 8 | + | ||
| 9 | +- replay-backed retraining | ||
| 10 | +- synthesized instruction provenance (`auto_synth`) | ||
| 11 | +- a local `sway` judge for filtering weak candidates | ||
| 12 | + | ||
| 13 | +Used carefully, it turns one trained document into a steadily better | ||
| 14 | +instruction corpus. | ||
| 15 | + | ||
| 16 | +## The honest starting point | ||
| 17 | + | ||
| 18 | +`--teacher self` uses the current adapter for that `.dlm`. That means | ||
| 19 | +the loop starts **after** there is already a trainable local adapter. | ||
| 20 | + | ||
| 21 | +A good bootstrap pattern is: | ||
| 22 | + | ||
| 23 | +1. Start with prose plus at least some useful seed supervision, or do an | ||
| 24 | + initial train from prose and existing sections. | ||
| 25 | +2. Run `dlm synth instructions --teacher self`. | ||
| 26 | +3. Retrain on the accepted synth sections. | ||
| 27 | +4. Repeat in small batches. | ||
| 28 | + | ||
| 29 | +If the adapter still cannot answer basic questions about the document, | ||
| 30 | +synthetic instruction generation will mostly amplify noise. | ||
| 31 | + | ||
| 32 | +## Minimal loop | ||
| 33 | + | ||
| 34 | +Train once: | ||
| 35 | + | ||
| 36 | +```sh | ||
| 37 | +uv run dlm train notes.dlm | ||
| 38 | +``` | ||
| 39 | + | ||
| 40 | +Generate a small accepted batch from the current adapter and write it | ||
| 41 | +back immediately: | ||
| 42 | + | ||
| 43 | +```sh | ||
| 44 | +uv run dlm synth instructions notes.dlm \ | ||
| 45 | + --teacher self \ | ||
| 46 | + --per-section 1 \ | ||
| 47 | + --strategy extraction \ | ||
| 48 | + --max-pairs 4 \ | ||
| 49 | + --apply | ||
| 50 | +``` | ||
| 51 | + | ||
| 52 | +Retrain on the expanded instruction set: | ||
| 53 | + | ||
| 54 | +```sh | ||
| 55 | +uv run dlm train notes.dlm | ||
| 56 | +``` | ||
| 57 | + | ||
| 58 | +Then inspect real output quality: | ||
| 59 | + | ||
| 60 | +```sh | ||
| 61 | +uv run dlm prompt notes.dlm "What does DGEMM do?" | ||
| 62 | +``` | ||
| 63 | + | ||
| 64 | +That is the basic self-improving loop. | ||
| 65 | + | ||
| 66 | +## Safer staged version | ||
| 67 | + | ||
| 68 | +If you want to inspect before writing: | ||
| 69 | + | ||
| 70 | +```sh | ||
| 71 | +uv run dlm synth instructions notes.dlm \ | ||
| 72 | + --teacher self \ | ||
| 73 | + --per-section 1 \ | ||
| 74 | + --strategy extraction | ||
| 75 | + | ||
| 76 | +uv run dlm synth list notes.dlm | ||
| 77 | +``` | ||
| 78 | + | ||
| 79 | +The current implementation stages accepted synth sections for | ||
| 80 | +inspection, but it does not yet have a separate `dlm synth apply` | ||
| 81 | +subcommand. Use `--apply` on the synth run when you want the sections | ||
| 82 | +written straight into the document. | ||
| 83 | + | ||
| 84 | +## Why `sway` stays the default | ||
| 85 | + | ||
| 86 | +The self-teacher path is the place where the default `--filter sway` | ||
| 87 | +matters most. | ||
| 88 | + | ||
| 89 | +Without filtering, a weak adapter can happily generate: | ||
| 90 | + | ||
| 91 | +- duplicates | ||
| 92 | +- overly generic answers | ||
| 93 | +- plausible but wrong extrapolations | ||
| 94 | + | ||
| 95 | +The current synth filter stack is: | ||
| 96 | + | ||
| 97 | +1. dedup | ||
| 98 | +2. optional judge pass | ||
| 99 | +3. optional threshold cut | ||
| 100 | + | ||
| 101 | +The CLI prints those counts so you can tell whether the loop is getting | ||
| 102 | +better or just louder. | ||
| 103 | + | ||
| 104 | +## A conservative rhythm | ||
| 105 | + | ||
| 106 | +This is a healthy local rhythm for a real project: | ||
| 107 | + | ||
| 108 | +```sh | ||
| 109 | +uv run dlm train notes.dlm | ||
| 110 | +uv run dlm synth instructions notes.dlm \ | ||
| 111 | + --teacher self \ | ||
| 112 | + --per-section 1 \ | ||
| 113 | + --max-pairs 4 \ | ||
| 114 | + --apply | ||
| 115 | +uv run dlm train notes.dlm | ||
| 116 | +uv run dlm prompt notes.dlm "Explain the core idea." | ||
| 117 | +``` | ||
| 118 | + | ||
| 119 | +Keep the accepted batch small at first. The point is to improve the | ||
| 120 | +document's instruction surface, not flood it with speculative rows. | ||
| 121 | + | ||
| 122 | +## When to switch away from `self` | ||
| 123 | + | ||
| 124 | +The self-teacher is convenient, but not always the right teacher. | ||
| 125 | + | ||
| 126 | +Prefer an external teacher when: | ||
| 127 | + | ||
| 128 | +- the local adapter is still very early and weak | ||
| 129 | +- you need broader general knowledge than the current adapter can supply | ||
| 130 | +- you want to compare local-vs-external synth quality on the same prose | ||
| 131 | + | ||
| 132 | +That usually looks like: | ||
| 133 | + | ||
| 134 | +```sh | ||
| 135 | +uv run dlm synth instructions notes.dlm \ | ||
| 136 | + --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \ | ||
| 137 | + --per-section 1 \ | ||
| 138 | + --apply | ||
| 139 | +``` | ||
| 140 | + | ||
| 141 | +and then later moving back to `--teacher self` once the adapter has real | ||
| 142 | +domain traction. | ||
| 143 | + | ||
| 144 | +## Pairing Sprint 43 with Sprint 42 | ||
| 145 | + | ||
| 146 | +Instruction synthesis and preference mining are complementary: | ||
| 147 | + | ||
| 148 | +- `dlm synth instructions` grows the SFT side of the document | ||
| 149 | +- `dlm synth preferences` / `dlm preference mine` sharpens ranking and | ||
| 150 | + behavior once the adapter can already produce multiple plausible | ||
| 151 | + answers | ||
| 152 | + | ||
| 153 | +A practical sequence is: | ||
| 154 | + | ||
| 155 | +1. train | ||
| 156 | +2. synth instructions | ||
| 157 | +3. train | ||
| 158 | +4. mine preferences | ||
| 159 | +5. train preference phase | ||
| 160 | + | ||
| 161 | +That is the closest current DLM path to a fully local self-improving | ||
| 162 | +document loop. | ||
| 163 | + | ||
| 164 | +## Failure modes to watch | ||
| 165 | + | ||
| 166 | +### The second pass is not better | ||
| 167 | + | ||
| 168 | +That usually means one of: | ||
| 169 | + | ||
| 170 | +- the first synth batch was too weak | ||
| 171 | +- the document still lacks enough domain prose | ||
| 172 | +- the adapter is too small for the domain | ||
| 173 | + | ||
| 174 | +Do not assume "more synthetic rows" automatically means "better model." | ||
| 175 | + | ||
| 176 | +### Expansion mode gets weird | ||
| 177 | + | ||
| 178 | +`--strategy expansion` is useful, but it is also the fastest route to | ||
| 179 | +polished nonsense. Prefer `extraction` for early loops and only widen to | ||
| 180 | +`both` or `expansion` once the adapter is already grounded. | ||
| 181 | + | ||
| 182 | +### Prompt quality improves but factuality does not | ||
| 183 | + | ||
| 184 | +That is a signal to go back to better prose or hand-authored | ||
| 185 | +instructional supervision. Self-improvement cannot invent missing source | ||
| 186 | +knowledge. | ||
| 187 | + | ||
| 188 | +## See also | ||
| 189 | + | ||
| 190 | +- [Synthesize training data](synthesize-training-data.md) | ||
| 191 | +- [Instruction section reference](../format/instruction-section.md) | ||
| 192 | +- [Self-improving loop](self-improving-loop.md) | ||
| 193 | +- [Reward-model integration](reward-model-integration.md) | ||
docs/cookbook/synthesize-training-data.mdadded@@ -0,0 +1,202 @@ | |||
| 1 | +# Synthesize training data | ||
| 2 | + | ||
| 3 | +`dlm synth instructions` turns prose-heavy `.dlm` files into usable | ||
| 4 | +`::instruction::` sections. | ||
| 5 | + | ||
| 6 | +This is the shortest path from "I have notes" to "I have supervised | ||
| 7 | +training pairs" when the document already contains domain prose but not | ||
| 8 | +enough authored Q/A. | ||
| 9 | + | ||
| 10 | +## What it does | ||
| 11 | + | ||
| 12 | +The synth loop: | ||
| 13 | + | ||
| 14 | +1. Finds non-empty prose sections in the document. | ||
| 15 | +2. Prompts a teacher model to generate question/answer pairs about that | ||
| 16 | + prose. | ||
| 17 | +3. Deduplicates the generated pairs. | ||
| 18 | +4. Optionally filters them through the `sway` judge. | ||
| 19 | +5. Either stages the accepted `auto_synth` sections for inspection or | ||
| 20 | + writes them straight back into the `.dlm`. | ||
| 21 | + | ||
| 22 | +The generated sections are still normal `::instruction::` sections. | ||
| 23 | +They just carry provenance metadata so DLM can tell synthesized pairs | ||
| 24 | +from hand-authored ones. | ||
| 25 | + | ||
| 26 | +## Choose a teacher | ||
| 27 | + | ||
| 28 | +The teacher decides who writes the candidate Q/A pairs: | ||
| 29 | + | ||
| 30 | +- `self`: use the current local adapter for this document | ||
| 31 | +- `hf:<model>`: use a HuggingFace text model | ||
| 32 | +- `openai:<model>`: use the OpenAI API | ||
| 33 | +- `anthropic:<model>`: use the Anthropic API | ||
| 34 | +- `vllm-server:<url>`: use an OpenAI-compatible local server | ||
| 35 | + | ||
| 36 | +The current default is `self`, but that only makes sense once the | ||
| 37 | +document already has a trained adapter. For a cold start, either: | ||
| 38 | + | ||
| 39 | +- train once first, then synth with `self`, or | ||
| 40 | +- use `hf:` / `openai:` / `anthropic:` / `vllm-server:` as the teacher | ||
| 41 | + | ||
| 42 | +## Minimal example | ||
| 43 | + | ||
| 44 | +Start with a prose-heavy document: | ||
| 45 | + | ||
| 46 | +```dlm | ||
| 47 | +--- | ||
| 48 | +dlm_id: 01K... | ||
| 49 | +dlm_version: 15 | ||
| 50 | +base_model: smollm2-135m | ||
| 51 | +--- | ||
| 52 | + | ||
| 53 | +DGEMM multiplies two dense matrices and can optionally accumulate the | ||
| 54 | +result into an existing output matrix. | ||
| 55 | +``` | ||
| 56 | + | ||
| 57 | +Generate one extraction-style pair per prose section with an HF teacher: | ||
| 58 | + | ||
| 59 | +```sh | ||
| 60 | +uv run dlm synth instructions notes.dlm \ | ||
| 61 | + --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \ | ||
| 62 | + --per-section 1 \ | ||
| 63 | + --strategy extraction | ||
| 64 | +``` | ||
| 65 | + | ||
| 66 | +That prints two summaries: | ||
| 67 | + | ||
| 68 | +- the raw synth plan | ||
| 69 | +- the filter report (`generated`, `dedup`, `judge passed`, `threshold`) | ||
| 70 | + | ||
| 71 | +By default, accepted sections are staged under the store so you can | ||
| 72 | +inspect them: | ||
| 73 | + | ||
| 74 | +```sh | ||
| 75 | +uv run dlm synth list notes.dlm | ||
| 76 | +``` | ||
| 77 | + | ||
| 78 | +If you want the accepted pairs written straight back into the document, | ||
| 79 | +use `--apply`: | ||
| 80 | + | ||
| 81 | +```sh | ||
| 82 | +uv run dlm synth instructions notes.dlm \ | ||
| 83 | + --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \ | ||
| 84 | + --per-section 1 \ | ||
| 85 | + --strategy extraction \ | ||
| 86 | + --apply | ||
| 87 | +``` | ||
| 88 | + | ||
| 89 | +## Strategy choices | ||
| 90 | + | ||
| 91 | +The `--strategy` flag controls what kind of questions the teacher is | ||
| 92 | +asked to produce: | ||
| 93 | + | ||
| 94 | +- `extraction`: questions answered directly by the prose | ||
| 95 | +- `expansion`: questions a curious reader might ask beyond the exact | ||
| 96 | + wording of the prose | ||
| 97 | +- `both`: split the per-section budget across both prompt styles | ||
| 98 | + | ||
| 99 | +Start with `extraction` when you care about faithfulness. Reach for | ||
| 100 | +`expansion` once the document already has a stable domain voice and you | ||
| 101 | +want broader instructional coverage. | ||
| 102 | + | ||
| 103 | +## Filter choices | ||
| 104 | + | ||
| 105 | +The `--filter` flag controls post-generation cleanup: | ||
| 106 | + | ||
| 107 | +- `sway`: dedup plus judge filtering against an empty baseline | ||
| 108 | +- `dedup-only`: keep only near-duplicate suppression | ||
| 109 | +- `none`: accept everything that parses as a valid pair | ||
| 110 | + | ||
| 111 | +`sway` is the safest default and is what most users should keep. It is | ||
| 112 | +especially helpful when using creative teachers or `--strategy both`. | ||
| 113 | + | ||
| 114 | +If you are debugging prompt quality, use `--filter none` once and look | ||
| 115 | +at the raw plan before deciding whether the issue is generation or | ||
| 116 | +filtering. | ||
| 117 | + | ||
| 118 | +## Useful knobs | ||
| 119 | + | ||
| 120 | +```sh | ||
| 121 | +uv run dlm synth instructions notes.dlm \ | ||
| 122 | + --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \ | ||
| 123 | + --per-section 3 \ | ||
| 124 | + --strategy both \ | ||
| 125 | + --filter sway \ | ||
| 126 | + --threshold 0.2 \ | ||
| 127 | + --max-pairs 8 \ | ||
| 128 | + --max-new-tokens 512 \ | ||
| 129 | + --temp 0.2 \ | ||
| 130 | + --top-p 0.95 \ | ||
| 131 | + --seed 7 | ||
| 132 | +``` | ||
| 133 | + | ||
| 134 | +The most useful flags in practice are: | ||
| 135 | + | ||
| 136 | +- `--per-section`: generate more than one candidate pair per prose block | ||
| 137 | +- `--max-pairs`: cap document churn on large files | ||
| 138 | +- `--threshold`: tighten or loosen `sway` acceptance | ||
| 139 | +- `--temp` and `--top-p`: increase diversity when the teacher is too | ||
| 140 | + repetitive | ||
| 141 | + | ||
| 142 | +## Training after synth | ||
| 143 | + | ||
| 144 | +Once the document has accepted `auto_synth` instruction sections, the | ||
| 145 | +next normal train run consumes them like any other instruction pair: | ||
| 146 | + | ||
| 147 | +```sh | ||
| 148 | +uv run dlm train notes.dlm | ||
| 149 | +``` | ||
| 150 | + | ||
| 151 | +No special train flag is needed. Synthesized instruction sections flow | ||
| 152 | +through the same SFT path as hand-authored sections. | ||
| 153 | + | ||
| 154 | +## Revert and inspection | ||
| 155 | + | ||
| 156 | +List applied auto-synth sections: | ||
| 157 | + | ||
| 158 | +```sh | ||
| 159 | +uv run dlm synth list notes.dlm | ||
| 160 | +``` | ||
| 161 | + | ||
| 162 | +Strip every synthesized instruction section from the document: | ||
| 163 | + | ||
| 164 | +```sh | ||
| 165 | +uv run dlm synth revert notes.dlm | ||
| 166 | +``` | ||
| 167 | + | ||
| 168 | +This only removes `auto_synth: true` instruction sections. Hand-authored | ||
| 169 | +instruction blocks stay untouched. | ||
| 170 | + | ||
| 171 | +## Common failure modes | ||
| 172 | + | ||
| 173 | +### The self teacher is weak | ||
| 174 | + | ||
| 175 | +If `--teacher self` produces junk, the adapter probably is not ready | ||
| 176 | +yet. Train once more first, or use a stronger external teacher for the | ||
| 177 | +first synth pass. | ||
| 178 | + | ||
| 179 | +### Everything gets filtered out | ||
| 180 | + | ||
| 181 | +That usually means one of three things: | ||
| 182 | + | ||
| 183 | +- the teacher produced near-duplicates | ||
| 184 | +- the generated answers were worse than the empty-baseline comparison in | ||
| 185 | + `sway` | ||
| 186 | +- the threshold is too strict | ||
| 187 | + | ||
| 188 | +Lower `--threshold`, or temporarily switch to `--filter dedup-only` to | ||
| 189 | +see whether the judge is the main bottleneck. | ||
| 190 | + | ||
| 191 | +### The document churns too much | ||
| 192 | + | ||
| 193 | +Use `--max-pairs` aggressively at first. A small accepted batch is much | ||
| 194 | +easier to reason about than dumping dozens of synthetic sections into a | ||
| 195 | +single file. | ||
| 196 | + | ||
| 197 | +## See also | ||
| 198 | + | ||
| 199 | +- [Instruction section reference](../format/instruction-section.md) | ||
| 200 | +- [Bootstrap self-improving](bootstrap-self-improving.md) | ||
| 201 | +- [Self-improving loop](self-improving-loop.md) | ||
| 202 | +- [CLI reference](../cli/reference.md) | ||
docs/format/instruction-section.mdadded@@ -0,0 +1,119 @@ | |||
| 1 | +# Instruction section reference | ||
| 2 | + | ||
| 3 | +`::instruction::` sections are the supervised fine-tuning format DLM | ||
| 4 | +uses for prompt/answer training data. | ||
| 5 | + | ||
| 6 | +They are valid in hand-authored `.dlm` files and in synthetic output | ||
| 7 | +written by `dlm synth instructions --apply`. | ||
| 8 | + | ||
| 9 | +## Basic shape | ||
| 10 | + | ||
| 11 | +Each instruction section contains one or more `Q` / `A` pairs: | ||
| 12 | + | ||
| 13 | +```dlm | ||
| 14 | +::instruction:: | ||
| 15 | +### Q | ||
| 16 | +What is a decorator? | ||
| 17 | + | ||
| 18 | +### A | ||
| 19 | +A function that takes a function and returns a wrapped function. | ||
| 20 | + | ||
| 21 | +### Q | ||
| 22 | +When should I use `functools.wraps`? | ||
| 23 | + | ||
| 24 | +### A | ||
| 25 | +Whenever a decorator returns another callable and you want to preserve | ||
| 26 | +the wrapped function's metadata. | ||
| 27 | +``` | ||
| 28 | + | ||
| 29 | +DLM splits those into individual supervised rows at parse time. | ||
| 30 | + | ||
| 31 | +## Semantics | ||
| 32 | + | ||
| 33 | +- `Q` is the prompt shown to the model. | ||
| 34 | +- `A` is the target response. | ||
| 35 | + | ||
| 36 | +At train time, DLM uses the question as context and the answer as the | ||
| 37 | +supervised target. This is the section type that most directly shapes | ||
| 38 | +assistant behavior. | ||
| 39 | + | ||
| 40 | +## Auto-synth instruction sections | ||
| 41 | + | ||
| 42 | +When `dlm synth instructions` writes sections back into a document, it | ||
| 43 | +adds an HTML marker immediately after the section fence: | ||
| 44 | + | ||
| 45 | +```dlm | ||
| 46 | +::instruction:: | ||
| 47 | +<!-- dlm-auto-synth: synth_teacher="self" synth_strategy="extraction" synth_at="2026-04-24T10:18:42Z" source_section_id="b6b7d8a2f4b3f9c0" --> | ||
| 48 | +### Q | ||
| 49 | +What does DGEMM do? | ||
| 50 | + | ||
| 51 | +### A | ||
| 52 | +It multiplies dense matrices and can optionally accumulate the result. | ||
| 53 | +``` | ||
| 54 | + | ||
| 55 | +That marker corresponds to these parsed fields on the section: | ||
| 56 | + | ||
| 57 | +- `auto_synth: true` | ||
| 58 | +- `synth_teacher` | ||
| 59 | +- `synth_strategy` | ||
| 60 | +- `synth_at` | ||
| 61 | +- `source_section_id` | ||
| 62 | + | ||
| 63 | +Hand-authored instruction sections omit the marker and keep | ||
| 64 | +`auto_synth=false`. | ||
| 65 | + | ||
| 66 | +## Validation rules | ||
| 67 | + | ||
| 68 | +- The auto-synth marker is only valid on `::instruction::` sections. | ||
| 69 | +- Auto-synth sections must provide all metadata fields together. | ||
| 70 | +- `synth_teacher` and `synth_strategy` must be non-empty strings. | ||
| 71 | +- `source_section_id` must be a valid referenced section ID. | ||
| 72 | +- Section identity ignores the synth metadata, so the same logical | ||
| 73 | + question/answer pair keeps the same content identity whether it was | ||
| 74 | + written by hand or synthesized automatically. | ||
| 75 | + | ||
| 76 | +## Interaction with training | ||
| 77 | + | ||
| 78 | +- `dlm train` includes synthesized instruction sections by default. | ||
| 79 | +- There is currently no separate "ignore auto-synth instructions" train | ||
| 80 | + flag; they flow through the normal SFT path once they are present in | ||
| 81 | + the document. | ||
| 82 | +- `dlm synth revert` strips every `auto_synth: true` instruction section | ||
| 83 | + from the file without touching hand-authored rows. | ||
| 84 | + | ||
| 85 | +## Interaction with `dlm synth` | ||
| 86 | + | ||
| 87 | +Relevant commands: | ||
| 88 | + | ||
| 89 | +- `dlm synth instructions <path>` | ||
| 90 | +- `dlm synth list <path>` | ||
| 91 | +- `dlm synth revert <path>` | ||
| 92 | + | ||
| 93 | +The current `instructions` command can: | ||
| 94 | + | ||
| 95 | +- stage accepted synth sections for inspection | ||
| 96 | +- write accepted synth sections directly with `--apply` | ||
| 97 | +- preview only with `--dry-run` | ||
| 98 | + | ||
| 99 | +## Choosing a good instruction section | ||
| 100 | + | ||
| 101 | +Hand-authored or synthesized, good instruction sections tend to have: | ||
| 102 | + | ||
| 103 | +- a clear prompt with one task | ||
| 104 | +- an answer that matches the tone you want the adapter to learn | ||
| 105 | +- enough domain specificity that the pair teaches something real | ||
| 106 | + | ||
| 107 | +Weak instruction sections tend to be: | ||
| 108 | + | ||
| 109 | +- generic | ||
| 110 | +- repetitive | ||
| 111 | +- too broad to answer well | ||
| 112 | +- stylistically inconsistent with the rest of the document | ||
| 113 | + | ||
| 114 | +## See also | ||
| 115 | + | ||
| 116 | +- [Section grammar](sections.md) | ||
| 117 | +- [Synthesize training data](../cookbook/synthesize-training-data.md) | ||
| 118 | +- [Bootstrap self-improving](../cookbook/bootstrap-self-improving.md) | ||
| 119 | +- [CLI reference](../cli/reference.md) | ||
docs/format/sections.mdmodified@@ -47,6 +47,12 @@ Trains via **supervised fine-tuning (SFT)**: the model sees `Q` text | |||
| 47 | as the prompt, `A` text as the target. This is the pattern that | 47 | as the prompt, `A` text as the target. This is the pattern that |
| 48 | produces "helpful assistant" behavior. | 48 | produces "helpful assistant" behavior. |
| 49 | 49 | ||
| 50 | +`dlm synth instructions` can also write synthesized instruction | ||
| 51 | +sections back into the document. Those keep the same basic body grammar | ||
| 52 | +but add an HTML provenance marker immediately after the fence. See the | ||
| 53 | +[instruction section reference](instruction-section.md) for the full | ||
| 54 | +marker shape and validation rules. | ||
| 55 | + | ||
| 50 | ### Preference (`::preference::`) | 56 | ### Preference (`::preference::`) |
| 51 | 57 | ||
| 52 | Open with `::preference::`. Each record has three blocks: | 58 | Open with `::preference::`. Each record has three blocks: |
@@ -159,6 +165,7 @@ being picked up as new?", the ID in `dlm show --json` is the answer. | |||
| 159 | 165 | ||
| 160 | ## See also | 166 | ## See also |
| 161 | 167 | ||
| 168 | +- [Instruction section reference](instruction-section.md) | ||
| 162 | - [Preference section reference](preference-section.md) | 169 | - [Preference section reference](preference-section.md) |
| 163 | - [First train walkthrough](../getting-started/first-train.md) | 170 | - [First train walkthrough](../getting-started/first-train.md) |
| 164 | - [Cookbook: coding tutor](../cookbook/coding-tutor.md) — full | 171 | - [Cookbook: coding tutor](../cookbook/coding-tutor.md) — full |
docs/index.mdmodified@@ -23,6 +23,8 @@ as Ollama, `llama-server`, `vllm`, and `mlx-serve`. | |||
| 23 | vision-language, and audio-language rows | 23 | vision-language, and audio-language rows |
| 24 | - **Replay-backed retraining** so edits accumulate instead of silently wiping | 24 | - **Replay-backed retraining** so edits accumulate instead of silently wiping |
| 25 | prior state | 25 | prior state |
| 26 | +- **Synthetic data loops** through `dlm synth instructions` and | ||
| 27 | + `dlm synth preferences` | ||
| 26 | - **Multi-adapter docs + learned gating** for separating knowledge, tone, or | 28 | - **Multi-adapter docs + learned gating** for separating knowledge, tone, or |
| 27 | persona lanes inside one project | 29 | persona lanes inside one project |
| 28 | - **Local iteration UX** with `prompt`, `repl`, `train --watch`, `metrics`, | 30 | - **Local iteration UX** with `prompt`, `repl`, `train --watch`, `metrics`, |
@@ -49,6 +51,7 @@ $ uv run dlm export tutor.dlm --target ollama --name my-tutor | |||
| 49 | | Train across a real repo | [Training across codebases](cookbook/training-across-codebases.md) | | 51 | | Train across a real repo | [Training across codebases](cookbook/training-across-codebases.md) | |
| 50 | | Use named adapters and routing | [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) | | 52 | | Use named adapters and routing | [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) | |
| 51 | | Work with images or audio | [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) | | 53 | | Work with images or audio | [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) | |
| 54 | +| Turn prose into instruction data | [Synthesize training data](cookbook/synthesize-training-data.md) and [Bootstrap self-improving](cookbook/bootstrap-self-improving.md) | | ||
| 52 | | Mine preference pairs from a live adapter | [Self-improving loop](cookbook/self-improving-loop.md) and [Reward-model integration](cookbook/reward-model-integration.md) | | 55 | | Mine preference pairs from a live adapter | [Self-improving loop](cookbook/self-improving-loop.md) and [Reward-model integration](cookbook/reward-model-integration.md) | |
| 53 | | Export or ship a model | [Multi-target export](cookbook/multi-target-export.md), [CLI reference](cli/reference.md), and [Determinism](determinism.md) | | 56 | | Export or ship a model | [Multi-target export](cookbook/multi-target-export.md), [CLI reference](cli/reference.md), and [Determinism](determinism.md) | |
| 54 | | Pull eval failures back into training | [Probe-driven training](cookbook/probe-driven-training.md) | | 57 | | Pull eval failures back into training | [Probe-driven training](cookbook/probe-driven-training.md) | |
mkdocs.ymlmodified@@ -58,6 +58,7 @@ nav: | |||
| 58 | - The .dlm format: | 58 | - The .dlm format: |
| 59 | - Frontmatter: format/frontmatter.md | 59 | - Frontmatter: format/frontmatter.md |
| 60 | - Sections: format/sections.md | 60 | - Sections: format/sections.md |
| 61 | + - Instruction sections: format/instruction-section.md | ||
| 61 | - Preference sections: format/preference-section.md | 62 | - Preference sections: format/preference-section.md |
| 62 | - Export manifest: format/export-manifest.md | 63 | - Export manifest: format/export-manifest.md |
| 63 | - .dlm/training.yaml: format/dlm-training-yaml.md | 64 | - .dlm/training.yaml: format/dlm-training-yaml.md |
@@ -72,6 +73,8 @@ nav: | |||
| 72 | - Sharing with pack: cookbook/sharing-with-pack.md | 73 | - Sharing with pack: cookbook/sharing-with-pack.md |
| 73 | - Quantization tradeoffs: cookbook/quantization-tradeoffs.md | 74 | - Quantization tradeoffs: cookbook/quantization-tradeoffs.md |
| 74 | - Preference (DPO vs ORPO): cookbook/preference-dpo-vs-orpo.md | 75 | - Preference (DPO vs ORPO): cookbook/preference-dpo-vs-orpo.md |
| 76 | + - Synthesize training data: cookbook/synthesize-training-data.md | ||
| 77 | + - Bootstrap self-improving: cookbook/bootstrap-self-improving.md | ||
| 75 | - Self-improving loop: cookbook/self-improving-loop.md | 78 | - Self-improving loop: cookbook/self-improving-loop.md |
| 76 | - Reward-model integration: cookbook/reward-model-integration.md | 79 | - Reward-model integration: cookbook/reward-model-integration.md |
| 77 | - Multi-adapter composition: cookbook/multi-adapter.md | 80 | - Multi-adapter composition: cookbook/multi-adapter.md |