tenseleyflow/documentlanguagemodel / 1556d38

Browse files

Document synth workflows

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
1556d3896fba7763c0533997f5209bf4da28e6c6
Parents
b41f337
Tree
10c8c71

6 changed files

StatusFile+-
A docs/cookbook/bootstrap-self-improving.md 193 0
A docs/cookbook/synthesize-training-data.md 202 0
A docs/format/instruction-section.md 119 0
M docs/format/sections.md 7 0
M docs/index.md 3 0
M mkdocs.yml 3 0
docs/cookbook/bootstrap-self-improving.mdadded
@@ -0,0 +1,193 @@
1
+# Bootstrap self-improving
2
+
3
+The self-teacher loop is the most interesting version of Sprint 43:
4
+your current adapter writes new `::instruction::` sections for its own
5
+document, then the next train run folds them back in.
6
+
7
+This is not magic. It works because DLM already has:
8
+
9
+- replay-backed retraining
10
+- synthesized instruction provenance (`auto_synth`)
11
+- a local `sway` judge for filtering weak candidates
12
+
13
+Used carefully, it turns one trained document into a steadily better
14
+instruction corpus.
15
+
16
+## The honest starting point
17
+
18
+`--teacher self` uses the current adapter for that `.dlm`. That means
19
+the loop starts **after** there is already a trainable local adapter.
20
+
21
+A good bootstrap pattern is:
22
+
23
+1. Start with prose plus at least some useful seed supervision, or do an
24
+   initial train from prose and existing sections.
25
+2. Run `dlm synth instructions --teacher self`.
26
+3. Retrain on the accepted synth sections.
27
+4. Repeat in small batches.
28
+
29
+If the adapter still cannot answer basic questions about the document,
30
+synthetic instruction generation will mostly amplify noise.
31
+
32
+## Minimal loop
33
+
34
+Train once:
35
+
36
+```sh
37
+uv run dlm train notes.dlm
38
+```
39
+
40
+Generate a small accepted batch from the current adapter and write it
41
+back immediately:
42
+
43
+```sh
44
+uv run dlm synth instructions notes.dlm \
45
+  --teacher self \
46
+  --per-section 1 \
47
+  --strategy extraction \
48
+  --max-pairs 4 \
49
+  --apply
50
+```
51
+
52
+Retrain on the expanded instruction set:
53
+
54
+```sh
55
+uv run dlm train notes.dlm
56
+```
57
+
58
+Then inspect real output quality:
59
+
60
+```sh
61
+uv run dlm prompt notes.dlm "What does DGEMM do?"
62
+```
63
+
64
+That is the basic self-improving loop.
65
+
66
+## Safer staged version
67
+
68
+If you want to inspect before writing:
69
+
70
+```sh
71
+uv run dlm synth instructions notes.dlm \
72
+  --teacher self \
73
+  --per-section 1 \
74
+  --strategy extraction
75
+
76
+uv run dlm synth list notes.dlm
77
+```
78
+
79
+The current implementation stages accepted synth sections for
80
+inspection, but it does not yet have a separate `dlm synth apply`
81
+subcommand. Use `--apply` on the synth run when you want the sections
82
+written straight into the document.
83
+
84
+## Why `sway` stays the default
85
+
86
+The self-teacher path is the place where the default `--filter sway`
87
+matters most.
88
+
89
+Without filtering, a weak adapter can happily generate:
90
+
91
+- duplicates
92
+- overly generic answers
93
+- plausible but wrong extrapolations
94
+
95
+The current synth filter stack is:
96
+
97
+1. dedup
98
+2. optional judge pass
99
+3. optional threshold cut
100
+
101
+The CLI prints those counts so you can tell whether the loop is getting
102
+better or just louder.
103
+
104
+## A conservative rhythm
105
+
106
+This is a healthy local rhythm for a real project:
107
+
108
+```sh
109
+uv run dlm train notes.dlm
110
+uv run dlm synth instructions notes.dlm \
111
+  --teacher self \
112
+  --per-section 1 \
113
+  --max-pairs 4 \
114
+  --apply
115
+uv run dlm train notes.dlm
116
+uv run dlm prompt notes.dlm "Explain the core idea."
117
+```
118
+
119
+Keep the accepted batch small at first. The point is to improve the
120
+document's instruction surface, not flood it with speculative rows.
121
+
122
+## When to switch away from `self`
123
+
124
+The self-teacher is convenient, but not always the right teacher.
125
+
126
+Prefer an external teacher when:
127
+
128
+- the local adapter is still very early and weak
129
+- you need broader general knowledge than the current adapter can supply
130
+- you want to compare local-vs-external synth quality on the same prose
131
+
132
+That usually looks like:
133
+
134
+```sh
135
+uv run dlm synth instructions notes.dlm \
136
+  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
137
+  --per-section 1 \
138
+  --apply
139
+```
140
+
141
+and then later moving back to `--teacher self` once the adapter has real
142
+domain traction.
143
+
144
+## Pairing Sprint 43 with Sprint 42
145
+
146
+Instruction synthesis and preference mining are complementary:
147
+
148
+- `dlm synth instructions` grows the SFT side of the document
149
+- `dlm synth preferences` / `dlm preference mine` sharpens ranking and
150
+  behavior once the adapter can already produce multiple plausible
151
+  answers
152
+
153
+A practical sequence is:
154
+
155
+1. train
156
+2. synth instructions
157
+3. train
158
+4. mine preferences
159
+5. train preference phase
160
+
161
+That is the closest current DLM path to a fully local self-improving
162
+document loop.
163
+
164
+## Failure modes to watch
165
+
166
+### The second pass is not better
167
+
168
+That usually means one of:
169
+
170
+- the first synth batch was too weak
171
+- the document still lacks enough domain prose
172
+- the adapter is too small for the domain
173
+
174
+Do not assume "more synthetic rows" automatically means "better model."
175
+
176
+### Expansion mode gets weird
177
+
178
+`--strategy expansion` is useful, but it is also the fastest route to
179
+polished nonsense. Prefer `extraction` for early loops and only widen to
180
+`both` or `expansion` once the adapter is already grounded.
181
+
182
+### Prompt quality improves but factuality does not
183
+
184
+That is a signal to go back to better prose or hand-authored
185
+instructional supervision. Self-improvement cannot invent missing source
186
+knowledge.
187
+
188
+## See also
189
+
190
+- [Synthesize training data](synthesize-training-data.md)
191
+- [Instruction section reference](../format/instruction-section.md)
192
+- [Self-improving loop](self-improving-loop.md)
193
+- [Reward-model integration](reward-model-integration.md)
docs/cookbook/synthesize-training-data.mdadded
@@ -0,0 +1,202 @@
1
+# Synthesize training data
2
+
3
+`dlm synth instructions` turns prose-heavy `.dlm` files into usable
4
+`::instruction::` sections.
5
+
6
+This is the shortest path from "I have notes" to "I have supervised
7
+training pairs" when the document already contains domain prose but not
8
+enough authored Q/A.
9
+
10
+## What it does
11
+
12
+The synth loop:
13
+
14
+1. Finds non-empty prose sections in the document.
15
+2. Prompts a teacher model to generate question/answer pairs about that
16
+   prose.
17
+3. Deduplicates the generated pairs.
18
+4. Optionally filters them through the `sway` judge.
19
+5. Either stages the accepted `auto_synth` sections for inspection or
20
+   writes them straight back into the `.dlm`.
21
+
22
+The generated sections are still normal `::instruction::` sections.
23
+They just carry provenance metadata so DLM can tell synthesized pairs
24
+from hand-authored ones.
25
+
26
+## Choose a teacher
27
+
28
+The teacher decides who writes the candidate Q/A pairs:
29
+
30
+- `self`: use the current local adapter for this document
31
+- `hf:<model>`: use a HuggingFace text model
32
+- `openai:<model>`: use the OpenAI API
33
+- `anthropic:<model>`: use the Anthropic API
34
+- `vllm-server:<url>`: use an OpenAI-compatible local server
35
+
36
+The current default is `self`, but that only makes sense once the
37
+document already has a trained adapter. For a cold start, either:
38
+
39
+- train once first, then synth with `self`, or
40
+- use `hf:` / `openai:` / `anthropic:` / `vllm-server:` as the teacher
41
+
42
+## Minimal example
43
+
44
+Start with a prose-heavy document:
45
+
46
+```dlm
47
+---
48
+dlm_id: 01K...
49
+dlm_version: 15
50
+base_model: smollm2-135m
51
+---
52
+
53
+DGEMM multiplies two dense matrices and can optionally accumulate the
54
+result into an existing output matrix.
55
+```
56
+
57
+Generate one extraction-style pair per prose section with an HF teacher:
58
+
59
+```sh
60
+uv run dlm synth instructions notes.dlm \
61
+  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
62
+  --per-section 1 \
63
+  --strategy extraction
64
+```
65
+
66
+That prints two summaries:
67
+
68
+- the raw synth plan
69
+- the filter report (`generated`, `dedup`, `judge passed`, `threshold`)
70
+
71
+By default, accepted sections are staged under the store so you can
72
+inspect them:
73
+
74
+```sh
75
+uv run dlm synth list notes.dlm
76
+```
77
+
78
+If you want the accepted pairs written straight back into the document,
79
+use `--apply`:
80
+
81
+```sh
82
+uv run dlm synth instructions notes.dlm \
83
+  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
84
+  --per-section 1 \
85
+  --strategy extraction \
86
+  --apply
87
+```
88
+
89
+## Strategy choices
90
+
91
+The `--strategy` flag controls what kind of questions the teacher is
92
+asked to produce:
93
+
94
+- `extraction`: questions answered directly by the prose
95
+- `expansion`: questions a curious reader might ask beyond the exact
96
+  wording of the prose
97
+- `both`: split the per-section budget across both prompt styles
98
+
99
+Start with `extraction` when you care about faithfulness. Reach for
100
+`expansion` once the document already has a stable domain voice and you
101
+want broader instructional coverage.
102
+
103
+## Filter choices
104
+
105
+The `--filter` flag controls post-generation cleanup:
106
+
107
+- `sway`: dedup plus judge filtering against an empty baseline
108
+- `dedup-only`: keep only near-duplicate suppression
109
+- `none`: accept everything that parses as a valid pair
110
+
111
+`sway` is the safest default and is what most users should keep. It is
112
+especially helpful when using creative teachers or `--strategy both`.
113
+
114
+If you are debugging prompt quality, use `--filter none` once and look
115
+at the raw plan before deciding whether the issue is generation or
116
+filtering.
117
+
118
+## Useful knobs
119
+
120
+```sh
121
+uv run dlm synth instructions notes.dlm \
122
+  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
123
+  --per-section 3 \
124
+  --strategy both \
125
+  --filter sway \
126
+  --threshold 0.2 \
127
+  --max-pairs 8 \
128
+  --max-new-tokens 512 \
129
+  --temp 0.2 \
130
+  --top-p 0.95 \
131
+  --seed 7
132
+```
133
+
134
+The most useful flags in practice are:
135
+
136
+- `--per-section`: generate more than one candidate pair per prose block
137
+- `--max-pairs`: cap document churn on large files
138
+- `--threshold`: tighten or loosen `sway` acceptance
139
+- `--temp` and `--top-p`: increase diversity when the teacher is too
140
+  repetitive
141
+
142
+## Training after synth
143
+
144
+Once the document has accepted `auto_synth` instruction sections, the
145
+next normal train run consumes them like any other instruction pair:
146
+
147
+```sh
148
+uv run dlm train notes.dlm
149
+```
150
+
151
+No special train flag is needed. Synthesized instruction sections flow
152
+through the same SFT path as hand-authored sections.
153
+
154
+## Revert and inspection
155
+
156
+List applied auto-synth sections:
157
+
158
+```sh
159
+uv run dlm synth list notes.dlm
160
+```
161
+
162
+Strip every synthesized instruction section from the document:
163
+
164
+```sh
165
+uv run dlm synth revert notes.dlm
166
+```
167
+
168
+This only removes `auto_synth: true` instruction sections. Hand-authored
169
+instruction blocks stay untouched.
170
+
171
+## Common failure modes
172
+
173
+### The self teacher is weak
174
+
175
+If `--teacher self` produces junk, the adapter probably is not ready
176
+yet. Train once more first, or use a stronger external teacher for the
177
+first synth pass.
178
+
179
+### Everything gets filtered out
180
+
181
+That usually means one of three things:
182
+
183
+- the teacher produced near-duplicates
184
+- the generated answers were worse than the empty-baseline comparison in
185
+  `sway`
186
+- the threshold is too strict
187
+
188
+Lower `--threshold`, or temporarily switch to `--filter dedup-only` to
189
+see whether the judge is the main bottleneck.
190
+
191
+### The document churns too much
192
+
193
+Use `--max-pairs` aggressively at first. A small accepted batch is much
194
+easier to reason about than dumping dozens of synthetic sections into a
195
+single file.
196
+
197
+## See also
198
+
199
+- [Instruction section reference](../format/instruction-section.md)
200
+- [Bootstrap self-improving](bootstrap-self-improving.md)
201
+- [Self-improving loop](self-improving-loop.md)
202
+- [CLI reference](../cli/reference.md)
docs/format/instruction-section.mdadded
@@ -0,0 +1,119 @@
1
+# Instruction section reference
2
+
3
+`::instruction::` sections are the supervised fine-tuning format DLM
4
+uses for prompt/answer training data.
5
+
6
+They are valid in hand-authored `.dlm` files and in synthetic output
7
+written by `dlm synth instructions --apply`.
8
+
9
+## Basic shape
10
+
11
+Each instruction section contains one or more `Q` / `A` pairs:
12
+
13
+```dlm
14
+::instruction::
15
+### Q
16
+What is a decorator?
17
+
18
+### A
19
+A function that takes a function and returns a wrapped function.
20
+
21
+### Q
22
+When should I use `functools.wraps`?
23
+
24
+### A
25
+Whenever a decorator returns another callable and you want to preserve
26
+the wrapped function's metadata.
27
+```
28
+
29
+DLM splits those into individual supervised rows at parse time.
30
+
31
+## Semantics
32
+
33
+- `Q` is the prompt shown to the model.
34
+- `A` is the target response.
35
+
36
+At train time, DLM uses the question as context and the answer as the
37
+supervised target. This is the section type that most directly shapes
38
+assistant behavior.
39
+
40
+## Auto-synth instruction sections
41
+
42
+When `dlm synth instructions` writes sections back into a document, it
43
+adds an HTML marker immediately after the section fence:
44
+
45
+```dlm
46
+::instruction::
47
+<!-- dlm-auto-synth: synth_teacher="self" synth_strategy="extraction" synth_at="2026-04-24T10:18:42Z" source_section_id="b6b7d8a2f4b3f9c0" -->
48
+### Q
49
+What does DGEMM do?
50
+
51
+### A
52
+It multiplies dense matrices and can optionally accumulate the result.
53
+```
54
+
55
+That marker corresponds to these parsed fields on the section:
56
+
57
+- `auto_synth: true`
58
+- `synth_teacher`
59
+- `synth_strategy`
60
+- `synth_at`
61
+- `source_section_id`
62
+
63
+Hand-authored instruction sections omit the marker and keep
64
+`auto_synth=false`.
65
+
66
+## Validation rules
67
+
68
+- The auto-synth marker is only valid on `::instruction::` sections.
69
+- Auto-synth sections must provide all metadata fields together.
70
+- `synth_teacher` and `synth_strategy` must be non-empty strings.
71
+- `source_section_id` must be a valid referenced section ID.
72
+- Section identity ignores the synth metadata, so the same logical
73
+  question/answer pair keeps the same content identity whether it was
74
+  written by hand or synthesized automatically.
75
+
76
+## Interaction with training
77
+
78
+- `dlm train` includes synthesized instruction sections by default.
79
+- There is currently no separate "ignore auto-synth instructions" train
80
+  flag; they flow through the normal SFT path once they are present in
81
+  the document.
82
+- `dlm synth revert` strips every `auto_synth: true` instruction section
83
+  from the file without touching hand-authored rows.
84
+
85
+## Interaction with `dlm synth`
86
+
87
+Relevant commands:
88
+
89
+- `dlm synth instructions <path>`
90
+- `dlm synth list <path>`
91
+- `dlm synth revert <path>`
92
+
93
+The current `instructions` command can:
94
+
95
+- stage accepted synth sections for inspection
96
+- write accepted synth sections directly with `--apply`
97
+- preview only with `--dry-run`
98
+
99
+## Choosing a good instruction section
100
+
101
+Hand-authored or synthesized, good instruction sections tend to have:
102
+
103
+- a clear prompt with one task
104
+- an answer that matches the tone you want the adapter to learn
105
+- enough domain specificity that the pair teaches something real
106
+
107
+Weak instruction sections tend to be:
108
+
109
+- generic
110
+- repetitive
111
+- too broad to answer well
112
+- stylistically inconsistent with the rest of the document
113
+
114
+## See also
115
+
116
+- [Section grammar](sections.md)
117
+- [Synthesize training data](../cookbook/synthesize-training-data.md)
118
+- [Bootstrap self-improving](../cookbook/bootstrap-self-improving.md)
119
+- [CLI reference](../cli/reference.md)
docs/format/sections.mdmodified
@@ -47,6 +47,12 @@ Trains via **supervised fine-tuning (SFT)**: the model sees `Q` text
4747
 as the prompt, `A` text as the target. This is the pattern that
4848
 produces "helpful assistant" behavior.
4949
 
50
+`dlm synth instructions` can also write synthesized instruction
51
+sections back into the document. Those keep the same basic body grammar
52
+but add an HTML provenance marker immediately after the fence. See the
53
+[instruction section reference](instruction-section.md) for the full
54
+marker shape and validation rules.
55
+
5056
 ### Preference (`::preference::`)
5157
 
5258
 Open with `::preference::`. Each record has three blocks:
@@ -159,6 +165,7 @@ being picked up as new?", the ID in `dlm show --json` is the answer.
159165
 
160166
 ## See also
161167
 
168
+- [Instruction section reference](instruction-section.md)
162169
 - [Preference section reference](preference-section.md)
163170
 - [First train walkthrough](../getting-started/first-train.md)
164171
 - [Cookbook: coding tutor](../cookbook/coding-tutor.md) — full
docs/index.mdmodified
@@ -23,6 +23,8 @@ as Ollama, `llama-server`, `vllm`, and `mlx-serve`.
2323
   vision-language, and audio-language rows
2424
 - **Replay-backed retraining** so edits accumulate instead of silently wiping
2525
   prior state
26
+- **Synthetic data loops** through `dlm synth instructions` and
27
+  `dlm synth preferences`
2628
 - **Multi-adapter docs + learned gating** for separating knowledge, tone, or
2729
   persona lanes inside one project
2830
 - **Local iteration UX** with `prompt`, `repl`, `train --watch`, `metrics`,
@@ -49,6 +51,7 @@ $ uv run dlm export tutor.dlm --target ollama --name my-tutor
4951
 | Train across a real repo | [Training across codebases](cookbook/training-across-codebases.md) |
5052
 | Use named adapters and routing | [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) |
5153
 | Work with images or audio | [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) |
54
+| Turn prose into instruction data | [Synthesize training data](cookbook/synthesize-training-data.md) and [Bootstrap self-improving](cookbook/bootstrap-self-improving.md) |
5255
 | Mine preference pairs from a live adapter | [Self-improving loop](cookbook/self-improving-loop.md) and [Reward-model integration](cookbook/reward-model-integration.md) |
5356
 | Export or ship a model | [Multi-target export](cookbook/multi-target-export.md), [CLI reference](cli/reference.md), and [Determinism](determinism.md) |
5457
 | Pull eval failures back into training | [Probe-driven training](cookbook/probe-driven-training.md) |
mkdocs.ymlmodified
@@ -58,6 +58,7 @@ nav:
5858
   - The .dlm format:
5959
       - Frontmatter: format/frontmatter.md
6060
       - Sections: format/sections.md
61
+      - Instruction sections: format/instruction-section.md
6162
       - Preference sections: format/preference-section.md
6263
       - Export manifest: format/export-manifest.md
6364
       - .dlm/training.yaml: format/dlm-training-yaml.md
@@ -72,6 +73,8 @@ nav:
7273
       - Sharing with pack: cookbook/sharing-with-pack.md
7374
       - Quantization tradeoffs: cookbook/quantization-tradeoffs.md
7475
       - Preference (DPO vs ORPO): cookbook/preference-dpo-vs-orpo.md
76
+      - Synthesize training data: cookbook/synthesize-training-data.md
77
+      - Bootstrap self-improving: cookbook/bootstrap-self-improving.md
7578
       - Self-improving loop: cookbook/self-improving-loop.md
7679
       - Reward-model integration: cookbook/reward-model-integration.md
7780
       - Multi-adapter composition: cookbook/multi-adapter.md