Document synth workflows

Status	File	+
A	`docs/cookbook/bootstrap-self-improving.md`	193
A	`docs/cookbook/synthesize-training-data.md`	202
A	`docs/format/instruction-section.md`	119
M	`docs/format/sections.md`	7
M	`docs/index.md`	3
M	`mkdocs.yml`	3

docs/cookbook/bootstrap-self-improving.mdadded

++# Bootstrap self-improving
++
++The self-teacher loop is the most interesting version of Sprint 43:
++your current adapter writes new `::instruction::` sections for its own
++document, then the next train run folds them back in.
++
++This is not magic. It works because DLM already has:
++
++- replay-backed retraining
++- synthesized instruction provenance (`auto_synth`)
++- a local `sway` judge for filtering weak candidates
++
++Used carefully, it turns one trained document into a steadily better
++instruction corpus.
++
++## The honest starting point
++
++`--teacher self` uses the current adapter for that `.dlm`. That means
++the loop starts **after** there is already a trainable local adapter.
++
++A good bootstrap pattern is:
++
++1. Start with prose plus at least some useful seed supervision, or do an
++   initial train from prose and existing sections.
++2. Run `dlm synth instructions --teacher self`.
++3. Retrain on the accepted synth sections.
++4. Repeat in small batches.
++
++If the adapter still cannot answer basic questions about the document,
++synthetic instruction generation will mostly amplify noise.
++
++## Minimal loop
++
++Train once:
++
++```sh
++uv run dlm train notes.dlm
++```
++
++Generate a small accepted batch from the current adapter and write it
++back immediately:
++
++```sh
++uv run dlm synth instructions notes.dlm \
++  --teacher self \
++  --per-section 1 \
++  --strategy extraction \
++  --max-pairs 4 \
++  --apply
++```
++
++Retrain on the expanded instruction set:
++
++```sh
++uv run dlm train notes.dlm
++```
++
++Then inspect real output quality:
++
++```sh
++uv run dlm prompt notes.dlm "What does DGEMM do?"
++```
++
++That is the basic self-improving loop.
++
++## Safer staged version
++
++If you want to inspect before writing:
++
++```sh
++uv run dlm synth instructions notes.dlm \
++  --teacher self \
++  --per-section 1 \
++  --strategy extraction
++
++uv run dlm synth list notes.dlm
++```
++
++The current implementation stages accepted synth sections for
++inspection, but it does not yet have a separate `dlm synth apply`
++subcommand. Use `--apply` on the synth run when you want the sections
++written straight into the document.
++
++## Why `sway` stays the default
++
++The self-teacher path is the place where the default `--filter sway`
++matters most.
++
++Without filtering, a weak adapter can happily generate:
++
++- duplicates
++- overly generic answers
++- plausible but wrong extrapolations
++
++The current synth filter stack is:
++
++1. dedup
++2. optional judge pass
++3. optional threshold cut
++
++The CLI prints those counts so you can tell whether the loop is getting
++better or just louder.
++
++## A conservative rhythm
++
++This is a healthy local rhythm for a real project:
++
++```sh
++uv run dlm train notes.dlm
++uv run dlm synth instructions notes.dlm \
++  --teacher self \
++  --per-section 1 \
++  --max-pairs 4 \
++  --apply
++uv run dlm train notes.dlm
++uv run dlm prompt notes.dlm "Explain the core idea."
++```
++
++Keep the accepted batch small at first. The point is to improve the
++document's instruction surface, not flood it with speculative rows.
++
++## When to switch away from `self`
++
++The self-teacher is convenient, but not always the right teacher.
++
++Prefer an external teacher when:
++
++- the local adapter is still very early and weak
++- you need broader general knowledge than the current adapter can supply
++- you want to compare local-vs-external synth quality on the same prose
++
++That usually looks like:
++
++```sh
++uv run dlm synth instructions notes.dlm \
++  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
++  --per-section 1 \
++  --apply
++```
++
++and then later moving back to `--teacher self` once the adapter has real
++domain traction.
++
++## Pairing Sprint 43 with Sprint 42
++
++Instruction synthesis and preference mining are complementary:
++
++- `dlm synth instructions` grows the SFT side of the document
++- `dlm synth preferences` / `dlm preference mine` sharpens ranking and
++  behavior once the adapter can already produce multiple plausible
++  answers
++
++A practical sequence is:
++
++1. train
++2. synth instructions
++3. train
++4. mine preferences
++5. train preference phase
++
++That is the closest current DLM path to a fully local self-improving
++document loop.
++
++## Failure modes to watch
++
++### The second pass is not better
++
++That usually means one of:
++
++- the first synth batch was too weak
++- the document still lacks enough domain prose
++- the adapter is too small for the domain
++
++Do not assume "more synthetic rows" automatically means "better model."
++
++### Expansion mode gets weird
++
++`--strategy expansion` is useful, but it is also the fastest route to
++polished nonsense. Prefer `extraction` for early loops and only widen to
++`both` or `expansion` once the adapter is already grounded.
++
++### Prompt quality improves but factuality does not
++
++That is a signal to go back to better prose or hand-authored
++instructional supervision. Self-improvement cannot invent missing source
++knowledge.
++
++## See also
++
++- [Synthesize training data](synthesize-training-data.md)
++- [Instruction section reference](../format/instruction-section.md)
++- [Self-improving loop](self-improving-loop.md)
++- [Reward-model integration](reward-model-integration.md)

docs/cookbook/synthesize-training-data.mdadded

++# Synthesize training data
++
++`dlm synth instructions` turns prose-heavy `.dlm` files into usable
++`::instruction::` sections.
++
++This is the shortest path from "I have notes" to "I have supervised
++training pairs" when the document already contains domain prose but not
++enough authored Q/A.
++
++## What it does
++
++The synth loop:
++
++1. Finds non-empty prose sections in the document.
++2. Prompts a teacher model to generate question/answer pairs about that
++   prose.
++3. Deduplicates the generated pairs.
++4. Optionally filters them through the `sway` judge.
++5. Either stages the accepted `auto_synth` sections for inspection or
++   writes them straight back into the `.dlm`.
++
++The generated sections are still normal `::instruction::` sections.
++They just carry provenance metadata so DLM can tell synthesized pairs
++from hand-authored ones.
++
++## Choose a teacher
++
++The teacher decides who writes the candidate Q/A pairs:
++
++- `self`: use the current local adapter for this document
++- `hf:<model>`: use a HuggingFace text model
++- `openai:<model>`: use the OpenAI API
++- `anthropic:<model>`: use the Anthropic API
++- `vllm-server:<url>`: use an OpenAI-compatible local server
++
++The current default is `self`, but that only makes sense once the
++document already has a trained adapter. For a cold start, either:
++
++- train once first, then synth with `self`, or
++- use `hf:` / `openai:` / `anthropic:` / `vllm-server:` as the teacher
++
++## Minimal example
++
++Start with a prose-heavy document:
++
++```dlm
++---
++dlm_id: 01K...
++dlm_version: 15
++base_model: smollm2-135m
++---
++
++DGEMM multiplies two dense matrices and can optionally accumulate the
++result into an existing output matrix.
++```
++
++Generate one extraction-style pair per prose section with an HF teacher:
++
++```sh
++uv run dlm synth instructions notes.dlm \
++  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
++  --per-section 1 \
++  --strategy extraction
++```
++
++That prints two summaries:
++
++- the raw synth plan
++- the filter report (`generated`, `dedup`, `judge passed`, `threshold`)
++
++By default, accepted sections are staged under the store so you can
++inspect them:
++
++```sh
++uv run dlm synth list notes.dlm
++```
++
++If you want the accepted pairs written straight back into the document,
++use `--apply`:
++
++```sh
++uv run dlm synth instructions notes.dlm \
++  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
++  --per-section 1 \
++  --strategy extraction \
++  --apply
++```
++
++## Strategy choices
++
++The `--strategy` flag controls what kind of questions the teacher is
++asked to produce:
++
++- `extraction`: questions answered directly by the prose
++- `expansion`: questions a curious reader might ask beyond the exact
++  wording of the prose
++- `both`: split the per-section budget across both prompt styles
++
++Start with `extraction` when you care about faithfulness. Reach for
++`expansion` once the document already has a stable domain voice and you
++want broader instructional coverage.
++
++## Filter choices
++
++The `--filter` flag controls post-generation cleanup:
++
++- `sway`: dedup plus judge filtering against an empty baseline
++- `dedup-only`: keep only near-duplicate suppression
++- `none`: accept everything that parses as a valid pair
++
++`sway` is the safest default and is what most users should keep. It is
++especially helpful when using creative teachers or `--strategy both`.
++
++If you are debugging prompt quality, use `--filter none` once and look
++at the raw plan before deciding whether the issue is generation or
++filtering.
++
++## Useful knobs
++
++```sh
++uv run dlm synth instructions notes.dlm \
++  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
++  --per-section 3 \
++  --strategy both \
++  --filter sway \
++  --threshold 0.2 \
++  --max-pairs 8 \
++  --max-new-tokens 512 \
++  --temp 0.2 \
++  --top-p 0.95 \
++  --seed 7
++```
++
++The most useful flags in practice are:
++
++- `--per-section`: generate more than one candidate pair per prose block
++- `--max-pairs`: cap document churn on large files
++- `--threshold`: tighten or loosen `sway` acceptance
++- `--temp` and `--top-p`: increase diversity when the teacher is too
++  repetitive
++
++## Training after synth
++
++Once the document has accepted `auto_synth` instruction sections, the
++next normal train run consumes them like any other instruction pair:
++
++```sh
++uv run dlm train notes.dlm
++```
++
++No special train flag is needed. Synthesized instruction sections flow
++through the same SFT path as hand-authored sections.
++
++## Revert and inspection
++
++List applied auto-synth sections:
++
++```sh
++uv run dlm synth list notes.dlm
++```
++
++Strip every synthesized instruction section from the document:
++
++```sh
++uv run dlm synth revert notes.dlm
++```
++
++This only removes `auto_synth: true` instruction sections. Hand-authored
++instruction blocks stay untouched.
++
++## Common failure modes
++
++### The self teacher is weak
++
++If `--teacher self` produces junk, the adapter probably is not ready
++yet. Train once more first, or use a stronger external teacher for the
++first synth pass.
++
++### Everything gets filtered out
++
++That usually means one of three things:
++
++- the teacher produced near-duplicates
++- the generated answers were worse than the empty-baseline comparison in
++  `sway`
++- the threshold is too strict
++
++Lower `--threshold`, or temporarily switch to `--filter dedup-only` to
++see whether the judge is the main bottleneck.
++
++### The document churns too much
++
++Use `--max-pairs` aggressively at first. A small accepted batch is much
++easier to reason about than dumping dozens of synthetic sections into a
++single file.
++
++## See also
++
++- [Instruction section reference](../format/instruction-section.md)
++- [Bootstrap self-improving](bootstrap-self-improving.md)
++- [Self-improving loop](self-improving-loop.md)
++- [CLI reference](../cli/reference.md)

docs/format/instruction-section.mdadded

++# Instruction section reference
++
++`::instruction::` sections are the supervised fine-tuning format DLM
++uses for prompt/answer training data.
++
++They are valid in hand-authored `.dlm` files and in synthetic output
++written by `dlm synth instructions --apply`.
++
++## Basic shape
++
++Each instruction section contains one or more `Q` / `A` pairs:
++
++```dlm
++::instruction::
++### Q
++What is a decorator?
++
++### A
++A function that takes a function and returns a wrapped function.
++
++### Q
++When should I use `functools.wraps`?
++
++### A
++Whenever a decorator returns another callable and you want to preserve
++the wrapped function's metadata.
++```
++
++DLM splits those into individual supervised rows at parse time.
++
++## Semantics
++
++- `Q` is the prompt shown to the model.
++- `A` is the target response.
++
++At train time, DLM uses the question as context and the answer as the
++supervised target. This is the section type that most directly shapes
++assistant behavior.
++
++## Auto-synth instruction sections
++
++When `dlm synth instructions` writes sections back into a document, it
++adds an HTML marker immediately after the section fence:
++
++```dlm
++::instruction::
++<!-- dlm-auto-synth: synth_teacher="self" synth_strategy="extraction" synth_at="2026-04-24T10:18:42Z" source_section_id="b6b7d8a2f4b3f9c0" -->
++### Q
++What does DGEMM do?
++
++### A
++It multiplies dense matrices and can optionally accumulate the result.
++```
++
++That marker corresponds to these parsed fields on the section:
++
++- `auto_synth: true`
++- `synth_teacher`
++- `synth_strategy`
++- `synth_at`
++- `source_section_id`
++
++Hand-authored instruction sections omit the marker and keep
++`auto_synth=false`.
++
++## Validation rules
++
++- The auto-synth marker is only valid on `::instruction::` sections.
++- Auto-synth sections must provide all metadata fields together.
++- `synth_teacher` and `synth_strategy` must be non-empty strings.
++- `source_section_id` must be a valid referenced section ID.
++- Section identity ignores the synth metadata, so the same logical
++  question/answer pair keeps the same content identity whether it was
++  written by hand or synthesized automatically.
++
++## Interaction with training
++
++- `dlm train` includes synthesized instruction sections by default.
++- There is currently no separate "ignore auto-synth instructions" train
++  flag; they flow through the normal SFT path once they are present in
++  the document.
++- `dlm synth revert` strips every `auto_synth: true` instruction section
++  from the file without touching hand-authored rows.
++
++## Interaction with `dlm synth`
++
++Relevant commands:
++
++- `dlm synth instructions <path>`
++- `dlm synth list <path>`
++- `dlm synth revert <path>`
++
++The current `instructions` command can:
++
++- stage accepted synth sections for inspection
++- write accepted synth sections directly with `--apply`
++- preview only with `--dry-run`
++
++## Choosing a good instruction section
++
++Hand-authored or synthesized, good instruction sections tend to have:
++
++- a clear prompt with one task
++- an answer that matches the tone you want the adapter to learn
++- enough domain specificity that the pair teaches something real
++
++Weak instruction sections tend to be:
++
++- generic
++- repetitive
++- too broad to answer well
++- stylistically inconsistent with the rest of the document
++
++## See also
++
++- [Section grammar](sections.md)
++- [Synthesize training data](../cookbook/synthesize-training-data.md)
++- [Bootstrap self-improving](../cookbook/bootstrap-self-improving.md)
++- [CLI reference](../cli/reference.md)

docs/format/sections.mdmodified

  as the prompt, `A` text as the target. This is the pattern that
  produces "helpful assistant" behavior.
++`dlm synth instructions` can also write synthesized instruction
++sections back into the document. Those keep the same basic body grammar
++but add an HTML provenance marker immediately after the fence. See the
++[instruction section reference](instruction-section.md) for the full
++marker shape and validation rules.
++
  ### Preference (`::preference::`)
  Open with `::preference::`. Each record has three blocks:
  ## See also
++- [Instruction section reference](instruction-section.md)
  - [Preference section reference](preference-section.md)
  - [First train walkthrough](../getting-started/first-train.md)
  - [Cookbook: coding tutor](../cookbook/coding-tutor.md) — full

docs/index.mdmodified

    vision-language, and audio-language rows
  - **Replay-backed retraining** so edits accumulate instead of silently wiping
    prior state
++- **Synthetic data loops** through `dlm synth instructions` and
++  `dlm synth preferences`
  - **Multi-adapter docs + learned gating** for separating knowledge, tone, or
    persona lanes inside one project
  - **Local iteration UX** with `prompt`, `repl`, `train --watch`, `metrics`,
  | Train across a real repo | [Training across codebases](cookbook/training-across-codebases.md) |
  | Use named adapters and routing | [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) |
  | Work with images or audio | [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) |
++| Turn prose into instruction data | [Synthesize training data](cookbook/synthesize-training-data.md) and [Bootstrap self-improving](cookbook/bootstrap-self-improving.md) |
  | Mine preference pairs from a live adapter | [Self-improving loop](cookbook/self-improving-loop.md) and [Reward-model integration](cookbook/reward-model-integration.md) |
  | Export or ship a model | [Multi-target export](cookbook/multi-target-export.md), [CLI reference](cli/reference.md), and [Determinism](determinism.md) |
  | Pull eval failures back into training | [Probe-driven training](cookbook/probe-driven-training.md) |

mkdocs.ymlmodified

    - The .dlm format:
        - Frontmatter: format/frontmatter.md
        - Sections: format/sections.md
++      - Instruction sections: format/instruction-section.md
        - Preference sections: format/preference-section.md
        - Export manifest: format/export-manifest.md
        - .dlm/training.yaml: format/dlm-training-yaml.md
        - Sharing with pack: cookbook/sharing-with-pack.md
        - Quantization tradeoffs: cookbook/quantization-tradeoffs.md
        - Preference (DPO vs ORPO): cookbook/preference-dpo-vs-orpo.md
++      - Synthesize training data: cookbook/synthesize-training-data.md
++      - Bootstrap self-improving: cookbook/bootstrap-self-improving.md
        - Self-improving loop: cookbook/self-improving-loop.md
        - Reward-model integration: cookbook/reward-model-integration.md
        - Multi-adapter composition: cookbook/multi-adapter.md

tenseleyflow/documentlanguagemodel / `1556d38`

6 changed files