`1556d38`

Document synth workflows

Authored by mfwolffe <wolffemf@dukes.jmu.edu> 2 weeks ago

SHA: 1556d3896fba7763c0533997f5209bf4da28e6c6
Parents: b41f337
Tree: 10c8c71

6 changed files

Status	File	+
A	`docs/cookbook/bootstrap-self-improving.md`	193
A	`docs/cookbook/synthesize-training-data.md`	202
A	`docs/format/instruction-section.md`	119
M	`docs/format/sections.md`	7
M	`docs/index.md`	3
M	`mkdocs.yml`	3

docs/cookbook/bootstrap-self-improving.mdadded

 +# Bootstrap self-improving
++
 +The self-teacher loop is the most interesting version of Sprint 43:
 +your current adapter writes new `::instruction::` sections for its own
 +document, then the next train run folds them back in.
++
 +This is not magic. It works because DLM already has:
++
 +- replay-backed retraining
 +- synthesized instruction provenance (`auto_synth`)
 +- a local `sway` judge for filtering weak candidates
++
 +Used carefully, it turns one trained document into a steadily better
 +instruction corpus.
++
 +## The honest starting point
++
 +`--teacher self` uses the current adapter for that `.dlm`. That means
 +the loop starts **after** there is already a trainable local adapter.
++
 +A good bootstrap pattern is:
++
 +1. Start with prose plus at least some useful seed supervision, or do an
 +   initial train from prose and existing sections.
 +2. Run `dlm synth instructions --teacher self`.
 +3. Retrain on the accepted synth sections.
 +4. Repeat in small batches.
++
 +If the adapter still cannot answer basic questions about the document,
 +synthetic instruction generation will mostly amplify noise.
++
 +## Minimal loop
++
 +Train once:
++
 +```sh
 +uv run dlm train notes.dlm
 +```
++
 +Generate a small accepted batch from the current adapter and write it
 +back immediately:
++
 +```sh
 +uv run dlm synth instructions notes.dlm \
 +  --teacher self \
 +  --per-section 1 \
 +  --strategy extraction \
 +  --max-pairs 4 \
 +  --apply
 +```
++
 +Retrain on the expanded instruction set:
++
 +```sh
 +uv run dlm train notes.dlm
 +```
++
 +Then inspect real output quality:
++
 +```sh
 +uv run dlm prompt notes.dlm "What does DGEMM do?"
 +```
++
 +That is the basic self-improving loop.
++
 +## Safer staged version
++
 +If you want to inspect before writing:
++
 +```sh
 +uv run dlm synth instructions notes.dlm \
 +  --teacher self \
 +  --per-section 1 \
 +  --strategy extraction
++
 +uv run dlm synth list notes.dlm
 +```
++
 +The current implementation stages accepted synth sections for
 +inspection, but it does not yet have a separate `dlm synth apply`
 +subcommand. Use `--apply` on the synth run when you want the sections
 +written straight into the document.
++
 +## Why `sway` stays the default
++
 +The self-teacher path is the place where the default `--filter sway`
 +matters most.
++
 +Without filtering, a weak adapter can happily generate:
++
 +- duplicates
 +- overly generic answers
 +- plausible but wrong extrapolations
++
 +The current synth filter stack is:
++
 +1. dedup
 +2. optional judge pass
 +3. optional threshold cut
++
 +The CLI prints those counts so you can tell whether the loop is getting
 +better or just louder.
++
 +## A conservative rhythm
++
 +This is a healthy local rhythm for a real project:
++
 +```sh
 +uv run dlm train notes.dlm
 +uv run dlm synth instructions notes.dlm \
 +  --teacher self \
 +  --per-section 1 \
 +  --max-pairs 4 \
 +  --apply
 +uv run dlm train notes.dlm
 +uv run dlm prompt notes.dlm "Explain the core idea."
 +```
++
 +Keep the accepted batch small at first. The point is to improve the
 +document's instruction surface, not flood it with speculative rows.
++
 +## When to switch away from `self`
++
 +The self-teacher is convenient, but not always the right teacher.
++
 +Prefer an external teacher when:
++
 +- the local adapter is still very early and weak
 +- you need broader general knowledge than the current adapter can supply
 +- you want to compare local-vs-external synth quality on the same prose
++
 +That usually looks like:
++
 +```sh
 +uv run dlm synth instructions notes.dlm \
 +  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
 +  --per-section 1 \
 +  --apply
 +```
++
 +and then later moving back to `--teacher self` once the adapter has real
 +domain traction.
++
 +## Pairing Sprint 43 with Sprint 42
++
 +Instruction synthesis and preference mining are complementary:
++
 +- `dlm synth instructions` grows the SFT side of the document
 +- `dlm synth preferences` / `dlm preference mine` sharpens ranking and
 +  behavior once the adapter can already produce multiple plausible
 +  answers
++
 +A practical sequence is:
++
 +1. train
 +2. synth instructions
 +3. train
 +4. mine preferences
 +5. train preference phase
++
 +That is the closest current DLM path to a fully local self-improving
 +document loop.
++
 +## Failure modes to watch
++
 +### The second pass is not better
++
 +That usually means one of:
++
 +- the first synth batch was too weak
 +- the document still lacks enough domain prose
 +- the adapter is too small for the domain
++
 +Do not assume "more synthetic rows" automatically means "better model."
++
 +### Expansion mode gets weird
++
 +`--strategy expansion` is useful, but it is also the fastest route to
 +polished nonsense. Prefer `extraction` for early loops and only widen to
 +`both` or `expansion` once the adapter is already grounded.
++
 +### Prompt quality improves but factuality does not
++
 +That is a signal to go back to better prose or hand-authored
 +instructional supervision. Self-improvement cannot invent missing source
 +knowledge.
++
 +## See also
++
 +- [Synthesize training data](synthesize-training-data.md)
 +- [Instruction section reference](../format/instruction-section.md)
 +- [Self-improving loop](self-improving-loop.md)
 +- [Reward-model integration](reward-model-integration.md)

docs/cookbook/synthesize-training-data.mdadded

 +# Synthesize training data
++
 +`dlm synth instructions` turns prose-heavy `.dlm` files into usable
 +`::instruction::` sections.
++
 +This is the shortest path from "I have notes" to "I have supervised
 +training pairs" when the document already contains domain prose but not
 +enough authored Q/A.
++
 +## What it does
++
 +The synth loop:
++
 +1. Finds non-empty prose sections in the document.
 +2. Prompts a teacher model to generate question/answer pairs about that
 +   prose.
 +3. Deduplicates the generated pairs.
 +4. Optionally filters them through the `sway` judge.
 +5. Either stages the accepted `auto_synth` sections for inspection or
 +   writes them straight back into the `.dlm`.
++
 +The generated sections are still normal `::instruction::` sections.
 +They just carry provenance metadata so DLM can tell synthesized pairs
 +from hand-authored ones.
++
 +## Choose a teacher
++
 +The teacher decides who writes the candidate Q/A pairs:
++
 +- `self`: use the current local adapter for this document
 +- `hf:<model>`: use a HuggingFace text model
 +- `openai:<model>`: use the OpenAI API
 +- `anthropic:<model>`: use the Anthropic API
 +- `vllm-server:<url>`: use an OpenAI-compatible local server
++
 +The current default is `self`, but that only makes sense once the
 +document already has a trained adapter. For a cold start, either:
++
 +- train once first, then synth with `self`, or
 +- use `hf:` / `openai:` / `anthropic:` / `vllm-server:` as the teacher
++
 +## Minimal example
++
 +Start with a prose-heavy document:
++
 +```dlm
 +---
 +dlm_id: 01K...
 +dlm_version: 15
 +base_model: smollm2-135m
 +---
++
 +DGEMM multiplies two dense matrices and can optionally accumulate the
 +result into an existing output matrix.
 +```
++
 +Generate one extraction-style pair per prose section with an HF teacher:
++
 +```sh
 +uv run dlm synth instructions notes.dlm \
 +  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
 +  --per-section 1 \
 +  --strategy extraction
 +```
++
 +That prints two summaries:
++
 +- the raw synth plan
 +- the filter report (`generated`, `dedup`, `judge passed`, `threshold`)
++
 +By default, accepted sections are staged under the store so you can
 +inspect them:
++
 +```sh
 +uv run dlm synth list notes.dlm
 +```
++
 +If you want the accepted pairs written straight back into the document,
 +use `--apply`:
++
 +```sh
 +uv run dlm synth instructions notes.dlm \
 +  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
 +  --per-section 1 \
 +  --strategy extraction \
 +  --apply
 +```
++
 +## Strategy choices
++
 +The `--strategy` flag controls what kind of questions the teacher is
 +asked to produce:
++
 +- `extraction`: questions answered directly by the prose
 +- `expansion`: questions a curious reader might ask beyond the exact
 +  wording of the prose
 +- `both`: split the per-section budget across both prompt styles
++
 +Start with `extraction` when you care about faithfulness. Reach for
 +`expansion` once the document already has a stable domain voice and you
 +want broader instructional coverage.
++
 +## Filter choices
++
 +The `--filter` flag controls post-generation cleanup:
++
 +- `sway`: dedup plus judge filtering against an empty baseline
 +- `dedup-only`: keep only near-duplicate suppression
 +- `none`: accept everything that parses as a valid pair
++
 +`sway` is the safest default and is what most users should keep. It is
 +especially helpful when using creative teachers or `--strategy both`.
++
 +If you are debugging prompt quality, use `--filter none` once and look
 +at the raw plan before deciding whether the issue is generation or
 +filtering.
++
 +## Useful knobs
++
 +```sh
 +uv run dlm synth instructions notes.dlm \
 +  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
 +  --per-section 3 \
 +  --strategy both \
 +  --filter sway \
 +  --threshold 0.2 \
 +  --max-pairs 8 \
 +  --max-new-tokens 512 \
 +  --temp 0.2 \
 +  --top-p 0.95 \
 +  --seed 7
 +```
++
 +The most useful flags in practice are:
++
 +- `--per-section`: generate more than one candidate pair per prose block
 +- `--max-pairs`: cap document churn on large files
 +- `--threshold`: tighten or loosen `sway` acceptance
 +- `--temp` and `--top-p`: increase diversity when the teacher is too
 +  repetitive
++
 +## Training after synth
++
 +Once the document has accepted `auto_synth` instruction sections, the
 +next normal train run consumes them like any other instruction pair:
++
 +```sh
 +uv run dlm train notes.dlm
 +```
++
 +No special train flag is needed. Synthesized instruction sections flow
 +through the same SFT path as hand-authored sections.
++
 +## Revert and inspection
++
 +List applied auto-synth sections:
++
 +```sh
 +uv run dlm synth list notes.dlm
 +```
++
 +Strip every synthesized instruction section from the document:
++
 +```sh
 +uv run dlm synth revert notes.dlm
 +```
++
 +This only removes `auto_synth: true` instruction sections. Hand-authored
 +instruction blocks stay untouched.
++
 +## Common failure modes
++
 +### The self teacher is weak
++
 +If `--teacher self` produces junk, the adapter probably is not ready
 +yet. Train once more first, or use a stronger external teacher for the
 +first synth pass.
++
 +### Everything gets filtered out
++
 +That usually means one of three things:
++
 +- the teacher produced near-duplicates
 +- the generated answers were worse than the empty-baseline comparison in
 +  `sway`
 +- the threshold is too strict
++
 +Lower `--threshold`, or temporarily switch to `--filter dedup-only` to
 +see whether the judge is the main bottleneck.
++
 +### The document churns too much
++
 +Use `--max-pairs` aggressively at first. A small accepted batch is much
 +easier to reason about than dumping dozens of synthetic sections into a
 +single file.
++
 +## See also
++
 +- [Instruction section reference](../format/instruction-section.md)
 +- [Bootstrap self-improving](bootstrap-self-improving.md)
 +- [Self-improving loop](self-improving-loop.md)
 +- [CLI reference](../cli/reference.md)

docs/format/instruction-section.mdadded

 +# Instruction section reference
++
 +`::instruction::` sections are the supervised fine-tuning format DLM
 +uses for prompt/answer training data.
++
 +They are valid in hand-authored `.dlm` files and in synthetic output
 +written by `dlm synth instructions --apply`.
++
 +## Basic shape
++
 +Each instruction section contains one or more `Q` / `A` pairs:
++
 +```dlm
 +::instruction::
 +### Q
 +What is a decorator?
++
 +### A
 +A function that takes a function and returns a wrapped function.
++
 +### Q
 +When should I use `functools.wraps`?
++
 +### A
 +Whenever a decorator returns another callable and you want to preserve
 +the wrapped function's metadata.
 +```
++
 +DLM splits those into individual supervised rows at parse time.
++
 +## Semantics
++
 +- `Q` is the prompt shown to the model.
 +- `A` is the target response.
++
 +At train time, DLM uses the question as context and the answer as the
 +supervised target. This is the section type that most directly shapes
 +assistant behavior.
++
 +## Auto-synth instruction sections
++
 +When `dlm synth instructions` writes sections back into a document, it
 +adds an HTML marker immediately after the section fence:
++
 +```dlm
 +::instruction::
 +<!-- dlm-auto-synth: synth_teacher="self" synth_strategy="extraction" synth_at="2026-04-24T10:18:42Z" source_section_id="b6b7d8a2f4b3f9c0" -->
 +### Q
 +What does DGEMM do?
++
 +### A
 +It multiplies dense matrices and can optionally accumulate the result.
 +```
++
 +That marker corresponds to these parsed fields on the section:
++
 +- `auto_synth: true`
 +- `synth_teacher`
 +- `synth_strategy`
 +- `synth_at`
 +- `source_section_id`
++
 +Hand-authored instruction sections omit the marker and keep
 +`auto_synth=false`.
++
 +## Validation rules
++
 +- The auto-synth marker is only valid on `::instruction::` sections.
 +- Auto-synth sections must provide all metadata fields together.
 +- `synth_teacher` and `synth_strategy` must be non-empty strings.
 +- `source_section_id` must be a valid referenced section ID.
 +- Section identity ignores the synth metadata, so the same logical
 +  question/answer pair keeps the same content identity whether it was
 +  written by hand or synthesized automatically.
++
 +## Interaction with training
++
 +- `dlm train` includes synthesized instruction sections by default.
 +- There is currently no separate "ignore auto-synth instructions" train
 +  flag; they flow through the normal SFT path once they are present in
 +  the document.
 +- `dlm synth revert` strips every `auto_synth: true` instruction section
 +  from the file without touching hand-authored rows.
++
 +## Interaction with `dlm synth`
++
 +Relevant commands:
++
 +- `dlm synth instructions <path>`
 +- `dlm synth list <path>`
 +- `dlm synth revert <path>`
++
 +The current `instructions` command can:
++
 +- stage accepted synth sections for inspection
 +- write accepted synth sections directly with `--apply`
 +- preview only with `--dry-run`
++
 +## Choosing a good instruction section
++
 +Hand-authored or synthesized, good instruction sections tend to have:
++
 +- a clear prompt with one task
 +- an answer that matches the tone you want the adapter to learn
 +- enough domain specificity that the pair teaches something real
++
 +Weak instruction sections tend to be:
++
 +- generic
 +- repetitive
 +- too broad to answer well
 +- stylistically inconsistent with the rest of the document
++
 +## See also
++
 +- [Section grammar](sections.md)
 +- [Synthesize training data](../cookbook/synthesize-training-data.md)
 +- [Bootstrap self-improving](../cookbook/bootstrap-self-improving.md)
 +- [CLI reference](../cli/reference.md)

docs/format/sections.mdmodified

  as the prompt, `A` text as the target. This is the pattern that
  produces "helpful assistant" behavior.
 +`dlm synth instructions` can also write synthesized instruction
 +sections back into the document. Those keep the same basic body grammar
 +but add an HTML provenance marker immediately after the fence. See the
 +[instruction section reference](instruction-section.md) for the full
 +marker shape and validation rules.
++
  ### Preference (`::preference::`)
  Open with `::preference::`. Each record has three blocks:
  ## See also
 +- [Instruction section reference](instruction-section.md)
  - [Preference section reference](preference-section.md)
  - [First train walkthrough](../getting-started/first-train.md)
  - [Cookbook: coding tutor](../cookbook/coding-tutor.md) — full

docs/index.mdmodified

    vision-language, and audio-language rows
  - **Replay-backed retraining** so edits accumulate instead of silently wiping
    prior state
 +- **Synthetic data loops** through `dlm synth instructions` and
 +  `dlm synth preferences`
  - **Multi-adapter docs + learned gating** for separating knowledge, tone, or
    persona lanes inside one project
  - **Local iteration UX** with `prompt`, `repl`, `train --watch`, `metrics`,
  | Train across a real repo | [Training across codebases](cookbook/training-across-codebases.md) |
  | Use named adapters and routing | [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) |
  | Work with images or audio | [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) |
 +| Turn prose into instruction data | [Synthesize training data](cookbook/synthesize-training-data.md) and [Bootstrap self-improving](cookbook/bootstrap-self-improving.md) |
  | Mine preference pairs from a live adapter | [Self-improving loop](cookbook/self-improving-loop.md) and [Reward-model integration](cookbook/reward-model-integration.md) |
  | Export or ship a model | [Multi-target export](cookbook/multi-target-export.md), [CLI reference](cli/reference.md), and [Determinism](determinism.md) |
  | Pull eval failures back into training | [Probe-driven training](cookbook/probe-driven-training.md) |

mkdocs.ymlmodified

    - The .dlm format:
        - Frontmatter: format/frontmatter.md
        - Sections: format/sections.md
 +      - Instruction sections: format/instruction-section.md
        - Preference sections: format/preference-section.md
        - Export manifest: format/export-manifest.md
        - .dlm/training.yaml: format/dlm-training-yaml.md
        - Sharing with pack: cookbook/sharing-with-pack.md
        - Quantization tradeoffs: cookbook/quantization-tradeoffs.md
        - Preference (DPO vs ORPO): cookbook/preference-dpo-vs-orpo.md
 +      - Synthesize training data: cookbook/synthesize-training-data.md
 +      - Bootstrap self-improving: cookbook/bootstrap-self-improving.md
        - Self-improving loop: cookbook/self-improving-loop.md
        - Reward-model integration: cookbook/reward-model-integration.md
        - Multi-adapter composition: cookbook/multi-adapter.md