markdown · 15314 bytes Raw Blame History

Changelog

All notable changes to DocumentLanguageModel are recorded here. The format follows Keep a Changelog; the project targets Semantic Versioning.

[Unreleased]

Fixed

  • gguf_arch preflight probe was silently false-negative on every HF off-registry base. Three compounding bugs surfaced while trying to train against hf:Qwen/Qwen3-1.7B:
    1. The probe's regex matched @Model.register(...) but upstream llama.cpp renamed the decorator to @ModelBase.register(...) mid-2024; the regex now accepts both forms.
    2. The regex captured only the first quoted arg, silently missing multi-arg decorators like @ModelBase.register("Qwen3ForCausalLM", "Qwen3Model"); the probe now extracts every quoted string inside the decorator's arg list.
    3. The probe compared spec.gguf_arch (short label like "qwen3") against the decorator's arguments, but llama.cpp registers HF class names ("Qwen3ForCausalLM") — different namespaces, will never match. Comparison now uses spec.architecture. The bug was invisible because registered models bypass the probe entirely; it only bit hf: paths.

Added

  • dlm export --emit-sway-json writes a ready-to-run <export-dir>/sway.yaml alongside the GGUF/Modelfile, eliminating the previous two-step dlm exportsway autogen ritual users had to do before evaluating an adapter via sway. Calls into dlm_sway.integrations.dlm.autogen.build_spec_dict via a new dlm.export.sway_json.write_sway_json helper. Closes the X1 half of sway's Sprint 26 cross-repo integration; X3 (sway-side sway pack / sway unpack) ships in sway proper.
    • New [sway] optional extra (pip install 'dlm[sway]') pulls dlm-sway>=0.1.0. Deliberately pulls plain dlm-sway, NOT dlm-sway[dlm], because the round-trip extra would create a pip-resolver cycle (sway's [dlm] extra already pulls dlm).
    • Failures route through a new typed SwayJsonExportError (subclass of ExportError) so the CLI's existing exception handler renders them cleanly. The most common failure — user didn't install the [sway] extra — gets a message that names the install command verbatim.
    • 5 unit tests in tests/unit/cli/test_export_sway_json.py cover the helper round-trip, missing-extra error, autogen failure wrapping, and CLI flag wiring.
  • dlm train --skip-export-probes mirrors the flag on dlm init (it was missing from the train CLI; a user could dlm init --skip-export-probes a fresh .dlm then have dlm train re-run the probes and fail). The flag threads into resolve_base_model identically on both paths; help text matches verbatim.

[0.10.0] — 2026-04-21

Four phases of work in a single release: advanced training, expanded hardware coverage, the UX layer, and the ecosystem layer. 265 commits since v0.9.0, five additive schema migrations (v1 → v6), six brutal audits (audit-04 through audit-09) with remediations landed inline.

Still below 1.0 on purpose — the milestone for the semantic bump remains the same as stated at v0.9.0: a human has to train + export + ollama run a real document end-to-end and walk away satisfied. This release is a broad feature expansion, not a stability claim.

Breaking changes

None at the data level — schema migrations v1 → v2 → v3 → v4 → v5 → v6 are all additive and run automatically via dlm migrate. Existing .dlm files parse without modification.

One subtle CLI contract change: dlm serve now refuses an untrained .dlm with an actionable error instead of a low-level ManifestCorruptError. Scripts that relied on the previous behavior exit-coded identically (exit 1) but must read the new message text.

Advanced training

Preference tuning (Sprint 17) and its orchestration (Sprint 18):

  • ::preference:: section fences with ### Prompt / ### Chosen / ### Rejected grammar.
  • DPO via TRL's DPOTrainer, ORPO via trl.experimental.orpo.
  • training.preference frontmatter block (method / β / reference mode / loss type / max lengths).
  • Phase orchestrator runs SFT → DPO/ORPO in sequence when preference content is present; --phase sft|preference|all overrides.
  • Replay corpus gains sample_preference_rows — preference sections sample with the same recency-weighted reservoir as CPT rows.
  • Doctor halves the micro-batch estimate and scales VRAM estimates when a DPO phase is active.

Continued pretraining refinements for DAPT (Sprint 19):

  • training.cpt schema block — CosineWithFloor LR schedule, embed-layer warm-up, mixed-mode loss split reporting, vocab-gap diagnostics.
  • Embed-layer freeze/unfreeze context manager wrapping the first N steps so vocab extensions settle before the backbone moves.
  • Training summary adds per-mode loss fields so DAPT runs report SFT vs CPT loss separately.

Multi-adapter (Sprint 20a-c):

  • training.adapters: [name, config] with mutual exclusion against the flat LoRA knobs.
  • dlm train --adapter <name> / dlm prompt --adapter <name> / dlm export --adapter <name>.
  • dlm export --adapter-mix a:0.5,b:0.5 — weighted merge via PEFT.add_weighted_adapter, with QLoRA safety gate.
  • Per-adapter store layout: adapter/{name}/versions/vNNNN/.
  • Finite-weight and finite-eval gates — a training run that produces NaN weights or loss is rejected (renamed -rejected) instead of committed.
  • training.precision override (schema v5) lets a document override the doctor's precision pick; MPS fp16 warns and pins to fp32 after a real NaN reproduction.

Hardware

MLX inference backend (Sprint 21):

  • PEFT safetensors → MLX .npz converter preserving adapter config.
  • MlxBackend implementing the InferenceBackend protocol.
  • --backend mlx flag on dlm prompt; doctor reports MLX availability.

ROCm training (Sprint 22):

  • Tier-2 AMD GPU support via ROCm's HIP.
  • bf16 + FlashAttention probes adapted for AMD.
  • Custom llama.cpp ROCm build script.
  • QLoRA-on-ROCm refusal with a precise error message.

Multi-GPU training (Sprint 23):

  • dlm train --gpus all|N|0,1 dispatches to accelerate launch.
  • rank_io.master_only gates all trainer I/O so ranks don't duplicate writes.
  • DlmLock gains world_size + accelerate_version fields for reproducibility.
  • Doctor's effective-batch-size math respects the selected rank count.

UX

Interactive REPL (Sprint 24):

  • dlm repl <path>prompt_toolkit loop against the trained adapter.
  • Slash-command parser: /seed, /temp, /top_p, /max_tokens, /system, /reload, /quit.
  • Persistent per-store history file.

Save-to-train watch mode (Sprint 25):

  • dlm train --watchwatchfiles wrapper with debounced retrain on settled saves.
  • Rich live status line (step, loss, elapsed, files watched).
  • Ctrl-C exits cleanly between cycles.
  • --watch --repl bridge is honestly deferred (marked [~] pending a CI-capable test harness).

Observability (Sprint 26):

  • Per-store SQLite metrics database at ~/.dlm/store/<id>/metrics.db.
  • Typed event dataclasses: RunStart, Step, Eval, RunEnd, TokenizationEvent.
  • dlm metrics [--json|--csv] — runs summary with filters.
  • dlm metrics watch <path> — live tail of steps + evals.
  • Optional sinks: TensorBoard ([tb] extra), W&B ([wandb] extra).

Ecosystem

Template gallery (Sprint 27):

  • dlm templates list — eight bundled templates (coding tutor, domain KB, writing partner, personal assistant, meeting notes, regex buddy, shell one-liner, study guide).
  • meta.yaml sidecars per template (title, summary, recommended base, tags, license).
  • dlm init --template <name> — fresh ULID, adopts the template's recommended base, persists license acceptance for gated bases.
  • Offline-first registry; --refresh reserved for a future upstream gallery.

Share protocol (Sprint 28):

  • dlm push --to hf:org/name | https://... | peer://host:port.
  • dlm pull <source> with signature verification on peer and URL pulls.
  • dlm serve <path> — LAN-local peer endpoint with HMAC bearer tokens, per-token rate limit, explicit public-bind gate.
  • Optional minisign signing — sidecar .minisig next to the pack.
  • HuggingFace Hub sink auto-generates a README from the manifest.

Source directives (Sprint 29):

  • training.sources: [...] — declare file or directory sources in frontmatter; the trainer descends the tree and ingests raw text through the existing CPT path. include / exclude glob filters, per-file and per-source size/count caps.
  • sources_policy: permissive | strict — strict confines paths to descendants of the .dlm's directory with a symlink-escape check.
  • Deterministic lexicographic enumeration; UTF-8 hygiene; binary detection via NUL sniff.
  • Per-directive provenance in TrainingRunSummary.source_directives (file count, byte total, skip reasons).

.dlm/ descent + auto-scaffold (Sprint 30):

  • Per-codebase .dlm/training.yaml + .dlm/ignore discovered on a directory walk; nearest-ancestor resolution with gitignore-subset last-match-wins semantics (! negation, anchored /, trailing /, globstar **).
  • Default-exclude set for VCS, caches, lockfiles, binaries.
  • Section.tags flow from config metadata onto synthesized sections (loss weighting deferred to a future release).
  • dlm train <dir> auto-scaffolds <dir>/.dlm/corpus.dlm on first run: ULID minted, --base + --include + --exclude + --policy baked in. Second invocation reuses the anchor.
  • --rescaffold rewrites the scaffolded .dlm in place while preserving dlm_id.

Tokenized-section cache (Sprint 31):

  • Per-store cache at ~/.dlm/store/<id>/tokenized-cache/, keyed by (section_id, tokenizer_sha256, sequence_len).
  • Atomic tmp+rename writes, LRU eviction with current-run protection, tokenizer-version invalidation on SHA bump.
  • dlm cache show | prune | clear CLI.
  • Deferred: trainer-side wiring into the SFTTrainer tokenization path requires pre-tokenization plus a custom collator (label-shift preservation is subtle). Module is shipped and unit-tested; the consumer lands in a future release. See src/dlm/directives/cache.py module docstring.

Audits + remediations

Six brutal audits ran during this window, each producing a findings doc under .docs/audits/ and remediation commits referencing the finding IDs:

  • Audit 04: replay-store integration, version-drift detection, tokenizer probe rename.
  • Audit 05: pyproject runtime deps, license-acceptance record persistence, lock policy rules.
  • Audit 06: 16 findings across GGUF parser hardening, ollama smoke tests, timezone-aware timestamps, pack hash determinism, vendor path resolution.
  • Audit 07: forward-date schema rejection, ruff src-side cleanup.
  • Audit 08: multi-GPU world_size plumbing, MLX adapter config fidelity, llama-cpp build env honoring, CLI reference drift.
  • Audit 09 (Phase 7 brutal): dlm train <dir> end-to-end crash (B1 + B2), test-masks-bugs pattern (B3), orphan tokenized-cache (M1+M2 documented deferral), dlm serve guard on untrained .dlm (M3), task-tracker drift (M4), seven minors + two polish. Empirical differential evidence in 09-sway-appendix.md — 359σ delta_kl vs null-adapter baseline on Fortran-idiomatic prompts.

Schema migrations

All additive. Identity migrators; no data loss.

From To Added
v1 v2 training.preference (DPO/ORPO) rename
v2 v3 training.cpt block (schedule + warm-up)
v3 v4 training.adapters (named multi-adapter)
v4 v5 training.precision override
v5 v6 training.sources + sources_policy

New CLI surface

dlm templates list
dlm init --template <name>
dlm push --to <hf:...|https:...|peer://...> [--sign]
dlm pull <source>
dlm serve <path> [--public --i-know-this-is-public]
dlm repl <path>
dlm train --watch
dlm metrics [--json|--csv]
dlm metrics watch <path>
dlm train <dir> --base <key> --include <glob>
dlm cache show | prune | clear

Test matrix

  • 2,211 unit tests pass (≥95 % coverage on touched packages).
  • ruff clean; mypy --strict clean across 215 source files.
  • Slow integration matrix: two-adapter training, preference round trip, MLX adapter conversion, ROCm smoke, multi-GPU smoke, end-to-end auto-scaffold cycle, tokenized-cache unit suite, peer round-trip, directive fixture tree → finite adapter.

Thanks

Five phases worth of work. Six audits caught real bugs, and the sway submodule's differential tests produced the empirical floor that the engine is behaviorally sound.

[0.9.0] — target

First tagged release. Ships via the tenseleyFlow/homebrew-tap (brew tap tenseleyFlow/tap && brew install dlm). Below v1.0 on purpose — a human still needs to train + export + ollama run a real document end-to-end before we claim the stable number.

Highlights

  • CLI: init, train, prompt, export, pack, unpack, doctor, show, migrate.
  • Content-addressed store at ~/.dlm/store/<dlm_id>/ with atomic manifest updates and exclusive locking.
  • Hardware-aware training plan (dlm doctor) across CUDA / MPS / ROCm / CPU tiers, with a refusal matrix that fails loudly on unsupported combinations.
  • Curated base-model registry (10 entries) plus hf:org/name escape hatch with compatibility probes.
  • LoRA + QLoRA training, replay-corpus retraining that retains prior sections, two-phase atomic version commits.
  • Eval harness: val-loss, perplexity, early-stop.
  • GGUF export with imatrix-calibrated quantization, explicit Go chat template (no fuzzy matching), embedding-row SHA verification, merge-safety gate against QLoRA pitfalls.
  • Ollama integration: Modelfile emission, ollama create, smoke validation, closed-loop token-identity verification against the HF Jinja reference.
  • .dlm.pack format: byte-identical packs, symlink / tar-bomb / zstd-bomb defenses, per-file SHA-256 integrity, pack-format migrations registry.
  • Reproducibility contract: per-store dlm.lock with severity-table mismatch policy, --strict-lock / --update-lock / --ignore-lock CLI flags, determinism golden integration test.
  • Documentation: getting started, .dlm format reference, CLI reference, six cookbook recipes, architecture overview, troubleshooting, determinism guide.
  • Five starter templates: coding tutor, domain KB, writing partner, personal assistant, changelog.
  • Weekly CI jobs: chat-template drift, slow integration suite.
  • Pre-commit config: ruff, mypy --strict, non-slow pytest.

Thanks

Built by following .docs/findings.md and the 29-sprint plan closely. Every pitfall in the findings inventory corresponds to a test and an explicit guardrail somewhere in the codebase.


The complete per-sprint history lives in .docs/sprints/ (local to the repo by user choice; planning artifacts stay out of git).

View source
1 # Changelog
2
3 All notable changes to DocumentLanguageModel are recorded here. The
4 format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/);
5 the project targets [Semantic Versioning](https://semver.org/).
6
7 ## [Unreleased]
8
9 ### Fixed
10
11 - **`gguf_arch` preflight probe was silently false-negative on every
12 HF off-registry base.** Three compounding bugs surfaced while
13 trying to train against `hf:Qwen/Qwen3-1.7B`:
14 1. The probe's regex matched `@Model.register(...)` but upstream
15 llama.cpp renamed the decorator to `@ModelBase.register(...)`
16 mid-2024; the regex now accepts both forms.
17 2. The regex captured only the *first* quoted arg, silently
18 missing multi-arg decorators like
19 `@ModelBase.register("Qwen3ForCausalLM", "Qwen3Model")`; the
20 probe now extracts every quoted string inside the decorator's
21 arg list.
22 3. The probe compared `spec.gguf_arch` (short label like
23 `"qwen3"`) against the decorator's arguments, but llama.cpp
24 registers HF class names (`"Qwen3ForCausalLM"`) — different
25 namespaces, will never match. Comparison now uses
26 `spec.architecture`. The bug was invisible because registered
27 models bypass the probe entirely; it only bit `hf:` paths.
28
29 ### Added
30
31 - **`dlm export --emit-sway-json`** writes a ready-to-run
32 `<export-dir>/sway.yaml` alongside the GGUF/Modelfile, eliminating
33 the previous two-step `dlm export``sway autogen` ritual users
34 had to do before evaluating an adapter via [sway](https://github.com/tenseleyFlow/sway).
35 Calls into `dlm_sway.integrations.dlm.autogen.build_spec_dict` via
36 a new `dlm.export.sway_json.write_sway_json` helper. Closes the X1
37 half of sway's Sprint 26 cross-repo integration; X3 (sway-side
38 `sway pack` / `sway unpack`) ships in sway proper.
39 - New `[sway]` optional extra (`pip install 'dlm[sway]'`) pulls
40 `dlm-sway>=0.1.0`. Deliberately pulls plain `dlm-sway`, NOT
41 `dlm-sway[dlm]`, because the round-trip extra would create a
42 pip-resolver cycle (sway's `[dlm]` extra already pulls dlm).
43 - Failures route through a new typed `SwayJsonExportError`
44 (subclass of `ExportError`) so the CLI's existing exception
45 handler renders them cleanly. The most common failure — user
46 didn't install the `[sway]` extra — gets a message that names
47 the install command verbatim.
48 - 5 unit tests in `tests/unit/cli/test_export_sway_json.py`
49 cover the helper round-trip, missing-extra error, autogen
50 failure wrapping, and CLI flag wiring.
51 - **`dlm train --skip-export-probes`** mirrors the flag on `dlm init`
52 (it was missing from the train CLI; a user could `dlm init
53 --skip-export-probes` a fresh .dlm then have `dlm train` re-run
54 the probes and fail). The flag threads into `resolve_base_model`
55 identically on both paths; help text matches verbatim.
56
57 ## [0.10.0] — 2026-04-21
58
59 Four phases of work in a single release: advanced training, expanded
60 hardware coverage, the UX layer, and the ecosystem layer. 265 commits
61 since v0.9.0, five additive schema migrations (v1 → v6), six brutal
62 audits (audit-04 through audit-09) with remediations landed inline.
63
64 Still below 1.0 on purpose — the milestone for the semantic bump
65 remains the same as stated at v0.9.0: a human has to train + export +
66 `ollama run` a real document end-to-end and walk away satisfied. This
67 release is a broad feature expansion, not a stability claim.
68
69 ### Breaking changes
70
71 None at the data level — schema migrations v1 → v2 → v3 → v4 → v5 →
72 v6 are all additive and run automatically via `dlm migrate`. Existing
73 `.dlm` files parse without modification.
74
75 One subtle CLI contract change: `dlm serve` now refuses an untrained
76 `.dlm` with an actionable error instead of a low-level
77 `ManifestCorruptError`. Scripts that relied on the previous behavior
78 exit-coded identically (exit 1) but must read the new message text.
79
80 ### Advanced training
81
82 Preference tuning (Sprint 17) and its orchestration (Sprint 18):
83
84 - `::preference::` section fences with `### Prompt` / `### Chosen` /
85 `### Rejected` grammar.
86 - DPO via TRL's `DPOTrainer`, ORPO via `trl.experimental.orpo`.
87 - `training.preference` frontmatter block (method / β / reference
88 mode / loss type / max lengths).
89 - Phase orchestrator runs SFT → DPO/ORPO in sequence when preference
90 content is present; `--phase sft|preference|all` overrides.
91 - Replay corpus gains `sample_preference_rows` — preference sections
92 sample with the same recency-weighted reservoir as CPT rows.
93 - Doctor halves the micro-batch estimate and scales VRAM estimates
94 when a DPO phase is active.
95
96 Continued pretraining refinements for DAPT (Sprint 19):
97
98 - `training.cpt` schema block — `CosineWithFloor` LR schedule,
99 embed-layer warm-up, mixed-mode loss split reporting, vocab-gap
100 diagnostics.
101 - Embed-layer freeze/unfreeze context manager wrapping the first N
102 steps so vocab extensions settle before the backbone moves.
103 - Training summary adds per-mode loss fields so DAPT runs report SFT
104 vs CPT loss separately.
105
106 Multi-adapter (Sprint 20a-c):
107
108 - `training.adapters: [name, config]` with mutual exclusion against
109 the flat LoRA knobs.
110 - `dlm train --adapter <name>` / `dlm prompt --adapter <name>` /
111 `dlm export --adapter <name>`.
112 - `dlm export --adapter-mix a:0.5,b:0.5` — weighted merge via
113 `PEFT.add_weighted_adapter`, with QLoRA safety gate.
114 - Per-adapter store layout: `adapter/{name}/versions/vNNNN/`.
115 - Finite-weight and finite-eval gates — a training run that produces
116 NaN weights or loss is rejected (renamed `-rejected`) instead of
117 committed.
118 - `training.precision` override (schema v5) lets a document override
119 the doctor's precision pick; MPS fp16 warns and pins to fp32 after
120 a real NaN reproduction.
121
122 ### Hardware
123
124 MLX inference backend (Sprint 21):
125
126 - PEFT safetensors → MLX `.npz` converter preserving adapter config.
127 - `MlxBackend` implementing the `InferenceBackend` protocol.
128 - `--backend mlx` flag on `dlm prompt`; doctor reports MLX
129 availability.
130
131 ROCm training (Sprint 22):
132
133 - Tier-2 AMD GPU support via ROCm's HIP.
134 - bf16 + FlashAttention probes adapted for AMD.
135 - Custom llama.cpp ROCm build script.
136 - QLoRA-on-ROCm refusal with a precise error message.
137
138 Multi-GPU training (Sprint 23):
139
140 - `dlm train --gpus all|N|0,1` dispatches to `accelerate launch`.
141 - `rank_io.master_only` gates all trainer I/O so ranks don't
142 duplicate writes.
143 - `DlmLock` gains `world_size` + `accelerate_version` fields for
144 reproducibility.
145 - Doctor's effective-batch-size math respects the selected rank
146 count.
147
148 ### UX
149
150 Interactive REPL (Sprint 24):
151
152 - `dlm repl <path>``prompt_toolkit` loop against the trained
153 adapter.
154 - Slash-command parser: `/seed`, `/temp`, `/top_p`, `/max_tokens`,
155 `/system`, `/reload`, `/quit`.
156 - Persistent per-store history file.
157
158 Save-to-train watch mode (Sprint 25):
159
160 - `dlm train --watch``watchfiles` wrapper with debounced retrain
161 on settled saves.
162 - Rich live status line (step, loss, elapsed, files watched).
163 - Ctrl-C exits cleanly between cycles.
164 - `--watch --repl` bridge is honestly deferred (marked `[~]` pending
165 a CI-capable test harness).
166
167 Observability (Sprint 26):
168
169 - Per-store SQLite metrics database at
170 `~/.dlm/store/<id>/metrics.db`.
171 - Typed event dataclasses: `RunStart`, `Step`, `Eval`, `RunEnd`,
172 `TokenizationEvent`.
173 - `dlm metrics [--json|--csv]` — runs summary with filters.
174 - `dlm metrics watch <path>` — live tail of steps + evals.
175 - Optional sinks: TensorBoard (`[tb]` extra), W&B (`[wandb]` extra).
176
177 ### Ecosystem
178
179 Template gallery (Sprint 27):
180
181 - `dlm templates list` — eight bundled templates (coding tutor,
182 domain KB, writing partner, personal assistant, meeting notes,
183 regex buddy, shell one-liner, study guide).
184 - `meta.yaml` sidecars per template (title, summary, recommended
185 base, tags, license).
186 - `dlm init --template <name>` — fresh ULID, adopts the template's
187 recommended base, persists license acceptance for gated bases.
188 - Offline-first registry; `--refresh` reserved for a future upstream
189 gallery.
190
191 Share protocol (Sprint 28):
192
193 - `dlm push --to hf:org/name | https://... | peer://host:port`.
194 - `dlm pull <source>` with signature verification on peer and URL
195 pulls.
196 - `dlm serve <path>` — LAN-local peer endpoint with HMAC bearer
197 tokens, per-token rate limit, explicit public-bind gate.
198 - Optional minisign signing — sidecar `.minisig` next to the pack.
199 - HuggingFace Hub sink auto-generates a README from the manifest.
200
201 Source directives (Sprint 29):
202
203 - `training.sources: [...]` — declare file or directory sources in
204 frontmatter; the trainer descends the tree and ingests raw text
205 through the existing CPT path. `include` / `exclude` glob filters,
206 per-file and per-source size/count caps.
207 - `sources_policy: permissive | strict` — strict confines paths to
208 descendants of the `.dlm`'s directory with a symlink-escape check.
209 - Deterministic lexicographic enumeration; UTF-8 hygiene; binary
210 detection via NUL sniff.
211 - Per-directive provenance in `TrainingRunSummary.source_directives`
212 (file count, byte total, skip reasons).
213
214 `.dlm/` descent + auto-scaffold (Sprint 30):
215
216 - Per-codebase `.dlm/training.yaml` + `.dlm/ignore` discovered on a
217 directory walk; nearest-ancestor resolution with gitignore-subset
218 last-match-wins semantics (`!` negation, anchored `/`, trailing
219 `/`, globstar `**`).
220 - Default-exclude set for VCS, caches, lockfiles, binaries.
221 - `Section.tags` flow from config metadata onto synthesized sections
222 (loss weighting deferred to a future release).
223 - `dlm train <dir>` auto-scaffolds `<dir>/.dlm/corpus.dlm` on first
224 run: ULID minted, `--base` + `--include` + `--exclude` + `--policy`
225 baked in. Second invocation reuses the anchor.
226 - `--rescaffold` rewrites the scaffolded `.dlm` in place while
227 preserving `dlm_id`.
228
229 Tokenized-section cache (Sprint 31):
230
231 - Per-store cache at `~/.dlm/store/<id>/tokenized-cache/`, keyed by
232 `(section_id, tokenizer_sha256, sequence_len)`.
233 - Atomic tmp+rename writes, LRU eviction with current-run
234 protection, tokenizer-version invalidation on SHA bump.
235 - `dlm cache show | prune | clear` CLI.
236 - **Deferred:** trainer-side wiring into the SFTTrainer tokenization
237 path requires pre-tokenization plus a custom collator (label-shift
238 preservation is subtle). Module is shipped and unit-tested; the
239 consumer lands in a future release. See
240 `src/dlm/directives/cache.py` module docstring.
241
242 ### Audits + remediations
243
244 Six brutal audits ran during this window, each producing a
245 findings doc under `.docs/audits/` and remediation commits
246 referencing the finding IDs:
247
248 - Audit 04: replay-store integration, version-drift detection,
249 tokenizer probe rename.
250 - Audit 05: pyproject runtime deps, license-acceptance record
251 persistence, lock policy rules.
252 - Audit 06: 16 findings across GGUF parser hardening, ollama smoke
253 tests, timezone-aware timestamps, pack hash determinism, vendor
254 path resolution.
255 - Audit 07: forward-date schema rejection, ruff src-side cleanup.
256 - Audit 08: multi-GPU world_size plumbing, MLX adapter config fidelity,
257 llama-cpp build env honoring, CLI reference drift.
258 - Audit 09 (Phase 7 brutal): `dlm train <dir>` end-to-end crash
259 (B1 + B2), test-masks-bugs pattern (B3), orphan tokenized-cache
260 (M1+M2 documented deferral), `dlm serve` guard on untrained `.dlm`
261 (M3), task-tracker drift (M4), seven minors + two polish. Empirical
262 differential evidence in `09-sway-appendix.md` — 359σ delta_kl vs
263 null-adapter baseline on Fortran-idiomatic prompts.
264
265 ### Schema migrations
266
267 All additive. Identity migrators; no data loss.
268
269 | From | To | Added |
270 |------|----|---------------------------------------------|
271 | v1 | v2 | `training.preference` (DPO/ORPO) rename |
272 | v2 | v3 | `training.cpt` block (schedule + warm-up) |
273 | v3 | v4 | `training.adapters` (named multi-adapter) |
274 | v4 | v5 | `training.precision` override |
275 | v5 | v6 | `training.sources` + `sources_policy` |
276
277 ### New CLI surface
278
279 ```
280 dlm templates list
281 dlm init --template <name>
282 dlm push --to <hf:...|https:...|peer://...> [--sign]
283 dlm pull <source>
284 dlm serve <path> [--public --i-know-this-is-public]
285 dlm repl <path>
286 dlm train --watch
287 dlm metrics [--json|--csv]
288 dlm metrics watch <path>
289 dlm train <dir> --base <key> --include <glob>
290 dlm cache show | prune | clear
291 ```
292
293 ### Test matrix
294
295 - 2,211 unit tests pass (≥95 % coverage on touched packages).
296 - ruff clean; mypy `--strict` clean across 215 source files.
297 - Slow integration matrix: two-adapter training, preference round
298 trip, MLX adapter conversion, ROCm smoke, multi-GPU smoke,
299 end-to-end auto-scaffold cycle, tokenized-cache unit suite, peer
300 round-trip, directive fixture tree → finite adapter.
301
302 ### Thanks
303
304 Five phases worth of work. Six audits caught real bugs, and the sway
305 submodule's differential tests produced the empirical floor that the
306 engine is behaviorally sound.
307
308 ## [0.9.0] — target
309
310 First tagged release. Ships via the
311 [tenseleyFlow/homebrew-tap](https://github.com/tenseleyFlow/homebrew-tap)
312 (`brew tap tenseleyFlow/tap && brew install dlm`). Below v1.0 on
313 purpose — a human still needs to train + export + `ollama run` a real
314 document end-to-end before we claim the stable number.
315
316 ### Highlights
317
318 - CLI: `init`, `train`, `prompt`, `export`, `pack`,
319 `unpack`, `doctor`, `show`, `migrate`.
320 - Content-addressed store at `~/.dlm/store/<dlm_id>/` with atomic
321 manifest updates and exclusive locking.
322 - Hardware-aware training plan (`dlm doctor`) across CUDA / MPS /
323 ROCm / CPU tiers, with a refusal matrix that fails loudly on
324 unsupported combinations.
325 - Curated base-model registry (10 entries) plus `hf:org/name`
326 escape hatch with compatibility probes.
327 - LoRA + QLoRA training, replay-corpus retraining that retains prior
328 sections, two-phase atomic version commits.
329 - Eval harness: val-loss, perplexity, early-stop.
330 - GGUF export with imatrix-calibrated quantization, explicit Go
331 chat template (no fuzzy matching), embedding-row SHA verification,
332 merge-safety gate against QLoRA pitfalls.
333 - Ollama integration: Modelfile emission, `ollama create`, smoke
334 validation, closed-loop token-identity verification against the
335 HF Jinja reference.
336 - `.dlm.pack` format: byte-identical packs, symlink / tar-bomb /
337 zstd-bomb defenses, per-file SHA-256 integrity, pack-format
338 migrations registry.
339 - Reproducibility contract: per-store `dlm.lock` with severity-table
340 mismatch policy, `--strict-lock` / `--update-lock` / `--ignore-lock`
341 CLI flags, determinism golden integration test.
342 - Documentation: getting started, `.dlm` format reference, CLI
343 reference, six cookbook recipes, architecture overview,
344 troubleshooting, determinism guide.
345 - Five starter templates: coding tutor, domain KB, writing partner,
346 personal assistant, changelog.
347 - Weekly CI jobs: chat-template drift, slow integration suite.
348 - Pre-commit config: ruff, mypy `--strict`, non-slow pytest.
349
350 ### Thanks
351
352 Built by following `.docs/findings.md` and the 29-sprint plan closely.
353 Every pitfall in the findings inventory corresponds to a test and an
354 explicit guardrail somewhere in the codebase.
355
356 ---
357
358 The complete per-sprint history lives in `.docs/sprints/` (local to
359 the repo by user choice; planning artifacts stay out of git).