documentlanguagemodel Public
Changelog
All notable changes to DocumentLanguageModel are recorded here. The format follows Keep a Changelog; the project targets Semantic Versioning.
[Unreleased]
Fixed
gguf_archpreflight probe was silently false-negative on every HF off-registry base. Three compounding bugs surfaced while trying to train againsthf:Qwen/Qwen3-1.7B:- The probe's regex matched
@Model.register(...)but upstream llama.cpp renamed the decorator to@ModelBase.register(...)mid-2024; the regex now accepts both forms. - The regex captured only the first quoted arg, silently
missing multi-arg decorators like
@ModelBase.register("Qwen3ForCausalLM", "Qwen3Model"); the probe now extracts every quoted string inside the decorator's arg list. - The probe compared
spec.gguf_arch(short label like"qwen3") against the decorator's arguments, but llama.cpp registers HF class names ("Qwen3ForCausalLM") — different namespaces, will never match. Comparison now usesspec.architecture. The bug was invisible because registered models bypass the probe entirely; it only bithf:paths.
- The probe's regex matched
Added
dlm export --emit-sway-jsonwrites a ready-to-run<export-dir>/sway.yamlalongside the GGUF/Modelfile, eliminating the previous two-stepdlm export→sway autogenritual users had to do before evaluating an adapter via sway. Calls intodlm_sway.integrations.dlm.autogen.build_spec_dictvia a newdlm.export.sway_json.write_sway_jsonhelper. Closes the X1 half of sway's Sprint 26 cross-repo integration; X3 (sway-sidesway pack/sway unpack) ships in sway proper.- New
[sway]optional extra (pip install 'dlm[sway]') pullsdlm-sway>=0.1.0. Deliberately pulls plaindlm-sway, NOTdlm-sway[dlm], because the round-trip extra would create a pip-resolver cycle (sway's[dlm]extra already pulls dlm). - Failures route through a new typed
SwayJsonExportError(subclass ofExportError) so the CLI's existing exception handler renders them cleanly. The most common failure — user didn't install the[sway]extra — gets a message that names the install command verbatim. - 5 unit tests in
tests/unit/cli/test_export_sway_json.pycover the helper round-trip, missing-extra error, autogen failure wrapping, and CLI flag wiring.
- New
dlm train --skip-export-probesmirrors the flag ondlm init(it was missing from the train CLI; a user coulddlm init --skip-export-probesa fresh .dlm then havedlm trainre-run the probes and fail). The flag threads intoresolve_base_modelidentically on both paths; help text matches verbatim.
[0.10.0] — 2026-04-21
Four phases of work in a single release: advanced training, expanded hardware coverage, the UX layer, and the ecosystem layer. 265 commits since v0.9.0, five additive schema migrations (v1 → v6), six brutal audits (audit-04 through audit-09) with remediations landed inline.
Still below 1.0 on purpose — the milestone for the semantic bump
remains the same as stated at v0.9.0: a human has to train + export +
ollama run a real document end-to-end and walk away satisfied. This
release is a broad feature expansion, not a stability claim.
Breaking changes
None at the data level — schema migrations v1 → v2 → v3 → v4 → v5 →
v6 are all additive and run automatically via dlm migrate. Existing
.dlm files parse without modification.
One subtle CLI contract change: dlm serve now refuses an untrained
.dlm with an actionable error instead of a low-level
ManifestCorruptError. Scripts that relied on the previous behavior
exit-coded identically (exit 1) but must read the new message text.
Advanced training
Preference tuning (Sprint 17) and its orchestration (Sprint 18):
::preference::section fences with### Prompt/### Chosen/### Rejectedgrammar.- DPO via TRL's
DPOTrainer, ORPO viatrl.experimental.orpo. training.preferencefrontmatter block (method / β / reference mode / loss type / max lengths).- Phase orchestrator runs SFT → DPO/ORPO in sequence when preference
content is present;
--phase sft|preference|alloverrides. - Replay corpus gains
sample_preference_rows— preference sections sample with the same recency-weighted reservoir as CPT rows. - Doctor halves the micro-batch estimate and scales VRAM estimates when a DPO phase is active.
Continued pretraining refinements for DAPT (Sprint 19):
training.cptschema block —CosineWithFloorLR schedule, embed-layer warm-up, mixed-mode loss split reporting, vocab-gap diagnostics.- Embed-layer freeze/unfreeze context manager wrapping the first N steps so vocab extensions settle before the backbone moves.
- Training summary adds per-mode loss fields so DAPT runs report SFT vs CPT loss separately.
Multi-adapter (Sprint 20a-c):
training.adapters: [name, config]with mutual exclusion against the flat LoRA knobs.dlm train --adapter <name>/dlm prompt --adapter <name>/dlm export --adapter <name>.dlm export --adapter-mix a:0.5,b:0.5— weighted merge viaPEFT.add_weighted_adapter, with QLoRA safety gate.- Per-adapter store layout:
adapter/{name}/versions/vNNNN/. - Finite-weight and finite-eval gates — a training run that produces
NaN weights or loss is rejected (renamed
-rejected) instead of committed. training.precisionoverride (schema v5) lets a document override the doctor's precision pick; MPS fp16 warns and pins to fp32 after a real NaN reproduction.
Hardware
MLX inference backend (Sprint 21):
- PEFT safetensors → MLX
.npzconverter preserving adapter config. MlxBackendimplementing theInferenceBackendprotocol.--backend mlxflag ondlm prompt; doctor reports MLX availability.
ROCm training (Sprint 22):
- Tier-2 AMD GPU support via ROCm's HIP.
- bf16 + FlashAttention probes adapted for AMD.
- Custom llama.cpp ROCm build script.
- QLoRA-on-ROCm refusal with a precise error message.
Multi-GPU training (Sprint 23):
dlm train --gpus all|N|0,1dispatches toaccelerate launch.rank_io.master_onlygates all trainer I/O so ranks don't duplicate writes.DlmLockgainsworld_size+accelerate_versionfields for reproducibility.- Doctor's effective-batch-size math respects the selected rank count.
UX
Interactive REPL (Sprint 24):
dlm repl <path>—prompt_toolkitloop against the trained adapter.- Slash-command parser:
/seed,/temp,/top_p,/max_tokens,/system,/reload,/quit. - Persistent per-store history file.
Save-to-train watch mode (Sprint 25):
dlm train --watch—watchfileswrapper with debounced retrain on settled saves.- Rich live status line (step, loss, elapsed, files watched).
- Ctrl-C exits cleanly between cycles.
--watch --replbridge is honestly deferred (marked[~]pending a CI-capable test harness).
Observability (Sprint 26):
- Per-store SQLite metrics database at
~/.dlm/store/<id>/metrics.db. - Typed event dataclasses:
RunStart,Step,Eval,RunEnd,TokenizationEvent. dlm metrics [--json|--csv]— runs summary with filters.dlm metrics watch <path>— live tail of steps + evals.- Optional sinks: TensorBoard (
[tb]extra), W&B ([wandb]extra).
Ecosystem
Template gallery (Sprint 27):
dlm templates list— eight bundled templates (coding tutor, domain KB, writing partner, personal assistant, meeting notes, regex buddy, shell one-liner, study guide).meta.yamlsidecars per template (title, summary, recommended base, tags, license).dlm init --template <name>— fresh ULID, adopts the template's recommended base, persists license acceptance for gated bases.- Offline-first registry;
--refreshreserved for a future upstream gallery.
Share protocol (Sprint 28):
dlm push --to hf:org/name | https://... | peer://host:port.dlm pull <source>with signature verification on peer and URL pulls.dlm serve <path>— LAN-local peer endpoint with HMAC bearer tokens, per-token rate limit, explicit public-bind gate.- Optional minisign signing — sidecar
.minisignext to the pack. - HuggingFace Hub sink auto-generates a README from the manifest.
Source directives (Sprint 29):
training.sources: [...]— declare file or directory sources in frontmatter; the trainer descends the tree and ingests raw text through the existing CPT path.include/excludeglob filters, per-file and per-source size/count caps.sources_policy: permissive | strict— strict confines paths to descendants of the.dlm's directory with a symlink-escape check.- Deterministic lexicographic enumeration; UTF-8 hygiene; binary detection via NUL sniff.
- Per-directive provenance in
TrainingRunSummary.source_directives(file count, byte total, skip reasons).
.dlm/ descent + auto-scaffold (Sprint 30):
- Per-codebase
.dlm/training.yaml+.dlm/ignorediscovered on a directory walk; nearest-ancestor resolution with gitignore-subset last-match-wins semantics (!negation, anchored/, trailing/, globstar**). - Default-exclude set for VCS, caches, lockfiles, binaries.
Section.tagsflow from config metadata onto synthesized sections (loss weighting deferred to a future release).dlm train <dir>auto-scaffolds<dir>/.dlm/corpus.dlmon first run: ULID minted,--base+--include+--exclude+--policybaked in. Second invocation reuses the anchor.--rescaffoldrewrites the scaffolded.dlmin place while preservingdlm_id.
Tokenized-section cache (Sprint 31):
- Per-store cache at
~/.dlm/store/<id>/tokenized-cache/, keyed by(section_id, tokenizer_sha256, sequence_len). - Atomic tmp+rename writes, LRU eviction with current-run protection, tokenizer-version invalidation on SHA bump.
dlm cache show | prune | clearCLI.- Deferred: trainer-side wiring into the SFTTrainer tokenization
path requires pre-tokenization plus a custom collator (label-shift
preservation is subtle). Module is shipped and unit-tested; the
consumer lands in a future release. See
src/dlm/directives/cache.pymodule docstring.
Audits + remediations
Six brutal audits ran during this window, each producing a
findings doc under .docs/audits/ and remediation commits
referencing the finding IDs:
- Audit 04: replay-store integration, version-drift detection, tokenizer probe rename.
- Audit 05: pyproject runtime deps, license-acceptance record persistence, lock policy rules.
- Audit 06: 16 findings across GGUF parser hardening, ollama smoke tests, timezone-aware timestamps, pack hash determinism, vendor path resolution.
- Audit 07: forward-date schema rejection, ruff src-side cleanup.
- Audit 08: multi-GPU world_size plumbing, MLX adapter config fidelity, llama-cpp build env honoring, CLI reference drift.
- Audit 09 (Phase 7 brutal):
dlm train <dir>end-to-end crash (B1 + B2), test-masks-bugs pattern (B3), orphan tokenized-cache (M1+M2 documented deferral),dlm serveguard on untrained.dlm(M3), task-tracker drift (M4), seven minors + two polish. Empirical differential evidence in09-sway-appendix.md— 359σ delta_kl vs null-adapter baseline on Fortran-idiomatic prompts.
Schema migrations
All additive. Identity migrators; no data loss.
| From | To | Added |
|---|---|---|
| v1 | v2 | training.preference (DPO/ORPO) rename |
| v2 | v3 | training.cpt block (schedule + warm-up) |
| v3 | v4 | training.adapters (named multi-adapter) |
| v4 | v5 | training.precision override |
| v5 | v6 | training.sources + sources_policy |
New CLI surface
dlm templates list
dlm init --template <name>
dlm push --to <hf:...|https:...|peer://...> [--sign]
dlm pull <source>
dlm serve <path> [--public --i-know-this-is-public]
dlm repl <path>
dlm train --watch
dlm metrics [--json|--csv]
dlm metrics watch <path>
dlm train <dir> --base <key> --include <glob>
dlm cache show | prune | clear
Test matrix
- 2,211 unit tests pass (≥95 % coverage on touched packages).
- ruff clean; mypy
--strictclean across 215 source files. - Slow integration matrix: two-adapter training, preference round trip, MLX adapter conversion, ROCm smoke, multi-GPU smoke, end-to-end auto-scaffold cycle, tokenized-cache unit suite, peer round-trip, directive fixture tree → finite adapter.
Thanks
Five phases worth of work. Six audits caught real bugs, and the sway submodule's differential tests produced the empirical floor that the engine is behaviorally sound.
[0.9.0] — target
First tagged release. Ships via the
tenseleyFlow/homebrew-tap
(brew tap tenseleyFlow/tap && brew install dlm). Below v1.0 on
purpose — a human still needs to train + export + ollama run a real
document end-to-end before we claim the stable number.
Highlights
- CLI:
init,train,prompt,export,pack,unpack,doctor,show,migrate. - Content-addressed store at
~/.dlm/store/<dlm_id>/with atomic manifest updates and exclusive locking. - Hardware-aware training plan (
dlm doctor) across CUDA / MPS / ROCm / CPU tiers, with a refusal matrix that fails loudly on unsupported combinations. - Curated base-model registry (10 entries) plus
hf:org/nameescape hatch with compatibility probes. - LoRA + QLoRA training, replay-corpus retraining that retains prior sections, two-phase atomic version commits.
- Eval harness: val-loss, perplexity, early-stop.
- GGUF export with imatrix-calibrated quantization, explicit Go chat template (no fuzzy matching), embedding-row SHA verification, merge-safety gate against QLoRA pitfalls.
- Ollama integration: Modelfile emission,
ollama create, smoke validation, closed-loop token-identity verification against the HF Jinja reference. .dlm.packformat: byte-identical packs, symlink / tar-bomb / zstd-bomb defenses, per-file SHA-256 integrity, pack-format migrations registry.- Reproducibility contract: per-store
dlm.lockwith severity-table mismatch policy,--strict-lock/--update-lock/--ignore-lockCLI flags, determinism golden integration test. - Documentation: getting started,
.dlmformat reference, CLI reference, six cookbook recipes, architecture overview, troubleshooting, determinism guide. - Five starter templates: coding tutor, domain KB, writing partner, personal assistant, changelog.
- Weekly CI jobs: chat-template drift, slow integration suite.
- Pre-commit config: ruff, mypy
--strict, non-slow pytest.
Thanks
Built by following .docs/findings.md and the 29-sprint plan closely.
Every pitfall in the findings inventory corresponds to a test and an
explicit guardrail somewhere in the codebase.
The complete per-sprint history lives in .docs/sprints/ (local to
the repo by user choice; planning artifacts stay out of git).
View source
| 1 | # Changelog |
| 2 | |
| 3 | All notable changes to DocumentLanguageModel are recorded here. The |
| 4 | format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/); |
| 5 | the project targets [Semantic Versioning](https://semver.org/). |
| 6 | |
| 7 | ## [Unreleased] |
| 8 | |
| 9 | ### Fixed |
| 10 | |
| 11 | - **`gguf_arch` preflight probe was silently false-negative on every |
| 12 | HF off-registry base.** Three compounding bugs surfaced while |
| 13 | trying to train against `hf:Qwen/Qwen3-1.7B`: |
| 14 | 1. The probe's regex matched `@Model.register(...)` but upstream |
| 15 | llama.cpp renamed the decorator to `@ModelBase.register(...)` |
| 16 | mid-2024; the regex now accepts both forms. |
| 17 | 2. The regex captured only the *first* quoted arg, silently |
| 18 | missing multi-arg decorators like |
| 19 | `@ModelBase.register("Qwen3ForCausalLM", "Qwen3Model")`; the |
| 20 | probe now extracts every quoted string inside the decorator's |
| 21 | arg list. |
| 22 | 3. The probe compared `spec.gguf_arch` (short label like |
| 23 | `"qwen3"`) against the decorator's arguments, but llama.cpp |
| 24 | registers HF class names (`"Qwen3ForCausalLM"`) — different |
| 25 | namespaces, will never match. Comparison now uses |
| 26 | `spec.architecture`. The bug was invisible because registered |
| 27 | models bypass the probe entirely; it only bit `hf:` paths. |
| 28 | |
| 29 | ### Added |
| 30 | |
| 31 | - **`dlm export --emit-sway-json`** writes a ready-to-run |
| 32 | `<export-dir>/sway.yaml` alongside the GGUF/Modelfile, eliminating |
| 33 | the previous two-step `dlm export` → `sway autogen` ritual users |
| 34 | had to do before evaluating an adapter via [sway](https://github.com/tenseleyFlow/sway). |
| 35 | Calls into `dlm_sway.integrations.dlm.autogen.build_spec_dict` via |
| 36 | a new `dlm.export.sway_json.write_sway_json` helper. Closes the X1 |
| 37 | half of sway's Sprint 26 cross-repo integration; X3 (sway-side |
| 38 | `sway pack` / `sway unpack`) ships in sway proper. |
| 39 | - New `[sway]` optional extra (`pip install 'dlm[sway]'`) pulls |
| 40 | `dlm-sway>=0.1.0`. Deliberately pulls plain `dlm-sway`, NOT |
| 41 | `dlm-sway[dlm]`, because the round-trip extra would create a |
| 42 | pip-resolver cycle (sway's `[dlm]` extra already pulls dlm). |
| 43 | - Failures route through a new typed `SwayJsonExportError` |
| 44 | (subclass of `ExportError`) so the CLI's existing exception |
| 45 | handler renders them cleanly. The most common failure — user |
| 46 | didn't install the `[sway]` extra — gets a message that names |
| 47 | the install command verbatim. |
| 48 | - 5 unit tests in `tests/unit/cli/test_export_sway_json.py` |
| 49 | cover the helper round-trip, missing-extra error, autogen |
| 50 | failure wrapping, and CLI flag wiring. |
| 51 | - **`dlm train --skip-export-probes`** mirrors the flag on `dlm init` |
| 52 | (it was missing from the train CLI; a user could `dlm init |
| 53 | --skip-export-probes` a fresh .dlm then have `dlm train` re-run |
| 54 | the probes and fail). The flag threads into `resolve_base_model` |
| 55 | identically on both paths; help text matches verbatim. |
| 56 | |
| 57 | ## [0.10.0] — 2026-04-21 |
| 58 | |
| 59 | Four phases of work in a single release: advanced training, expanded |
| 60 | hardware coverage, the UX layer, and the ecosystem layer. 265 commits |
| 61 | since v0.9.0, five additive schema migrations (v1 → v6), six brutal |
| 62 | audits (audit-04 through audit-09) with remediations landed inline. |
| 63 | |
| 64 | Still below 1.0 on purpose — the milestone for the semantic bump |
| 65 | remains the same as stated at v0.9.0: a human has to train + export + |
| 66 | `ollama run` a real document end-to-end and walk away satisfied. This |
| 67 | release is a broad feature expansion, not a stability claim. |
| 68 | |
| 69 | ### Breaking changes |
| 70 | |
| 71 | None at the data level — schema migrations v1 → v2 → v3 → v4 → v5 → |
| 72 | v6 are all additive and run automatically via `dlm migrate`. Existing |
| 73 | `.dlm` files parse without modification. |
| 74 | |
| 75 | One subtle CLI contract change: `dlm serve` now refuses an untrained |
| 76 | `.dlm` with an actionable error instead of a low-level |
| 77 | `ManifestCorruptError`. Scripts that relied on the previous behavior |
| 78 | exit-coded identically (exit 1) but must read the new message text. |
| 79 | |
| 80 | ### Advanced training |
| 81 | |
| 82 | Preference tuning (Sprint 17) and its orchestration (Sprint 18): |
| 83 | |
| 84 | - `::preference::` section fences with `### Prompt` / `### Chosen` / |
| 85 | `### Rejected` grammar. |
| 86 | - DPO via TRL's `DPOTrainer`, ORPO via `trl.experimental.orpo`. |
| 87 | - `training.preference` frontmatter block (method / β / reference |
| 88 | mode / loss type / max lengths). |
| 89 | - Phase orchestrator runs SFT → DPO/ORPO in sequence when preference |
| 90 | content is present; `--phase sft|preference|all` overrides. |
| 91 | - Replay corpus gains `sample_preference_rows` — preference sections |
| 92 | sample with the same recency-weighted reservoir as CPT rows. |
| 93 | - Doctor halves the micro-batch estimate and scales VRAM estimates |
| 94 | when a DPO phase is active. |
| 95 | |
| 96 | Continued pretraining refinements for DAPT (Sprint 19): |
| 97 | |
| 98 | - `training.cpt` schema block — `CosineWithFloor` LR schedule, |
| 99 | embed-layer warm-up, mixed-mode loss split reporting, vocab-gap |
| 100 | diagnostics. |
| 101 | - Embed-layer freeze/unfreeze context manager wrapping the first N |
| 102 | steps so vocab extensions settle before the backbone moves. |
| 103 | - Training summary adds per-mode loss fields so DAPT runs report SFT |
| 104 | vs CPT loss separately. |
| 105 | |
| 106 | Multi-adapter (Sprint 20a-c): |
| 107 | |
| 108 | - `training.adapters: [name, config]` with mutual exclusion against |
| 109 | the flat LoRA knobs. |
| 110 | - `dlm train --adapter <name>` / `dlm prompt --adapter <name>` / |
| 111 | `dlm export --adapter <name>`. |
| 112 | - `dlm export --adapter-mix a:0.5,b:0.5` — weighted merge via |
| 113 | `PEFT.add_weighted_adapter`, with QLoRA safety gate. |
| 114 | - Per-adapter store layout: `adapter/{name}/versions/vNNNN/`. |
| 115 | - Finite-weight and finite-eval gates — a training run that produces |
| 116 | NaN weights or loss is rejected (renamed `-rejected`) instead of |
| 117 | committed. |
| 118 | - `training.precision` override (schema v5) lets a document override |
| 119 | the doctor's precision pick; MPS fp16 warns and pins to fp32 after |
| 120 | a real NaN reproduction. |
| 121 | |
| 122 | ### Hardware |
| 123 | |
| 124 | MLX inference backend (Sprint 21): |
| 125 | |
| 126 | - PEFT safetensors → MLX `.npz` converter preserving adapter config. |
| 127 | - `MlxBackend` implementing the `InferenceBackend` protocol. |
| 128 | - `--backend mlx` flag on `dlm prompt`; doctor reports MLX |
| 129 | availability. |
| 130 | |
| 131 | ROCm training (Sprint 22): |
| 132 | |
| 133 | - Tier-2 AMD GPU support via ROCm's HIP. |
| 134 | - bf16 + FlashAttention probes adapted for AMD. |
| 135 | - Custom llama.cpp ROCm build script. |
| 136 | - QLoRA-on-ROCm refusal with a precise error message. |
| 137 | |
| 138 | Multi-GPU training (Sprint 23): |
| 139 | |
| 140 | - `dlm train --gpus all|N|0,1` dispatches to `accelerate launch`. |
| 141 | - `rank_io.master_only` gates all trainer I/O so ranks don't |
| 142 | duplicate writes. |
| 143 | - `DlmLock` gains `world_size` + `accelerate_version` fields for |
| 144 | reproducibility. |
| 145 | - Doctor's effective-batch-size math respects the selected rank |
| 146 | count. |
| 147 | |
| 148 | ### UX |
| 149 | |
| 150 | Interactive REPL (Sprint 24): |
| 151 | |
| 152 | - `dlm repl <path>` — `prompt_toolkit` loop against the trained |
| 153 | adapter. |
| 154 | - Slash-command parser: `/seed`, `/temp`, `/top_p`, `/max_tokens`, |
| 155 | `/system`, `/reload`, `/quit`. |
| 156 | - Persistent per-store history file. |
| 157 | |
| 158 | Save-to-train watch mode (Sprint 25): |
| 159 | |
| 160 | - `dlm train --watch` — `watchfiles` wrapper with debounced retrain |
| 161 | on settled saves. |
| 162 | - Rich live status line (step, loss, elapsed, files watched). |
| 163 | - Ctrl-C exits cleanly between cycles. |
| 164 | - `--watch --repl` bridge is honestly deferred (marked `[~]` pending |
| 165 | a CI-capable test harness). |
| 166 | |
| 167 | Observability (Sprint 26): |
| 168 | |
| 169 | - Per-store SQLite metrics database at |
| 170 | `~/.dlm/store/<id>/metrics.db`. |
| 171 | - Typed event dataclasses: `RunStart`, `Step`, `Eval`, `RunEnd`, |
| 172 | `TokenizationEvent`. |
| 173 | - `dlm metrics [--json|--csv]` — runs summary with filters. |
| 174 | - `dlm metrics watch <path>` — live tail of steps + evals. |
| 175 | - Optional sinks: TensorBoard (`[tb]` extra), W&B (`[wandb]` extra). |
| 176 | |
| 177 | ### Ecosystem |
| 178 | |
| 179 | Template gallery (Sprint 27): |
| 180 | |
| 181 | - `dlm templates list` — eight bundled templates (coding tutor, |
| 182 | domain KB, writing partner, personal assistant, meeting notes, |
| 183 | regex buddy, shell one-liner, study guide). |
| 184 | - `meta.yaml` sidecars per template (title, summary, recommended |
| 185 | base, tags, license). |
| 186 | - `dlm init --template <name>` — fresh ULID, adopts the template's |
| 187 | recommended base, persists license acceptance for gated bases. |
| 188 | - Offline-first registry; `--refresh` reserved for a future upstream |
| 189 | gallery. |
| 190 | |
| 191 | Share protocol (Sprint 28): |
| 192 | |
| 193 | - `dlm push --to hf:org/name | https://... | peer://host:port`. |
| 194 | - `dlm pull <source>` with signature verification on peer and URL |
| 195 | pulls. |
| 196 | - `dlm serve <path>` — LAN-local peer endpoint with HMAC bearer |
| 197 | tokens, per-token rate limit, explicit public-bind gate. |
| 198 | - Optional minisign signing — sidecar `.minisig` next to the pack. |
| 199 | - HuggingFace Hub sink auto-generates a README from the manifest. |
| 200 | |
| 201 | Source directives (Sprint 29): |
| 202 | |
| 203 | - `training.sources: [...]` — declare file or directory sources in |
| 204 | frontmatter; the trainer descends the tree and ingests raw text |
| 205 | through the existing CPT path. `include` / `exclude` glob filters, |
| 206 | per-file and per-source size/count caps. |
| 207 | - `sources_policy: permissive | strict` — strict confines paths to |
| 208 | descendants of the `.dlm`'s directory with a symlink-escape check. |
| 209 | - Deterministic lexicographic enumeration; UTF-8 hygiene; binary |
| 210 | detection via NUL sniff. |
| 211 | - Per-directive provenance in `TrainingRunSummary.source_directives` |
| 212 | (file count, byte total, skip reasons). |
| 213 | |
| 214 | `.dlm/` descent + auto-scaffold (Sprint 30): |
| 215 | |
| 216 | - Per-codebase `.dlm/training.yaml` + `.dlm/ignore` discovered on a |
| 217 | directory walk; nearest-ancestor resolution with gitignore-subset |
| 218 | last-match-wins semantics (`!` negation, anchored `/`, trailing |
| 219 | `/`, globstar `**`). |
| 220 | - Default-exclude set for VCS, caches, lockfiles, binaries. |
| 221 | - `Section.tags` flow from config metadata onto synthesized sections |
| 222 | (loss weighting deferred to a future release). |
| 223 | - `dlm train <dir>` auto-scaffolds `<dir>/.dlm/corpus.dlm` on first |
| 224 | run: ULID minted, `--base` + `--include` + `--exclude` + `--policy` |
| 225 | baked in. Second invocation reuses the anchor. |
| 226 | - `--rescaffold` rewrites the scaffolded `.dlm` in place while |
| 227 | preserving `dlm_id`. |
| 228 | |
| 229 | Tokenized-section cache (Sprint 31): |
| 230 | |
| 231 | - Per-store cache at `~/.dlm/store/<id>/tokenized-cache/`, keyed by |
| 232 | `(section_id, tokenizer_sha256, sequence_len)`. |
| 233 | - Atomic tmp+rename writes, LRU eviction with current-run |
| 234 | protection, tokenizer-version invalidation on SHA bump. |
| 235 | - `dlm cache show | prune | clear` CLI. |
| 236 | - **Deferred:** trainer-side wiring into the SFTTrainer tokenization |
| 237 | path requires pre-tokenization plus a custom collator (label-shift |
| 238 | preservation is subtle). Module is shipped and unit-tested; the |
| 239 | consumer lands in a future release. See |
| 240 | `src/dlm/directives/cache.py` module docstring. |
| 241 | |
| 242 | ### Audits + remediations |
| 243 | |
| 244 | Six brutal audits ran during this window, each producing a |
| 245 | findings doc under `.docs/audits/` and remediation commits |
| 246 | referencing the finding IDs: |
| 247 | |
| 248 | - Audit 04: replay-store integration, version-drift detection, |
| 249 | tokenizer probe rename. |
| 250 | - Audit 05: pyproject runtime deps, license-acceptance record |
| 251 | persistence, lock policy rules. |
| 252 | - Audit 06: 16 findings across GGUF parser hardening, ollama smoke |
| 253 | tests, timezone-aware timestamps, pack hash determinism, vendor |
| 254 | path resolution. |
| 255 | - Audit 07: forward-date schema rejection, ruff src-side cleanup. |
| 256 | - Audit 08: multi-GPU world_size plumbing, MLX adapter config fidelity, |
| 257 | llama-cpp build env honoring, CLI reference drift. |
| 258 | - Audit 09 (Phase 7 brutal): `dlm train <dir>` end-to-end crash |
| 259 | (B1 + B2), test-masks-bugs pattern (B3), orphan tokenized-cache |
| 260 | (M1+M2 documented deferral), `dlm serve` guard on untrained `.dlm` |
| 261 | (M3), task-tracker drift (M4), seven minors + two polish. Empirical |
| 262 | differential evidence in `09-sway-appendix.md` — 359σ delta_kl vs |
| 263 | null-adapter baseline on Fortran-idiomatic prompts. |
| 264 | |
| 265 | ### Schema migrations |
| 266 | |
| 267 | All additive. Identity migrators; no data loss. |
| 268 | |
| 269 | | From | To | Added | |
| 270 | |------|----|---------------------------------------------| |
| 271 | | v1 | v2 | `training.preference` (DPO/ORPO) rename | |
| 272 | | v2 | v3 | `training.cpt` block (schedule + warm-up) | |
| 273 | | v3 | v4 | `training.adapters` (named multi-adapter) | |
| 274 | | v4 | v5 | `training.precision` override | |
| 275 | | v5 | v6 | `training.sources` + `sources_policy` | |
| 276 | |
| 277 | ### New CLI surface |
| 278 | |
| 279 | ``` |
| 280 | dlm templates list |
| 281 | dlm init --template <name> |
| 282 | dlm push --to <hf:...|https:...|peer://...> [--sign] |
| 283 | dlm pull <source> |
| 284 | dlm serve <path> [--public --i-know-this-is-public] |
| 285 | dlm repl <path> |
| 286 | dlm train --watch |
| 287 | dlm metrics [--json|--csv] |
| 288 | dlm metrics watch <path> |
| 289 | dlm train <dir> --base <key> --include <glob> |
| 290 | dlm cache show | prune | clear |
| 291 | ``` |
| 292 | |
| 293 | ### Test matrix |
| 294 | |
| 295 | - 2,211 unit tests pass (≥95 % coverage on touched packages). |
| 296 | - ruff clean; mypy `--strict` clean across 215 source files. |
| 297 | - Slow integration matrix: two-adapter training, preference round |
| 298 | trip, MLX adapter conversion, ROCm smoke, multi-GPU smoke, |
| 299 | end-to-end auto-scaffold cycle, tokenized-cache unit suite, peer |
| 300 | round-trip, directive fixture tree → finite adapter. |
| 301 | |
| 302 | ### Thanks |
| 303 | |
| 304 | Five phases worth of work. Six audits caught real bugs, and the sway |
| 305 | submodule's differential tests produced the empirical floor that the |
| 306 | engine is behaviorally sound. |
| 307 | |
| 308 | ## [0.9.0] — target |
| 309 | |
| 310 | First tagged release. Ships via the |
| 311 | [tenseleyFlow/homebrew-tap](https://github.com/tenseleyFlow/homebrew-tap) |
| 312 | (`brew tap tenseleyFlow/tap && brew install dlm`). Below v1.0 on |
| 313 | purpose — a human still needs to train + export + `ollama run` a real |
| 314 | document end-to-end before we claim the stable number. |
| 315 | |
| 316 | ### Highlights |
| 317 | |
| 318 | - CLI: `init`, `train`, `prompt`, `export`, `pack`, |
| 319 | `unpack`, `doctor`, `show`, `migrate`. |
| 320 | - Content-addressed store at `~/.dlm/store/<dlm_id>/` with atomic |
| 321 | manifest updates and exclusive locking. |
| 322 | - Hardware-aware training plan (`dlm doctor`) across CUDA / MPS / |
| 323 | ROCm / CPU tiers, with a refusal matrix that fails loudly on |
| 324 | unsupported combinations. |
| 325 | - Curated base-model registry (10 entries) plus `hf:org/name` |
| 326 | escape hatch with compatibility probes. |
| 327 | - LoRA + QLoRA training, replay-corpus retraining that retains prior |
| 328 | sections, two-phase atomic version commits. |
| 329 | - Eval harness: val-loss, perplexity, early-stop. |
| 330 | - GGUF export with imatrix-calibrated quantization, explicit Go |
| 331 | chat template (no fuzzy matching), embedding-row SHA verification, |
| 332 | merge-safety gate against QLoRA pitfalls. |
| 333 | - Ollama integration: Modelfile emission, `ollama create`, smoke |
| 334 | validation, closed-loop token-identity verification against the |
| 335 | HF Jinja reference. |
| 336 | - `.dlm.pack` format: byte-identical packs, symlink / tar-bomb / |
| 337 | zstd-bomb defenses, per-file SHA-256 integrity, pack-format |
| 338 | migrations registry. |
| 339 | - Reproducibility contract: per-store `dlm.lock` with severity-table |
| 340 | mismatch policy, `--strict-lock` / `--update-lock` / `--ignore-lock` |
| 341 | CLI flags, determinism golden integration test. |
| 342 | - Documentation: getting started, `.dlm` format reference, CLI |
| 343 | reference, six cookbook recipes, architecture overview, |
| 344 | troubleshooting, determinism guide. |
| 345 | - Five starter templates: coding tutor, domain KB, writing partner, |
| 346 | personal assistant, changelog. |
| 347 | - Weekly CI jobs: chat-template drift, slow integration suite. |
| 348 | - Pre-commit config: ruff, mypy `--strict`, non-slow pytest. |
| 349 | |
| 350 | ### Thanks |
| 351 | |
| 352 | Built by following `.docs/findings.md` and the 29-sprint plan closely. |
| 353 | Every pitfall in the findings inventory corresponds to a test and an |
| 354 | explicit guardrail somewhere in the codebase. |
| 355 | |
| 356 | --- |
| 357 | |
| 358 | The complete per-sprint history lives in `.docs/sprints/` (local to |
| 359 | the repo by user choice; planning artifacts stay out of git). |