documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 15314 bytes Raw Blame History

Changelog

All notable changes to DocumentLanguageModel are recorded here. The format follows Keep a Changelog; the project targets Semantic Versioning.

[Unreleased]

Fixed

gguf_arch preflight probe was silently false-negative on every HF off-registry base. Three compounding bugs surfaced while trying to train against hf:Qwen/Qwen3-1.7B:
1. The probe's regex matched @Model.register(...) but upstream llama.cpp renamed the decorator to @ModelBase.register(...) mid-2024; the regex now accepts both forms.
2. The regex captured only the first quoted arg, silently missing multi-arg decorators like @ModelBase.register("Qwen3ForCausalLM", "Qwen3Model"); the probe now extracts every quoted string inside the decorator's arg list.
3. The probe compared spec.gguf_arch (short label like "qwen3") against the decorator's arguments, but llama.cpp registers HF class names ("Qwen3ForCausalLM") — different namespaces, will never match. Comparison now uses spec.architecture. The bug was invisible because registered models bypass the probe entirely; it only bit hf: paths.

Added

dlm export --emit-sway-json writes a ready-to-run <export-dir>/sway.yaml alongside the GGUF/Modelfile, eliminating the previous two-step dlm export → sway autogen ritual users had to do before evaluating an adapter via sway. Calls into dlm_sway.integrations.dlm.autogen.build_spec_dict via a new dlm.export.sway_json.write_sway_json helper. Closes the X1 half of sway's Sprint 26 cross-repo integration; X3 (sway-side sway pack / sway unpack) ships in sway proper.
- New [sway] optional extra (pip install 'dlm[sway]') pulls dlm-sway>=0.1.0. Deliberately pulls plain dlm-sway, NOT dlm-sway[dlm], because the round-trip extra would create a pip-resolver cycle (sway's [dlm] extra already pulls dlm).
- Failures route through a new typed SwayJsonExportError (subclass of ExportError) so the CLI's existing exception handler renders them cleanly. The most common failure — user didn't install the [sway] extra — gets a message that names the install command verbatim.
- 5 unit tests in tests/unit/cli/test_export_sway_json.py cover the helper round-trip, missing-extra error, autogen failure wrapping, and CLI flag wiring.
dlm train --skip-export-probes mirrors the flag on dlm init (it was missing from the train CLI; a user could dlm init --skip-export-probes a fresh .dlm then have dlm train re-run the probes and fail). The flag threads into resolve_base_model identically on both paths; help text matches verbatim.

[0.10.0] — 2026-04-21

Four phases of work in a single release: advanced training, expanded hardware coverage, the UX layer, and the ecosystem layer. 265 commits since v0.9.0, five additive schema migrations (v1 → v6), six brutal audits (audit-04 through audit-09) with remediations landed inline.

Still below 1.0 on purpose — the milestone for the semantic bump remains the same as stated at v0.9.0: a human has to train + export + ollama run a real document end-to-end and walk away satisfied. This release is a broad feature expansion, not a stability claim.

Breaking changes

None at the data level — schema migrations v1 → v2 → v3 → v4 → v5 → v6 are all additive and run automatically via dlm migrate. Existing .dlm files parse without modification.

One subtle CLI contract change: dlm serve now refuses an untrained .dlm with an actionable error instead of a low-level ManifestCorruptError. Scripts that relied on the previous behavior exit-coded identically (exit 1) but must read the new message text.

Advanced training

Preference tuning (Sprint 17) and its orchestration (Sprint 18):

::preference:: section fences with ### Prompt / ### Chosen / ### Rejected grammar.
DPO via TRL's DPOTrainer, ORPO via trl.experimental.orpo.
training.preference frontmatter block (method / β / reference mode / loss type / max lengths).
Phase orchestrator runs SFT → DPO/ORPO in sequence when preference content is present; --phase sft|preference|all overrides.
Replay corpus gains sample_preference_rows — preference sections sample with the same recency-weighted reservoir as CPT rows.
Doctor halves the micro-batch estimate and scales VRAM estimates when a DPO phase is active.

Continued pretraining refinements for DAPT (Sprint 19):

training.cpt schema block — CosineWithFloor LR schedule, embed-layer warm-up, mixed-mode loss split reporting, vocab-gap diagnostics.
Embed-layer freeze/unfreeze context manager wrapping the first N steps so vocab extensions settle before the backbone moves.
Training summary adds per-mode loss fields so DAPT runs report SFT vs CPT loss separately.

Multi-adapter (Sprint 20a-c):

training.adapters: [name, config] with mutual exclusion against the flat LoRA knobs.
dlm train --adapter <name> / dlm prompt --adapter <name> / dlm export --adapter <name>.
dlm export --adapter-mix a:0.5,b:0.5 — weighted merge via PEFT.add_weighted_adapter, with QLoRA safety gate.
Per-adapter store layout: adapter/{name}/versions/vNNNN/.
Finite-weight and finite-eval gates — a training run that produces NaN weights or loss is rejected (renamed -rejected) instead of committed.
training.precision override (schema v5) lets a document override the doctor's precision pick; MPS fp16 warns and pins to fp32 after a real NaN reproduction.

Hardware

MLX inference backend (Sprint 21):

PEFT safetensors → MLX .npz converter preserving adapter config.
MlxBackend implementing the InferenceBackend protocol.
--backend mlx flag on dlm prompt; doctor reports MLX availability.

ROCm training (Sprint 22):

Tier-2 AMD GPU support via ROCm's HIP.
bf16 + FlashAttention probes adapted for AMD.
Custom llama.cpp ROCm build script.
QLoRA-on-ROCm refusal with a precise error message.

Multi-GPU training (Sprint 23):

dlm train --gpus all|N|0,1 dispatches to accelerate launch.
rank_io.master_only gates all trainer I/O so ranks don't duplicate writes.
DlmLock gains world_size + accelerate_version fields for reproducibility.
Doctor's effective-batch-size math respects the selected rank count.

UX

Interactive REPL (Sprint 24):

dlm repl <path> — prompt_toolkit loop against the trained adapter.
Slash-command parser: /seed, /temp, /top_p, /max_tokens, /system, /reload, /quit.
Persistent per-store history file.

Save-to-train watch mode (Sprint 25):

dlm train --watch — watchfiles wrapper with debounced retrain on settled saves.
Rich live status line (step, loss, elapsed, files watched).
Ctrl-C exits cleanly between cycles.
--watch --repl bridge is honestly deferred (marked [~] pending a CI-capable test harness).

Observability (Sprint 26):

Per-store SQLite metrics database at ~/.dlm/store/<id>/metrics.db.
Typed event dataclasses: RunStart, Step, Eval, RunEnd, TokenizationEvent.
dlm metrics [--json|--csv] — runs summary with filters.
dlm metrics watch <path> — live tail of steps + evals.
Optional sinks: TensorBoard ([tb] extra), W&B ([wandb] extra).

Ecosystem

Template gallery (Sprint 27):

dlm templates list — eight bundled templates (coding tutor, domain KB, writing partner, personal assistant, meeting notes, regex buddy, shell one-liner, study guide).
meta.yaml sidecars per template (title, summary, recommended base, tags, license).
dlm init --template <name> — fresh ULID, adopts the template's recommended base, persists license acceptance for gated bases.
Offline-first registry; --refresh reserved for a future upstream gallery.

Share protocol (Sprint 28):

dlm push --to hf:org/name | https://... | peer://host:port.
dlm pull <source> with signature verification on peer and URL pulls.
dlm serve <path> — LAN-local peer endpoint with HMAC bearer tokens, per-token rate limit, explicit public-bind gate.
Optional minisign signing — sidecar .minisig next to the pack.
HuggingFace Hub sink auto-generates a README from the manifest.

Source directives (Sprint 29):

training.sources: [...] — declare file or directory sources in frontmatter; the trainer descends the tree and ingests raw text through the existing CPT path. include / exclude glob filters, per-file and per-source size/count caps.
sources_policy: permissive | strict — strict confines paths to descendants of the .dlm's directory with a symlink-escape check.
Deterministic lexicographic enumeration; UTF-8 hygiene; binary detection via NUL sniff.
Per-directive provenance in TrainingRunSummary.source_directives (file count, byte total, skip reasons).

.dlm/ descent + auto-scaffold (Sprint 30):

Per-codebase .dlm/training.yaml + .dlm/ignore discovered on a directory walk; nearest-ancestor resolution with gitignore-subset last-match-wins semantics (! negation, anchored /, trailing /, globstar **).
Default-exclude set for VCS, caches, lockfiles, binaries.
Section.tags flow from config metadata onto synthesized sections (loss weighting deferred to a future release).
dlm train <dir> auto-scaffolds <dir>/.dlm/corpus.dlm on first run: ULID minted, --base + --include + --exclude + --policy baked in. Second invocation reuses the anchor.
--rescaffold rewrites the scaffolded .dlm in place while preserving dlm_id.

Tokenized-section cache (Sprint 31):

Per-store cache at ~/.dlm/store/<id>/tokenized-cache/, keyed by (section_id, tokenizer_sha256, sequence_len).
Atomic tmp+rename writes, LRU eviction with current-run protection, tokenizer-version invalidation on SHA bump.
dlm cache show | prune | clear CLI.
Deferred: trainer-side wiring into the SFTTrainer tokenization path requires pre-tokenization plus a custom collator (label-shift preservation is subtle). Module is shipped and unit-tested; the consumer lands in a future release. See src/dlm/directives/cache.py module docstring.

Audits + remediations

Six brutal audits ran during this window, each producing a findings doc under .docs/audits/ and remediation commits referencing the finding IDs:

Audit 04: replay-store integration, version-drift detection, tokenizer probe rename.
Audit 05: pyproject runtime deps, license-acceptance record persistence, lock policy rules.
Audit 06: 16 findings across GGUF parser hardening, ollama smoke tests, timezone-aware timestamps, pack hash determinism, vendor path resolution.
Audit 07: forward-date schema rejection, ruff src-side cleanup.
Audit 08: multi-GPU world_size plumbing, MLX adapter config fidelity, llama-cpp build env honoring, CLI reference drift.
Audit 09 (Phase 7 brutal): dlm train <dir> end-to-end crash (B1 + B2), test-masks-bugs pattern (B3), orphan tokenized-cache (M1+M2 documented deferral), dlm serve guard on untrained .dlm (M3), task-tracker drift (M4), seven minors + two polish. Empirical differential evidence in 09-sway-appendix.md — 359σ delta_kl vs null-adapter baseline on Fortran-idiomatic prompts.

Schema migrations

All additive. Identity migrators; no data loss.

From	To	Added
v1	v2	`training.preference` (DPO/ORPO) rename
v2	v3	`training.cpt` block (schedule + warm-up)
v3	v4	`training.adapters` (named multi-adapter)
v4	v5	`training.precision` override
v5	v6	`training.sources` + `sources_policy`

New CLI surface

dlm templates list
dlm init --template <name>
dlm push --to <hf:...|https:...|peer://...> [--sign]
dlm pull <source>
dlm serve <path> [--public --i-know-this-is-public]
dlm repl <path>
dlm train --watch
dlm metrics [--json|--csv]
dlm metrics watch <path>
dlm train <dir> --base <key> --include <glob>
dlm cache show | prune | clear

Test matrix

2,211 unit tests pass (≥95 % coverage on touched packages).
ruff clean; mypy --strict clean across 215 source files.
Slow integration matrix: two-adapter training, preference round trip, MLX adapter conversion, ROCm smoke, multi-GPU smoke, end-to-end auto-scaffold cycle, tokenized-cache unit suite, peer round-trip, directive fixture tree → finite adapter.

Thanks

Five phases worth of work. Six audits caught real bugs, and the sway submodule's differential tests produced the empirical floor that the engine is behaviorally sound.

[0.9.0] — target

First tagged release. Ships via the tenseleyFlow/homebrew-tap (brew tap tenseleyFlow/tap && brew install dlm). Below v1.0 on purpose — a human still needs to train + export + ollama run a real document end-to-end before we claim the stable number.

Highlights

CLI: init, train, prompt, export, pack, unpack, doctor, show, migrate.
Content-addressed store at ~/.dlm/store/<dlm_id>/ with atomic manifest updates and exclusive locking.
Hardware-aware training plan (dlm doctor) across CUDA / MPS / ROCm / CPU tiers, with a refusal matrix that fails loudly on unsupported combinations.
Curated base-model registry (10 entries) plus hf:org/name escape hatch with compatibility probes.
LoRA + QLoRA training, replay-corpus retraining that retains prior sections, two-phase atomic version commits.
Eval harness: val-loss, perplexity, early-stop.
GGUF export with imatrix-calibrated quantization, explicit Go chat template (no fuzzy matching), embedding-row SHA verification, merge-safety gate against QLoRA pitfalls.
Ollama integration: Modelfile emission, ollama create, smoke validation, closed-loop token-identity verification against the HF Jinja reference.
.dlm.pack format: byte-identical packs, symlink / tar-bomb / zstd-bomb defenses, per-file SHA-256 integrity, pack-format migrations registry.
Reproducibility contract: per-store dlm.lock with severity-table mismatch policy, --strict-lock / --update-lock / --ignore-lock CLI flags, determinism golden integration test.
Documentation: getting started, .dlm format reference, CLI reference, six cookbook recipes, architecture overview, troubleshooting, determinism guide.
Five starter templates: coding tutor, domain KB, writing partner, personal assistant, changelog.
Weekly CI jobs: chat-template drift, slow integration suite.
Pre-commit config: ruff, mypy --strict, non-slow pytest.

Thanks

Built by following .docs/findings.md and the 29-sprint plan closely. Every pitfall in the findings inventory corresponds to a test and an explicit guardrail somewhere in the codebase.

The complete per-sprint history lives in .docs/sprints/ (local to the repo by user choice; planning artifacts stay out of git).

View source

  
        1
        # Changelog
      
        2
        
        3
        All notable changes to DocumentLanguageModel are recorded here. The
      
        4
        format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/);
      
        5
        the project targets [Semantic Versioning](https://semver.org/).
      
        6
        
        7
        ## [Unreleased]
      
        8
        
        9
        ### Fixed
      
        10
        
        11
        - **`gguf_arch` preflight probe was silently false-negative on every
      
        12
          HF off-registry base.** Three compounding bugs surfaced while
      
        13
          trying to train against `hf:Qwen/Qwen3-1.7B`:
      
        14
          1. The probe's regex matched `@Model.register(...)` but upstream
      
        15
             llama.cpp renamed the decorator to `@ModelBase.register(...)`
      
        16
             mid-2024; the regex now accepts both forms.
      
        17
          2. The regex captured only the *first* quoted arg, silently
      
        18
             missing multi-arg decorators like
      
        19
             `@ModelBase.register("Qwen3ForCausalLM", "Qwen3Model")`; the
      
        20
             probe now extracts every quoted string inside the decorator's
      
        21
             arg list.
      
        22
          3. The probe compared `spec.gguf_arch` (short label like
      
        23
             `"qwen3"`) against the decorator's arguments, but llama.cpp
      
        24
             registers HF class names (`"Qwen3ForCausalLM"`) — different
      
        25
             namespaces, will never match. Comparison now uses
      
        26
             `spec.architecture`. The bug was invisible because registered
      
        27
             models bypass the probe entirely; it only bit `hf:` paths.
      
        28
        
        29
        ### Added
      
        30
        
        31
        - **`dlm export --emit-sway-json`** writes a ready-to-run
      
        32
          `<export-dir>/sway.yaml` alongside the GGUF/Modelfile, eliminating
      
        33
          the previous two-step `dlm export` → `sway autogen` ritual users
      
        34
          had to do before evaluating an adapter via [sway](https://github.com/tenseleyFlow/sway).
      
        35
          Calls into `dlm_sway.integrations.dlm.autogen.build_spec_dict` via
      
        36
          a new `dlm.export.sway_json.write_sway_json` helper. Closes the X1
      
        37
          half of sway's Sprint 26 cross-repo integration; X3 (sway-side
      
        38
          `sway pack` / `sway unpack`) ships in sway proper.
      
        39
          - New `[sway]` optional extra (`pip install 'dlm[sway]'`) pulls
      
        40
            `dlm-sway>=0.1.0`. Deliberately pulls plain `dlm-sway`, NOT
      
        41
            `dlm-sway[dlm]`, because the round-trip extra would create a
      
        42
            pip-resolver cycle (sway's `[dlm]` extra already pulls dlm).
      
        43
          - Failures route through a new typed `SwayJsonExportError`
      
        44
            (subclass of `ExportError`) so the CLI's existing exception
      
        45
            handler renders them cleanly. The most common failure — user
      
        46
            didn't install the `[sway]` extra — gets a message that names
      
        47
            the install command verbatim.
      
        48
          - 5 unit tests in `tests/unit/cli/test_export_sway_json.py`
      
        49
            cover the helper round-trip, missing-extra error, autogen
      
        50
            failure wrapping, and CLI flag wiring.
      
        51
        - **`dlm train --skip-export-probes`** mirrors the flag on `dlm init`
      
        52
          (it was missing from the train CLI; a user could `dlm init
      
        53
          --skip-export-probes` a fresh .dlm then have `dlm train` re-run
      
        54
          the probes and fail). The flag threads into `resolve_base_model`
      
        55
          identically on both paths; help text matches verbatim.
      
        56
        
        57
        ## [0.10.0] — 2026-04-21
      
        58
        
        59
        Four phases of work in a single release: advanced training, expanded
      
        60
        hardware coverage, the UX layer, and the ecosystem layer. 265 commits
      
        61
        since v0.9.0, five additive schema migrations (v1 → v6), six brutal
      
        62
        audits (audit-04 through audit-09) with remediations landed inline.
      
        63
        
        64
        Still below 1.0 on purpose — the milestone for the semantic bump
      
        65
        remains the same as stated at v0.9.0: a human has to train + export +
      
        66
        `ollama run` a real document end-to-end and walk away satisfied. This
      
        67
        release is a broad feature expansion, not a stability claim.
      
        68
        
        69
        ### Breaking changes
      
        70
        
        71
        None at the data level — schema migrations v1 → v2 → v3 → v4 → v5 →
      
        72
        v6 are all additive and run automatically via `dlm migrate`. Existing
      
        73
        `.dlm` files parse without modification.
      
        74
        
        75
        One subtle CLI contract change: `dlm serve` now refuses an untrained
      
        76
        `.dlm` with an actionable error instead of a low-level
      
        77
        `ManifestCorruptError`. Scripts that relied on the previous behavior
      
        78
        exit-coded identically (exit 1) but must read the new message text.
      
        79
        
        80
        ### Advanced training
      
        81
        
        82
        Preference tuning (Sprint 17) and its orchestration (Sprint 18):
      
        83
        
        84
        - `::preference::` section fences with `### Prompt` / `### Chosen` /
      
        85
          `### Rejected` grammar.
      
        86
        - DPO via TRL's `DPOTrainer`, ORPO via `trl.experimental.orpo`.
      
        87
        - `training.preference` frontmatter block (method / β / reference
      
        88
          mode / loss type / max lengths).
      
        89
        - Phase orchestrator runs SFT → DPO/ORPO in sequence when preference
      
        90
          content is present; `--phase sft|preference|all` overrides.
      
        91
        - Replay corpus gains `sample_preference_rows` — preference sections
      
        92
          sample with the same recency-weighted reservoir as CPT rows.
      
        93
        - Doctor halves the micro-batch estimate and scales VRAM estimates
      
        94
          when a DPO phase is active.
      
        95
        
        96
        Continued pretraining refinements for DAPT (Sprint 19):
      
        97
        
        98
        - `training.cpt` schema block — `CosineWithFloor` LR schedule,
      
        99
          embed-layer warm-up, mixed-mode loss split reporting, vocab-gap
      
        100
          diagnostics.
      
        101
        - Embed-layer freeze/unfreeze context manager wrapping the first N
      
        102
          steps so vocab extensions settle before the backbone moves.
      
        103
        - Training summary adds per-mode loss fields so DAPT runs report SFT
      
        104
          vs CPT loss separately.
      
        105
        
        106
        Multi-adapter (Sprint 20a-c):
      
        107
        
        108
        - `training.adapters: [name, config]` with mutual exclusion against
      
        109
          the flat LoRA knobs.
      
        110
        - `dlm train --adapter <name>` / `dlm prompt --adapter <name>` /
      
        111
          `dlm export --adapter <name>`.
      
        112
        - `dlm export --adapter-mix a:0.5,b:0.5` — weighted merge via
      
        113
          `PEFT.add_weighted_adapter`, with QLoRA safety gate.
      
        114
        - Per-adapter store layout: `adapter/{name}/versions/vNNNN/`.
      
        115
        - Finite-weight and finite-eval gates — a training run that produces
      
        116
          NaN weights or loss is rejected (renamed `-rejected`) instead of
      
        117
          committed.
      
        118
        - `training.precision` override (schema v5) lets a document override
      
        119
          the doctor's precision pick; MPS fp16 warns and pins to fp32 after
      
        120
          a real NaN reproduction.
      
        121
        
        122
        ### Hardware
      
        123
        
        124
        MLX inference backend (Sprint 21):
      
        125
        
        126
        - PEFT safetensors → MLX `.npz` converter preserving adapter config.
      
        127
        - `MlxBackend` implementing the `InferenceBackend` protocol.
      
        128
        - `--backend mlx` flag on `dlm prompt`; doctor reports MLX
      
        129
          availability.
      
        130
        
        131
        ROCm training (Sprint 22):
      
        132
        
        133
        - Tier-2 AMD GPU support via ROCm's HIP.
      
        134
        - bf16 + FlashAttention probes adapted for AMD.
      
        135
        - Custom llama.cpp ROCm build script.
      
        136
        - QLoRA-on-ROCm refusal with a precise error message.
      
        137
        
        138
        Multi-GPU training (Sprint 23):
      
        139
        
        140
        - `dlm train --gpus all|N|0,1` dispatches to `accelerate launch`.
      
        141
        - `rank_io.master_only` gates all trainer I/O so ranks don't
      
        142
          duplicate writes.
      
        143
        - `DlmLock` gains `world_size` + `accelerate_version` fields for
      
        144
          reproducibility.
      
        145
        - Doctor's effective-batch-size math respects the selected rank
      
        146
          count.
      
        147
        
        148
        ### UX
      
        149
        
        150
        Interactive REPL (Sprint 24):
      
        151
        
        152
        - `dlm repl <path>` — `prompt_toolkit` loop against the trained
      
        153
          adapter.
      
        154
        - Slash-command parser: `/seed`, `/temp`, `/top_p`, `/max_tokens`,
      
        155
          `/system`, `/reload`, `/quit`.
      
        156
        - Persistent per-store history file.
      
        157
        
        158
        Save-to-train watch mode (Sprint 25):
      
        159
        
        160
        - `dlm train --watch` — `watchfiles` wrapper with debounced retrain
      
        161
          on settled saves.
      
        162
        - Rich live status line (step, loss, elapsed, files watched).
      
        163
        - Ctrl-C exits cleanly between cycles.
      
        164
        - `--watch --repl` bridge is honestly deferred (marked `[~]` pending
      
        165
          a CI-capable test harness).
      
        166
        
        167
        Observability (Sprint 26):
      
        168
        
        169
        - Per-store SQLite metrics database at
      
        170
          `~/.dlm/store/<id>/metrics.db`.
      
        171
        - Typed event dataclasses: `RunStart`, `Step`, `Eval`, `RunEnd`,
      
        172
          `TokenizationEvent`.
      
        173
        - `dlm metrics [--json|--csv]` — runs summary with filters.
      
        174
        - `dlm metrics watch <path>` — live tail of steps + evals.
      
        175
        - Optional sinks: TensorBoard (`[tb]` extra), W&B (`[wandb]` extra).
      
        176
        
        177
        ### Ecosystem
      
        178
        
        179
        Template gallery (Sprint 27):
      
        180
        
        181
        - `dlm templates list` — eight bundled templates (coding tutor,
      
        182
          domain KB, writing partner, personal assistant, meeting notes,
      
        183
          regex buddy, shell one-liner, study guide).
      
        184
        - `meta.yaml` sidecars per template (title, summary, recommended
      
        185
          base, tags, license).
      
        186
        - `dlm init --template <name>` — fresh ULID, adopts the template's
      
        187
          recommended base, persists license acceptance for gated bases.
      
        188
        - Offline-first registry; `--refresh` reserved for a future upstream
      
        189
          gallery.
      
        190
        
        191
        Share protocol (Sprint 28):
      
        192
        
        193
        - `dlm push --to hf:org/name | https://... | peer://host:port`.
      
        194
        - `dlm pull <source>` with signature verification on peer and URL
      
        195
          pulls.
      
        196
        - `dlm serve <path>` — LAN-local peer endpoint with HMAC bearer
      
        197
          tokens, per-token rate limit, explicit public-bind gate.
      
        198
        - Optional minisign signing — sidecar `.minisig` next to the pack.
      
        199
        - HuggingFace Hub sink auto-generates a README from the manifest.
      
        200
        
        201
        Source directives (Sprint 29):
      
        202
        
        203
        - `training.sources: [...]` — declare file or directory sources in
      
        204
          frontmatter; the trainer descends the tree and ingests raw text
      
        205
          through the existing CPT path. `include` / `exclude` glob filters,
      
        206
          per-file and per-source size/count caps.
      
        207
        - `sources_policy: permissive | strict` — strict confines paths to
      
        208
          descendants of the `.dlm`'s directory with a symlink-escape check.
      
        209
        - Deterministic lexicographic enumeration; UTF-8 hygiene; binary
      
        210
          detection via NUL sniff.
      
        211
        - Per-directive provenance in `TrainingRunSummary.source_directives`
      
        212
          (file count, byte total, skip reasons).
      
        213
        
        214
        `.dlm/` descent + auto-scaffold (Sprint 30):
      
        215
        
        216
        - Per-codebase `.dlm/training.yaml` + `.dlm/ignore` discovered on a
      
        217
          directory walk; nearest-ancestor resolution with gitignore-subset
      
        218
          last-match-wins semantics (`!` negation, anchored `/`, trailing
      
        219
          `/`, globstar `**`).
      
        220
        - Default-exclude set for VCS, caches, lockfiles, binaries.
      
        221
        - `Section.tags` flow from config metadata onto synthesized sections
      
        222
          (loss weighting deferred to a future release).
      
        223
        - `dlm train <dir>` auto-scaffolds `<dir>/.dlm/corpus.dlm` on first
      
        224
          run: ULID minted, `--base` + `--include` + `--exclude` + `--policy`
      
        225
          baked in. Second invocation reuses the anchor.
      
        226
        - `--rescaffold` rewrites the scaffolded `.dlm` in place while
      
        227
          preserving `dlm_id`.
      
        228
        
        229
        Tokenized-section cache (Sprint 31):
      
        230
        
        231
        - Per-store cache at `~/.dlm/store/<id>/tokenized-cache/`, keyed by
      
        232
          `(section_id, tokenizer_sha256, sequence_len)`.
      
        233
        - Atomic tmp+rename writes, LRU eviction with current-run
      
        234
          protection, tokenizer-version invalidation on SHA bump.
      
        235
        - `dlm cache show | prune | clear` CLI.
      
        236
        - **Deferred:** trainer-side wiring into the SFTTrainer tokenization
      
        237
          path requires pre-tokenization plus a custom collator (label-shift
      
        238
          preservation is subtle). Module is shipped and unit-tested; the
      
        239
          consumer lands in a future release. See
      
        240
          `src/dlm/directives/cache.py` module docstring.
      
        241
        
        242
        ### Audits + remediations
      
        243
        
        244
        Six brutal audits ran during this window, each producing a
      
        245
        findings doc under `.docs/audits/` and remediation commits
      
        246
        referencing the finding IDs:
      
        247
        
        248
        - Audit 04: replay-store integration, version-drift detection,
      
        249
          tokenizer probe rename.
      
        250
        - Audit 05: pyproject runtime deps, license-acceptance record
      
        251
          persistence, lock policy rules.
      
        252
        - Audit 06: 16 findings across GGUF parser hardening, ollama smoke
      
        253
          tests, timezone-aware timestamps, pack hash determinism, vendor
      
        254
          path resolution.
      
        255
        - Audit 07: forward-date schema rejection, ruff src-side cleanup.
      
        256
        - Audit 08: multi-GPU world_size plumbing, MLX adapter config fidelity,
      
        257
          llama-cpp build env honoring, CLI reference drift.
      
        258
        - Audit 09 (Phase 7 brutal): `dlm train <dir>` end-to-end crash
      
        259
          (B1 + B2), test-masks-bugs pattern (B3), orphan tokenized-cache
      
        260
          (M1+M2 documented deferral), `dlm serve` guard on untrained `.dlm`
      
        261
          (M3), task-tracker drift (M4), seven minors + two polish. Empirical
      
        262
          differential evidence in `09-sway-appendix.md` — 359σ delta_kl vs
      
        263
          null-adapter baseline on Fortran-idiomatic prompts.
      
        264
        
        265
        ### Schema migrations
      
        266
        
        267
        All additive. Identity migrators; no data loss.
      
        268
        
        269
        | From | To | Added                                       |
      
        270
        |------|----|---------------------------------------------|
      
        271
        | v1   | v2 | `training.preference` (DPO/ORPO) rename     |
      
        272
        | v2   | v3 | `training.cpt` block (schedule + warm-up)   |
      
        273
        | v3   | v4 | `training.adapters` (named multi-adapter)   |
      
        274
        | v4   | v5 | `training.precision` override               |
      
        275
        | v5   | v6 | `training.sources` + `sources_policy`       |
      
        276
        
        277
        ### New CLI surface
      
        278
        
        279
        ```
      
        280
        dlm templates list
      
        281
        dlm init --template <name>
      
        282
        dlm push --to <hf:...|https:...|peer://...> [--sign]
      
        283
        dlm pull <source>
      
        284
        dlm serve <path> [--public --i-know-this-is-public]
      
        285
        dlm repl <path>
      
        286
        dlm train --watch
      
        287
        dlm metrics [--json|--csv]
      
        288
        dlm metrics watch <path>
      
        289
        dlm train <dir> --base <key> --include <glob>
      
        290
        dlm cache show | prune | clear
      
        291
        ```
      
        292
        
        293
        ### Test matrix
      
        294
        
        295
        - 2,211 unit tests pass (≥95 % coverage on touched packages).
      
        296
        - ruff clean; mypy `--strict` clean across 215 source files.
      
        297
        - Slow integration matrix: two-adapter training, preference round
      
        298
          trip, MLX adapter conversion, ROCm smoke, multi-GPU smoke,
      
        299
          end-to-end auto-scaffold cycle, tokenized-cache unit suite, peer
      
        300
          round-trip, directive fixture tree → finite adapter.
      
        301
        
        302
        ### Thanks
      
        303
        
        304
        Five phases worth of work. Six audits caught real bugs, and the sway
      
        305
        submodule's differential tests produced the empirical floor that the
      
        306
        engine is behaviorally sound.
      
        307
        
        308
        ## [0.9.0] — target
      
        309
        
        310
        First tagged release. Ships via the
      
        311
        [tenseleyFlow/homebrew-tap](https://github.com/tenseleyFlow/homebrew-tap)
      
        312
        (`brew tap tenseleyFlow/tap && brew install dlm`). Below v1.0 on
      
        313
        purpose — a human still needs to train + export + `ollama run` a real
      
        314
        document end-to-end before we claim the stable number.
      
        315
        
        316
        ### Highlights
      
        317
        
        318
        - CLI: `init`, `train`, `prompt`, `export`, `pack`,
      
        319
          `unpack`, `doctor`, `show`, `migrate`.
      
        320
        - Content-addressed store at `~/.dlm/store/<dlm_id>/` with atomic
      
        321
          manifest updates and exclusive locking.
      
        322
        - Hardware-aware training plan (`dlm doctor`) across CUDA / MPS /
      
        323
          ROCm / CPU tiers, with a refusal matrix that fails loudly on
      
        324
          unsupported combinations.
      
        325
        - Curated base-model registry (10 entries) plus `hf:org/name`
      
        326
          escape hatch with compatibility probes.
      
        327
        - LoRA + QLoRA training, replay-corpus retraining that retains prior
      
        328
          sections, two-phase atomic version commits.
      
        329
        - Eval harness: val-loss, perplexity, early-stop.
      
        330
        - GGUF export with imatrix-calibrated quantization, explicit Go
      
        331
          chat template (no fuzzy matching), embedding-row SHA verification,
      
        332
          merge-safety gate against QLoRA pitfalls.
      
        333
        - Ollama integration: Modelfile emission, `ollama create`, smoke
      
        334
          validation, closed-loop token-identity verification against the
      
        335
          HF Jinja reference.
      
        336
        - `.dlm.pack` format: byte-identical packs, symlink / tar-bomb /
      
        337
          zstd-bomb defenses, per-file SHA-256 integrity, pack-format
      
        338
          migrations registry.
      
        339
        - Reproducibility contract: per-store `dlm.lock` with severity-table
      
        340
          mismatch policy, `--strict-lock` / `--update-lock` / `--ignore-lock`
      
        341
          CLI flags, determinism golden integration test.
      
        342
        - Documentation: getting started, `.dlm` format reference, CLI
      
        343
          reference, six cookbook recipes, architecture overview,
      
        344
          troubleshooting, determinism guide.
      
        345
        - Five starter templates: coding tutor, domain KB, writing partner,
      
        346
          personal assistant, changelog.
      
        347
        - Weekly CI jobs: chat-template drift, slow integration suite.
      
        348
        - Pre-commit config: ruff, mypy `--strict`, non-slow pytest.
      
        349
        
        350
        ### Thanks
      
        351
        
        352
        Built by following `.docs/findings.md` and the 29-sprint plan closely.
      
        353
        Every pitfall in the findings inventory corresponds to a test and an
      
        354
        explicit guardrail somewhere in the codebase.
      
        355
        
        356
        ---
      
        357
        
        358
        The complete per-sprint history lives in `.docs/sprints/` (local to
      
        359
        the repo by user choice; planning artifacts stay out of git).

1	# Changelog
2
3	All notable changes to DocumentLanguageModel are recorded here. The
4	format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/);
5	the project targets [Semantic Versioning](https://semver.org/).
6
7	## [Unreleased]
8
9	### Fixed
10
11	- **`gguf_arch` preflight probe was silently false-negative on every
12	HF off-registry base.** Three compounding bugs surfaced while
13	trying to train against `hf:Qwen/Qwen3-1.7B`:
14	1. The probe's regex matched `@Model.register(...)` but upstream
15	llama.cpp renamed the decorator to `@ModelBase.register(...)`
16	mid-2024; the regex now accepts both forms.
17	2. The regex captured only the first quoted arg, silently
18	missing multi-arg decorators like
19	`@ModelBase.register("Qwen3ForCausalLM", "Qwen3Model")`; the
20	probe now extracts every quoted string inside the decorator's
21	arg list.
22	3. The probe compared `spec.gguf_arch` (short label like
23	`"qwen3"`) against the decorator's arguments, but llama.cpp
24	registers HF class names (`"Qwen3ForCausalLM"`) — different
25	namespaces, will never match. Comparison now uses
26	`spec.architecture`. The bug was invisible because registered
27	models bypass the probe entirely; it only bit `hf:` paths.
28
29	### Added
30
31	- `dlm export --emit-sway-json` writes a ready-to-run
32	`<export-dir>/sway.yaml` alongside the GGUF/Modelfile, eliminating
33	the previous two-step `dlm export` → `sway autogen` ritual users
34	had to do before evaluating an adapter via [sway](https://github.com/tenseleyFlow/sway).
35	Calls into `dlm_sway.integrations.dlm.autogen.build_spec_dict` via
36	a new `dlm.export.sway_json.write_sway_json` helper. Closes the X1
37	half of sway's Sprint 26 cross-repo integration; X3 (sway-side
38	`sway pack` / `sway unpack`) ships in sway proper.
39	- New `[sway]` optional extra (`pip install 'dlm[sway]'`) pulls
40	`dlm-sway>=0.1.0`. Deliberately pulls plain `dlm-sway`, NOT
41	`dlm-sway[dlm]`, because the round-trip extra would create a
42	pip-resolver cycle (sway's `[dlm]` extra already pulls dlm).
43	- Failures route through a new typed `SwayJsonExportError`
44	(subclass of `ExportError`) so the CLI's existing exception
45	handler renders them cleanly. The most common failure — user
46	didn't install the `[sway]` extra — gets a message that names
47	the install command verbatim.
48	- 5 unit tests in `tests/unit/cli/test_export_sway_json.py`
49	cover the helper round-trip, missing-extra error, autogen
50	failure wrapping, and CLI flag wiring.
51	- `dlm train --skip-export-probes` mirrors the flag on `dlm init`
52	(it was missing from the train CLI; a user could `dlm init
53	--skip-export-probes` a fresh .dlm then have `dlm train` re-run
54	the probes and fail). The flag threads into `resolve_base_model`
55	identically on both paths; help text matches verbatim.
56
57	## [0.10.0] — 2026-04-21
58
59	Four phases of work in a single release: advanced training, expanded
60	hardware coverage, the UX layer, and the ecosystem layer. 265 commits
61	since v0.9.0, five additive schema migrations (v1 → v6), six brutal
62	audits (audit-04 through audit-09) with remediations landed inline.
63
64	Still below 1.0 on purpose — the milestone for the semantic bump
65	remains the same as stated at v0.9.0: a human has to train + export +
66	`ollama run` a real document end-to-end and walk away satisfied. This
67	release is a broad feature expansion, not a stability claim.
68
69	### Breaking changes
70
71	None at the data level — schema migrations v1 → v2 → v3 → v4 → v5 →
72	v6 are all additive and run automatically via `dlm migrate`. Existing
73	`.dlm` files parse without modification.
74
75	One subtle CLI contract change: `dlm serve` now refuses an untrained
76	`.dlm` with an actionable error instead of a low-level
77	`ManifestCorruptError`. Scripts that relied on the previous behavior
78	exit-coded identically (exit 1) but must read the new message text.
79
80	### Advanced training
81
82	Preference tuning (Sprint 17) and its orchestration (Sprint 18):
83
84	- `::preference::` section fences with `### Prompt` / `### Chosen` /
85	`### Rejected` grammar.
86	- DPO via TRL's `DPOTrainer`, ORPO via `trl.experimental.orpo`.
87	- `training.preference` frontmatter block (method / β / reference
88	mode / loss type / max lengths).
89	- Phase orchestrator runs SFT → DPO/ORPO in sequence when preference
90	content is present; `--phase sft\|preference\|all` overrides.
91	- Replay corpus gains `sample_preference_rows` — preference sections
92	sample with the same recency-weighted reservoir as CPT rows.
93	- Doctor halves the micro-batch estimate and scales VRAM estimates
94	when a DPO phase is active.
95
96	Continued pretraining refinements for DAPT (Sprint 19):
97
98	- `training.cpt` schema block — `CosineWithFloor` LR schedule,
99	embed-layer warm-up, mixed-mode loss split reporting, vocab-gap
100	diagnostics.
101	- Embed-layer freeze/unfreeze context manager wrapping the first N
102	steps so vocab extensions settle before the backbone moves.
103	- Training summary adds per-mode loss fields so DAPT runs report SFT
104	vs CPT loss separately.
105
106	Multi-adapter (Sprint 20a-c):
107
108	- `training.adapters: [name, config]` with mutual exclusion against
109	the flat LoRA knobs.
110	- `dlm train --adapter <name>` / `dlm prompt --adapter <name>` /
111	`dlm export --adapter <name>`.
112	- `dlm export --adapter-mix a:0.5,b:0.5` — weighted merge via
113	`PEFT.add_weighted_adapter`, with QLoRA safety gate.
114	- Per-adapter store layout: `adapter/{name}/versions/vNNNN/`.
115	- Finite-weight and finite-eval gates — a training run that produces
116	NaN weights or loss is rejected (renamed `-rejected`) instead of
117	committed.
118	- `training.precision` override (schema v5) lets a document override
119	the doctor's precision pick; MPS fp16 warns and pins to fp32 after
120	a real NaN reproduction.
121
122	### Hardware
123
124	MLX inference backend (Sprint 21):
125
126	- PEFT safetensors → MLX `.npz` converter preserving adapter config.
127	- `MlxBackend` implementing the `InferenceBackend` protocol.
128	- `--backend mlx` flag on `dlm prompt`; doctor reports MLX
129	availability.
130
131	ROCm training (Sprint 22):
132
133	- Tier-2 AMD GPU support via ROCm's HIP.
134	- bf16 + FlashAttention probes adapted for AMD.
135	- Custom llama.cpp ROCm build script.
136	- QLoRA-on-ROCm refusal with a precise error message.
137
138	Multi-GPU training (Sprint 23):
139
140	- `dlm train --gpus all\|N\|0,1` dispatches to `accelerate launch`.
141	- `rank_io.master_only` gates all trainer I/O so ranks don't
142	duplicate writes.
143	- `DlmLock` gains `world_size` + `accelerate_version` fields for
144	reproducibility.
145	- Doctor's effective-batch-size math respects the selected rank
146	count.
147
148	### UX
149
150	Interactive REPL (Sprint 24):
151
152	- `dlm repl <path>` — `prompt_toolkit` loop against the trained
153	adapter.
154	- Slash-command parser: `/seed`, `/temp`, `/top_p`, `/max_tokens`,
155	`/system`, `/reload`, `/quit`.
156	- Persistent per-store history file.
157
158	Save-to-train watch mode (Sprint 25):
159
160	- `dlm train --watch` — `watchfiles` wrapper with debounced retrain
161	on settled saves.
162	- Rich live status line (step, loss, elapsed, files watched).
163	- Ctrl-C exits cleanly between cycles.
164	- `--watch --repl` bridge is honestly deferred (marked `[~]` pending
165	a CI-capable test harness).
166
167	Observability (Sprint 26):
168
169	- Per-store SQLite metrics database at
170	`~/.dlm/store/<id>/metrics.db`.
171	- Typed event dataclasses: `RunStart`, `Step`, `Eval`, `RunEnd`,
172	`TokenizationEvent`.
173	- `dlm metrics [--json\|--csv]` — runs summary with filters.
174	- `dlm metrics watch <path>` — live tail of steps + evals.
175	- Optional sinks: TensorBoard (`[tb]` extra), W&B (`[wandb]` extra).
176
177	### Ecosystem
178
179	Template gallery (Sprint 27):
180
181	- `dlm templates list` — eight bundled templates (coding tutor,
182	domain KB, writing partner, personal assistant, meeting notes,
183	regex buddy, shell one-liner, study guide).
184	- `meta.yaml` sidecars per template (title, summary, recommended
185	base, tags, license).
186	- `dlm init --template <name>` — fresh ULID, adopts the template's
187	recommended base, persists license acceptance for gated bases.
188	- Offline-first registry; `--refresh` reserved for a future upstream
189	gallery.
190
191	Share protocol (Sprint 28):
192
193	- `dlm push --to hf:org/name \| https://... \| peer://host:port`.
194	- `dlm pull <source>` with signature verification on peer and URL
195	pulls.
196	- `dlm serve <path>` — LAN-local peer endpoint with HMAC bearer
197	tokens, per-token rate limit, explicit public-bind gate.
198	- Optional minisign signing — sidecar `.minisig` next to the pack.
199	- HuggingFace Hub sink auto-generates a README from the manifest.
200
201	Source directives (Sprint 29):
202
203	- `training.sources: [...]` — declare file or directory sources in
204	frontmatter; the trainer descends the tree and ingests raw text
205	through the existing CPT path. `include` / `exclude` glob filters,
206	per-file and per-source size/count caps.
207	- `sources_policy: permissive \| strict` — strict confines paths to
208	descendants of the `.dlm`'s directory with a symlink-escape check.
209	- Deterministic lexicographic enumeration; UTF-8 hygiene; binary
210	detection via NUL sniff.
211	- Per-directive provenance in `TrainingRunSummary.source_directives`
212	(file count, byte total, skip reasons).
213
214	`.dlm/` descent + auto-scaffold (Sprint 30):
215
216	- Per-codebase `.dlm/training.yaml` + `.dlm/ignore` discovered on a
217	directory walk; nearest-ancestor resolution with gitignore-subset
218	last-match-wins semantics (`!` negation, anchored `/`, trailing
219	`/`, globstar `**`).
220	- Default-exclude set for VCS, caches, lockfiles, binaries.
221	- `Section.tags` flow from config metadata onto synthesized sections
222	(loss weighting deferred to a future release).
223	- `dlm train <dir>` auto-scaffolds `<dir>/.dlm/corpus.dlm` on first
224	run: ULID minted, `--base` + `--include` + `--exclude` + `--policy`
225	baked in. Second invocation reuses the anchor.
226	- `--rescaffold` rewrites the scaffolded `.dlm` in place while
227	preserving `dlm_id`.
228
229	Tokenized-section cache (Sprint 31):
230
231	- Per-store cache at `~/.dlm/store/<id>/tokenized-cache/`, keyed by
232	`(section_id, tokenizer_sha256, sequence_len)`.
233	- Atomic tmp+rename writes, LRU eviction with current-run
234	protection, tokenizer-version invalidation on SHA bump.
235	- `dlm cache show \| prune \| clear` CLI.
236	- Deferred: trainer-side wiring into the SFTTrainer tokenization
237	path requires pre-tokenization plus a custom collator (label-shift
238	preservation is subtle). Module is shipped and unit-tested; the
239	consumer lands in a future release. See
240	`src/dlm/directives/cache.py` module docstring.
241
242	### Audits + remediations
243
244	Six brutal audits ran during this window, each producing a
245	findings doc under `.docs/audits/` and remediation commits
246	referencing the finding IDs:
247
248	- Audit 04: replay-store integration, version-drift detection,
249	tokenizer probe rename.
250	- Audit 05: pyproject runtime deps, license-acceptance record
251	persistence, lock policy rules.
252	- Audit 06: 16 findings across GGUF parser hardening, ollama smoke
253	tests, timezone-aware timestamps, pack hash determinism, vendor
254	path resolution.
255	- Audit 07: forward-date schema rejection, ruff src-side cleanup.
256	- Audit 08: multi-GPU world_size plumbing, MLX adapter config fidelity,
257	llama-cpp build env honoring, CLI reference drift.
258	- Audit 09 (Phase 7 brutal): `dlm train <dir>` end-to-end crash
259	(B1 + B2), test-masks-bugs pattern (B3), orphan tokenized-cache
260	(M1+M2 documented deferral), `dlm serve` guard on untrained `.dlm`
261	(M3), task-tracker drift (M4), seven minors + two polish. Empirical
262	differential evidence in `09-sway-appendix.md` — 359σ delta_kl vs
263	null-adapter baseline on Fortran-idiomatic prompts.
264
265	### Schema migrations
266
267	All additive. Identity migrators; no data loss.
268
269	\| From \| To \| Added \|
270	\|------\|----\|---------------------------------------------\|
271	\| v1 \| v2 \| `training.preference` (DPO/ORPO) rename \|
272	\| v2 \| v3 \| `training.cpt` block (schedule + warm-up) \|
273	\| v3 \| v4 \| `training.adapters` (named multi-adapter) \|
274	\| v4 \| v5 \| `training.precision` override \|
275	\| v5 \| v6 \| `training.sources` + `sources_policy` \|
276
277	### New CLI surface
278
279	```
280	dlm templates list
281	dlm init --template <name>
282	dlm push --to <hf:...\|https:...\|peer://...> [--sign]
283	dlm pull <source>
284	dlm serve <path> [--public --i-know-this-is-public]
285	dlm repl <path>
286	dlm train --watch
287	dlm metrics [--json\|--csv]
288	dlm metrics watch <path>
289	dlm train <dir> --base <key> --include <glob>
290	dlm cache show \| prune \| clear
291	```
292
293	### Test matrix
294
295	- 2,211 unit tests pass (≥95 % coverage on touched packages).
296	- ruff clean; mypy `--strict` clean across 215 source files.
297	- Slow integration matrix: two-adapter training, preference round
298	trip, MLX adapter conversion, ROCm smoke, multi-GPU smoke,
299	end-to-end auto-scaffold cycle, tokenized-cache unit suite, peer
300	round-trip, directive fixture tree → finite adapter.
301
302	### Thanks
303
304	Five phases worth of work. Six audits caught real bugs, and the sway
305	submodule's differential tests produced the empirical floor that the
306	engine is behaviorally sound.
307
308	## [0.9.0] — target
309
310	First tagged release. Ships via the
311	[tenseleyFlow/homebrew-tap](https://github.com/tenseleyFlow/homebrew-tap)
312	(`brew tap tenseleyFlow/tap && brew install dlm`). Below v1.0 on
313	purpose — a human still needs to train + export + `ollama run` a real
314	document end-to-end before we claim the stable number.
315
316	### Highlights
317
318	- CLI: `init`, `train`, `prompt`, `export`, `pack`,
319	`unpack`, `doctor`, `show`, `migrate`.
320	- Content-addressed store at `~/.dlm/store/<dlm_id>/` with atomic
321	manifest updates and exclusive locking.
322	- Hardware-aware training plan (`dlm doctor`) across CUDA / MPS /
323	ROCm / CPU tiers, with a refusal matrix that fails loudly on
324	unsupported combinations.
325	- Curated base-model registry (10 entries) plus `hf:org/name`
326	escape hatch with compatibility probes.
327	- LoRA + QLoRA training, replay-corpus retraining that retains prior
328	sections, two-phase atomic version commits.
329	- Eval harness: val-loss, perplexity, early-stop.
330	- GGUF export with imatrix-calibrated quantization, explicit Go
331	chat template (no fuzzy matching), embedding-row SHA verification,
332	merge-safety gate against QLoRA pitfalls.
333	- Ollama integration: Modelfile emission, `ollama create`, smoke
334	validation, closed-loop token-identity verification against the
335	HF Jinja reference.
336	- `.dlm.pack` format: byte-identical packs, symlink / tar-bomb /
337	zstd-bomb defenses, per-file SHA-256 integrity, pack-format
338	migrations registry.
339	- Reproducibility contract: per-store `dlm.lock` with severity-table
340	mismatch policy, `--strict-lock` / `--update-lock` / `--ignore-lock`
341	CLI flags, determinism golden integration test.
342	- Documentation: getting started, `.dlm` format reference, CLI
343	reference, six cookbook recipes, architecture overview,
344	troubleshooting, determinism guide.
345	- Five starter templates: coding tutor, domain KB, writing partner,
346	personal assistant, changelog.
347	- Weekly CI jobs: chat-template drift, slow integration suite.
348	- Pre-commit config: ruff, mypy `--strict`, non-slow pytest.
349
350	### Thanks
351
352	Built by following `.docs/findings.md` and the 29-sprint plan closely.
353	Every pitfall in the findings inventory corresponds to a test and an
354	explicit guardrail somewhere in the codebase.
355
356	---
357
358	The complete per-sprint history lives in `.docs/sprints/` (local to
359	the repo by user choice; planning artifacts stay out of git).