docs: architecture + troubleshooting (symptom/cause/fix) + determinism guide (sprint 16)
- SHA
5e5a46e38cb08d470985e5e9e91d1082a44b5765- Parents
-
9fc495d - Tree
2ff888e
5e5a46e
5e5a46e38cb08d470985e5e9e91d1082a44b57659fc495d
2ff888e| Status | File | + | - |
|---|---|---|---|
| A |
docs/architecture.md
|
111 | 0 |
| A |
docs/determinism.md
|
139 | 0 |
| A |
docs/troubleshooting.md
|
206 | 0 |
docs/architecture.mdadded@@ -0,0 +1,111 @@ | ||
| 1 | +# Architecture | |
| 2 | + | |
| 3 | +A compressed map of how DLM is organized. For the sprint-level | |
| 4 | +history, see `.docs/sprints/` in the repo (planning artifacts kept | |
| 5 | +local). | |
| 6 | + | |
| 7 | +## The big idea | |
| 8 | + | |
| 9 | +``` | |
| 10 | +.dlm file ──▶ parser ──▶ dataset builder ──▶ SFTTrainer ──▶ LoRA adapter | |
| 11 | + │ ▲ │ | |
| 12 | + │ │ ▼ | |
| 13 | + └──▶ replay corpus ─────────┘ GGUF + Modelfile | |
| 14 | + │ | |
| 15 | + ▼ | |
| 16 | + ollama create | |
| 17 | +``` | |
| 18 | + | |
| 19 | +The `.dlm` source is the input; a trained LoRA adapter is the output. | |
| 20 | +Everything in between is opinionated engineering: content-addressed | |
| 21 | +storage, a determinism contract, a hardware doctor, an explicit Go | |
| 22 | +chat template, preflight checks against every footgun we've found. | |
| 23 | + | |
| 24 | +## Module map | |
| 25 | + | |
| 26 | +| Module | What it owns | | |
| 27 | +|---|---| | |
| 28 | +| `dlm.doc` | `.dlm` parser, serializer, Pydantic schema, section grammar. | | |
| 29 | +| `dlm.store` | Content-addressed store at `~/.dlm/store/<id>/`. Paths, manifest, exclusive lock, introspection. | | |
| 30 | +| `dlm.base_models` | Curated registry of launch-day bases; `hf:` escape hatch; compatibility probes; license acceptance. | | |
| 31 | +| `dlm.hardware` | Backend detection (CUDA / MPS / ROCm / CPU), capability probing, memory estimation, refusal matrix, `TrainingPlan` resolver. | | |
| 32 | +| `dlm.data` | Section → dataset row adapter, tokenizer bring-up (pad ≠ EOS rule), TRL formatting. | | |
| 33 | +| `dlm.replay` | Zstd-compressed append-only corpus + recency-weighted sampler + delta-against-manifest. | | |
| 34 | +| `dlm.train` | Orchestrator: preflight → determinism → load → train → two-phase commit → state sidecar → manifest update. | | |
| 35 | +| `dlm.eval` | Perplexity / val-loss callback + early-stop + training-summary writer. | | |
| 36 | +| `dlm.inference` | HF-heavy path for `dlm prompt`; `InferencePlan` resolver. | | |
| 37 | +| `dlm.export` | GGUF conversion, adapter GGUF, quantization, imatrix calibration, embedding-row sha, merge-safety gate. | | |
| 38 | +| `dlm.export.ollama` | Modelfile emission, Go template registry, `ollama create` + smoke, token-identity verification. | | |
| 39 | +| `dlm.pack` | `.dlm.pack` format (v1), packer, unpacker, integrity verification, migrations registry. | | |
| 40 | +| `dlm.lock` | Per-store `dlm.lock` schema, severity-table mismatch policy, validator, writer. | | |
| 41 | +| `dlm.cli` | Typer app + per-command glue; `dlm.cli.reporter` owns formatted error output. | | |
| 42 | +| `dlm.io` | `atomic` (write-and-rename), `text` (UTF-8 + LF normalization), `ulid`. | | |
| 43 | + | |
| 44 | +## Storage layout | |
| 45 | + | |
| 46 | +``` | |
| 47 | +~/.dlm/store/<dlm_id>/ | |
| 48 | +├── dlm.lock # Sprint 15 reproducibility contract | |
| 49 | +├── manifest.json # training runs + exports + content hashes | |
| 50 | +├── adapter/ | |
| 51 | +│ ├── current.txt # → versions/v0001 | |
| 52 | +│ └── versions/ | |
| 53 | +│ ├── v0001/ | |
| 54 | +│ │ ├── adapter_config.json | |
| 55 | +│ │ ├── adapter_model.safetensors | |
| 56 | +│ │ ├── training_state.pt # optimizer/scheduler/RNG | |
| 57 | +│ │ ├── training_state.pt.sha256 | |
| 58 | +│ │ ├── training_run.json # human-readable run metadata | |
| 59 | +│ │ └── pinned_versions.json | |
| 60 | +│ └── v0002/ | |
| 61 | +├── replay/ | |
| 62 | +│ ├── corpus.zst # append-only zstd-compressed section history | |
| 63 | +│ └── index.json | |
| 64 | +├── exports/ | |
| 65 | +│ └── Q4_K_M/ | |
| 66 | +│ ├── base.Q4_K_M.gguf | |
| 67 | +│ ├── adapter.gguf | |
| 68 | +│ ├── Modelfile | |
| 69 | +│ ├── export_manifest.json | |
| 70 | +│ └── imatrix.dat # cached per-corpus-hash | |
| 71 | +├── cache/ # scratch for convert scripts | |
| 72 | +└── logs/ | |
| 73 | + └── train-000001-*.jsonl # per-step JSONL log | |
| 74 | +``` | |
| 75 | + | |
| 76 | +## Contract boundaries | |
| 77 | + | |
| 78 | +Four load-bearing files; when editing, keep them distinct: | |
| 79 | + | |
| 80 | +- **`manifest.json`** — running narrative of training runs, exports, | |
| 81 | + and content hashes. Mutable on every run. Owned by Sprint 04. | |
| 82 | +- **`dlm.lock`** (per-store) — version pins + hardware tier + | |
| 83 | + determinism flags + license acceptance. Owned by Sprint 15. | |
| 84 | +- **`training_state.pt`** — optimizer/scheduler/RNG for bit-exact | |
| 85 | + resume. Owned by Sprint 09. | |
| 86 | +- **`exports/<quant>/export_manifest.json`** — per-export checksums, | |
| 87 | + quant level, pinned llama.cpp tag, smoke output. Owned by Sprint 11. | |
| 88 | + | |
| 89 | +## The determinism contract | |
| 90 | + | |
| 91 | +Same `(.dlm source, base revision, hardware tier, pinned versions, | |
| 92 | +seed, determinism flags)` → same adapter SHA. Enforced by | |
| 93 | +`src/dlm/lock/` + the integration test under | |
| 94 | +`tests/integration/lock/test_determinism_golden.py`. See | |
| 95 | +[Determinism](determinism.md) for details. | |
| 96 | + | |
| 97 | +## Sprint timeline | |
| 98 | + | |
| 99 | +| Phase | Sprints | Release | | |
| 100 | +|---|---|---| | |
| 101 | +| 0 — Foundation | 01–05 (scaffolding → hardware doctor) | v0.1 | | |
| 102 | +| 1 — Core training | 06–10 (registry → replay → trainer → eval) | v0.5 | | |
| 103 | +| 2 — Export | 11–12 (+ 11.5, 11.6, 12.5, 12.6 follow-ups) | v0.8 | | |
| 104 | +| 3 — MVP release | 12b, 13, 14, 14.5, 15, 16 (this sprint) | **v1.0** | | |
| 105 | +| 4 — Advanced training | 17–20 (DPO, ORPO, CPT, multi-adapter) | v1.x | | |
| 106 | +| 5 — Performance & scale | 21–23 (MLX, ROCm, multi-GPU) | v1.x / v2 | | |
| 107 | +| 6 — UX polish | 24–26 (REPL, watch mode, observability) | v2 | | |
| 108 | +| 7 — Ecosystem | 27–28 (gallery, share protocol) | v2+ | | |
| 109 | + | |
| 110 | +Every sprint has a binary Definition of Done; status snapshots live in | |
| 111 | +`.docs/sprints/00-index.md` in the repo (local-only by user choice). | |
docs/determinism.mdadded@@ -0,0 +1,139 @@ | ||
| 1 | +# Determinism & reproducibility | |
| 2 | + | |
| 3 | +DLM treats determinism as a contract: same input → same adapter SHA. | |
| 4 | +The contract is enforced by `src/dlm/lock/` (Sprint 15), backed by a | |
| 5 | +golden integration test, and surfaced to users via three CLI flags. | |
| 6 | + | |
| 7 | +## The contract | |
| 8 | + | |
| 9 | +Given: | |
| 10 | + | |
| 11 | +- the same `.dlm` source text (SHA-256 match), | |
| 12 | +- the same base model revision, | |
| 13 | +- the same pinned versions (torch, transformers, peft, trl, | |
| 14 | + bitsandbytes, accelerate, llama.cpp tag), | |
| 15 | +- the same hardware tier, | |
| 16 | +- the same seed and determinism flags, | |
| 17 | + | |
| 18 | +training produces a byte-identical `adapter_model.safetensors`. | |
| 19 | + | |
| 20 | +Proved by `tests/integration/lock/test_determinism_golden.py`, which | |
| 21 | +runs two fresh training cycles on the tiny model and asserts the | |
| 22 | +adapter SHAs match. | |
| 23 | + | |
| 24 | +## What's in `dlm.lock` | |
| 25 | + | |
| 26 | +Each store has a `dlm.lock` next to `manifest.json`: | |
| 27 | + | |
| 28 | +```json | |
| 29 | +{ | |
| 30 | + "lock_version": 1, | |
| 31 | + "created_at": "2026-04-19T17:30:00", | |
| 32 | + "dlm_id": "01HRZYQ2X0MB5K4VN7E9DNT5GH", | |
| 33 | + "dlm_sha256": "0123…ef", | |
| 34 | + "base_model_revision": "12fd25f77366fa6b3b4b768ec3050bf629380bac", | |
| 35 | + "base_model_sha256": null, | |
| 36 | + "pinned_versions": { | |
| 37 | + "torch": "2.5.1", | |
| 38 | + "transformers": "4.46.2", | |
| 39 | + "peft": "0.14.0", | |
| 40 | + "trl": "0.12.2", | |
| 41 | + "bitsandbytes": "0.45.0" | |
| 42 | + }, | |
| 43 | + "cuda_version": null, | |
| 44 | + "rocm_version": null, | |
| 45 | + "hardware_tier": "mps", | |
| 46 | + "seed": 42, | |
| 47 | + "determinism_flags": {}, | |
| 48 | + "determinism_class": "best-effort", | |
| 49 | + "license_acceptance": null, | |
| 50 | + "last_run_id": 3 | |
| 51 | +} | |
| 52 | +``` | |
| 53 | + | |
| 54 | +Validated on every `dlm train`; written on success. | |
| 55 | + | |
| 56 | +## Mismatch severity table | |
| 57 | + | |
| 58 | +When the live runtime diverges from the recorded lock, each field is | |
| 59 | +classified: | |
| 60 | + | |
| 61 | +| Field | Severity | Policy | | |
| 62 | +|---|---|---| | |
| 63 | +| `dlm_sha256` | ALLOW | Editing the doc is the point of DLM. | | |
| 64 | +| `base_model_revision` | ERROR | Breaks reproducibility; requires `--update-lock` to accept. | | |
| 65 | +| `torch` major version | ERROR | | | |
| 66 | +| `torch` minor/patch | WARN | | | |
| 67 | +| `transformers` / `peft` / `trl` / `accelerate` / `llama_cpp` | WARN | | | |
| 68 | +| `bitsandbytes` any | WARN | QLoRA kernels are version-sensitive. | | |
| 69 | +| `hardware_tier` | WARN | Re-plan recommended. | | |
| 70 | +| `determinism_class` | WARN | | | |
| 71 | +| `determinism_flags` | WARN | | | |
| 72 | + | |
| 73 | +WARN mismatches print to stderr but don't block the run. ERROR | |
| 74 | +mismatches raise `LockValidationError` → exit code 1 with runbook | |
| 75 | +hints. | |
| 76 | + | |
| 77 | +## CLI flags | |
| 78 | + | |
| 79 | +| Flag | Behavior | | |
| 80 | +|---|---| | |
| 81 | +| *(default)* | Validate; abort on ERROR, warn on WARN, proceed + write. | | |
| 82 | +| `--strict-lock` | Upgrade every WARN to ERROR. | | |
| 83 | +| `--update-lock` | Skip validation, always write. For intentional drift acceptance. | | |
| 84 | +| `--ignore-lock` | Skip validation, don't write. For experimentation; the lock on disk stays stale. | | |
| 85 | + | |
| 86 | +The three flags are mutually exclusive. See [CLI reference](cli/reference.md). | |
| 87 | + | |
| 88 | +## Determinism tiers | |
| 89 | + | |
| 90 | +The `determinism_class` field records what tier the host supports: | |
| 91 | + | |
| 92 | +- **`strong`** — CUDA with all deterministic kernels available. Bit-exact | |
| 93 | + reproduction expected across runs. | |
| 94 | +- **`best-effort`** — MPS, ROCm, or CUDA without the full deterministic | |
| 95 | + kernel set. Loss curves are close but not bit-identical. | |
| 96 | +- **`advisory`** — CPU-only or a configuration where DLM refuses to | |
| 97 | + promise determinism (some MPS ops fall here). | |
| 98 | + | |
| 99 | +The golden integration test runs on CPU (tier `advisory`) and still | |
| 100 | +passes because SmolLM2-135M doesn't exercise the nondeterministic | |
| 101 | +kernels. On larger bases the CPU tier stops being bit-exact; that's | |
| 102 | +honest and documented. | |
| 103 | + | |
| 104 | +## Regenerating the golden | |
| 105 | + | |
| 106 | +When a pinned version changes deliberately (dep bump, llama.cpp tag | |
| 107 | +move), the recorded adapter SHA must be refreshed: | |
| 108 | + | |
| 109 | +```sh | |
| 110 | +# Dry run — report the old vs new SHA without writing. | |
| 111 | +$ uv run python scripts/regen-determinism-golden.py | |
| 112 | + | |
| 113 | +# Review the diff; then approve: | |
| 114 | +$ uv run python scripts/regen-determinism-golden.py --approve | |
| 115 | +``` | |
| 116 | + | |
| 117 | +The script: | |
| 118 | + | |
| 119 | +1. Samples `capture_runtime_versions()` to produce the current tuple. | |
| 120 | +2. Runs the tiny-model training twice; confirms the two SHAs match. | |
| 121 | +3. Writes `tests/golden/determinism/tuple-<hash>.json` keyed by a | |
| 122 | + SHA-256 of the sorted version tuple + platform. | |
| 123 | + | |
| 124 | +Each tuple gets its own golden; the tuple file is keyed by content so | |
| 125 | +running on a new platform simply writes a new golden file. The | |
| 126 | +reviewer checks in the new golden alongside the dep bump. | |
| 127 | + | |
| 128 | +## Non-goals | |
| 129 | + | |
| 130 | +- **Byte-exact reproducibility from pure source.** DLM's replay corpus | |
| 131 | + carries prior-run signal. Reconstructing a specific adapter without | |
| 132 | + its replay history isn't possible — use `dlm pack` to archive. | |
| 133 | +- **Airgapped reproducibility.** The first `dlm train` against a new | |
| 134 | + base pulls from HuggingFace. Subsequent runs use the local cache. | |
| 135 | + We don't currently ship a fully-offline path; `--include-base` on | |
| 136 | + `dlm pack` is the workaround. | |
| 137 | +- **MPS bit-exactness for large bases.** Apple's Metal kernels aren't | |
| 138 | + deterministic for every op we use; the `best-effort` tier is an | |
| 139 | + honest label, not a TODO. | |
docs/troubleshooting.mdadded@@ -0,0 +1,206 @@ | ||
| 1 | +# Troubleshooting | |
| 2 | + | |
| 3 | +Structured as **symptom → cause → fix**. Seeded from the pitfall | |
| 4 | +inventory in `.docs/findings.md` (repo-local). Don't see your problem | |
| 5 | +here? Open an issue with the full `dlm doctor` output and the error. | |
| 6 | + | |
| 7 | +## Training | |
| 8 | + | |
| 9 | +### `OOMError: CUDA out of memory at step 12` | |
| 10 | + | |
| 11 | +**Cause:** peak VRAM exceeded the device budget. The doctor picks | |
| 12 | +`grad_accum` to stay under ~85% of VRAM on CUDA / 50% of unified | |
| 13 | +memory on MPS, but some base+lora configurations push harder than the | |
| 14 | +estimator predicts. | |
| 15 | + | |
| 16 | +**Fix:** DLM's OOM guard catches CUDA OOM, computes a recommended | |
| 17 | +`grad_accum` bump, and surfaces it in the error message. Apply the | |
| 18 | +recommendation in the `.dlm` frontmatter: | |
| 19 | + | |
| 20 | +```yaml | |
| 21 | +training: | |
| 22 | + micro_batch_size: 1 | |
| 23 | + grad_accum: 8 # was "auto" which picked 4; bump to 8 | |
| 24 | +``` | |
| 25 | + | |
| 26 | +Rerun with `--fresh` (the first run's mock was incomplete) or | |
| 27 | +`--resume` if the partial run committed state before OOM. | |
| 28 | + | |
| 29 | +### `RuntimeError: pad_token is <|endoftext|>` | |
| 30 | + | |
| 31 | +**Cause:** pitfall #4 — padding with EOS mid-sequence corrupts labels. | |
| 32 | + | |
| 33 | +**Fix:** The tokenizer bring-up (Sprint 07) sets pad to `unk_token` or | |
| 34 | +adds `<|pad|>` as a learnable token (and forces | |
| 35 | +`modules_to_save=["embed_tokens", "lm_head"]` — adapter size inflates; | |
| 36 | +this is logged loudly). If you see this error raw from HF, the | |
| 37 | +bring-up didn't run — file a bug with the base model name. | |
| 38 | + | |
| 39 | +### `ResumeIntegrityError: training_state.pt sha256 mismatch` | |
| 40 | + | |
| 41 | +**Cause:** the state sidecar's bytes disagree with the recorded SHA. | |
| 42 | +Either the file was partially written (power loss) or modified out of | |
| 43 | +band. | |
| 44 | + | |
| 45 | +**Fix:** `--resume` refuses to proceed. Use `--fresh` to discard the | |
| 46 | +state and start from scratch, or restore the sidecar from a backup / | |
| 47 | +`.dlm.pack`. | |
| 48 | + | |
| 49 | +### Loss is flat / doesn't decrease | |
| 50 | + | |
| 51 | +**Cause:** several possibilities. | |
| 52 | + | |
| 53 | +**Fixes (check in order):** | |
| 54 | + | |
| 55 | +1. **Dataset is too small.** Under ~500 tokens of training signal, | |
| 56 | + 20 steps won't move loss visibly. Add more sections. | |
| 57 | +2. **Learning rate too low.** Try `learning_rate: 5e-4` (up from the | |
| 58 | + default 2e-4) for small documents. | |
| 59 | +3. **Wrong base.** Coder documents on a non-coder base (or vice | |
| 60 | + versa) fight the base's pretraining. Switch to the appropriate | |
| 61 | + base. | |
| 62 | +4. **`--fresh` would un-freeze replay weight.** If you've edited the | |
| 63 | + document heavily, the replay corpus dominates the training mix; | |
| 64 | + try `--fresh` to train only on current content. | |
| 65 | + | |
| 66 | +## Export | |
| 67 | + | |
| 68 | +### `preflight: unknown pre-tokenizer hash` | |
| 69 | + | |
| 70 | +**Cause:** pitfall #5 — the llama.cpp GGUF conversion can't recognize | |
| 71 | +the base's pre-tokenizer, which silently produces a broken tokenizer | |
| 72 | +in the GGUF. | |
| 73 | + | |
| 74 | +**Fix:** bump `vendor/llama.cpp` to a version that knows this | |
| 75 | +tokenizer: | |
| 76 | + | |
| 77 | +```sh | |
| 78 | +$ cd vendor/llama.cpp | |
| 79 | +$ git fetch origin | |
| 80 | +$ git checkout b9200 # or newer | |
| 81 | +$ cd ../.. | |
| 82 | +$ scripts/bump-llama-cpp.sh build | |
| 83 | +``` | |
| 84 | + | |
| 85 | +Then re-run `dlm export`. The registry probe (Sprint 06) will also | |
| 86 | +re-run on the next `dlm init` + `hf:` base. | |
| 87 | + | |
| 88 | +### `ExportError: no current adapter` | |
| 89 | + | |
| 90 | +**Cause:** export ran against a store with no trained adapter. | |
| 91 | +`adapter/current.txt` either doesn't exist or points nowhere. | |
| 92 | + | |
| 93 | +**Fix:** run `dlm train` before `dlm export`. If you just packed / | |
| 94 | +unpacked, the adapter version number in the pointer file should still | |
| 95 | +be valid — confirm `adapter/versions/vNNNN/` exists under the store. | |
| 96 | + | |
| 97 | +### `merge refused: adapter was trained with QLoRA` | |
| 98 | + | |
| 99 | +**Cause:** pitfall #3 — merging LoRA into a 4-bit base is | |
| 100 | +precision-unsafe. | |
| 101 | + | |
| 102 | +**Fix:** either drop `--merged` (ship base + adapter separately — the | |
| 103 | +recommended path) or add `--dequantize`: | |
| 104 | + | |
| 105 | +```sh | |
| 106 | +$ uv run dlm export tutor.dlm --merged --dequantize --quant Q4_K_M | |
| 107 | +``` | |
| 108 | + | |
| 109 | +`--dequantize` dequantizes the base to fp16, then merges, then | |
| 110 | +requantizes for export. Bigger artifact, slower export; only worth it | |
| 111 | +for single-file deployments. | |
| 112 | + | |
| 113 | +### `lock: base_model_revision changed` | |
| 114 | + | |
| 115 | +**Cause:** the base model revision pinned in `dlm.lock` differs from | |
| 116 | +the current `BaseModelSpec.revision`. Happens on a base-registry bump. | |
| 117 | + | |
| 118 | +**Fix:** | |
| 119 | + | |
| 120 | +```sh | |
| 121 | +$ uv run dlm train tutor.dlm --update-lock | |
| 122 | +``` | |
| 123 | + | |
| 124 | +Retrain against the new revision and overwrite the lock. Or | |
| 125 | +`--ignore-lock` if you're experimenting and don't want to commit to | |
| 126 | +the new revision yet. | |
| 127 | + | |
| 128 | +### Runaway generation in Ollama | |
| 129 | + | |
| 130 | +**Cause:** the Modelfile's `PARAMETER stop` is missing or incomplete. | |
| 131 | +Sprint 12's template registry sets stops per dialect; if the base is | |
| 132 | +off-registry (`hf:` prefix) the template defaults kick in. | |
| 133 | + | |
| 134 | +**Fix:** for a registered base, re-run `dlm export` — the export | |
| 135 | +registry was patched in Sprint 16 audit-06 Q4 to include all | |
| 136 | +per-family stop tokens. For `hf:` bases, open an issue; the template | |
| 137 | +registry needs a manual entry. | |
| 138 | + | |
| 139 | +### `template drift: HF Jinja produced N, Ollama produced M` | |
| 140 | + | |
| 141 | +**Cause:** Sprint 12.6's closed-loop verification caught a token-count | |
| 142 | +divergence between the HF `apply_chat_template` and Ollama's Go | |
| 143 | +template. Either the upstream base's `chat_template` changed or the Go | |
| 144 | +template has a bug. | |
| 145 | + | |
| 146 | +**Fix:** regenerate the goldens (after review): | |
| 147 | + | |
| 148 | +```sh | |
| 149 | +$ uv run python scripts/refresh-chat-template-goldens.py --dialect chatml | |
| 150 | +``` | |
| 151 | + | |
| 152 | +Then commit the updated goldens. If the token count is off for | |
| 153 | +multiple dialects, investigate the Go template in | |
| 154 | +`src/dlm/export/ollama/templates/`. | |
| 155 | + | |
| 156 | +## Hardware / doctor | |
| 157 | + | |
| 158 | +### `dlm doctor: no viable plan` | |
| 159 | + | |
| 160 | +**Cause:** the refusal matrix (Sprint 05) refused the combination. | |
| 161 | +Common cases: QLoRA requested on CPU, or training a 3B model on a | |
| 162 | +host with < 8 GB of memory. | |
| 163 | + | |
| 164 | +**Fix:** `dlm doctor` prints the specific refusal reason. Either | |
| 165 | +switch to a smaller base (`smollm2-135m` always plans), drop `adapter: | |
| 166 | +qlora` from the frontmatter (falls back to plain LoRA), or add | |
| 167 | +`--force` if you deliberately want to try anyway (CPU training of | |
| 168 | +small models works; it's just slow). | |
| 169 | + | |
| 170 | +### Chat template fuzzy-match warning from Ollama | |
| 171 | + | |
| 172 | +**Cause:** Ollama is trying to guess the dialect because the | |
| 173 | +Modelfile lacks an explicit `TEMPLATE`. This shouldn't happen with | |
| 174 | +DLM — we always emit an explicit `TEMPLATE "..."` (pitfall #1). | |
| 175 | + | |
| 176 | +**Fix:** this is a bug; open an issue with the export output + the | |
| 177 | +contents of the emitted Modelfile. | |
| 178 | + | |
| 179 | +## Determinism | |
| 180 | + | |
| 181 | +### Two fresh runs produce different adapters | |
| 182 | + | |
| 183 | +**Cause:** either a version in the pinned tuple changed, or a CUDA | |
| 184 | +kernel decided to be nondeterministic despite our env settings. | |
| 185 | + | |
| 186 | +**Fix:** | |
| 187 | + | |
| 188 | +1. Compare `pinned_versions` in the two `dlm.lock` files — if they | |
| 189 | + differ, the regen-golden flow expects the drift. | |
| 190 | +2. On CUDA, confirm `CUBLAS_WORKSPACE_CONFIG=:4096:8` is set in the | |
| 191 | + environment. DLM sets this internally for training, but subprocess | |
| 192 | + tools that read the value may not inherit it. | |
| 193 | +3. On MPS, bit-exact determinism is not part of the contract — | |
| 194 | + `determinism_class: best-effort` is honest. | |
| 195 | + | |
| 196 | +## Nothing matches | |
| 197 | + | |
| 198 | +Open an issue at | |
| 199 | +<https://github.com/tenseleyFlow/DocumentLanguageModel/issues> with: | |
| 200 | + | |
| 201 | +- `uv run dlm doctor --json` output | |
| 202 | +- The full error message and stack (if any) | |
| 203 | +- The `.dlm` file (redact any sensitive content) | |
| 204 | +- Steps to reproduce | |
| 205 | + | |
| 206 | +The more reproducible the report, the faster the fix. | |