documentlanguagemodel Public
Troubleshooting
Structured as symptom → cause → fix. Seeded from the pitfall
inventory in .docs/findings.md (repo-local). Don't see your problem
here? Open an issue with the full dlm doctor output and the error.
Training
OOMError: CUDA out of memory at step 12
Cause: peak VRAM exceeded the device budget. The doctor picks
grad_accum to stay under ~85% of VRAM on CUDA / 50% of unified
memory on MPS, but some base+lora configurations push harder than the
estimator predicts.
Fix: DLM's OOM guard catches CUDA OOM, computes a recommended
grad_accum bump, and surfaces it in the error message. Apply the
recommendation in the .dlm frontmatter:
training:
micro_batch_size: 1
grad_accum: 8 # was "auto" which picked 4; bump to 8
Rerun with --fresh (the first run's mock was incomplete) or
--resume if the partial run committed state before OOM.
RuntimeError: pad_token is <|endoftext|>
Cause: pitfall #4 — padding with EOS mid-sequence corrupts labels.
Fix: The tokenizer bring-up (Sprint 07) sets pad to unk_token or
adds <|pad|> as a learnable token (and forces
modules_to_save=["embed_tokens", "lm_head"] — adapter size inflates;
this is logged loudly). If you see this error raw from HF, the
bring-up didn't run — file a bug with the base model name.
ResumeIntegrityError: training_state.pt sha256 mismatch
Cause: the state sidecar's bytes disagree with the recorded SHA. Either the file was partially written (power loss) or modified out of band.
Fix: --resume refuses to proceed. Use --fresh to discard the
state and start from scratch, or restore the sidecar from a backup /
.dlm.pack.
Loss is flat / doesn't decrease
Cause: several possibilities.
Fixes (check in order):
- Dataset is too small. Under ~500 tokens of training signal, 20 steps won't move loss visibly. Add more sections.
- Learning rate too low. Try
learning_rate: 5e-4(up from the default 2e-4) for small documents. - Wrong base. Coder documents on a non-coder base (or vice versa) fight the base's pretraining. Switch to the appropriate base.
--freshwould un-freeze replay weight. If you've edited the document heavily, the replay corpus dominates the training mix; try--freshto train only on current content.
Export
preflight: unknown pre-tokenizer hash
Cause: pitfall #5 — the llama.cpp GGUF conversion can't recognize the base's pre-tokenizer, which silently produces a broken tokenizer in the GGUF.
Fix: bump vendor/llama.cpp to a version that knows this
tokenizer:
$ cd vendor/llama.cpp
$ git fetch origin
$ git checkout b9200 # or newer
$ cd ../..
$ scripts/bump-llama-cpp.sh build
Then re-run dlm export. The registry probe (Sprint 06) will also
re-run on the next dlm init + hf: base.
ExportError: no current adapter
Cause: export ran against a store with no trained adapter.
adapter/current.txt either doesn't exist or points nowhere.
Fix: run dlm train before dlm export. If you just packed /
unpacked, the adapter version number in the pointer file should still
be valid — confirm adapter/versions/vNNNN/ exists under the store.
merge refused: adapter was trained with QLoRA
Cause: pitfall #3 — merging LoRA into a 4-bit base is precision-unsafe.
Fix: either drop --merged (ship base + adapter separately — the
recommended path) or add --dequantize:
$ uv run dlm export tutor.dlm --merged --dequantize --quant Q4_K_M
--dequantize dequantizes the base to fp16, then merges, then
requantizes for export. Bigger artifact, slower export; only worth it
for single-file deployments.
lock: base_model_revision changed
Cause: the base model revision pinned in dlm.lock differs from
the current BaseModelSpec.revision. Happens on a base-registry bump.
Fix:
$ uv run dlm train tutor.dlm --update-lock
Retrain against the new revision and overwrite the lock. Or
--ignore-lock if you're experimenting and don't want to commit to
the new revision yet.
Runaway generation in Ollama
Cause: the Modelfile's PARAMETER stop is missing or incomplete.
Sprint 12's template registry sets stops per dialect; if the base is
off-registry (hf: prefix) the template defaults kick in.
Fix: for a registered base, re-run dlm export — the export
registry was patched in Sprint 16 audit-06 Q4 to include all
per-family stop tokens. For hf: bases, open an issue; the template
registry needs a manual entry.
template drift: HF Jinja produced N, Ollama produced M
Cause: Sprint 12.6's closed-loop verification caught a token-count
divergence between the HF apply_chat_template and Ollama's Go
template. Either the upstream base's chat_template changed or the Go
template has a bug.
Fix: regenerate the goldens (after review):
$ uv run python scripts/refresh-chat-template-goldens.py --dialect chatml
Then commit the updated goldens. If the token count is off for
multiple dialects, investigate the Go template in
src/dlm/export/ollama/templates/.
Hardware / doctor
dlm doctor: no viable plan
Cause: the refusal matrix (Sprint 05) refused the combination. Common cases: QLoRA requested on CPU, or training a 3B model on a host with < 8 GB of memory.
Fix: dlm doctor prints the specific refusal reason. Either
switch to a smaller base (smollm2-135m always plans), drop adapter: qlora from the frontmatter (falls back to plain LoRA), or add
--force if you deliberately want to try anyway (CPU training of
small models works; it's just slow).
Chat template fuzzy-match warning from Ollama
Cause: Ollama is trying to guess the dialect because the
Modelfile lacks an explicit TEMPLATE. This shouldn't happen with
DLM — we always emit an explicit TEMPLATE "..." (pitfall #1).
Fix: this is a bug; open an issue with the export output + the contents of the emitted Modelfile.
Determinism
Two fresh runs produce different adapters
Cause: either a version in the pinned tuple changed, or a CUDA kernel decided to be nondeterministic despite our env settings.
Fix:
- Compare
pinned_versionsin the twodlm.lockfiles — if they differ, the regen-golden flow expects the drift. - On CUDA, confirm
CUBLAS_WORKSPACE_CONFIG=:4096:8is set in the environment. DLM sets this internally for training, but subprocess tools that read the value may not inherit it. - On MPS, bit-exact determinism is not part of the contract —
determinism_class: best-effortis honest.
Nothing matches
Open an issue at https://github.com/tenseleyFlow/DocumentLanguageModel/issues with:
uv run dlm doctor --jsonoutput- The full error message and stack (if any)
- The
.dlmfile (redact any sensitive content) - Steps to reproduce
The more reproducible the report, the faster the fix.
View source
| 1 | # Troubleshooting |
| 2 | |
| 3 | Structured as **symptom → cause → fix**. Seeded from the pitfall |
| 4 | inventory in `.docs/findings.md` (repo-local). Don't see your problem |
| 5 | here? Open an issue with the full `dlm doctor` output and the error. |
| 6 | |
| 7 | ## Training |
| 8 | |
| 9 | ### `OOMError: CUDA out of memory at step 12` |
| 10 | |
| 11 | **Cause:** peak VRAM exceeded the device budget. The doctor picks |
| 12 | `grad_accum` to stay under ~85% of VRAM on CUDA / 50% of unified |
| 13 | memory on MPS, but some base+lora configurations push harder than the |
| 14 | estimator predicts. |
| 15 | |
| 16 | **Fix:** DLM's OOM guard catches CUDA OOM, computes a recommended |
| 17 | `grad_accum` bump, and surfaces it in the error message. Apply the |
| 18 | recommendation in the `.dlm` frontmatter: |
| 19 | |
| 20 | ```yaml |
| 21 | training: |
| 22 | micro_batch_size: 1 |
| 23 | grad_accum: 8 # was "auto" which picked 4; bump to 8 |
| 24 | ``` |
| 25 | |
| 26 | Rerun with `--fresh` (the first run's mock was incomplete) or |
| 27 | `--resume` if the partial run committed state before OOM. |
| 28 | |
| 29 | ### `RuntimeError: pad_token is <|endoftext|>` |
| 30 | |
| 31 | **Cause:** pitfall #4 — padding with EOS mid-sequence corrupts labels. |
| 32 | |
| 33 | **Fix:** The tokenizer bring-up (Sprint 07) sets pad to `unk_token` or |
| 34 | adds `<|pad|>` as a learnable token (and forces |
| 35 | `modules_to_save=["embed_tokens", "lm_head"]` — adapter size inflates; |
| 36 | this is logged loudly). If you see this error raw from HF, the |
| 37 | bring-up didn't run — file a bug with the base model name. |
| 38 | |
| 39 | ### `ResumeIntegrityError: training_state.pt sha256 mismatch` |
| 40 | |
| 41 | **Cause:** the state sidecar's bytes disagree with the recorded SHA. |
| 42 | Either the file was partially written (power loss) or modified out of |
| 43 | band. |
| 44 | |
| 45 | **Fix:** `--resume` refuses to proceed. Use `--fresh` to discard the |
| 46 | state and start from scratch, or restore the sidecar from a backup / |
| 47 | `.dlm.pack`. |
| 48 | |
| 49 | ### Loss is flat / doesn't decrease |
| 50 | |
| 51 | **Cause:** several possibilities. |
| 52 | |
| 53 | **Fixes (check in order):** |
| 54 | |
| 55 | 1. **Dataset is too small.** Under ~500 tokens of training signal, |
| 56 | 20 steps won't move loss visibly. Add more sections. |
| 57 | 2. **Learning rate too low.** Try `learning_rate: 5e-4` (up from the |
| 58 | default 2e-4) for small documents. |
| 59 | 3. **Wrong base.** Coder documents on a non-coder base (or vice |
| 60 | versa) fight the base's pretraining. Switch to the appropriate |
| 61 | base. |
| 62 | 4. **`--fresh` would un-freeze replay weight.** If you've edited the |
| 63 | document heavily, the replay corpus dominates the training mix; |
| 64 | try `--fresh` to train only on current content. |
| 65 | |
| 66 | ## Export |
| 67 | |
| 68 | ### `preflight: unknown pre-tokenizer hash` |
| 69 | |
| 70 | **Cause:** pitfall #5 — the llama.cpp GGUF conversion can't recognize |
| 71 | the base's pre-tokenizer, which silently produces a broken tokenizer |
| 72 | in the GGUF. |
| 73 | |
| 74 | **Fix:** bump `vendor/llama.cpp` to a version that knows this |
| 75 | tokenizer: |
| 76 | |
| 77 | ```sh |
| 78 | $ cd vendor/llama.cpp |
| 79 | $ git fetch origin |
| 80 | $ git checkout b9200 # or newer |
| 81 | $ cd ../.. |
| 82 | $ scripts/bump-llama-cpp.sh build |
| 83 | ``` |
| 84 | |
| 85 | Then re-run `dlm export`. The registry probe (Sprint 06) will also |
| 86 | re-run on the next `dlm init` + `hf:` base. |
| 87 | |
| 88 | ### `ExportError: no current adapter` |
| 89 | |
| 90 | **Cause:** export ran against a store with no trained adapter. |
| 91 | `adapter/current.txt` either doesn't exist or points nowhere. |
| 92 | |
| 93 | **Fix:** run `dlm train` before `dlm export`. If you just packed / |
| 94 | unpacked, the adapter version number in the pointer file should still |
| 95 | be valid — confirm `adapter/versions/vNNNN/` exists under the store. |
| 96 | |
| 97 | ### `merge refused: adapter was trained with QLoRA` |
| 98 | |
| 99 | **Cause:** pitfall #3 — merging LoRA into a 4-bit base is |
| 100 | precision-unsafe. |
| 101 | |
| 102 | **Fix:** either drop `--merged` (ship base + adapter separately — the |
| 103 | recommended path) or add `--dequantize`: |
| 104 | |
| 105 | ```sh |
| 106 | $ uv run dlm export tutor.dlm --merged --dequantize --quant Q4_K_M |
| 107 | ``` |
| 108 | |
| 109 | `--dequantize` dequantizes the base to fp16, then merges, then |
| 110 | requantizes for export. Bigger artifact, slower export; only worth it |
| 111 | for single-file deployments. |
| 112 | |
| 113 | ### `lock: base_model_revision changed` |
| 114 | |
| 115 | **Cause:** the base model revision pinned in `dlm.lock` differs from |
| 116 | the current `BaseModelSpec.revision`. Happens on a base-registry bump. |
| 117 | |
| 118 | **Fix:** |
| 119 | |
| 120 | ```sh |
| 121 | $ uv run dlm train tutor.dlm --update-lock |
| 122 | ``` |
| 123 | |
| 124 | Retrain against the new revision and overwrite the lock. Or |
| 125 | `--ignore-lock` if you're experimenting and don't want to commit to |
| 126 | the new revision yet. |
| 127 | |
| 128 | ### Runaway generation in Ollama |
| 129 | |
| 130 | **Cause:** the Modelfile's `PARAMETER stop` is missing or incomplete. |
| 131 | Sprint 12's template registry sets stops per dialect; if the base is |
| 132 | off-registry (`hf:` prefix) the template defaults kick in. |
| 133 | |
| 134 | **Fix:** for a registered base, re-run `dlm export` — the export |
| 135 | registry was patched in Sprint 16 audit-06 Q4 to include all |
| 136 | per-family stop tokens. For `hf:` bases, open an issue; the template |
| 137 | registry needs a manual entry. |
| 138 | |
| 139 | ### `template drift: HF Jinja produced N, Ollama produced M` |
| 140 | |
| 141 | **Cause:** Sprint 12.6's closed-loop verification caught a token-count |
| 142 | divergence between the HF `apply_chat_template` and Ollama's Go |
| 143 | template. Either the upstream base's `chat_template` changed or the Go |
| 144 | template has a bug. |
| 145 | |
| 146 | **Fix:** regenerate the goldens (after review): |
| 147 | |
| 148 | ```sh |
| 149 | $ uv run python scripts/refresh-chat-template-goldens.py --dialect chatml |
| 150 | ``` |
| 151 | |
| 152 | Then commit the updated goldens. If the token count is off for |
| 153 | multiple dialects, investigate the Go template in |
| 154 | `src/dlm/export/ollama/templates/`. |
| 155 | |
| 156 | ## Hardware / doctor |
| 157 | |
| 158 | ### `dlm doctor: no viable plan` |
| 159 | |
| 160 | **Cause:** the refusal matrix (Sprint 05) refused the combination. |
| 161 | Common cases: QLoRA requested on CPU, or training a 3B model on a |
| 162 | host with < 8 GB of memory. |
| 163 | |
| 164 | **Fix:** `dlm doctor` prints the specific refusal reason. Either |
| 165 | switch to a smaller base (`smollm2-135m` always plans), drop `adapter: |
| 166 | qlora` from the frontmatter (falls back to plain LoRA), or add |
| 167 | `--force` if you deliberately want to try anyway (CPU training of |
| 168 | small models works; it's just slow). |
| 169 | |
| 170 | ### Chat template fuzzy-match warning from Ollama |
| 171 | |
| 172 | **Cause:** Ollama is trying to guess the dialect because the |
| 173 | Modelfile lacks an explicit `TEMPLATE`. This shouldn't happen with |
| 174 | DLM — we always emit an explicit `TEMPLATE "..."` (pitfall #1). |
| 175 | |
| 176 | **Fix:** this is a bug; open an issue with the export output + the |
| 177 | contents of the emitted Modelfile. |
| 178 | |
| 179 | ## Determinism |
| 180 | |
| 181 | ### Two fresh runs produce different adapters |
| 182 | |
| 183 | **Cause:** either a version in the pinned tuple changed, or a CUDA |
| 184 | kernel decided to be nondeterministic despite our env settings. |
| 185 | |
| 186 | **Fix:** |
| 187 | |
| 188 | 1. Compare `pinned_versions` in the two `dlm.lock` files — if they |
| 189 | differ, the regen-golden flow expects the drift. |
| 190 | 2. On CUDA, confirm `CUBLAS_WORKSPACE_CONFIG=:4096:8` is set in the |
| 191 | environment. DLM sets this internally for training, but subprocess |
| 192 | tools that read the value may not inherit it. |
| 193 | 3. On MPS, bit-exact determinism is not part of the contract — |
| 194 | `determinism_class: best-effort` is honest. |
| 195 | |
| 196 | ## Nothing matches |
| 197 | |
| 198 | Open an issue at |
| 199 | <https://github.com/tenseleyFlow/DocumentLanguageModel/issues> with: |
| 200 | |
| 201 | - `uv run dlm doctor --json` output |
| 202 | - The full error message and stack (if any) |
| 203 | - The `.dlm` file (redact any sensitive content) |
| 204 | - Steps to reproduce |
| 205 | |
| 206 | The more reproducible the report, the faster the fix. |