documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 7000 bytes Raw Blame History

Troubleshooting

Structured as symptom → cause → fix. Seeded from the pitfall inventory in .docs/findings.md (repo-local). Don't see your problem here? Open an issue with the full dlm doctor output and the error.

Training

`OOMError: CUDA out of memory at step 12`

Cause: peak VRAM exceeded the device budget. The doctor picks grad_accum to stay under ~85% of VRAM on CUDA / 50% of unified memory on MPS, but some base+lora configurations push harder than the estimator predicts.

Fix: DLM's OOM guard catches CUDA OOM, computes a recommended grad_accum bump, and surfaces it in the error message. Apply the recommendation in the .dlm frontmatter:

training:
  micro_batch_size: 1
  grad_accum: 8     # was "auto" which picked 4; bump to 8

Rerun with --fresh (the first run's mock was incomplete) or --resume if the partial run committed state before OOM.

`RuntimeError: pad_token is <|endoftext|>`

Cause: pitfall #4 — padding with EOS mid-sequence corrupts labels.

Fix: The tokenizer bring-up (Sprint 07) sets pad to unk_token or adds <|pad|> as a learnable token (and forces modules_to_save=["embed_tokens", "lm_head"] — adapter size inflates; this is logged loudly). If you see this error raw from HF, the bring-up didn't run — file a bug with the base model name.

`ResumeIntegrityError: training_state.pt sha256 mismatch`

Cause: the state sidecar's bytes disagree with the recorded SHA. Either the file was partially written (power loss) or modified out of band.

Fix: --resume refuses to proceed. Use --fresh to discard the state and start from scratch, or restore the sidecar from a backup / .dlm.pack.

Loss is flat / doesn't decrease

Cause: several possibilities.

Fixes (check in order):

Dataset is too small. Under ~500 tokens of training signal, 20 steps won't move loss visibly. Add more sections.
Learning rate too low. Try learning_rate: 5e-4 (up from the default 2e-4) for small documents.
Wrong base. Coder documents on a non-coder base (or vice versa) fight the base's pretraining. Switch to the appropriate base.
--fresh would un-freeze replay weight. If you've edited the document heavily, the replay corpus dominates the training mix; try --fresh to train only on current content.

Export

`preflight: unknown pre-tokenizer hash`

Cause: pitfall #5 — the llama.cpp GGUF conversion can't recognize the base's pre-tokenizer, which silently produces a broken tokenizer in the GGUF.

Fix: bump vendor/llama.cpp to a version that knows this tokenizer:

$ cd vendor/llama.cpp
$ git fetch origin
$ git checkout b9200     # or newer
$ cd ../..
$ scripts/bump-llama-cpp.sh build

Then re-run dlm export. The registry probe (Sprint 06) will also re-run on the next dlm init + hf: base.

`ExportError: no current adapter`

Cause: export ran against a store with no trained adapter. adapter/current.txt either doesn't exist or points nowhere.

Fix: run dlm train before dlm export. If you just packed / unpacked, the adapter version number in the pointer file should still be valid — confirm adapter/versions/vNNNN/ exists under the store.

`merge refused: adapter was trained with QLoRA`

Cause: pitfall #3 — merging LoRA into a 4-bit base is precision-unsafe.

Fix: either drop --merged (ship base + adapter separately — the recommended path) or add --dequantize:

$ uv run dlm export tutor.dlm --merged --dequantize --quant Q4_K_M

--dequantize dequantizes the base to fp16, then merges, then requantizes for export. Bigger artifact, slower export; only worth it for single-file deployments.

`lock: base_model_revision changed`

Cause: the base model revision pinned in dlm.lock differs from the current BaseModelSpec.revision. Happens on a base-registry bump.

Fix:

$ uv run dlm train tutor.dlm --update-lock

Retrain against the new revision and overwrite the lock. Or --ignore-lock if you're experimenting and don't want to commit to the new revision yet.

Runaway generation in Ollama

Cause: the Modelfile's PARAMETER stop is missing or incomplete. Sprint 12's template registry sets stops per dialect; if the base is off-registry (hf: prefix) the template defaults kick in.

Fix: for a registered base, re-run dlm export — the export registry was patched in Sprint 16 audit-06 Q4 to include all per-family stop tokens. For hf: bases, open an issue; the template registry needs a manual entry.

`template drift: HF Jinja produced N, Ollama produced M`

Cause: Sprint 12.6's closed-loop verification caught a token-count divergence between the HF apply_chat_template and Ollama's Go template. Either the upstream base's chat_template changed or the Go template has a bug.

Fix: regenerate the goldens (after review):

$ uv run python scripts/refresh-chat-template-goldens.py --dialect chatml

Then commit the updated goldens. If the token count is off for multiple dialects, investigate the Go template in src/dlm/export/ollama/templates/.

Hardware / doctor

`dlm doctor: no viable plan`

Cause: the refusal matrix (Sprint 05) refused the combination. Common cases: QLoRA requested on CPU, or training a 3B model on a host with < 8 GB of memory.

Fix: dlm doctor prints the specific refusal reason. Either switch to a smaller base (smollm2-135m always plans), drop adapter: qlora from the frontmatter (falls back to plain LoRA), or add --force if you deliberately want to try anyway (CPU training of small models works; it's just slow).

Chat template fuzzy-match warning from Ollama

Cause: Ollama is trying to guess the dialect because the Modelfile lacks an explicit TEMPLATE. This shouldn't happen with DLM — we always emit an explicit TEMPLATE "..." (pitfall #1).

Fix: this is a bug; open an issue with the export output + the contents of the emitted Modelfile.

Determinism

Two fresh runs produce different adapters

Cause: either a version in the pinned tuple changed, or a CUDA kernel decided to be nondeterministic despite our env settings.

Fix:

Compare pinned_versions in the two dlm.lock files — if they differ, the regen-golden flow expects the drift.
On CUDA, confirm CUBLAS_WORKSPACE_CONFIG=:4096:8 is set in the environment. DLM sets this internally for training, but subprocess tools that read the value may not inherit it.
On MPS, bit-exact determinism is not part of the contract — determinism_class: best-effort is honest.

Nothing matches

Open an issue at https://github.com/tenseleyFlow/DocumentLanguageModel/issues with:

uv run dlm doctor --json output
The full error message and stack (if any)
The .dlm file (redact any sensitive content)
Steps to reproduce

The more reproducible the report, the faster the fix.

View source

  
        1
        # Troubleshooting
      
        2
        
        3
        Structured as **symptom → cause → fix**. Seeded from the pitfall
      
        4
        inventory in `.docs/findings.md` (repo-local). Don't see your problem
      
        5
        here? Open an issue with the full `dlm doctor` output and the error.
      
        6
        
        7
        ## Training
      
        8
        
        9
        ### `OOMError: CUDA out of memory at step 12`
      
        10
        
        11
        **Cause:** peak VRAM exceeded the device budget. The doctor picks
      
        12
        `grad_accum` to stay under ~85% of VRAM on CUDA / 50% of unified
      
        13
        memory on MPS, but some base+lora configurations push harder than the
      
        14
        estimator predicts.
      
        15
        
        16
        **Fix:** DLM's OOM guard catches CUDA OOM, computes a recommended
      
        17
        `grad_accum` bump, and surfaces it in the error message. Apply the
      
        18
        recommendation in the `.dlm` frontmatter:
      
        19
        
        20
        ```yaml
      
        21
        training:
      
        22
          micro_batch_size: 1
      
        23
          grad_accum: 8     # was "auto" which picked 4; bump to 8
      
        24
        ```
      
        25
        
        26
        Rerun with `--fresh` (the first run's mock was incomplete) or
      
        27
        `--resume` if the partial run committed state before OOM.
      
        28
        
        29
        ### `RuntimeError: pad_token is <|endoftext|>`
      
        30
        
        31
        **Cause:** pitfall #4 — padding with EOS mid-sequence corrupts labels.
      
        32
        
        33
        **Fix:** The tokenizer bring-up (Sprint 07) sets pad to `unk_token` or
      
        34
        adds `<|pad|>` as a learnable token (and forces
      
        35
        `modules_to_save=["embed_tokens", "lm_head"]` — adapter size inflates;
      
        36
        this is logged loudly). If you see this error raw from HF, the
      
        37
        bring-up didn't run — file a bug with the base model name.
      
        38
        
        39
        ### `ResumeIntegrityError: training_state.pt sha256 mismatch`
      
        40
        
        41
        **Cause:** the state sidecar's bytes disagree with the recorded SHA.
      
        42
        Either the file was partially written (power loss) or modified out of
      
        43
        band.
      
        44
        
        45
        **Fix:** `--resume` refuses to proceed. Use `--fresh` to discard the
      
        46
        state and start from scratch, or restore the sidecar from a backup /
      
        47
        `.dlm.pack`.
      
        48
        
        49
        ### Loss is flat / doesn't decrease
      
        50
        
        51
        **Cause:** several possibilities.
      
        52
        
        53
        **Fixes (check in order):**
      
        54
        
        55
        1. **Dataset is too small.** Under ~500 tokens of training signal,
      
        56
           20 steps won't move loss visibly. Add more sections.
      
        57
        2. **Learning rate too low.** Try `learning_rate: 5e-4` (up from the
      
        58
           default 2e-4) for small documents.
      
        59
        3. **Wrong base.** Coder documents on a non-coder base (or vice
      
        60
           versa) fight the base's pretraining. Switch to the appropriate
      
        61
           base.
      
        62
        4. **`--fresh` would un-freeze replay weight.** If you've edited the
      
        63
           document heavily, the replay corpus dominates the training mix;
      
        64
           try `--fresh` to train only on current content.
      
        65
        
        66
        ## Export
      
        67
        
        68
        ### `preflight: unknown pre-tokenizer hash`
      
        69
        
        70
        **Cause:** pitfall #5 — the llama.cpp GGUF conversion can't recognize
      
        71
        the base's pre-tokenizer, which silently produces a broken tokenizer
      
        72
        in the GGUF.
      
        73
        
        74
        **Fix:** bump `vendor/llama.cpp` to a version that knows this
      
        75
        tokenizer:
      
        76
        
        77
        ```sh
      
        78
        $ cd vendor/llama.cpp
      
        79
        $ git fetch origin
      
        80
        $ git checkout b9200     # or newer
      
        81
        $ cd ../..
      
        82
        $ scripts/bump-llama-cpp.sh build
      
        83
        ```
      
        84
        
        85
        Then re-run `dlm export`. The registry probe (Sprint 06) will also
      
        86
        re-run on the next `dlm init` + `hf:` base.
      
        87
        
        88
        ### `ExportError: no current adapter`
      
        89
        
        90
        **Cause:** export ran against a store with no trained adapter.
      
        91
        `adapter/current.txt` either doesn't exist or points nowhere.
      
        92
        
        93
        **Fix:** run `dlm train` before `dlm export`. If you just packed /
      
        94
        unpacked, the adapter version number in the pointer file should still
      
        95
        be valid — confirm `adapter/versions/vNNNN/` exists under the store.
      
        96
        
        97
        ### `merge refused: adapter was trained with QLoRA`
      
        98
        
        99
        **Cause:** pitfall #3 — merging LoRA into a 4-bit base is
      
        100
        precision-unsafe.
      
        101
        
        102
        **Fix:** either drop `--merged` (ship base + adapter separately — the
      
        103
        recommended path) or add `--dequantize`:
      
        104
        
        105
        ```sh
      
        106
        $ uv run dlm export tutor.dlm --merged --dequantize --quant Q4_K_M
      
        107
        ```
      
        108
        
        109
        `--dequantize` dequantizes the base to fp16, then merges, then
      
        110
        requantizes for export. Bigger artifact, slower export; only worth it
      
        111
        for single-file deployments.
      
        112
        
        113
        ### `lock: base_model_revision changed`
      
        114
        
        115
        **Cause:** the base model revision pinned in `dlm.lock` differs from
      
        116
        the current `BaseModelSpec.revision`. Happens on a base-registry bump.
      
        117
        
        118
        **Fix:**
      
        119
        
        120
        ```sh
      
        121
        $ uv run dlm train tutor.dlm --update-lock
      
        122
        ```
      
        123
        
        124
        Retrain against the new revision and overwrite the lock. Or
      
        125
        `--ignore-lock` if you're experimenting and don't want to commit to
      
        126
        the new revision yet.
      
        127
        
        128
        ### Runaway generation in Ollama
      
        129
        
        130
        **Cause:** the Modelfile's `PARAMETER stop` is missing or incomplete.
      
        131
        Sprint 12's template registry sets stops per dialect; if the base is
      
        132
        off-registry (`hf:` prefix) the template defaults kick in.
      
        133
        
        134
        **Fix:** for a registered base, re-run `dlm export` — the export
      
        135
        registry was patched in Sprint 16 audit-06 Q4 to include all
      
        136
        per-family stop tokens. For `hf:` bases, open an issue; the template
      
        137
        registry needs a manual entry.
      
        138
        
        139
        ### `template drift: HF Jinja produced N, Ollama produced M`
      
        140
        
        141
        **Cause:** Sprint 12.6's closed-loop verification caught a token-count
      
        142
        divergence between the HF `apply_chat_template` and Ollama's Go
      
        143
        template. Either the upstream base's `chat_template` changed or the Go
      
        144
        template has a bug.
      
        145
        
        146
        **Fix:** regenerate the goldens (after review):
      
        147
        
        148
        ```sh
      
        149
        $ uv run python scripts/refresh-chat-template-goldens.py --dialect chatml
      
        150
        ```
      
        151
        
        152
        Then commit the updated goldens. If the token count is off for
      
        153
        multiple dialects, investigate the Go template in
      
        154
        `src/dlm/export/ollama/templates/`.
      
        155
        
        156
        ## Hardware / doctor
      
        157
        
        158
        ### `dlm doctor: no viable plan`
      
        159
        
        160
        **Cause:** the refusal matrix (Sprint 05) refused the combination.
      
        161
        Common cases: QLoRA requested on CPU, or training a 3B model on a
      
        162
        host with < 8 GB of memory.
      
        163
        
        164
        **Fix:** `dlm doctor` prints the specific refusal reason. Either
      
        165
        switch to a smaller base (`smollm2-135m` always plans), drop `adapter:
      
        166
        qlora` from the frontmatter (falls back to plain LoRA), or add
      
        167
        `--force` if you deliberately want to try anyway (CPU training of
      
        168
        small models works; it's just slow).
      
        169
        
        170
        ### Chat template fuzzy-match warning from Ollama
      
        171
        
        172
        **Cause:** Ollama is trying to guess the dialect because the
      
        173
        Modelfile lacks an explicit `TEMPLATE`. This shouldn't happen with
      
        174
        DLM — we always emit an explicit `TEMPLATE "..."` (pitfall #1).
      
        175
        
        176
        **Fix:** this is a bug; open an issue with the export output + the
      
        177
        contents of the emitted Modelfile.
      
        178
        
        179
        ## Determinism
      
        180
        
        181
        ### Two fresh runs produce different adapters
      
        182
        
        183
        **Cause:** either a version in the pinned tuple changed, or a CUDA
      
        184
        kernel decided to be nondeterministic despite our env settings.
      
        185
        
        186
        **Fix:**
      
        187
        
        188
        1. Compare `pinned_versions` in the two `dlm.lock` files — if they
      
        189
           differ, the regen-golden flow expects the drift.
      
        190
        2. On CUDA, confirm `CUBLAS_WORKSPACE_CONFIG=:4096:8` is set in the
      
        191
           environment. DLM sets this internally for training, but subprocess
      
        192
           tools that read the value may not inherit it.
      
        193
        3. On MPS, bit-exact determinism is not part of the contract —
      
        194
           `determinism_class: best-effort` is honest.
      
        195
        
        196
        ## Nothing matches
      
        197
        
        198
        Open an issue at
      
        199
        <https://github.com/tenseleyFlow/DocumentLanguageModel/issues> with:
      
        200
        
        201
        - `uv run dlm doctor --json` output
      
        202
        - The full error message and stack (if any)
      
        203
        - The `.dlm` file (redact any sensitive content)
      
        204
        - Steps to reproduce
      
        205
        
        206
        The more reproducible the report, the faster the fix.

1	# Troubleshooting
2
3	Structured as symptom → cause → fix. Seeded from the pitfall
4	inventory in `.docs/findings.md` (repo-local). Don't see your problem
5	here? Open an issue with the full `dlm doctor` output and the error.
6
7	## Training
8
9	### `OOMError: CUDA out of memory at step 12`
10
11	Cause: peak VRAM exceeded the device budget. The doctor picks
12	`grad_accum` to stay under ~85% of VRAM on CUDA / 50% of unified
13	memory on MPS, but some base+lora configurations push harder than the
14	estimator predicts.
15
16	Fix: DLM's OOM guard catches CUDA OOM, computes a recommended
17	`grad_accum` bump, and surfaces it in the error message. Apply the
18	recommendation in the `.dlm` frontmatter:
19
20	```yaml
21	training:
22	micro_batch_size: 1
23	grad_accum: 8 # was "auto" which picked 4; bump to 8
24	```
25
26	Rerun with `--fresh` (the first run's mock was incomplete) or
27	`--resume` if the partial run committed state before OOM.
28
29	### `RuntimeError: pad_token is <\|endoftext\|>`
30
31	Cause: pitfall #4 — padding with EOS mid-sequence corrupts labels.
32
33	Fix: The tokenizer bring-up (Sprint 07) sets pad to `unk_token` or
34	adds `<\|pad\|>` as a learnable token (and forces
35	`modules_to_save=["embed_tokens", "lm_head"]` — adapter size inflates;
36	this is logged loudly). If you see this error raw from HF, the
37	bring-up didn't run — file a bug with the base model name.
38
39	### `ResumeIntegrityError: training_state.pt sha256 mismatch`
40
41	Cause: the state sidecar's bytes disagree with the recorded SHA.
42	Either the file was partially written (power loss) or modified out of
43	band.
44
45	Fix: `--resume` refuses to proceed. Use `--fresh` to discard the
46	state and start from scratch, or restore the sidecar from a backup /
47	`.dlm.pack`.
48
49	### Loss is flat / doesn't decrease
50
51	Cause: several possibilities.
52
53	Fixes (check in order):
54
55	1. Dataset is too small. Under ~500 tokens of training signal,
56	20 steps won't move loss visibly. Add more sections.
57	2. Learning rate too low. Try `learning_rate: 5e-4` (up from the
58	default 2e-4) for small documents.
59	3. Wrong base. Coder documents on a non-coder base (or vice
60	versa) fight the base's pretraining. Switch to the appropriate
61	base.
62	4. `--fresh` would un-freeze replay weight. If you've edited the
63	document heavily, the replay corpus dominates the training mix;
64	try `--fresh` to train only on current content.
65
66	## Export
67
68	### `preflight: unknown pre-tokenizer hash`
69
70	Cause: pitfall #5 — the llama.cpp GGUF conversion can't recognize
71	the base's pre-tokenizer, which silently produces a broken tokenizer
72	in the GGUF.
73
74	Fix: bump `vendor/llama.cpp` to a version that knows this
75	tokenizer:
76
77	```sh
78	$ cd vendor/llama.cpp
79	$ git fetch origin
80	$ git checkout b9200 # or newer
81	$ cd ../..
82	$ scripts/bump-llama-cpp.sh build
83	```
84
85	Then re-run `dlm export`. The registry probe (Sprint 06) will also
86	re-run on the next `dlm init` + `hf:` base.
87
88	### `ExportError: no current adapter`
89
90	Cause: export ran against a store with no trained adapter.
91	`adapter/current.txt` either doesn't exist or points nowhere.
92
93	Fix: run `dlm train` before `dlm export`. If you just packed /
94	unpacked, the adapter version number in the pointer file should still
95	be valid — confirm `adapter/versions/vNNNN/` exists under the store.
96
97	### `merge refused: adapter was trained with QLoRA`
98
99	Cause: pitfall #3 — merging LoRA into a 4-bit base is
100	precision-unsafe.
101
102	Fix: either drop `--merged` (ship base + adapter separately — the
103	recommended path) or add `--dequantize`:
104
105	```sh
106	$ uv run dlm export tutor.dlm --merged --dequantize --quant Q4_K_M
107	```
108
109	`--dequantize` dequantizes the base to fp16, then merges, then
110	requantizes for export. Bigger artifact, slower export; only worth it
111	for single-file deployments.
112
113	### `lock: base_model_revision changed`
114
115	Cause: the base model revision pinned in `dlm.lock` differs from
116	the current `BaseModelSpec.revision`. Happens on a base-registry bump.
117
118	Fix:
119
120	```sh
121	$ uv run dlm train tutor.dlm --update-lock
122	```
123
124	Retrain against the new revision and overwrite the lock. Or
125	`--ignore-lock` if you're experimenting and don't want to commit to
126	the new revision yet.
127
128	### Runaway generation in Ollama
129
130	Cause: the Modelfile's `PARAMETER stop` is missing or incomplete.
131	Sprint 12's template registry sets stops per dialect; if the base is
132	off-registry (`hf:` prefix) the template defaults kick in.
133
134	Fix: for a registered base, re-run `dlm export` — the export
135	registry was patched in Sprint 16 audit-06 Q4 to include all
136	per-family stop tokens. For `hf:` bases, open an issue; the template
137	registry needs a manual entry.
138
139	### `template drift: HF Jinja produced N, Ollama produced M`
140
141	Cause: Sprint 12.6's closed-loop verification caught a token-count
142	divergence between the HF `apply_chat_template` and Ollama's Go
143	template. Either the upstream base's `chat_template` changed or the Go
144	template has a bug.
145
146	Fix: regenerate the goldens (after review):
147
148	```sh
149	$ uv run python scripts/refresh-chat-template-goldens.py --dialect chatml
150	```
151
152	Then commit the updated goldens. If the token count is off for
153	multiple dialects, investigate the Go template in
154	`src/dlm/export/ollama/templates/`.
155
156	## Hardware / doctor
157
158	### `dlm doctor: no viable plan`
159
160	Cause: the refusal matrix (Sprint 05) refused the combination.
161	Common cases: QLoRA requested on CPU, or training a 3B model on a
162	host with < 8 GB of memory.
163
164	Fix: `dlm doctor` prints the specific refusal reason. Either
165	switch to a smaller base (`smollm2-135m` always plans), drop `adapter:
166	qlora` from the frontmatter (falls back to plain LoRA), or add
167	`--force` if you deliberately want to try anyway (CPU training of
168	small models works; it's just slow).
169
170	### Chat template fuzzy-match warning from Ollama
171
172	Cause: Ollama is trying to guess the dialect because the
173	Modelfile lacks an explicit `TEMPLATE`. This shouldn't happen with
174	DLM — we always emit an explicit `TEMPLATE "..."` (pitfall #1).
175
176	Fix: this is a bug; open an issue with the export output + the
177	contents of the emitted Modelfile.
178
179	## Determinism
180
181	### Two fresh runs produce different adapters
182
183	Cause: either a version in the pinned tuple changed, or a CUDA
184	kernel decided to be nondeterministic despite our env settings.
185
186	Fix:
187
188	1. Compare `pinned_versions` in the two `dlm.lock` files — if they
189	differ, the regen-golden flow expects the drift.
190	2. On CUDA, confirm `CUBLAS_WORKSPACE_CONFIG=:4096:8` is set in the
191	environment. DLM sets this internally for training, but subprocess
192	tools that read the value may not inherit it.
193	3. On MPS, bit-exact determinism is not part of the contract —
194	`determinism_class: best-effort` is honest.
195
196	## Nothing matches
197
198	Open an issue at
199	<https://github.com/tenseleyFlow/DocumentLanguageModel/issues> with:
200
201	- `uv run dlm doctor --json` output
202	- The full error message and stack (if any)
203	- The `.dlm` file (redact any sensitive content)
204	- Steps to reproduce
205
206	The more reproducible the report, the faster the fix.