markdown · 12348 bytes Raw Blame History

Frontmatter reference

The YAML block between the two --- lines at the top of every .dlm document. Validated with Pydantic in dlm.doc.schema (extra="forbid", frozen=True) — unknown keys or wrong types fail fast with a file:line:col error.

Minimum required frontmatter

---
dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH
base_model: smollm2-135m
---

dlm_id is a 26-character Crockford base32 ULID. dlm init generates it; don't edit it by hand.

base_model is either a registry key or hf:org/name:

Registry key HuggingFace id
smollm2-135m HuggingFaceTB/SmolLM2-135M-Instruct
smollm2-360m HuggingFaceTB/SmolLM2-360M-Instruct
smollm2-1.7b HuggingFaceTB/SmolLM2-1.7B-Instruct
qwen2.5-0.5b Qwen/Qwen2.5-0.5B-Instruct
qwen2.5-1.5b Qwen/Qwen2.5-1.5B-Instruct
qwen2.5-3b Qwen/Qwen2.5-3B-Instruct
qwen2.5-coder-1.5b Qwen/Qwen2.5-Coder-1.5B-Instruct
llama-3.2-1b meta-llama/Llama-3.2-1B-Instruct (gated)
llama-3.2-3b meta-llama/Llama-3.2-3B-Instruct (gated)
phi-3.5-mini microsoft/Phi-3.5-mini-instruct

The shipped registry is broader than this quick-start table. Current additions include:

  • 2026 text-family refresh rows: qwen3-1.7b, qwen3-1.7b-thinking, qwen3-4b, qwen3-8b, llama-3.3-8b-instruct, phi-4-mini-reasoning, gemma-2-2b-it, gemma-2-9b-it, smollm3-3b, olmo-2-7b-instruct, and mixtral-8x7b-instruct.
  • Vision-language rows: paligemma-3b-mix-224, qwen2-vl-2b-instruct, internvl2-2b, internvl3-2b, and mistral-small-3.1-24b-instruct.
  • Audio-language row: qwen2-audio-7b-instruct.

Off-registry bases use hf: prefix, e.g. base_model: hf:mistralai/Mistral-7B-Instruct-v0.3. dlm init runs a compatibility probe; failures abort with a clear diagnostic.

Full frontmatter

---
dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH
dlm_version: 1                    # bumped by `dlm migrate`; default: 1
base_model: qwen2.5-1.5b
system_prompt: |
  You are a concise assistant.
training:
  adapter: lora                   # or qlora (CUDA only)
  lora_r: 8                       # 1..256
  lora_alpha: 16
  lora_dropout: 0.05              # 0.0..0.5
  target_modules: auto            # or a list[str]
  sequence_len: 2048              # 64..32768
  micro_batch_size: auto          # or a positive int
  grad_accum: auto                # or a positive int
  learning_rate: 2e-4
  num_epochs: 3
  optimizer: adamw_torch          # or adamw_bnb_8bit / paged_adamw_8bit
  lr_scheduler: cosine            # or linear / constant
  warmup_ratio: 0.1               # 0.0..0.5
  # precision: fp16               # optional override; default lets the doctor pick
  seed: 42
export:
  default_quant: Q4_K_M           # or Q5_K_M / Q6_K / Q8_0
  default_temperature: 0.2        # optional; overrides dialect default
  default_top_p: null             # optional; null keeps dialect default
---

Field-by-field

Top-level

Field Type Default Notes
dlm_id 26-char ULID required Assigned by dlm init. Never regenerated.
dlm_version int ≥ 1 1 Bumped by dlm migrate when the schema evolves.
base_model non-empty str required Registry key or hf:org/name.
system_prompt str or null null Emitted as SYSTEM "…" in the Modelfile on export.
training object defaults See below.
export object defaults See below.

training

Field Type Default Notes
adapter lora / qlora / dora lora QLoRA requires CUDA + bitsandbytes. DoRA (weight-decomposed LoRA) requires peft >= 0.8; ~10% training wall-clock tax for 2-4% quality uplift on multi-task fine-tunes. See docs/cookbook/dora-vs-lora.md.
lora_r int 1..256 8 LoRA rank.
lora_alpha int ≥ 1 16 LoRA alpha (scaling).
lora_dropout float 0..0.5 0.05
target_modules auto or list auto auto uses the per-architecture registry from Sprint 06. Explicit lists override.
sequence_len int 64..32768 2048 Max token length per example. Also emitted as Ollama PARAMETER num_ctx.
micro_batch_size auto or int ≥ 1 auto Doctor picks based on VRAM.
grad_accum auto or int ≥ 1 auto Doctor picks to reach effective batch = 8.
learning_rate float > 0 2e-4
num_epochs int ≥ 1 3
optimizer enum adamw_torch adamw_bnb_8bit / paged_adamw_8bit for CUDA + bnb. galore_adamw / galore_adamw_8bit for rank-projected optimizer state (~40% memory reduction, paper uplift at ≥ 7B bases; dlm doctor warns on sub-1B). See docs/cookbook/dora-vs-lora.md.
lr_scheduler enum cosine
warmup_ratio float 0..0.5 0.1
precision bf16 / fp16 / fp32 or null null Override the doctor's auto-pick. Defaults: bf16 on Ampere+/ROCm-bf16, fp16 on older CUDA, fp32 on MPS (the MPS fp16 attention kernels produce NaN LoRA weights on tiny-data SFT — see bug note below). Set fp16 on MPS only if you need the memory headroom for a 7–8B base and your data isn't pathologically small; the post-train finite-weights gate will still refuse to persist a corrupt adapter.
seed int 42 Determinism seed. Changing it invalidates the determinism golden.
sources list[SourceDirective] or null null Declarative file-tree ingestion. Each entry is walked at train time; matching files become synthetic PROSE sections on the CPT path. See below.
sources_policy permissive / strict permissive strict confines directive paths to the .dlm's parent subtree; permissive allows absolute paths anywhere. Symlink escapes are refused under strict, warned under permissive.
gate GateConfig defaults Learned MoE-style adapter gate (schema v8). See below.
cache CacheConfig defaults Tokenized-section cache knobs (schema v9). See below.

training.gate — GateConfig

Learned adapter routing. A small MLP trained post-SFT that maps a prompt embedding to per-adapter weights, replacing the hand-set --adapter-mix for the dlm prompt path.

Field Type Default Notes
enabled bool false Opt-in. Requires training.adapters with ≥2 named adapters.
hidden_proj_dim int 8..2048 64 Gate MLP internal width. Default is ~0.5MB for 4 adapters × 2048 hidden.
steps int 1..10000 200 AdamW iterations for the post-SFT gate training pass.
lr float 0..1 3e-4 AdamW learning rate.
cold_start_floor int 1..1024 4 Per-adapter minimum supervising sections. Below this, gate training is skipped and a uniform-mode gate_config.json is written instead.
entropy_lambda float 0..1 0.01 Shannon-entropy regularizer on the gate loss. Higher values discourage mode collapse; lower values let the gate commit harder.

Enabling gate on a document without training.adapters (or with only one adapter) is refused at parse time — a router over a single adapter has nothing to route between. See docs/cookbook/learned-adapter-gate.md for the full workflow + Ollama-export fallback semantics.

training.audio — AudioConfig

Opt-in knobs for audio-language training. Only consulted when the base_model is audio-language (e.g. qwen2-audio-7b-instruct). Defaults preserve the pre-v12 contract.

Field Type Default Notes
auto_resample bool false When true, audio files whose native sample rate disagrees with the base's pinned rate resample on-the-fly via dlm.data.audio_resample (soxr preferred, scipy.signal.resample_poly fallback). Default false preserves the v11 refuse-on-mismatch contract. Cache keys carry the flag so resampled and native-rate entries never collide.

Requires either soxr (pip install dlm[audio] pulls it in) or scipy to be importable when auto_resample: true; otherwise the preprocessor/collator raises AudioResampleUnavailable at first mismatched decode rather than training on the wrong rate.

training.cache — CacheConfig

Per-document knobs on the tokenized-section cache at ~/.dlm/store/<dlm_id>/tokenized-cache/. Defaults match the behavior pre-v9 so upgrading a doc is a no-op.

Field Type Default Notes
enabled bool true Set false to skip the cache on every run of this doc (equivalent to always passing --no-cache).
max_bytes int ≥ 1 10_737_418_240 (10 GiB) LRU cap. Threaded to TokenizedCache.open(..., max_bytes=...). Reads evict until size ≤ cap after a put.
prune_older_than_days int ≥ 1 90 Default cutoff for dlm cache prune when the CLI --older-than flag is omitted. The flag still wins when passed.

The CLI --no-cache flag and the DLM_DISABLE_TOKENIZED_CACHE=1 env var both override enabled: true for a single invocation. See the cache cookbook for sizing advice.

training.sources[] — SourceDirective

One entry per external root to ingest. Paths resolve relative to the .dlm file's parent when not absolute; ~ expands to $HOME.

Field Type Default Notes
path non-empty str required File or directory path. Relative → anchored at the .dlm's parent.
include list[str] ["**/*"] Glob patterns (POSIX, ** spans directories). At least one must match for a file to be ingested.
exclude list[str] [] Glob patterns evaluated first; any match drops the file.
max_bytes_per_file int ≥ 1 or null null Files larger than this are skipped with one log line.
max_files int ≥ 1 or null null Deterministic truncation: lexicographic-sorted walk keeps the first-N.

Behavior:

  • File enumeration is deterministic. Lexicographic sort on the resolved path list; identical trees on identical OSes produce identical Section order.
  • Binary files are skipped (NUL byte in the first KiB — the standard grep heuristic). Skip count is recorded in the training summary.
  • UTF-8 decode failures are skipped, not fatal. Use exclude for known-non-UTF-8 formats.
  • Each ingested file becomes a PROSE section whose content is prefixed with # source: <relpath>\n\n. The path prefix ensures two files with identical bodies produce distinct section_ids — the replay corpus tracks per-file identity, not per-content.
  • Integration is seamless with in-body sections. The CPT path, replay corpus, content-hash diff, and deterministic train/val split all treat directive-sourced sections identically.

Example:

training:
  sources_policy: permissive
  sources:
    - path: ~/code/quillstone-protocol
      include: ["**/*.py", "**/*.rs"]
      exclude: ["tests/**", "**/__pycache__/**"]
      max_bytes_per_file: 65536
      max_files: 5000
    - path: ~/notes/research.md

After dlm train, the training summary JSON carries a source_directives: [...] array with per-source file counts, byte totals, and skip breakdowns. dlm show --json reports the same under training_sources.

Secrets warning: directive ingestion has no implicit exclude list. Add explicit exclude: ["**/.env", "**/credentials*", ...] or use sources_policy: strict + a curated subtree to avoid training on .env, private keys, or other sensitive files that happen to live in your codebase.

export

Field Type Default Notes
default_quant Q4_K_M/Q5_K_M/Q6_K/Q8_0 Q4_K_M Used when dlm export --quant isn't passed.
default_temperature float 0..2 or null null Per-document sampling override. Emitted as Modelfile PARAMETER temperature.
default_top_p float 0..1 or null null Per-document sampling override.

Migrations

When a new version bumps dlm_version (e.g., adding a field), dlm migrate runs the registered migrators in order and rewrites the frontmatter in place. See Sprint 12b for the migration framework.

The parser refuses to load a document whose dlm_version exceeds the running CLI's CURRENT_SCHEMA_VERSION:

error: tutor.dlm:2:14 — dlm_version 2 is newer than this CLI supports (1).
       Upgrade dlm to continue.
View source
1 # Frontmatter reference
2
3 The YAML block between the two `---` lines at the top of every `.dlm`
4 document. Validated with Pydantic in `dlm.doc.schema` (`extra="forbid"`,
5 `frozen=True`) — unknown keys or wrong types fail fast with a
6 `file:line:col` error.
7
8 ## Minimum required frontmatter
9
10 ```yaml
11 ---
12 dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH
13 base_model: smollm2-135m
14 ---
15 ```
16
17 `dlm_id` is a 26-character Crockford base32 ULID. `dlm init` generates
18 it; don't edit it by hand.
19
20 `base_model` is either a registry key or `hf:org/name`:
21
22 | Registry key | HuggingFace id |
23 |---|---|
24 | `smollm2-135m` | HuggingFaceTB/SmolLM2-135M-Instruct |
25 | `smollm2-360m` | HuggingFaceTB/SmolLM2-360M-Instruct |
26 | `smollm2-1.7b` | HuggingFaceTB/SmolLM2-1.7B-Instruct |
27 | `qwen2.5-0.5b` | Qwen/Qwen2.5-0.5B-Instruct |
28 | `qwen2.5-1.5b` | Qwen/Qwen2.5-1.5B-Instruct |
29 | `qwen2.5-3b` | Qwen/Qwen2.5-3B-Instruct |
30 | `qwen2.5-coder-1.5b` | Qwen/Qwen2.5-Coder-1.5B-Instruct |
31 | `llama-3.2-1b` | meta-llama/Llama-3.2-1B-Instruct (gated) |
32 | `llama-3.2-3b` | meta-llama/Llama-3.2-3B-Instruct (gated) |
33 | `phi-3.5-mini` | microsoft/Phi-3.5-mini-instruct |
34
35 The shipped registry is broader than this quick-start table. Current
36 additions include:
37
38 - 2026 text-family refresh rows: `qwen3-1.7b`, `qwen3-1.7b-thinking`,
39 `qwen3-4b`, `qwen3-8b`, `llama-3.3-8b-instruct`,
40 `phi-4-mini-reasoning`, `gemma-2-2b-it`, `gemma-2-9b-it`,
41 `smollm3-3b`, `olmo-2-7b-instruct`, and `mixtral-8x7b-instruct`.
42 - Vision-language rows: `paligemma-3b-mix-224`,
43 `qwen2-vl-2b-instruct`, `internvl2-2b`, `internvl3-2b`, and
44 `mistral-small-3.1-24b-instruct`.
45 - Audio-language row: `qwen2-audio-7b-instruct`.
46
47 Off-registry bases use `hf:` prefix, e.g.
48 `base_model: hf:mistralai/Mistral-7B-Instruct-v0.3`. `dlm init` runs
49 a compatibility probe; failures abort with a clear diagnostic.
50
51 ## Full frontmatter
52
53 ```yaml
54 ---
55 dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH
56 dlm_version: 1 # bumped by `dlm migrate`; default: 1
57 base_model: qwen2.5-1.5b
58 system_prompt: |
59 You are a concise assistant.
60 training:
61 adapter: lora # or qlora (CUDA only)
62 lora_r: 8 # 1..256
63 lora_alpha: 16
64 lora_dropout: 0.05 # 0.0..0.5
65 target_modules: auto # or a list[str]
66 sequence_len: 2048 # 64..32768
67 micro_batch_size: auto # or a positive int
68 grad_accum: auto # or a positive int
69 learning_rate: 2e-4
70 num_epochs: 3
71 optimizer: adamw_torch # or adamw_bnb_8bit / paged_adamw_8bit
72 lr_scheduler: cosine # or linear / constant
73 warmup_ratio: 0.1 # 0.0..0.5
74 # precision: fp16 # optional override; default lets the doctor pick
75 seed: 42
76 export:
77 default_quant: Q4_K_M # or Q5_K_M / Q6_K / Q8_0
78 default_temperature: 0.2 # optional; overrides dialect default
79 default_top_p: null # optional; null keeps dialect default
80 ---
81 ```
82
83 ## Field-by-field
84
85 ### Top-level
86
87 | Field | Type | Default | Notes |
88 |---|---|---|---|
89 | `dlm_id` | 26-char ULID | required | Assigned by `dlm init`. Never regenerated. |
90 | `dlm_version` | int ≥ 1 | `1` | Bumped by `dlm migrate` when the schema evolves. |
91 | `base_model` | non-empty str | required | Registry key or `hf:org/name`. |
92 | `system_prompt` | str or null | null | Emitted as `SYSTEM "…"` in the Modelfile on export. |
93 | `training` | object | defaults | See below. |
94 | `export` | object | defaults | See below. |
95
96 ### `training`
97
98 | Field | Type | Default | Notes |
99 |---|---|---|---|
100 | `adapter` | `lora` / `qlora` / `dora` | `lora` | QLoRA requires CUDA + bitsandbytes. DoRA (weight-decomposed LoRA) requires `peft >= 0.8`; ~10% training wall-clock tax for 2-4% quality uplift on multi-task fine-tunes. See `docs/cookbook/dora-vs-lora.md`. |
101 | `lora_r` | int 1..256 | 8 | LoRA rank. |
102 | `lora_alpha` | int ≥ 1 | 16 | LoRA alpha (scaling). |
103 | `lora_dropout` | float 0..0.5 | 0.05 | |
104 | `target_modules` | `auto` or list | `auto` | `auto` uses the per-architecture registry from Sprint 06. Explicit lists override. |
105 | `sequence_len` | int 64..32768 | 2048 | Max token length per example. Also emitted as Ollama `PARAMETER num_ctx`. |
106 | `micro_batch_size` | `auto` or int ≥ 1 | `auto` | Doctor picks based on VRAM. |
107 | `grad_accum` | `auto` or int ≥ 1 | `auto` | Doctor picks to reach effective batch = 8. |
108 | `learning_rate` | float > 0 | 2e-4 | |
109 | `num_epochs` | int ≥ 1 | 3 | |
110 | `optimizer` | enum | `adamw_torch` | `adamw_bnb_8bit` / `paged_adamw_8bit` for CUDA + bnb. `galore_adamw` / `galore_adamw_8bit` for rank-projected optimizer state (~40% memory reduction, paper uplift at ≥ 7B bases; `dlm doctor` warns on sub-1B). See `docs/cookbook/dora-vs-lora.md`. |
111 | `lr_scheduler` | enum | `cosine` | |
112 | `warmup_ratio` | float 0..0.5 | 0.1 | |
113 | `precision` | `bf16` / `fp16` / `fp32` or null | null | Override the doctor's auto-pick. Defaults: bf16 on Ampere+/ROCm-bf16, fp16 on older CUDA, **fp32 on MPS** (the MPS fp16 attention kernels produce NaN LoRA weights on tiny-data SFT — see bug note below). Set `fp16` on MPS only if you need the memory headroom for a 7–8B base and your data isn't pathologically small; the post-train finite-weights gate will still refuse to persist a corrupt adapter. |
114 | `seed` | int | 42 | Determinism seed. Changing it invalidates the [determinism golden](../determinism.md). |
115 | `sources` | list[SourceDirective] or null | null | Declarative file-tree ingestion. Each entry is walked at train time; matching files become synthetic PROSE sections on the CPT path. See below. |
116 | `sources_policy` | `permissive` / `strict` | `permissive` | `strict` confines directive paths to the `.dlm`'s parent subtree; `permissive` allows absolute paths anywhere. Symlink escapes are refused under strict, warned under permissive. |
117 | `gate` | GateConfig | defaults | Learned MoE-style adapter gate (schema v8). See below. |
118 | `cache` | CacheConfig | defaults | Tokenized-section cache knobs (schema v9). See below. |
119
120 ### `training.gate` — GateConfig
121
122 Learned adapter routing. A small MLP trained post-SFT that maps a
123 prompt embedding to per-adapter weights, replacing the hand-set
124 `--adapter-mix` for the `dlm prompt` path.
125
126 | Field | Type | Default | Notes |
127 |---|---|---|---|
128 | `enabled` | bool | `false` | Opt-in. Requires `training.adapters` with ≥2 named adapters. |
129 | `hidden_proj_dim` | int 8..2048 | `64` | Gate MLP internal width. Default is ~0.5MB for 4 adapters × 2048 hidden. |
130 | `steps` | int 1..10000 | `200` | AdamW iterations for the post-SFT gate training pass. |
131 | `lr` | float 0..1 | `3e-4` | AdamW learning rate. |
132 | `cold_start_floor` | int 1..1024 | `4` | Per-adapter minimum supervising sections. Below this, gate training is skipped and a uniform-mode `gate_config.json` is written instead. |
133 | `entropy_lambda` | float 0..1 | `0.01` | Shannon-entropy regularizer on the gate loss. Higher values discourage mode collapse; lower values let the gate commit harder. |
134
135 Enabling `gate` on a document without `training.adapters` (or with
136 only one adapter) is refused at parse time — a router over a single
137 adapter has nothing to route between. See
138 `docs/cookbook/learned-adapter-gate.md` for the full workflow +
139 Ollama-export fallback semantics.
140
141 ### `training.audio` — AudioConfig
142
143 Opt-in knobs for audio-language training. Only consulted when the
144 `base_model` is audio-language (e.g. `qwen2-audio-7b-instruct`).
145 Defaults preserve the pre-v12 contract.
146
147 | Field | Type | Default | Notes |
148 |---|---|---|---|
149 | `auto_resample` | bool | `false` | When `true`, audio files whose native sample rate disagrees with the base's pinned rate resample on-the-fly via `dlm.data.audio_resample` (soxr preferred, scipy.signal.resample_poly fallback). Default `false` preserves the v11 refuse-on-mismatch contract. Cache keys carry the flag so resampled and native-rate entries never collide. |
150
151 Requires either `soxr` (`pip install dlm[audio]` pulls it in) or
152 `scipy` to be importable when `auto_resample: true`; otherwise the
153 preprocessor/collator raises `AudioResampleUnavailable` at first
154 mismatched decode rather than training on the wrong rate.
155
156 ### `training.cache` — CacheConfig
157
158 Per-document knobs on the tokenized-section cache at
159 `~/.dlm/store/<dlm_id>/tokenized-cache/`. Defaults match the behavior
160 pre-v9 so upgrading a doc is a no-op.
161
162 | Field | Type | Default | Notes |
163 |---|---|---|---|
164 | `enabled` | bool | `true` | Set `false` to skip the cache on every run of this doc (equivalent to always passing `--no-cache`). |
165 | `max_bytes` | int ≥ 1 | `10_737_418_240` (10 GiB) | LRU cap. Threaded to `TokenizedCache.open(..., max_bytes=...)`. Reads evict until size ≤ cap after a put. |
166 | `prune_older_than_days` | int ≥ 1 | `90` | Default cutoff for `dlm cache prune` when the CLI `--older-than` flag is omitted. The flag still wins when passed. |
167
168 The CLI `--no-cache` flag and the `DLM_DISABLE_TOKENIZED_CACHE=1` env
169 var both override `enabled: true` for a single invocation. See the
170 cache cookbook for sizing advice.
171
172 ### `training.sources[]` — SourceDirective
173
174 One entry per external root to ingest. Paths resolve relative to the
175 `.dlm` file's parent when not absolute; `~` expands to `$HOME`.
176
177 | Field | Type | Default | Notes |
178 |---|---|---|---|
179 | `path` | non-empty str | required | File or directory path. Relative → anchored at the `.dlm`'s parent. |
180 | `include` | list[str] | `["**/*"]` | Glob patterns (POSIX, `**` spans directories). At least one must match for a file to be ingested. |
181 | `exclude` | list[str] | `[]` | Glob patterns evaluated first; any match drops the file. |
182 | `max_bytes_per_file` | int ≥ 1 or null | null | Files larger than this are skipped with one log line. |
183 | `max_files` | int ≥ 1 or null | null | Deterministic truncation: lexicographic-sorted walk keeps the first-N. |
184
185 Behavior:
186
187 - **File enumeration is deterministic.** Lexicographic sort on the
188 resolved path list; identical trees on identical OSes produce
189 identical Section order.
190 - **Binary files are skipped** (NUL byte in the first KiB — the
191 standard grep heuristic). Skip count is recorded in the training
192 summary.
193 - **UTF-8 decode failures are skipped**, not fatal. Use `exclude` for
194 known-non-UTF-8 formats.
195 - **Each ingested file becomes a PROSE section** whose content is
196 prefixed with `# source: <relpath>\n\n`. The path prefix ensures
197 two files with identical bodies produce distinct `section_id`s —
198 the replay corpus tracks per-file identity, not per-content.
199 - **Integration is seamless** with in-body sections. The CPT path,
200 replay corpus, content-hash diff, and deterministic train/val
201 split all treat directive-sourced sections identically.
202
203 Example:
204
205 ```yaml
206 training:
207 sources_policy: permissive
208 sources:
209 - path: ~/code/quillstone-protocol
210 include: ["**/*.py", "**/*.rs"]
211 exclude: ["tests/**", "**/__pycache__/**"]
212 max_bytes_per_file: 65536
213 max_files: 5000
214 - path: ~/notes/research.md
215 ```
216
217 After `dlm train`, the training summary JSON carries a
218 `source_directives: [...]` array with per-source file counts, byte
219 totals, and skip breakdowns. `dlm show --json` reports the same
220 under `training_sources`.
221
222 **Secrets warning:** directive ingestion has no implicit exclude
223 list. Add explicit `exclude: ["**/.env", "**/credentials*", ...]`
224 or use `sources_policy: strict` + a curated subtree to avoid
225 training on `.env`, private keys, or other sensitive files that
226 happen to live in your codebase.
227
228 ### `export`
229
230 | Field | Type | Default | Notes |
231 |---|---|---|---|
232 | `default_quant` | `Q4_K_M`/`Q5_K_M`/`Q6_K`/`Q8_0` | `Q4_K_M` | Used when `dlm export --quant` isn't passed. |
233 | `default_temperature` | float 0..2 or null | null | Per-document sampling override. Emitted as Modelfile `PARAMETER temperature`. |
234 | `default_top_p` | float 0..1 or null | null | Per-document sampling override. |
235
236 ## Migrations
237
238 When a new version bumps `dlm_version` (e.g., adding a field),
239 `dlm migrate` runs the registered migrators in order and rewrites the
240 frontmatter in place. See Sprint 12b for the migration framework.
241
242 The parser refuses to load a document whose `dlm_version` exceeds the
243 running CLI's `CURRENT_SCHEMA_VERSION`:
244
245 ```
246 error: tutor.dlm:2:14 — dlm_version 2 is newer than this CLI supports (1).
247 Upgrade dlm to continue.
248 ```