# Frontmatter reference The YAML block between the two `---` lines at the top of every `.dlm` document. Validated with Pydantic in `dlm.doc.schema` (`extra="forbid"`, `frozen=True`) — unknown keys or wrong types fail fast with a `file:line:col` error. ## Minimum required frontmatter ```yaml --- dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH base_model: smollm2-135m --- ``` `dlm_id` is a 26-character Crockford base32 ULID. `dlm init` generates it; don't edit it by hand. `base_model` is either a registry key or `hf:org/name`: | Registry key | HuggingFace id | |---|---| | `smollm2-135m` | HuggingFaceTB/SmolLM2-135M-Instruct | | `smollm2-360m` | HuggingFaceTB/SmolLM2-360M-Instruct | | `smollm2-1.7b` | HuggingFaceTB/SmolLM2-1.7B-Instruct | | `qwen2.5-0.5b` | Qwen/Qwen2.5-0.5B-Instruct | | `qwen2.5-1.5b` | Qwen/Qwen2.5-1.5B-Instruct | | `qwen2.5-3b` | Qwen/Qwen2.5-3B-Instruct | | `qwen2.5-coder-1.5b` | Qwen/Qwen2.5-Coder-1.5B-Instruct | | `llama-3.2-1b` | meta-llama/Llama-3.2-1B-Instruct (gated) | | `llama-3.2-3b` | meta-llama/Llama-3.2-3B-Instruct (gated) | | `phi-3.5-mini` | microsoft/Phi-3.5-mini-instruct | The shipped registry is broader than this quick-start table. Current additions include: - 2026 text-family refresh rows: `qwen3-1.7b`, `qwen3-1.7b-thinking`, `qwen3-4b`, `qwen3-8b`, `llama-3.3-8b-instruct`, `phi-4-mini-reasoning`, `gemma-2-2b-it`, `gemma-2-9b-it`, `smollm3-3b`, `olmo-2-7b-instruct`, and `mixtral-8x7b-instruct`. - Vision-language rows: `paligemma-3b-mix-224`, `qwen2-vl-2b-instruct`, `internvl2-2b`, `internvl3-2b`, and `mistral-small-3.1-24b-instruct`. - Audio-language row: `qwen2-audio-7b-instruct`. Off-registry bases use `hf:` prefix, e.g. `base_model: hf:mistralai/Mistral-7B-Instruct-v0.3`. `dlm init` runs a compatibility probe; failures abort with a clear diagnostic. ## Full frontmatter ```yaml --- dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH dlm_version: 1 # bumped by `dlm migrate`; default: 1 base_model: qwen2.5-1.5b system_prompt: | You are a concise assistant. training: adapter: lora # or qlora (CUDA only) lora_r: 8 # 1..256 lora_alpha: 16 lora_dropout: 0.05 # 0.0..0.5 target_modules: auto # or a list[str] sequence_len: 2048 # 64..32768 micro_batch_size: auto # or a positive int grad_accum: auto # or a positive int learning_rate: 2e-4 num_epochs: 3 optimizer: adamw_torch # or adamw_bnb_8bit / paged_adamw_8bit lr_scheduler: cosine # or linear / constant warmup_ratio: 0.1 # 0.0..0.5 # precision: fp16 # optional override; default lets the doctor pick seed: 42 export: default_quant: Q4_K_M # or Q5_K_M / Q6_K / Q8_0 default_temperature: 0.2 # optional; overrides dialect default default_top_p: null # optional; null keeps dialect default --- ``` ## Field-by-field ### Top-level | Field | Type | Default | Notes | |---|---|---|---| | `dlm_id` | 26-char ULID | required | Assigned by `dlm init`. Never regenerated. | | `dlm_version` | int ≥ 1 | `1` | Bumped by `dlm migrate` when the schema evolves. | | `base_model` | non-empty str | required | Registry key or `hf:org/name`. | | `system_prompt` | str or null | null | Emitted as `SYSTEM "…"` in the Modelfile on export. | | `training` | object | defaults | See below. | | `export` | object | defaults | See below. | ### `training` | Field | Type | Default | Notes | |---|---|---|---| | `adapter` | `lora` / `qlora` / `dora` | `lora` | QLoRA requires CUDA + bitsandbytes. DoRA (weight-decomposed LoRA) requires `peft >= 0.8`; ~10% training wall-clock tax for 2-4% quality uplift on multi-task fine-tunes. See `docs/cookbook/dora-vs-lora.md`. | | `lora_r` | int 1..256 | 8 | LoRA rank. | | `lora_alpha` | int ≥ 1 | 16 | LoRA alpha (scaling). | | `lora_dropout` | float 0..0.5 | 0.05 | | | `target_modules` | `auto` or list | `auto` | `auto` uses the per-architecture registry from Sprint 06. Explicit lists override. | | `sequence_len` | int 64..32768 | 2048 | Max token length per example. Also emitted as Ollama `PARAMETER num_ctx`. | | `micro_batch_size` | `auto` or int ≥ 1 | `auto` | Doctor picks based on VRAM. | | `grad_accum` | `auto` or int ≥ 1 | `auto` | Doctor picks to reach effective batch = 8. | | `learning_rate` | float > 0 | 2e-4 | | | `num_epochs` | int ≥ 1 | 3 | | | `optimizer` | enum | `adamw_torch` | `adamw_bnb_8bit` / `paged_adamw_8bit` for CUDA + bnb. `galore_adamw` / `galore_adamw_8bit` for rank-projected optimizer state (~40% memory reduction, paper uplift at ≥ 7B bases; `dlm doctor` warns on sub-1B). See `docs/cookbook/dora-vs-lora.md`. | | `lr_scheduler` | enum | `cosine` | | | `warmup_ratio` | float 0..0.5 | 0.1 | | | `precision` | `bf16` / `fp16` / `fp32` or null | null | Override the doctor's auto-pick. Defaults: bf16 on Ampere+/ROCm-bf16, fp16 on older CUDA, **fp32 on MPS** (the MPS fp16 attention kernels produce NaN LoRA weights on tiny-data SFT — see bug note below). Set `fp16` on MPS only if you need the memory headroom for a 7–8B base and your data isn't pathologically small; the post-train finite-weights gate will still refuse to persist a corrupt adapter. | | `seed` | int | 42 | Determinism seed. Changing it invalidates the [determinism golden](../determinism.md). | | `sources` | list[SourceDirective] or null | null | Declarative file-tree ingestion. Each entry is walked at train time; matching files become synthetic PROSE sections on the CPT path. See below. | | `sources_policy` | `permissive` / `strict` | `permissive` | `strict` confines directive paths to the `.dlm`'s parent subtree; `permissive` allows absolute paths anywhere. Symlink escapes are refused under strict, warned under permissive. | | `gate` | GateConfig | defaults | Learned MoE-style adapter gate (schema v8). See below. | | `cache` | CacheConfig | defaults | Tokenized-section cache knobs (schema v9). See below. | ### `training.gate` — GateConfig Learned adapter routing. A small MLP trained post-SFT that maps a prompt embedding to per-adapter weights, replacing the hand-set `--adapter-mix` for the `dlm prompt` path. | Field | Type | Default | Notes | |---|---|---|---| | `enabled` | bool | `false` | Opt-in. Requires `training.adapters` with ≥2 named adapters. | | `hidden_proj_dim` | int 8..2048 | `64` | Gate MLP internal width. Default is ~0.5MB for 4 adapters × 2048 hidden. | | `steps` | int 1..10000 | `200` | AdamW iterations for the post-SFT gate training pass. | | `lr` | float 0..1 | `3e-4` | AdamW learning rate. | | `cold_start_floor` | int 1..1024 | `4` | Per-adapter minimum supervising sections. Below this, gate training is skipped and a uniform-mode `gate_config.json` is written instead. | | `entropy_lambda` | float 0..1 | `0.01` | Shannon-entropy regularizer on the gate loss. Higher values discourage mode collapse; lower values let the gate commit harder. | Enabling `gate` on a document without `training.adapters` (or with only one adapter) is refused at parse time — a router over a single adapter has nothing to route between. See `docs/cookbook/learned-adapter-gate.md` for the full workflow + Ollama-export fallback semantics. ### `training.audio` — AudioConfig Opt-in knobs for audio-language training. Only consulted when the `base_model` is audio-language (e.g. `qwen2-audio-7b-instruct`). Defaults preserve the pre-v12 contract. | Field | Type | Default | Notes | |---|---|---|---| | `auto_resample` | bool | `false` | When `true`, audio files whose native sample rate disagrees with the base's pinned rate resample on-the-fly via `dlm.data.audio_resample` (soxr preferred, scipy.signal.resample_poly fallback). Default `false` preserves the v11 refuse-on-mismatch contract. Cache keys carry the flag so resampled and native-rate entries never collide. | Requires either `soxr` (`pip install dlm[audio]` pulls it in) or `scipy` to be importable when `auto_resample: true`; otherwise the preprocessor/collator raises `AudioResampleUnavailable` at first mismatched decode rather than training on the wrong rate. ### `training.cache` — CacheConfig Per-document knobs on the tokenized-section cache at `~/.dlm/store//tokenized-cache/`. Defaults match the behavior pre-v9 so upgrading a doc is a no-op. | Field | Type | Default | Notes | |---|---|---|---| | `enabled` | bool | `true` | Set `false` to skip the cache on every run of this doc (equivalent to always passing `--no-cache`). | | `max_bytes` | int ≥ 1 | `10_737_418_240` (10 GiB) | LRU cap. Threaded to `TokenizedCache.open(..., max_bytes=...)`. Reads evict until size ≤ cap after a put. | | `prune_older_than_days` | int ≥ 1 | `90` | Default cutoff for `dlm cache prune` when the CLI `--older-than` flag is omitted. The flag still wins when passed. | The CLI `--no-cache` flag and the `DLM_DISABLE_TOKENIZED_CACHE=1` env var both override `enabled: true` for a single invocation. See the cache cookbook for sizing advice. ### `training.sources[]` — SourceDirective One entry per external root to ingest. Paths resolve relative to the `.dlm` file's parent when not absolute; `~` expands to `$HOME`. | Field | Type | Default | Notes | |---|---|---|---| | `path` | non-empty str | required | File or directory path. Relative → anchored at the `.dlm`'s parent. | | `include` | list[str] | `["**/*"]` | Glob patterns (POSIX, `**` spans directories). At least one must match for a file to be ingested. | | `exclude` | list[str] | `[]` | Glob patterns evaluated first; any match drops the file. | | `max_bytes_per_file` | int ≥ 1 or null | null | Files larger than this are skipped with one log line. | | `max_files` | int ≥ 1 or null | null | Deterministic truncation: lexicographic-sorted walk keeps the first-N. | Behavior: - **File enumeration is deterministic.** Lexicographic sort on the resolved path list; identical trees on identical OSes produce identical Section order. - **Binary files are skipped** (NUL byte in the first KiB — the standard grep heuristic). Skip count is recorded in the training summary. - **UTF-8 decode failures are skipped**, not fatal. Use `exclude` for known-non-UTF-8 formats. - **Each ingested file becomes a PROSE section** whose content is prefixed with `# source: \n\n`. The path prefix ensures two files with identical bodies produce distinct `section_id`s — the replay corpus tracks per-file identity, not per-content. - **Integration is seamless** with in-body sections. The CPT path, replay corpus, content-hash diff, and deterministic train/val split all treat directive-sourced sections identically. Example: ```yaml training: sources_policy: permissive sources: - path: ~/code/quillstone-protocol include: ["**/*.py", "**/*.rs"] exclude: ["tests/**", "**/__pycache__/**"] max_bytes_per_file: 65536 max_files: 5000 - path: ~/notes/research.md ``` After `dlm train`, the training summary JSON carries a `source_directives: [...]` array with per-source file counts, byte totals, and skip breakdowns. `dlm show --json` reports the same under `training_sources`. **Secrets warning:** directive ingestion has no implicit exclude list. Add explicit `exclude: ["**/.env", "**/credentials*", ...]` or use `sources_policy: strict` + a curated subtree to avoid training on `.env`, private keys, or other sensitive files that happen to live in your codebase. ### `export` | Field | Type | Default | Notes | |---|---|---|---| | `default_quant` | `Q4_K_M`/`Q5_K_M`/`Q6_K`/`Q8_0` | `Q4_K_M` | Used when `dlm export --quant` isn't passed. | | `default_temperature` | float 0..2 or null | null | Per-document sampling override. Emitted as Modelfile `PARAMETER temperature`. | | `default_top_p` | float 0..1 or null | null | Per-document sampling override. | ## Migrations When a new version bumps `dlm_version` (e.g., adding a field), `dlm migrate` runs the registered migrators in order and rewrites the frontmatter in place. See Sprint 12b for the migration framework. The parser refuses to load a document whose `dlm_version` exceeds the running CLI's `CURRENT_SCHEMA_VERSION`: ``` error: tutor.dlm:2:14 — dlm_version 2 is newer than this CLI supports (1). Upgrade dlm to continue. ```