documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 12348 bytes Raw Blame History

Frontmatter reference

The YAML block between the two --- lines at the top of every .dlm document. Validated with Pydantic in dlm.doc.schema (extra="forbid", frozen=True) — unknown keys or wrong types fail fast with a file:line:col error.

Minimum required frontmatter

---
dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH
base_model: smollm2-135m
---

dlm_id is a 26-character Crockford base32 ULID. dlm init generates it; don't edit it by hand.

base_model is either a registry key or hf:org/name:

Registry key	HuggingFace id
`smollm2-135m`	HuggingFaceTB/SmolLM2-135M-Instruct
`smollm2-360m`	HuggingFaceTB/SmolLM2-360M-Instruct
`smollm2-1.7b`	HuggingFaceTB/SmolLM2-1.7B-Instruct
`qwen2.5-0.5b`	Qwen/Qwen2.5-0.5B-Instruct
`qwen2.5-1.5b`	Qwen/Qwen2.5-1.5B-Instruct
`qwen2.5-3b`	Qwen/Qwen2.5-3B-Instruct
`qwen2.5-coder-1.5b`	Qwen/Qwen2.5-Coder-1.5B-Instruct
`llama-3.2-1b`	meta-llama/Llama-3.2-1B-Instruct (gated)
`llama-3.2-3b`	meta-llama/Llama-3.2-3B-Instruct (gated)
`phi-3.5-mini`	microsoft/Phi-3.5-mini-instruct

The shipped registry is broader than this quick-start table. Current additions include:

2026 text-family refresh rows: qwen3-1.7b, qwen3-1.7b-thinking, qwen3-4b, qwen3-8b, llama-3.3-8b-instruct, phi-4-mini-reasoning, gemma-2-2b-it, gemma-2-9b-it, smollm3-3b, olmo-2-7b-instruct, and mixtral-8x7b-instruct.
Vision-language rows: paligemma-3b-mix-224, qwen2-vl-2b-instruct, internvl2-2b, internvl3-2b, and mistral-small-3.1-24b-instruct.
Audio-language row: qwen2-audio-7b-instruct.

Off-registry bases use hf: prefix, e.g. base_model: hf:mistralai/Mistral-7B-Instruct-v0.3. dlm init runs a compatibility probe; failures abort with a clear diagnostic.

Full frontmatter

---
dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH
dlm_version: 1                    # bumped by `dlm migrate`; default: 1
base_model: qwen2.5-1.5b
system_prompt: |
  You are a concise assistant.
training:
  adapter: lora                   # or qlora (CUDA only)
  lora_r: 8                       # 1..256
  lora_alpha: 16
  lora_dropout: 0.05              # 0.0..0.5
  target_modules: auto            # or a list[str]
  sequence_len: 2048              # 64..32768
  micro_batch_size: auto          # or a positive int
  grad_accum: auto                # or a positive int
  learning_rate: 2e-4
  num_epochs: 3
  optimizer: adamw_torch          # or adamw_bnb_8bit / paged_adamw_8bit
  lr_scheduler: cosine            # or linear / constant
  warmup_ratio: 0.1               # 0.0..0.5
  # precision: fp16               # optional override; default lets the doctor pick
  seed: 42
export:
  default_quant: Q4_K_M           # or Q5_K_M / Q6_K / Q8_0
  default_temperature: 0.2        # optional; overrides dialect default
  default_top_p: null             # optional; null keeps dialect default
---

Field-by-field

Top-level

Field	Type	Default	Notes
`dlm_id`	26-char ULID	required	Assigned by `dlm init`. Never regenerated.
`dlm_version`	int ≥ 1	`1`	Bumped by `dlm migrate` when the schema evolves.
`base_model`	non-empty str	required	Registry key or `hf:org/name`.
`system_prompt`	str or null	null	Emitted as `SYSTEM "…"` in the Modelfile on export.
`training`	object	defaults	See below.
`export`	object	defaults	See below.

`training`

Field	Type	Default	Notes
`adapter`	`lora` / `qlora` / `dora`	`lora`	QLoRA requires CUDA + bitsandbytes. DoRA (weight-decomposed LoRA) requires `peft >= 0.8`; ~10% training wall-clock tax for 2-4% quality uplift on multi-task fine-tunes. See `docs/cookbook/dora-vs-lora.md`.
`lora_r`	int 1..256	8	LoRA rank.
`lora_alpha`	int ≥ 1	16	LoRA alpha (scaling).
`lora_dropout`	float 0..0.5	0.05
`target_modules`	`auto` or list	`auto`	`auto` uses the per-architecture registry from Sprint 06. Explicit lists override.
`sequence_len`	int 64..32768	2048	Max token length per example. Also emitted as Ollama `PARAMETER num_ctx`.
`micro_batch_size`	`auto` or int ≥ 1	`auto`	Doctor picks based on VRAM.
`grad_accum`	`auto` or int ≥ 1	`auto`	Doctor picks to reach effective batch = 8.
`learning_rate`	float > 0	2e-4
`num_epochs`	int ≥ 1	3
`optimizer`	enum	`adamw_torch`	`adamw_bnb_8bit` / `paged_adamw_8bit` for CUDA + bnb. `galore_adamw` / `galore_adamw_8bit` for rank-projected optimizer state (~40% memory reduction, paper uplift at ≥ 7B bases; `dlm doctor` warns on sub-1B). See `docs/cookbook/dora-vs-lora.md`.
`lr_scheduler`	enum	`cosine`
`warmup_ratio`	float 0..0.5	0.1
`precision`	`bf16` / `fp16` / `fp32` or null	null	Override the doctor's auto-pick. Defaults: bf16 on Ampere+/ROCm-bf16, fp16 on older CUDA, fp32 on MPS (the MPS fp16 attention kernels produce NaN LoRA weights on tiny-data SFT — see bug note below). Set `fp16` on MPS only if you need the memory headroom for a 7–8B base and your data isn't pathologically small; the post-train finite-weights gate will still refuse to persist a corrupt adapter.
`seed`	int	42	Determinism seed. Changing it invalidates the determinism golden.
`sources`	list[SourceDirective] or null	null	Declarative file-tree ingestion. Each entry is walked at train time; matching files become synthetic PROSE sections on the CPT path. See below.
`sources_policy`	`permissive` / `strict`	`permissive`	`strict` confines directive paths to the `.dlm`'s parent subtree; `permissive` allows absolute paths anywhere. Symlink escapes are refused under strict, warned under permissive.
`gate`	GateConfig	defaults	Learned MoE-style adapter gate (schema v8). See below.
`cache`	CacheConfig	defaults	Tokenized-section cache knobs (schema v9). See below.

`training.gate` — GateConfig

Learned adapter routing. A small MLP trained post-SFT that maps a prompt embedding to per-adapter weights, replacing the hand-set --adapter-mix for the dlm prompt path.

Field	Type	Default	Notes
`enabled`	bool	`false`	Opt-in. Requires `training.adapters` with ≥2 named adapters.
`hidden_proj_dim`	int 8..2048	`64`	Gate MLP internal width. Default is ~0.5MB for 4 adapters × 2048 hidden.
`steps`	int 1..10000	`200`	AdamW iterations for the post-SFT gate training pass.
`lr`	float 0..1	`3e-4`	AdamW learning rate.
`cold_start_floor`	int 1..1024	`4`	Per-adapter minimum supervising sections. Below this, gate training is skipped and a uniform-mode `gate_config.json` is written instead.
`entropy_lambda`	float 0..1	`0.01`	Shannon-entropy regularizer on the gate loss. Higher values discourage mode collapse; lower values let the gate commit harder.

Enabling gate on a document without training.adapters (or with only one adapter) is refused at parse time — a router over a single adapter has nothing to route between. See docs/cookbook/learned-adapter-gate.md for the full workflow + Ollama-export fallback semantics.

`training.audio` — AudioConfig

Opt-in knobs for audio-language training. Only consulted when the base_model is audio-language (e.g. qwen2-audio-7b-instruct). Defaults preserve the pre-v12 contract.

Field	Type	Default	Notes
`auto_resample`	bool	`false`	When `true`, audio files whose native sample rate disagrees with the base's pinned rate resample on-the-fly via `dlm.data.audio_resample` (soxr preferred, scipy.signal.resample_poly fallback). Default `false` preserves the v11 refuse-on-mismatch contract. Cache keys carry the flag so resampled and native-rate entries never collide.

Requires either soxr (pip install dlm[audio] pulls it in) or scipy to be importable when auto_resample: true; otherwise the preprocessor/collator raises AudioResampleUnavailable at first mismatched decode rather than training on the wrong rate.

`training.cache` — CacheConfig

Per-document knobs on the tokenized-section cache at ~/.dlm/store/<dlm_id>/tokenized-cache/. Defaults match the behavior pre-v9 so upgrading a doc is a no-op.

Field	Type	Default	Notes
`enabled`	bool	`true`	Set `false` to skip the cache on every run of this doc (equivalent to always passing `--no-cache`).
`max_bytes`	int ≥ 1	`10_737_418_240` (10 GiB)	LRU cap. Threaded to `TokenizedCache.open(..., max_bytes=...)`. Reads evict until size ≤ cap after a put.
`prune_older_than_days`	int ≥ 1	`90`	Default cutoff for `dlm cache prune` when the CLI `--older-than` flag is omitted. The flag still wins when passed.

The CLI --no-cache flag and the DLM_DISABLE_TOKENIZED_CACHE=1 env var both override enabled: true for a single invocation. See the cache cookbook for sizing advice.

`training.sources[]` — SourceDirective

One entry per external root to ingest. Paths resolve relative to the .dlm file's parent when not absolute; ~ expands to $HOME.

Field	Type	Default	Notes
`path`	non-empty str	required	File or directory path. Relative → anchored at the `.dlm`'s parent.
`include`	list[str]	`["*/"]`	Glob patterns (POSIX, `**` spans directories). At least one must match for a file to be ingested.
`exclude`	list[str]	`[]`	Glob patterns evaluated first; any match drops the file.
`max_bytes_per_file`	int ≥ 1 or null	null	Files larger than this are skipped with one log line.
`max_files`	int ≥ 1 or null	null	Deterministic truncation: lexicographic-sorted walk keeps the first-N.

Behavior:

File enumeration is deterministic. Lexicographic sort on the resolved path list; identical trees on identical OSes produce identical Section order.
Binary files are skipped (NUL byte in the first KiB — the standard grep heuristic). Skip count is recorded in the training summary.
UTF-8 decode failures are skipped, not fatal. Use exclude for known-non-UTF-8 formats.
Each ingested file becomes a PROSE section whose content is prefixed with # source: <relpath>\n\n. The path prefix ensures two files with identical bodies produce distinct section_ids — the replay corpus tracks per-file identity, not per-content.
Integration is seamless with in-body sections. The CPT path, replay corpus, content-hash diff, and deterministic train/val split all treat directive-sourced sections identically.

Example:

training:
  sources_policy: permissive
  sources:
    - path: ~/code/quillstone-protocol
      include: ["**/*.py", "**/*.rs"]
      exclude: ["tests/**", "**/__pycache__/**"]
      max_bytes_per_file: 65536
      max_files: 5000
    - path: ~/notes/research.md

After dlm train, the training summary JSON carries a source_directives: [...] array with per-source file counts, byte totals, and skip breakdowns. dlm show --json reports the same under training_sources.

Secrets warning: directive ingestion has no implicit exclude list. Add explicit exclude: ["**/.env", "**/credentials*", ...] or use sources_policy: strict + a curated subtree to avoid training on .env, private keys, or other sensitive files that happen to live in your codebase.

`export`

Field	Type	Default	Notes
`default_quant`	`Q4_K_M`/`Q5_K_M`/`Q6_K`/`Q8_0`	`Q4_K_M`	Used when `dlm export --quant` isn't passed.
`default_temperature`	float 0..2 or null	null	Per-document sampling override. Emitted as Modelfile `PARAMETER temperature`.
`default_top_p`	float 0..1 or null	null	Per-document sampling override.

Migrations

When a new version bumps dlm_version (e.g., adding a field), dlm migrate runs the registered migrators in order and rewrites the frontmatter in place. See Sprint 12b for the migration framework.

The parser refuses to load a document whose dlm_version exceeds the running CLI's CURRENT_SCHEMA_VERSION:

error: tutor.dlm:2:14 — dlm_version 2 is newer than this CLI supports (1).
       Upgrade dlm to continue.

View source

  
        1
        # Frontmatter reference
      
        2
        
        3
        The YAML block between the two `---` lines at the top of every `.dlm`
      
        4
        document. Validated with Pydantic in `dlm.doc.schema` (`extra="forbid"`,
      
        5
        `frozen=True`) — unknown keys or wrong types fail fast with a
      
        6
        `file:line:col` error.
      
        7
        
        8
        ## Minimum required frontmatter
      
        9
        
        10
        ```yaml
      
        11
        ---
      
        12
        dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH
      
        13
        base_model: smollm2-135m
      
        14
        ---
      
        15
        ```
      
        16
        
        17
        `dlm_id` is a 26-character Crockford base32 ULID. `dlm init` generates
      
        18
        it; don't edit it by hand.
      
        19
        
        20
        `base_model` is either a registry key or `hf:org/name`:
      
        21
        
        22
        | Registry key | HuggingFace id |
      
        23
        |---|---|
      
        24
        | `smollm2-135m` | HuggingFaceTB/SmolLM2-135M-Instruct |
      
        25
        | `smollm2-360m` | HuggingFaceTB/SmolLM2-360M-Instruct |
      
        26
        | `smollm2-1.7b` | HuggingFaceTB/SmolLM2-1.7B-Instruct |
      
        27
        | `qwen2.5-0.5b` | Qwen/Qwen2.5-0.5B-Instruct |
      
        28
        | `qwen2.5-1.5b` | Qwen/Qwen2.5-1.5B-Instruct |
      
        29
        | `qwen2.5-3b` | Qwen/Qwen2.5-3B-Instruct |
      
        30
        | `qwen2.5-coder-1.5b` | Qwen/Qwen2.5-Coder-1.5B-Instruct |
      
        31
        | `llama-3.2-1b` | meta-llama/Llama-3.2-1B-Instruct (gated) |
      
        32
        | `llama-3.2-3b` | meta-llama/Llama-3.2-3B-Instruct (gated) |
      
        33
        | `phi-3.5-mini` | microsoft/Phi-3.5-mini-instruct |
      
        34
        
        35
        The shipped registry is broader than this quick-start table. Current
      
        36
        additions include:
      
        37
        
        38
        - 2026 text-family refresh rows: `qwen3-1.7b`, `qwen3-1.7b-thinking`,
      
        39
          `qwen3-4b`, `qwen3-8b`, `llama-3.3-8b-instruct`,
      
        40
          `phi-4-mini-reasoning`, `gemma-2-2b-it`, `gemma-2-9b-it`,
      
        41
          `smollm3-3b`, `olmo-2-7b-instruct`, and `mixtral-8x7b-instruct`.
      
        42
        - Vision-language rows: `paligemma-3b-mix-224`,
      
        43
          `qwen2-vl-2b-instruct`, `internvl2-2b`, `internvl3-2b`, and
      
        44
          `mistral-small-3.1-24b-instruct`.
      
        45
        - Audio-language row: `qwen2-audio-7b-instruct`.
      
        46
        
        47
        Off-registry bases use `hf:` prefix, e.g.
      
        48
        `base_model: hf:mistralai/Mistral-7B-Instruct-v0.3`. `dlm init` runs
      
        49
        a compatibility probe; failures abort with a clear diagnostic.
      
        50
        
        51
        ## Full frontmatter
      
        52
        
        53
        ```yaml
      
        54
        ---
      
        55
        dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH
      
        56
        dlm_version: 1                    # bumped by `dlm migrate`; default: 1
      
        57
        base_model: qwen2.5-1.5b
      
        58
        system_prompt: |
      
        59
          You are a concise assistant.
      
        60
        training:
      
        61
          adapter: lora                   # or qlora (CUDA only)
      
        62
          lora_r: 8                       # 1..256
      
        63
          lora_alpha: 16
      
        64
          lora_dropout: 0.05              # 0.0..0.5
      
        65
          target_modules: auto            # or a list[str]
      
        66
          sequence_len: 2048              # 64..32768
      
        67
          micro_batch_size: auto          # or a positive int
      
        68
          grad_accum: auto                # or a positive int
      
        69
          learning_rate: 2e-4
      
        70
          num_epochs: 3
      
        71
          optimizer: adamw_torch          # or adamw_bnb_8bit / paged_adamw_8bit
      
        72
          lr_scheduler: cosine            # or linear / constant
      
        73
          warmup_ratio: 0.1               # 0.0..0.5
      
        74
          # precision: fp16               # optional override; default lets the doctor pick
      
        75
          seed: 42
      
        76
        export:
      
        77
          default_quant: Q4_K_M           # or Q5_K_M / Q6_K / Q8_0
      
        78
          default_temperature: 0.2        # optional; overrides dialect default
      
        79
          default_top_p: null             # optional; null keeps dialect default
      
        80
        ---
      
        81
        ```
      
        82
        
        83
        ## Field-by-field
      
        84
        
        85
        ### Top-level
      
        86
        
        87
        | Field | Type | Default | Notes |
      
        88
        |---|---|---|---|
      
        89
        | `dlm_id` | 26-char ULID | required | Assigned by `dlm init`. Never regenerated. |
      
        90
        | `dlm_version` | int ≥ 1 | `1` | Bumped by `dlm migrate` when the schema evolves. |
      
        91
        | `base_model` | non-empty str | required | Registry key or `hf:org/name`. |
      
        92
        | `system_prompt` | str or null | null | Emitted as `SYSTEM "…"` in the Modelfile on export. |
      
        93
        | `training` | object | defaults | See below. |
      
        94
        | `export` | object | defaults | See below. |
      
        95
        
        96
        ### `training`
      
        97
        
        98
        | Field | Type | Default | Notes |
      
        99
        |---|---|---|---|
      
        100
        | `adapter` | `lora` / `qlora` / `dora` | `lora` | QLoRA requires CUDA + bitsandbytes. DoRA (weight-decomposed LoRA) requires `peft >= 0.8`; ~10% training wall-clock tax for 2-4% quality uplift on multi-task fine-tunes. See `docs/cookbook/dora-vs-lora.md`. |
      
        101
        | `lora_r` | int 1..256 | 8 | LoRA rank. |
      
        102
        | `lora_alpha` | int ≥ 1 | 16 | LoRA alpha (scaling). |
      
        103
        | `lora_dropout` | float 0..0.5 | 0.05 | |
      
        104
        | `target_modules` | `auto` or list | `auto` | `auto` uses the per-architecture registry from Sprint 06. Explicit lists override. |
      
        105
        | `sequence_len` | int 64..32768 | 2048 | Max token length per example. Also emitted as Ollama `PARAMETER num_ctx`. |
      
        106
        | `micro_batch_size` | `auto` or int ≥ 1 | `auto` | Doctor picks based on VRAM. |
      
        107
        | `grad_accum` | `auto` or int ≥ 1 | `auto` | Doctor picks to reach effective batch = 8. |
      
        108
        | `learning_rate` | float > 0 | 2e-4 | |
      
        109
        | `num_epochs` | int ≥ 1 | 3 | |
      
        110
        | `optimizer` | enum | `adamw_torch` | `adamw_bnb_8bit` / `paged_adamw_8bit` for CUDA + bnb. `galore_adamw` / `galore_adamw_8bit` for rank-projected optimizer state (~40% memory reduction, paper uplift at ≥ 7B bases; `dlm doctor` warns on sub-1B). See `docs/cookbook/dora-vs-lora.md`. |
      
        111
        | `lr_scheduler` | enum | `cosine` | |
      
        112
        | `warmup_ratio` | float 0..0.5 | 0.1 | |
      
        113
        | `precision` | `bf16` / `fp16` / `fp32` or null | null | Override the doctor's auto-pick. Defaults: bf16 on Ampere+/ROCm-bf16, fp16 on older CUDA, **fp32 on MPS** (the MPS fp16 attention kernels produce NaN LoRA weights on tiny-data SFT — see bug note below). Set `fp16` on MPS only if you need the memory headroom for a 7–8B base and your data isn't pathologically small; the post-train finite-weights gate will still refuse to persist a corrupt adapter. |
      
        114
        | `seed` | int | 42 | Determinism seed. Changing it invalidates the [determinism golden](../determinism.md). |
      
        115
        | `sources` | list[SourceDirective] or null | null | Declarative file-tree ingestion. Each entry is walked at train time; matching files become synthetic PROSE sections on the CPT path. See below. |
      
        116
        | `sources_policy` | `permissive` / `strict` | `permissive` | `strict` confines directive paths to the `.dlm`'s parent subtree; `permissive` allows absolute paths anywhere. Symlink escapes are refused under strict, warned under permissive. |
      
        117
        | `gate` | GateConfig | defaults | Learned MoE-style adapter gate (schema v8). See below. |
      
        118
        | `cache` | CacheConfig | defaults | Tokenized-section cache knobs (schema v9). See below. |
      
        119
        
        120
        ### `training.gate` — GateConfig
      
        121
        
        122
        Learned adapter routing. A small MLP trained post-SFT that maps a
      
        123
        prompt embedding to per-adapter weights, replacing the hand-set
      
        124
        `--adapter-mix` for the `dlm prompt` path.
      
        125
        
        126
        | Field | Type | Default | Notes |
      
        127
        |---|---|---|---|
      
        128
        | `enabled` | bool | `false` | Opt-in. Requires `training.adapters` with ≥2 named adapters. |
      
        129
        | `hidden_proj_dim` | int 8..2048 | `64` | Gate MLP internal width. Default is ~0.5MB for 4 adapters × 2048 hidden. |
      
        130
        | `steps` | int 1..10000 | `200` | AdamW iterations for the post-SFT gate training pass. |
      
        131
        | `lr` | float 0..1 | `3e-4` | AdamW learning rate. |
      
        132
        | `cold_start_floor` | int 1..1024 | `4` | Per-adapter minimum supervising sections. Below this, gate training is skipped and a uniform-mode `gate_config.json` is written instead. |
      
        133
        | `entropy_lambda` | float 0..1 | `0.01` | Shannon-entropy regularizer on the gate loss. Higher values discourage mode collapse; lower values let the gate commit harder. |
      
        134
        
        135
        Enabling `gate` on a document without `training.adapters` (or with
      
        136
        only one adapter) is refused at parse time — a router over a single
      
        137
        adapter has nothing to route between. See
      
        138
        `docs/cookbook/learned-adapter-gate.md` for the full workflow +
      
        139
        Ollama-export fallback semantics.
      
        140
        
        141
        ### `training.audio` — AudioConfig
      
        142
        
        143
        Opt-in knobs for audio-language training. Only consulted when the
      
        144
        `base_model` is audio-language (e.g. `qwen2-audio-7b-instruct`).
      
        145
        Defaults preserve the pre-v12 contract.
      
        146
        
        147
        | Field | Type | Default | Notes |
      
        148
        |---|---|---|---|
      
        149
        | `auto_resample` | bool | `false` | When `true`, audio files whose native sample rate disagrees with the base's pinned rate resample on-the-fly via `dlm.data.audio_resample` (soxr preferred, scipy.signal.resample_poly fallback). Default `false` preserves the v11 refuse-on-mismatch contract. Cache keys carry the flag so resampled and native-rate entries never collide. |
      
        150
        
        151
        Requires either `soxr` (`pip install dlm[audio]` pulls it in) or
      
        152
        `scipy` to be importable when `auto_resample: true`; otherwise the
      
        153
        preprocessor/collator raises `AudioResampleUnavailable` at first
      
        154
        mismatched decode rather than training on the wrong rate.
      
        155
        
        156
        ### `training.cache` — CacheConfig
      
        157
        
        158
        Per-document knobs on the tokenized-section cache at
      
        159
        `~/.dlm/store/<dlm_id>/tokenized-cache/`. Defaults match the behavior
      
        160
        pre-v9 so upgrading a doc is a no-op.
      
        161
        
        162
        | Field | Type | Default | Notes |
      
        163
        |---|---|---|---|
      
        164
        | `enabled` | bool | `true` | Set `false` to skip the cache on every run of this doc (equivalent to always passing `--no-cache`). |
      
        165
        | `max_bytes` | int ≥ 1 | `10_737_418_240` (10 GiB) | LRU cap. Threaded to `TokenizedCache.open(..., max_bytes=...)`. Reads evict until size ≤ cap after a put. |
      
        166
        | `prune_older_than_days` | int ≥ 1 | `90` | Default cutoff for `dlm cache prune` when the CLI `--older-than` flag is omitted. The flag still wins when passed. |
      
        167
        
        168
        The CLI `--no-cache` flag and the `DLM_DISABLE_TOKENIZED_CACHE=1` env
      
        169
        var both override `enabled: true` for a single invocation. See the
      
        170
        cache cookbook for sizing advice.
      
        171
        
        172
        ### `training.sources[]` — SourceDirective
      
        173
        
        174
        One entry per external root to ingest. Paths resolve relative to the
      
        175
        `.dlm` file's parent when not absolute; `~` expands to `$HOME`.
      
        176
        
        177
        | Field | Type | Default | Notes |
      
        178
        |---|---|---|---|
      
        179
        | `path` | non-empty str | required | File or directory path. Relative → anchored at the `.dlm`'s parent. |
      
        180
        | `include` | list[str] | `["**/*"]` | Glob patterns (POSIX, `**` spans directories). At least one must match for a file to be ingested. |
      
        181
        | `exclude` | list[str] | `[]` | Glob patterns evaluated first; any match drops the file. |
      
        182
        | `max_bytes_per_file` | int ≥ 1 or null | null | Files larger than this are skipped with one log line. |
      
        183
        | `max_files` | int ≥ 1 or null | null | Deterministic truncation: lexicographic-sorted walk keeps the first-N. |
      
        184
        
        185
        Behavior:
      
        186
        
        187
        - **File enumeration is deterministic.** Lexicographic sort on the
      
        188
          resolved path list; identical trees on identical OSes produce
      
        189
          identical Section order.
      
        190
        - **Binary files are skipped** (NUL byte in the first KiB — the
      
        191
          standard grep heuristic). Skip count is recorded in the training
      
        192
          summary.
      
        193
        - **UTF-8 decode failures are skipped**, not fatal. Use `exclude` for
      
        194
          known-non-UTF-8 formats.
      
        195
        - **Each ingested file becomes a PROSE section** whose content is
      
        196
          prefixed with `# source: <relpath>\n\n`. The path prefix ensures
      
        197
          two files with identical bodies produce distinct `section_id`s —
      
        198
          the replay corpus tracks per-file identity, not per-content.
      
        199
        - **Integration is seamless** with in-body sections. The CPT path,
      
        200
          replay corpus, content-hash diff, and deterministic train/val
      
        201
          split all treat directive-sourced sections identically.
      
        202
        
        203
        Example:
      
        204
        
        205
        ```yaml
      
        206
        training:
      
        207
          sources_policy: permissive
      
        208
          sources:
      
        209
            - path: ~/code/quillstone-protocol
      
        210
              include: ["**/*.py", "**/*.rs"]
      
        211
              exclude: ["tests/**", "**/__pycache__/**"]
      
        212
              max_bytes_per_file: 65536
      
        213
              max_files: 5000
      
        214
            - path: ~/notes/research.md
      
        215
        ```
      
        216
        
        217
        After `dlm train`, the training summary JSON carries a
      
        218
        `source_directives: [...]` array with per-source file counts, byte
      
        219
        totals, and skip breakdowns. `dlm show --json` reports the same
      
        220
        under `training_sources`.
      
        221
        
        222
        **Secrets warning:** directive ingestion has no implicit exclude
      
        223
        list. Add explicit `exclude: ["**/.env", "**/credentials*", ...]`
      
        224
        or use `sources_policy: strict` + a curated subtree to avoid
      
        225
        training on `.env`, private keys, or other sensitive files that
      
        226
        happen to live in your codebase.
      
        227
        
        228
        ### `export`
      
        229
        
        230
        | Field | Type | Default | Notes |
      
        231
        |---|---|---|---|
      
        232
        | `default_quant` | `Q4_K_M`/`Q5_K_M`/`Q6_K`/`Q8_0` | `Q4_K_M` | Used when `dlm export --quant` isn't passed. |
      
        233
        | `default_temperature` | float 0..2 or null | null | Per-document sampling override. Emitted as Modelfile `PARAMETER temperature`. |
      
        234
        | `default_top_p` | float 0..1 or null | null | Per-document sampling override. |
      
        235
        
        236
        ## Migrations
      
        237
        
        238
        When a new version bumps `dlm_version` (e.g., adding a field),
      
        239
        `dlm migrate` runs the registered migrators in order and rewrites the
      
        240
        frontmatter in place. See Sprint 12b for the migration framework.
      
        241
        
        242
        The parser refuses to load a document whose `dlm_version` exceeds the
      
        243
        running CLI's `CURRENT_SCHEMA_VERSION`:
      
        244
        
        245
        ```
      
        246
        error: tutor.dlm:2:14 — dlm_version 2 is newer than this CLI supports (1).
      
        247
               Upgrade dlm to continue.
      
        248
        ```

1	# Frontmatter reference
2
3	The YAML block between the two `---` lines at the top of every `.dlm`
4	document. Validated with Pydantic in `dlm.doc.schema` (`extra="forbid"`,
5	`frozen=True`) — unknown keys or wrong types fail fast with a
6	`file:line:col` error.
7
8	## Minimum required frontmatter
9
10	```yaml
11	---
12	dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH
13	base_model: smollm2-135m
14	---
15	```
16
17	`dlm_id` is a 26-character Crockford base32 ULID. `dlm init` generates
18	it; don't edit it by hand.
19
20	`base_model` is either a registry key or `hf:org/name`:
21
22	\| Registry key \| HuggingFace id \|
23	\|---\|---\|
24	\| `smollm2-135m` \| HuggingFaceTB/SmolLM2-135M-Instruct \|
25	\| `smollm2-360m` \| HuggingFaceTB/SmolLM2-360M-Instruct \|
26	\| `smollm2-1.7b` \| HuggingFaceTB/SmolLM2-1.7B-Instruct \|
27	\| `qwen2.5-0.5b` \| Qwen/Qwen2.5-0.5B-Instruct \|
28	\| `qwen2.5-1.5b` \| Qwen/Qwen2.5-1.5B-Instruct \|
29	\| `qwen2.5-3b` \| Qwen/Qwen2.5-3B-Instruct \|
30	\| `qwen2.5-coder-1.5b` \| Qwen/Qwen2.5-Coder-1.5B-Instruct \|
31	\| `llama-3.2-1b` \| meta-llama/Llama-3.2-1B-Instruct (gated) \|
32	\| `llama-3.2-3b` \| meta-llama/Llama-3.2-3B-Instruct (gated) \|
33	\| `phi-3.5-mini` \| microsoft/Phi-3.5-mini-instruct \|
34
35	The shipped registry is broader than this quick-start table. Current
36	additions include:
37
38	- 2026 text-family refresh rows: `qwen3-1.7b`, `qwen3-1.7b-thinking`,
39	`qwen3-4b`, `qwen3-8b`, `llama-3.3-8b-instruct`,
40	`phi-4-mini-reasoning`, `gemma-2-2b-it`, `gemma-2-9b-it`,
41	`smollm3-3b`, `olmo-2-7b-instruct`, and `mixtral-8x7b-instruct`.
42	- Vision-language rows: `paligemma-3b-mix-224`,
43	`qwen2-vl-2b-instruct`, `internvl2-2b`, `internvl3-2b`, and
44	`mistral-small-3.1-24b-instruct`.
45	- Audio-language row: `qwen2-audio-7b-instruct`.
46
47	Off-registry bases use `hf:` prefix, e.g.
48	`base_model: hf:mistralai/Mistral-7B-Instruct-v0.3`. `dlm init` runs
49	a compatibility probe; failures abort with a clear diagnostic.
50
51	## Full frontmatter
52
53	```yaml
54	---
55	dlm_id: 01HRZYQ2X0MB5K4VN7E9DNT5GH
56	dlm_version: 1 # bumped by `dlm migrate`; default: 1
57	base_model: qwen2.5-1.5b
58	system_prompt: \|
59	You are a concise assistant.
60	training:
61	adapter: lora # or qlora (CUDA only)
62	lora_r: 8 # 1..256
63	lora_alpha: 16
64	lora_dropout: 0.05 # 0.0..0.5
65	target_modules: auto # or a list[str]
66	sequence_len: 2048 # 64..32768
67	micro_batch_size: auto # or a positive int
68	grad_accum: auto # or a positive int
69	learning_rate: 2e-4
70	num_epochs: 3
71	optimizer: adamw_torch # or adamw_bnb_8bit / paged_adamw_8bit
72	lr_scheduler: cosine # or linear / constant
73	warmup_ratio: 0.1 # 0.0..0.5
74	# precision: fp16 # optional override; default lets the doctor pick
75	seed: 42
76	export:
77	default_quant: Q4_K_M # or Q5_K_M / Q6_K / Q8_0
78	default_temperature: 0.2 # optional; overrides dialect default
79	default_top_p: null # optional; null keeps dialect default
80	---
81	```
82
83	## Field-by-field
84
85	### Top-level
86
87	\| Field \| Type \| Default \| Notes \|
88	\|---\|---\|---\|---\|
89	\| `dlm_id` \| 26-char ULID \| required \| Assigned by `dlm init`. Never regenerated. \|
90	\| `dlm_version` \| int ≥ 1 \| `1` \| Bumped by `dlm migrate` when the schema evolves. \|
91	\| `base_model` \| non-empty str \| required \| Registry key or `hf:org/name`. \|
92	\| `system_prompt` \| str or null \| null \| Emitted as `SYSTEM "…"` in the Modelfile on export. \|
93	\| `training` \| object \| defaults \| See below. \|
94	\| `export` \| object \| defaults \| See below. \|
95
96	### `training`
97
98	\| Field \| Type \| Default \| Notes \|
99	\|---\|---\|---\|---\|
100	\| `adapter` \| `lora` / `qlora` / `dora` \| `lora` \| QLoRA requires CUDA + bitsandbytes. DoRA (weight-decomposed LoRA) requires `peft >= 0.8`; ~10% training wall-clock tax for 2-4% quality uplift on multi-task fine-tunes. See `docs/cookbook/dora-vs-lora.md`. \|
101	\| `lora_r` \| int 1..256 \| 8 \| LoRA rank. \|
102	\| `lora_alpha` \| int ≥ 1 \| 16 \| LoRA alpha (scaling). \|
103	\| `lora_dropout` \| float 0..0.5 \| 0.05 \| \|
104	\| `target_modules` \| `auto` or list \| `auto` \| `auto` uses the per-architecture registry from Sprint 06. Explicit lists override. \|
105	\| `sequence_len` \| int 64..32768 \| 2048 \| Max token length per example. Also emitted as Ollama `PARAMETER num_ctx`. \|
106	\| `micro_batch_size` \| `auto` or int ≥ 1 \| `auto` \| Doctor picks based on VRAM. \|
107	\| `grad_accum` \| `auto` or int ≥ 1 \| `auto` \| Doctor picks to reach effective batch = 8. \|
108	\| `learning_rate` \| float > 0 \| 2e-4 \| \|
109	\| `num_epochs` \| int ≥ 1 \| 3 \| \|
110	\| `optimizer` \| enum \| `adamw_torch` \| `adamw_bnb_8bit` / `paged_adamw_8bit` for CUDA + bnb. `galore_adamw` / `galore_adamw_8bit` for rank-projected optimizer state (~40% memory reduction, paper uplift at ≥ 7B bases; `dlm doctor` warns on sub-1B). See `docs/cookbook/dora-vs-lora.md`. \|
111	\| `lr_scheduler` \| enum \| `cosine` \| \|
112	\| `warmup_ratio` \| float 0..0.5 \| 0.1 \| \|
113	\| `precision` \| `bf16` / `fp16` / `fp32` or null \| null \| Override the doctor's auto-pick. Defaults: bf16 on Ampere+/ROCm-bf16, fp16 on older CUDA, fp32 on MPS (the MPS fp16 attention kernels produce NaN LoRA weights on tiny-data SFT — see bug note below). Set `fp16` on MPS only if you need the memory headroom for a 7–8B base and your data isn't pathologically small; the post-train finite-weights gate will still refuse to persist a corrupt adapter. \|
114	\| `seed` \| int \| 42 \| Determinism seed. Changing it invalidates the [determinism golden](../determinism.md). \|
115	\| `sources` \| list[SourceDirective] or null \| null \| Declarative file-tree ingestion. Each entry is walked at train time; matching files become synthetic PROSE sections on the CPT path. See below. \|
116	\| `sources_policy` \| `permissive` / `strict` \| `permissive` \| `strict` confines directive paths to the `.dlm`'s parent subtree; `permissive` allows absolute paths anywhere. Symlink escapes are refused under strict, warned under permissive. \|
117	\| `gate` \| GateConfig \| defaults \| Learned MoE-style adapter gate (schema v8). See below. \|
118	\| `cache` \| CacheConfig \| defaults \| Tokenized-section cache knobs (schema v9). See below. \|
119
120	### `training.gate` — GateConfig
121
122	Learned adapter routing. A small MLP trained post-SFT that maps a
123	prompt embedding to per-adapter weights, replacing the hand-set
124	`--adapter-mix` for the `dlm prompt` path.
125
126	\| Field \| Type \| Default \| Notes \|
127	\|---\|---\|---\|---\|
128	\| `enabled` \| bool \| `false` \| Opt-in. Requires `training.adapters` with ≥2 named adapters. \|
129	\| `hidden_proj_dim` \| int 8..2048 \| `64` \| Gate MLP internal width. Default is ~0.5MB for 4 adapters × 2048 hidden. \|
130	\| `steps` \| int 1..10000 \| `200` \| AdamW iterations for the post-SFT gate training pass. \|
131	\| `lr` \| float 0..1 \| `3e-4` \| AdamW learning rate. \|
132	\| `cold_start_floor` \| int 1..1024 \| `4` \| Per-adapter minimum supervising sections. Below this, gate training is skipped and a uniform-mode `gate_config.json` is written instead. \|
133	\| `entropy_lambda` \| float 0..1 \| `0.01` \| Shannon-entropy regularizer on the gate loss. Higher values discourage mode collapse; lower values let the gate commit harder. \|
134
135	Enabling `gate` on a document without `training.adapters` (or with
136	only one adapter) is refused at parse time — a router over a single
137	adapter has nothing to route between. See
138	`docs/cookbook/learned-adapter-gate.md` for the full workflow +
139	Ollama-export fallback semantics.
140
141	### `training.audio` — AudioConfig
142
143	Opt-in knobs for audio-language training. Only consulted when the
144	`base_model` is audio-language (e.g. `qwen2-audio-7b-instruct`).
145	Defaults preserve the pre-v12 contract.
146
147	\| Field \| Type \| Default \| Notes \|
148	\|---\|---\|---\|---\|
149	\| `auto_resample` \| bool \| `false` \| When `true`, audio files whose native sample rate disagrees with the base's pinned rate resample on-the-fly via `dlm.data.audio_resample` (soxr preferred, scipy.signal.resample_poly fallback). Default `false` preserves the v11 refuse-on-mismatch contract. Cache keys carry the flag so resampled and native-rate entries never collide. \|
150
151	Requires either `soxr` (`pip install dlm[audio]` pulls it in) or
152	`scipy` to be importable when `auto_resample: true`; otherwise the
153	preprocessor/collator raises `AudioResampleUnavailable` at first
154	mismatched decode rather than training on the wrong rate.
155
156	### `training.cache` — CacheConfig
157
158	Per-document knobs on the tokenized-section cache at
159	`~/.dlm/store/<dlm_id>/tokenized-cache/`. Defaults match the behavior
160	pre-v9 so upgrading a doc is a no-op.
161
162	\| Field \| Type \| Default \| Notes \|
163	\|---\|---\|---\|---\|
164	\| `enabled` \| bool \| `true` \| Set `false` to skip the cache on every run of this doc (equivalent to always passing `--no-cache`). \|
165	\| `max_bytes` \| int ≥ 1 \| `10_737_418_240` (10 GiB) \| LRU cap. Threaded to `TokenizedCache.open(..., max_bytes=...)`. Reads evict until size ≤ cap after a put. \|
166	\| `prune_older_than_days` \| int ≥ 1 \| `90` \| Default cutoff for `dlm cache prune` when the CLI `--older-than` flag is omitted. The flag still wins when passed. \|
167
168	The CLI `--no-cache` flag and the `DLM_DISABLE_TOKENIZED_CACHE=1` env
169	var both override `enabled: true` for a single invocation. See the
170	cache cookbook for sizing advice.
171
172	### `training.sources[]` — SourceDirective
173
174	One entry per external root to ingest. Paths resolve relative to the
175	`.dlm` file's parent when not absolute; `~` expands to `$HOME`.
176
177	\| Field \| Type \| Default \| Notes \|
178	\|---\|---\|---\|---\|
179	\| `path` \| non-empty str \| required \| File or directory path. Relative → anchored at the `.dlm`'s parent. \|
180	\| `include` \| list[str] \| `["*/"]` \| Glob patterns (POSIX, `**` spans directories). At least one must match for a file to be ingested. \|
181	\| `exclude` \| list[str] \| `[]` \| Glob patterns evaluated first; any match drops the file. \|
182	\| `max_bytes_per_file` \| int ≥ 1 or null \| null \| Files larger than this are skipped with one log line. \|
183	\| `max_files` \| int ≥ 1 or null \| null \| Deterministic truncation: lexicographic-sorted walk keeps the first-N. \|
184
185	Behavior:
186
187	- File enumeration is deterministic. Lexicographic sort on the
188	resolved path list; identical trees on identical OSes produce
189	identical Section order.
190	- Binary files are skipped (NUL byte in the first KiB — the
191	standard grep heuristic). Skip count is recorded in the training
192	summary.
193	- UTF-8 decode failures are skipped, not fatal. Use `exclude` for
194	known-non-UTF-8 formats.
195	- Each ingested file becomes a PROSE section whose content is
196	prefixed with `# source: <relpath>\n\n`. The path prefix ensures
197	two files with identical bodies produce distinct `section_id`s —
198	the replay corpus tracks per-file identity, not per-content.
199	- Integration is seamless with in-body sections. The CPT path,
200	replay corpus, content-hash diff, and deterministic train/val
201	split all treat directive-sourced sections identically.
202
203	Example:
204
205	```yaml
206	training:
207	sources_policy: permissive
208	sources:
209	- path: ~/code/quillstone-protocol
210	include: ["*/.py", "*/.rs"]
211	exclude: ["tests/", "/__pycache__/**"]
212	max_bytes_per_file: 65536
213	max_files: 5000
214	- path: ~/notes/research.md
215	```
216
217	After `dlm train`, the training summary JSON carries a
218	`source_directives: [...]` array with per-source file counts, byte
219	totals, and skip breakdowns. `dlm show --json` reports the same
220	under `training_sources`.
221
222	Secrets warning: directive ingestion has no implicit exclude
223	list. Add explicit `exclude: ["/.env", "/credentials*", ...]`
224	or use `sources_policy: strict` + a curated subtree to avoid
225	training on `.env`, private keys, or other sensitive files that
226	happen to live in your codebase.
227
228	### `export`
229
230	\| Field \| Type \| Default \| Notes \|
231	\|---\|---\|---\|---\|
232	\| `default_quant` \| `Q4_K_M`/`Q5_K_M`/`Q6_K`/`Q8_0` \| `Q4_K_M` \| Used when `dlm export --quant` isn't passed. \|
233	\| `default_temperature` \| float 0..2 or null \| null \| Per-document sampling override. Emitted as Modelfile `PARAMETER temperature`. \|
234	\| `default_top_p` \| float 0..1 or null \| null \| Per-document sampling override. \|
235
236	## Migrations
237
238	When a new version bumps `dlm_version` (e.g., adding a field),
239	`dlm migrate` runs the registered migrators in order and rewrites the
240	frontmatter in place. See Sprint 12b for the migration framework.
241
242	The parser refuses to load a document whose `dlm_version` exceeds the
243	running CLI's `CURRENT_SCHEMA_VERSION`:
244
245	```
246	error: tutor.dlm:2:14 — dlm_version 2 is newer than this CLI supports (1).
247	Upgrade dlm to continue.
248	```

Frontmatter reference

Minimum required frontmatter

Full frontmatter

Field-by-field

Top-level

training

training.gate — GateConfig

training.audio — AudioConfig

training.cache — CacheConfig

training.sources[] — SourceDirective

export