Refresh Sprint 40 closeout proofs

Status	File	+	-
M	`docs/cookbook/audio-training.md`	1	4
A	`docs/cookbook/choosing-a-base.md`	37	0
M	`docs/cookbook/multimodal-training.md`	11	5
M	`docs/format/frontmatter.md`	5	5
A	`docs/hardware/memory-estimates.md`	38	0
M	`docs/hardware/vl-memory.md`	23	16
M	`docs/index.md`	5	4
M	`mkdocs.yml`	3	0
M	`src/dlm/base_models/registry.py`	63	59
M	`src/dlm/base_models/resolver.py`	1	0
M	`src/dlm/base_models/schema.py`	1	0
A	`src/dlm/base_models/templates/qwen3thinking.jinja`	14	0
M	`src/dlm/export/ollama/template_registry.py`	21	1
A	`src/dlm/export/ollama/templates/qwen3thinking.gotmpl`	5	0
A	`tests/integration/base_models/test_13_entries_scaffold.py`	45	0
M	`tests/integration/cli/test_registry_refresh_init.py`	19	25
A	`tests/integration/gate/test_mixtral_gate_smoke.py`	54	0
M	`tests/unit/base_models/test_audio_registry.py`	5	5
M	`tests/unit/base_models/test_registry.py`	6	1
M	`tests/unit/base_models/test_registry_2026.py`	41	25
M	`tests/unit/base_models/test_schema.py`	11	1
M	`tests/unit/base_models/test_vl_registry.py`	41	9
A	`tests/unit/doc/test_v12_migrator.py`	36	0
M	`tests/unit/export/ollama/test_template_registry.py`	33	3
A	`tests/unit/export/test_mixtral_template.py`	38	0
A	`tests/unit/export/test_phi4_template.py`	37	0
A	`tests/unit/export/test_qwen3_template.py`	37	0

docs/cookbook/audio-training.mdmodified

    24 GB VRAM. Qwen2-Audio-7B-Instruct fp16 weighs ~15 GB; the 16 GB
    consumer GPUs don't fit this base without quantization (4-bit audio
    training is deferred).
--- A Hugging Face account with the [Qwen2-Audio-7B-Instruct terms
--  accepted](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct) and
--  `HF_TOKEN` exported.
  - Qwen2-Audio cached locally (`huggingface-cli download
    Qwen/Qwen2-Audio-7B-Instruct`). First train without this triggers
    the download automatically.
  ## Step 1 — Scaffold an audio `.dlm`
  ```bash
--dlm init my-audio.dlm --audio --i-accept-license
++dlm init my-audio.dlm --audio
  ```
  `--audio` pins the base to `qwen2-audio-7b-instruct` and emits a

docs/cookbook/choosing-a-base.mdadded

++# Choosing a base
++
++The fastest way to pick a DLM base is to decide three things first:
++
++1. Do you need plain text, multimodal vision, or audio?
++2. Do you want the most permissive license possible, or are gated rows fine?
++3. Are you targeting Apple Silicon, a mid-size CUDA card, or a large CUDA box?
++
++## Quick picks
++
++| If you want… | Start with… | Why |
++|---|---|---|
++| Fast local iteration on almost any laptop | `smollm2-135m` | Tiny, cheap, and ideal for testing authoring loops. |
++| Best general-purpose 2026 text base around the 4B tier | `qwen3-4b` | Strong default quality, permissive license, and current-generation tokenizer/chat behavior. |
++| A reasoning-first 1.7B profile | `qwen3-1.7b-thinking` | Same upstream Qwen3 weights, but a curated reasoning-profile key with cooler defaults. |
++| Fully open-model story | `olmo-2-7b-instruct` | Open weights and open-data lineage make it the cleanest reproducibility pitch. |
++| Apache sparse-MoE experiments | `mixtral-8x7b-instruct` | First `text-moe` row in the registry; pairs with the learned gate work. |
++| Small gated text base | `gemma-2-2b-it` | Useful when Gemma’s instruction style or ecosystem matters more than license friction. |
++| Larger gated text base | `gemma-2-9b-it` | Upper-tier Gemma pick; large enough to want real GPU planning. |
++| Large multimodal capability | `mistral-small-3.1-24b-instruct` | Strongest shipped VL row, but large-CUDA-first. |
++| Safe default multimodal row on a smaller box | `qwen2-vl-2b-instruct` | Permissive, solid, and compatible with the current generic VL runtime. |
++| Audio-language training | `qwen2-audio-7b-instruct` | Current shipped audio row; open-license and no longer gated on HF. |
++
++## Notes on the sharp edges
++
++- `llama-3.3-8b-instruct` is still treated like the Llama family in DLM’s policy surface: acceptance required, not redistributable, and intended for users who already know they want the Llama line.
++- `internvl2-2b` and `internvl3-2b` are registry-visible planning targets, but the current generic VL runtime still refuses the InternVL family until DLM owns its custom processor/collator contract.
++- `mistral-small-3.1-24b-instruct` is intentionally refused on MPS by default. It is a real shipped row, just not a casual laptop target.
++
++## Hardware-first view
++
++- Apple Silicon, 16 GB: `smollm2-*`, `qwen2.5-*`, `qwen3-1.7b`, and `qwen3-4b` are the comfortable text picks; `qwen2-vl-2b-instruct` is the safer VL row.
++- Apple Silicon, 32 GB+: `qwen3-8b`, `gemma-2-2b-it`, and `phi-4-mini-reasoning` become practical. Large VL rows still need caution.
++- CUDA, 24 GB: this is where `gemma-2-9b-it`, `mixtral-8x7b-instruct`, and the heavier multimodal rows start becoming realistic.
++- CUDA, 48 GB+: this is the intended home for `mistral-small-3.1-24b-instruct`.
++
++See [hardware/memory-estimates](../hardware/memory-estimates.md) for the text-family budget table and [hardware/vl-memory](../hardware/vl-memory.md) for the VL rows.

docs/cookbook/multimodal-training.mdmodified

  ### Picking a different VL base
--Four VL bases ship in the registry today:
++Five VL bases ship in the registry today:
  ```bash
  # Permissive + Apache-2.0 + strong general-purpose VL (pinned 672²):
  # MIT-licensed, smallest per-image footprint (448²):
  dlm init my-diagrams.dlm --multimodal --base internvl2-2b
++# Newer InternVL planning row (dynamic 448-tiling, still runtime-deferred):
++dlm init my-diagrams.dlm --multimodal --base internvl3-2b
++
  # Largest-capability VL row, CUDA-first (pinned 1540²):
  dlm init my-diagrams.dlm --multimodal --base mistral-small-3.1-24b-instruct
  the registry, but on the current stack DLM now refuses it for actual
  prompt/train/HF-snapshot-export work. The upstream family still needs a
  custom processor/collator path for its tokenizer-only `AutoProcessor`,
--`<image>` expansion, and `image_flags` forward contract. That same
++`<image>` expansion, and `image_flags` forward contract. The same
--family gap is the reason `internvl3-2b` has not been added yet.
++family gap applies to `internvl3-2b` as well: it is now registry-
++visible and scaffoldable, but the generic runtime still refuses the
++whole InternVL family until DLM owns that custom contract.
  **Heads-up on Mistral Small 3.1**: it is a real VL registry row now,
  but it is intentionally treated as a large-CUDA-first base. `dlm
  doctor` refuses it on Apple Silicon by default unless you explicitly
    None of the registered bases hit this verdict at the pinned tag.
  - **UNSUPPORTED** — llama.cpp doesn't know the arch at all. Falls
    back to HF-snapshot with an actionable banner naming the arch
--  class and the vendored tag. **paligemma-3b-mix-224** and
++  class and the vendored tag. **paligemma-3b-mix-224**,
--  **internvl2-2b** are UNSUPPORTED at the pinned tag.
++  **internvl2-2b**, and **internvl3-2b** are UNSUPPORTED at the
++  pinned tag.
  See [docs/hardware/vl-memory.md](../hardware/vl-memory.md#llamacpp-gguf-support-matrix-sprint-354)
  for the current support verdicts; bump the vendored tag with

docs/format/frontmatter.mdmodified

  The shipped registry is broader than this quick-start table. Current
  additions include:
--- 2026 text-family refresh rows: `qwen3-1.7b`, `qwen3-4b`, `qwen3-8b`,
++- 2026 text-family refresh rows: `qwen3-1.7b`, `qwen3-1.7b-thinking`,
--  `llama-3.3-8b-instruct`, `phi-4-mini-reasoning`, `gemma-2-2b-it`,
++  `qwen3-4b`, `qwen3-8b`, `llama-3.3-8b-instruct`,
--  `gemma-2-9b-it`, `smollm3-3b`, `olmo-2-7b-instruct`, and
++  `phi-4-mini-reasoning`, `gemma-2-2b-it`, `gemma-2-9b-it`,
--  `mixtral-8x7b-instruct`.
++  `smollm3-3b`, `olmo-2-7b-instruct`, and `mixtral-8x7b-instruct`.
  - Vision-language rows: `paligemma-3b-mix-224`,
--  `qwen2-vl-2b-instruct`, `internvl2-2b`, and
++  `qwen2-vl-2b-instruct`, `internvl2-2b`, `internvl3-2b`, and
    `mistral-small-3.1-24b-instruct`.
  - Audio-language row: `qwen2-audio-7b-instruct`.

docs/hardware/memory-estimates.mdadded

++# Memory estimates
++
++These are planning numbers, not a promise. DLM’s doctor still does the
++real refusal/fit decision, but the table below is the quick mental map
++for the Sprint 40 refresh rows that changed the most user expectations.
++
++## Text-family checkpoints
++
++| Base | fp16 weights | Practical target |
++|---|---:|---|
++| `qwen3-8b` | ~16 GB | 24 GB CUDA or high-memory Apple Silicon for LoRA; lighter inference on smaller boxes. |
++| `llama-3.3-8b-instruct` | ~16.5 GB | Same class as other 8B text rows: real GPU planning required for training. |
++| `gemma-2-9b-it` | ~18 GB | 24 GB CUDA is the comfortable floor. |
++| `mistral-small-3.1-24b-instruct` | ~48 GB | Large-CUDA-first. Refused on MPS by default unless forced. |
++
++## What the doctor is approximating
++
++For LoRA/QLoRA, the planner estimates:
++
++- base weights at the chosen load precision
++- activation memory from `sequence_len × micro_batch × layers`
++- optimizer state for the trainable adapter params
++- LoRA parameter storage
++- a 20% safety margin on top
++
++That estimator lives in `src/dlm/hardware/memory.py` and is intentionally conservative.
++
++## Rules of thumb
++
++- 8B-class rows are where laptop experimentation starts turning into real hardware planning.
++- 9B-class rows are usually fine on 24 GB CUDA, but not “casual” on smaller hosts.
++- 24B-class rows are not broad consumer defaults. In DLM they are treated as explicit high-capacity picks.
++- MPS can be surprisingly good for text LoRA, but DLM now refuses oversized bases like `mistral-small-3.1-24b-instruct` by default because unified memory headroom disappears too quickly.
++
++## Related
++
++- [Choosing a base](../cookbook/choosing-a-base.md)
++- [Vision-language memory budget](vl-memory.md)

docs/hardware/vl-memory.mdmodified

  # Vision-language memory budget
--Four VL rows now ship in the registry: **PaliGemma-3B-mix-224**,
++Five VL rows now ship in the registry: **PaliGemma-3B-mix-224**,
--**Qwen2-VL-2B-Instruct**, **InternVL2-2B**, and
++**Qwen2-VL-2B-Instruct**, **InternVL2-2B**, **InternVL3-2B**, and
  **Mistral-Small-3.1-24B-Instruct-2503**. Each row carries a pinned
  preprocessing plan; dynamic-resolution support (Qwen2-VL's native
  capability, Mistral Small 3.1's longer-edge policy, and the broader
  **Reality check.** The generic VL train/prompt path is complete today
  for PaliGemma, Qwen2-VL, and Mistral Small 3.1. InternVL2 remains
--registry-visible for planning and future support, but on the current
++registry-visible for planning and future support, and InternVL3 now
--transformers stack its HF path still exposes a tokenizer-only
++joins it under the same honest caveat: on the current transformers
--`AutoProcessor` and needs a custom collator/runtime contract. DLM now
++stack the InternVL family still exposes a tokenizer-only
++`AutoProcessor` and needs a custom collator/runtime contract. DLM
  refuses that family with a clear error instead of pretending the
  generic VL path is enough.
  | paligemma-3b-mix-224      | Gemma (gated) | The cleanest PEFT path + proven chart/doc QA; accept the Gemma license first. |
  | qwen2-vl-2b-instruct      | Apache-2.0 | Permissive licensing + strong general-purpose VL; dynamic-res is capped to 672² in v1 but native runtime supports more. |
  | internvl2-2b              | MIT        | Registry-visible planning target for a future custom InternVL path; current train/prompt/export-snapshot flows refuse it on this stack. |
++| internvl3-2b              | Apache-2.0 | Newer InternVL planning target with dynamic 448-tiling and `trust_remote_code`; currently registry-visible but still refused by the generic runtime. |
  | mistral-small-3.1-24b-instruct | Apache-2.0 | Highest-capability VL row in the registry today; targets large CUDA boxes first and is refused on MPS by default unless you explicitly force it. |
  ## PaliGemma-3B-mix-224 (224×224, fp16)
  trims ~30% of peak; `training.gradient_checkpointing: true` in
  frontmatter enables it.
--## InternVL2-2B (448×448, fp16)
++## InternVL2-2B / InternVL3-2B (448×448, fp16)
  InternVL2 uses ViT-L/14 + pixel-shuffle 2×2 so 448² input yields 256
  image tokens per 448-tile — the smallest InternVL-family budget and
--the cheapest of the four rows on paper.
++the cheapest of the registry rows on paper. InternVL3 keeps the same
++448 target size but switches the registry row to `resize_policy:
++dynamic` and a user-visible `<image>` placeholder while still
++expanding into the same hidden InternVL context window at runtime.
  | Config          | Base weights | Adapter | Activations | Total (peak) |
  |-----------------|-------------:|--------:|------------:|-------------:|
  memory alone. 12 GB CUDA would handle batch=1; 16 GB CUDA would handle
  batch=4.
--**Current runtime status.** This row is not trainable/promptable via
++**Current runtime status.** These rows are not trainable/promptable via
--the generic VL path today. InternVL2 ships as `InternVLChatModel`, a
++the generic VL path today. InternVL2 and InternVL3 both ship as
--custom remote-code family whose upstream runtime expands `<image>` into
++`InternVLChatModel`, a custom remote-code family whose upstream runtime
--repeated `<IMG_CONTEXT>` spans and threads `image_flags` through the
++expands `<image>` into repeated `<IMG_CONTEXT>` spans and threads
--forward pass. On the current stack, `AutoProcessor.from_pretrained(...)`
++`image_flags` through the forward pass. On the current stack,
--resolves to a tokenizer-only object, so DLM refuses the family early
++`AutoProcessor.from_pretrained(...)` resolves to a tokenizer-only
--instead of failing later inside the model. Keep the budget numbers here
++object, so DLM refuses the family early instead of failing later inside
--for planning, but use PaliGemma, Qwen2-VL, or Mistral Small 3.1 for
++the model. Keep the budget numbers here for planning, but use
--actual runs today.
++PaliGemma, Qwen2-VL, or Mistral Small 3.1 for actual runs today.
  ## Mistral Small 3.1 24B Instruct (pinned 1540×1540, fp16)
  | paligemma-3b-mix-224      | PaliGemmaForConditionalGeneration   | UNSUPPORTED  |
  | qwen2-vl-2b-instruct      | Qwen2VLForConditionalGeneration     | SUPPORTED    |
  | internvl2-2b              | InternVLChatModel                   | UNSUPPORTED  |
++| internvl3-2b              | InternVLChatModel                   | UNSUPPORTED  |
  **UNSUPPORTED** means `dlm export` falls back to the HF-snapshot path
  with an actionable banner. **SUPPORTED** means single-file VL GGUF
  |---------------------------|------------:|----------------:|
  | paligemma-3b-mix-224      |     224×224 |        ~0.5 MB  |
  | internvl2-2b              |     448×448 |        ~2.0 MB  |
++| internvl3-2b              |     448×448 |        ~2.0 MB  |
  | qwen2-vl-2b-instruct      |     672×672 |        ~4.5 MB  |
  | mistral-small-3.1-24b-instruct | 1540×1540 |       ~23.5 MB  |

docs/index.mdmodified

    control is both the prose you're training on and the configuration
    for how the training runs. Edit, retrain, share.
  - **Real pretrained bases.** SmolLM2-135M for fast iteration; newer
--  registry rows like Qwen3, Llama 3.3, Gemma 2, SmolLM3, Phi-4-mini-
++  registry rows like Qwen3 (including a reasoning-profile key),
--  reasoning, OLMo-2, Mixtral, and Mistral Small 3.1 cover current
++  Llama 3.3, Gemma 2, SmolLM3, Phi-4-mini-reasoning, OLMo-2, Mixtral,
--  text, sparse-MoE, and multimodal use cases. No from-scratch
++  Mistral Small 3.1, and InternVL3 cover current text, sparse-MoE,
--  transformers, no toy experiments.
++  and multimodal planning use cases. No from-scratch transformers,
++  no toy experiments.
  - **Deterministic by contract.** Same document + same hardware tier +
    pinned versions produce bit-identical adapters. [Determinism](determinism.md)
    is a first-class feature.

mkdocs.ymlmodified

        - .dlm/ignore: format/dlm-ignore.md
    - CLI reference: cli/reference.md
    - Cookbook:
++      - Choosing a base: cookbook/choosing-a-base.md
        - Coding tutor: cookbook/coding-tutor.md
        - Domain knowledge base: cookbook/domain-kb.md
        - Writing partner: cookbook/writing-partner.md
    - Architecture: architecture.md
    - Determinism: determinism.md
    - Hardware:
++      - Memory estimates: hardware/memory-estimates.md
++      - Vision-language memory: hardware/vl-memory.md
        - AMD ROCm: hardware/rocm.md
    - Troubleshooting: troubleshooting.md

src/dlm/base_models/registry.pymodified

      BaseModelSpec(
          key="qwen3-1.7b",
          hf_id="Qwen/Qwen3-1.7B",
--        # Placeholder SHA: format-valid, not a real HF commit. The
++        revision="70d244cc86ccca08cf5af4e1e306ecf908b1ad5e",
--        # weekly `scripts/refresh-registry.py --check` run surfaces
--        # drift and prints the live value for manual review.
--        revision="1a2b3c4d5e6f7890abcdeffedcba0987654321ab",
          architecture="Qwen3ForCausalLM",
          params=1_700_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
          recommended_seq_len=2048,
          reasoning_tuned=True,
      ),
++    BaseModelSpec(
++        key="qwen3-1.7b-thinking",
++        hf_id="Qwen/Qwen3-1.7B",
++        revision="70d244cc86ccca08cf5af4e1e306ecf908b1ad5e",
++        architecture="Qwen3ForCausalLM",
++        params=1_700_000_000,
++        target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
++        template="qwen3thinking",
++        gguf_arch="qwen3",
++        tokenizer_pre="qwen2",
++        license_spdx="Apache-2.0",
++        license_url="https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/LICENSE",
++        requires_acceptance=False,
++        redistributable=True,
++        size_gb_fp16=3.4,
++        context_length=32_768,
++        recommended_seq_len=2048,
++        reasoning_tuned=True,
++    ),
      BaseModelSpec(
          key="qwen3-4b",
          hf_id="Qwen/Qwen3-4B",
--        revision="2b3c4d5e6f7890abcdeffedcba0987654321abc2",
++        revision="1cfa9a7208912126459214e8b04321603b3df60c",
          architecture="Qwen3ForCausalLM",
          params=4_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="qwen3-8b",
          hf_id="Qwen/Qwen3-8B",
--        revision="3c4d5e6f7890abcdeffedcba0987654321abc2d3",
++        revision="b968826d9c46dd6066d109eabc6255188de91218",
          architecture="Qwen3ForCausalLM",
          params=8_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="smollm3-3b",
          hf_id="HuggingFaceTB/SmolLM3-3B",
--        # Placeholder SHA: format-valid, not a real HF commit. The
++        revision="a07cc9a04f16550a088caea529712d1d335b0ac1",
--        # weekly `scripts/refresh-registry.py --check` run surfaces
--        # drift and prints the live value for manual review.
--        revision="5e6f7890abcdeffedcba0987654321abc2d3e4f5",
          architecture="SmolLM3ForCausalLM",
          params=3_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="olmo-2-7b-instruct",
          hf_id="allenai/OLMo-2-1124-7B-Instruct",
--        # Placeholder SHA: format-valid, not a real HF commit. The
++        revision="470b1fba1ae01581f270116362ee4aa1b97f4c84",
--        # weekly `scripts/refresh-registry.py --check` run surfaces
--        # drift and prints the live value for manual review.
--        revision="6f7890abcdeffedcba0987654321abc2d3e4f5a6",
          architecture="Olmo2ForCausalLM",
          params=7_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="gemma-2-2b-it",
          hf_id="google/gemma-2-2b-it",
--        # Placeholder SHA: format-valid, not a real HF commit. The
++        revision="299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8",
--        # weekly `scripts/refresh-registry.py --check` run surfaces
--        # drift and prints the live value for manual review.
--        revision="7a890abcdeffedcba0987654321abc2d3e4f5a6b",
          architecture="Gemma2ForCausalLM",
          params=2_600_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="gemma-2-9b-it",
          hf_id="google/gemma-2-9b-it",
--        # Placeholder SHA: format-valid, not a real HF commit. The
++        revision="11c9b309abf73637e4b6f9a3fa1e92e615547819",
--        # weekly `scripts/refresh-registry.py --check` run surfaces
--        # drift and prints the live value for manual review.
--        revision="8f90abcdeffedcba0987654321abc2d3e4f5a6b7",
          architecture="Gemma2ForCausalLM",
          params=9_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="phi-4-mini-reasoning",
          hf_id="microsoft/Phi-4-mini-reasoning",
--        # Placeholder SHA: format-valid, not a real HF commit. The
++        revision="0e3b1e2d02ee478a3743abe3f629e9c0cb722e0a",
--        # weekly `scripts/refresh-registry.py --check` run surfaces
--        # drift and prints the live value for manual review.
--        revision="9a0bcdeffedcba0987654321abc2d3e4f5a6b7c8",
          architecture="Phi3ForCausalLM",
          params=3_800_000_000,
          target_modules=["qkv_proj", "o_proj", "gate_up_proj", "down_proj"],
      BaseModelSpec(
          key="mixtral-8x7b-instruct",
          hf_id="mistralai/Mixtral-8x7B-Instruct-v0.1",
--        # Placeholder SHA: format-valid, not a real HF commit. The
++        revision="eba92302a2861cdc0098cc54bc9f17cb2c47eb61",
--        # weekly `scripts/refresh-registry.py --check` run surfaces
--        # drift and prints the live value for manual review.
--        revision="bc0deffedcba0987654321abc2d3e4f5a6b7c8d9",
          architecture="MixtralForCausalLM",
          params=46_700_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="mistral-small-3.1-24b-instruct",
          hf_id="mistralai/Mistral-Small-3.1-24B-Instruct-2503",
--        # Placeholder SHA: format-valid, not a real HF commit. The
++        revision="68faf511d618ef198fef186659617cfd2eb8e33a",
--        # weekly `scripts/refresh-registry.py --check` run surfaces
--        # drift and prints the live value for manual review.
--        revision="ab0cdeffedcba0987654321abc2d3e4f5a6b7c8d",
          architecture="Mistral3ForConditionalGeneration",
          params=24_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="paligemma-3b-mix-224",
          hf_id="google/paligemma-3b-mix-224",
--        # Placeholder SHA: format-valid, not a real HF commit. The
++        revision="d1d8734c9c3ad0ccfeea4afc270faa356c2ba515",
--        # weekly `scripts/refresh-registry.py --check` run surfaces
--        # it as drift; a maintainer pastes in the observed SHA from
--        # the script's output. Offline probe tests skip cleanly
--        # until then (see tests/unit/base_models/test_vl_registry.py).
--        # To verify, run:
--        #     uv run python scripts/refresh-registry.py --check
--        revision="8d2f7bc9c15d71a00c14f9eb7e4c7b99c79e0a11",
          architecture="PaliGemmaForConditionalGeneration",
          params=2_900_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="qwen2-vl-2b-instruct",
          hf_id="Qwen/Qwen2-VL-2B-Instruct",
--        # Placeholder SHA (format-valid, not a real commit). See the
++        revision="895c3a49bc3fa70a340399125c650a463535e71c",
--        # paligemma entry for the self-healing workflow via
--        # `scripts/refresh-registry.py --check`.
--        revision="c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9",
          architecture="Qwen2VLForConditionalGeneration",
          params=2_200_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="internvl2-2b",
          hf_id="OpenGVLab/InternVL2-2B",
--        # Placeholder SHA (format-valid, not a real commit).
++        revision="e4f6747bd20f139e637642c6a058c6bd00b36919",
--        revision="d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0",
          architecture="InternVLChatModel",
          params=2_200_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
              num_image_tokens=256,
          ),
      ),
++    BaseModelSpec(
++        key="internvl3-2b",
++        hf_id="OpenGVLab/InternVL3-2B",
++        revision="899155015275a9b7338c7f4677e19c784e0e5a21",
++        architecture="InternVLChatModel",
++        params=2_000_000_000,
++        target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
++        template="internvl2",
++        gguf_arch="internvl3",
++        tokenizer_pre="internvl3",
++        license_spdx="Apache-2.0",
++        license_url="https://huggingface.co/OpenGVLab/InternVL3-2B",
++        requires_acceptance=False,
++        redistributable=True,
++        trust_remote_code=True,
++        size_gb_fp16=4.0,
++        context_length=32_768,
++        recommended_seq_len=2048,
++        modality="vision-language",
++        vl_preprocessor_plan=VlPreprocessorPlan(
++            target_size=(448, 448),
++            resize_policy="dynamic",
++            image_token="<image>",
++            num_image_tokens=256,
++        ),
++    ),
      # --- Audio-language bases -----------------------------------------------
      # Qwen2-Audio-7B-Instruct — Alibaba's open audio-text model. Uses
      # the Qwen2 LLM backbone + a dedicated audio encoder. Apache-2.0
--    # but the 7B checkpoint is gated on HF via license acceptance, so
++    # and currently ungated on HF, so the registry keeps it open and
--    # `requires_acceptance=True` flows through the same pattern the
++    # redistributable like the other permissive Qwen rows.
--    # Llama-3.2 / PaliGemma entries use. Redistributable under
--    # Apache-2.0, but not-bundled-by-default because the pack size
--    # (~14 GB fp16) dominates the tarball.
+     #
      # The 16 kHz pin + 30 s max-length match the training-time
      # defaults documented in the Qwen2-Audio card. Resampling support
      BaseModelSpec(
          key="qwen2-audio-7b-instruct",
          hf_id="Qwen/Qwen2-Audio-7B-Instruct",
--        # Placeholder SHA (format-valid, not a real commit). See the
++        revision="0a095220c30b7b31434169c3086508ef3ea5bf0a",
--        # paligemma entry for the self-healing workflow via
--        # `scripts/refresh-registry.py --check`.
--        revision="a1b2c3d4e5f678901234567890abcdef01234567",
          architecture="Qwen2AudioForConditionalGeneration",
          params=8_400_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
          tokenizer_pre="qwen2",
          license_spdx="Apache-2.0",
          license_url="https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct",
--        requires_acceptance=True,
++        requires_acceptance=False,
--        redistributable=False,
++        redistributable=True,
          size_gb_fp16=15.5,
          context_length=8_192,
          recommended_seq_len=2048,

src/dlm/base_models/resolver.pymodified


 
 TemplateDialect = Literal[
     "chatml",
+    "qwen3thinking",
     "gemma2",
     "smollm3",
     "olmo2",

src/dlm/base_models/schema.pymodified


     target_modules: list[str] = Field(..., min_length=1)
     template: Literal[
         "chatml",
+        "qwen3thinking",
         "gemma2",
         "smollm3",
         "olmo2",

src/dlm/base_models/templates/qwen3thinking.jinjaadded

++{#
++Qwen3 reasoning-profile reference template.
++
++Upstream keeps ChatML framing for request construction; the profile
++delta is the reasoning-first sampling/runtime behavior rather than a
++different turn wrapper.
++#}
++{%- for message in messages -%}
++<|im_start|>{{ message['role'] }}
++{{ message['content'] }}<|im_end|>
++{% endfor -%}
++{%- if add_generation_prompt -%}
++<|im_start|>assistant
++{%- endif -%}

src/dlm/export/ollama/template_registry.pymodified

  from dlm.export.ollama.errors import TemplateRegistryError
--Dialect = Literal["chatml", "gemma2", "smollm3", "olmo2", "llama3", "phi3", "phi4mini", "mistral"]
++Dialect = Literal[
++    "chatml",
++    "qwen3thinking",
++    "gemma2",
++    "smollm3",
++    "olmo2",
++    "llama3",
++    "phi3",
++    "phi4mini",
++    "mistral",
++]
  _TEMPLATES_DIR: Final[Path] = Path(__file__).resolve().parent / "templates"
          # synthesize a new turn instead of yielding.
          default_stops=("<|im_end|>", "<|endoftext|>", "<|im_start|>"),
      ),
++    "qwen3thinking": DialectTemplate(
++        dialect="qwen3thinking",
++        template_path=_TEMPLATES_DIR / "qwen3thinking.gotmpl",
++        # Qwen3's reasoning profile still uses ChatML turn framing, but
++        # the upstream defaults run slightly broader sampling than the
++        # legacy ChatML family.
++        default_stops=("<|im_end|>", "<|endoftext|>", "<|im_start|>"),
++        default_temperature=0.6,
++        default_top_p=0.95,
++    ),
      "gemma2": DialectTemplate(
          dialect="gemma2",
          template_path=_TEMPLATES_DIR / "gemma2.gotmpl",

src/dlm/export/ollama/templates/qwen3thinking.gotmpladded

++{{- if .System }}<|im_start|>system
++{{ .System }}<|im_end|>
++{{ end }}{{- range .Messages }}<|im_start|>{{ .Role }}
++{{ .Content }}<|im_end|>
++{{ end }}<|im_start|>assistant

tests/integration/base_models/test_13_entries_scaffold.pyadded

++"""Sprint 40 closeout mirror for the named 13-entry scaffold deliverable."""
++
++from __future__ import annotations
++
++from pathlib import Path
++
++import pytest
++from tests.integration.cli.test_registry_refresh_init import SPRINT40_INIT_CASES
++from typer.testing import CliRunner
++
++from dlm.cli.app import app
++from dlm.doc.parser import parse_file
++from dlm.doc.sections import SectionType
++
++
++@pytest.mark.parametrize(("base_key", "extra_flags", "expect_image_section"), SPRINT40_INIT_CASES)
++def test_init_scaffolds_for_all_thirteen_registry_refresh_entries(
++    tmp_path: Path,
++    base_key: str,
++    extra_flags: list[str],
++    expect_image_section: bool,
++) -> None:
++    runner = CliRunner()
++    home = tmp_path / "home"
++    doc = tmp_path / f"{base_key}.dlm"
++
++    result = runner.invoke(
++        app,
++        [
++            "--home",
++            str(home),
++            "init",
++            str(doc),
++            "--base",
++            base_key,
++            *extra_flags,
++        ],
++    )
++    assert result.exit_code == 0, result.output
++    parsed = parse_file(doc)
++    section_types = {section.type for section in parsed.sections}
++    if expect_image_section:
++        assert SectionType.IMAGE in section_types
++    else:
++        assert SectionType.IMAGE not in section_types

tests/integration/cli/test_registry_refresh_init.pymodified

--"""Scaffold coverage for the Sprint 40 registry refresh entries we ship.
++"""Scaffold coverage for every Sprint 40 registry-refresh entry."""
--
--This is intentionally scoped to entries that currently exist in the
--registry. Two rows in the original sprint draft still need upstream
--reality work (`qwen3-1.7b-thinking`, `internvl3-2b`), so this test
--guards the refresh surface we have actually landed rather than baking
--stale assumptions into CI.
--"""
  from __future__ import annotations
  from dlm.doc.parser import parse_file
  from dlm.doc.sections import SectionType
--
++SPRINT40_INIT_CASES: tuple[tuple[str, list[str], bool], ...] = (
--@pytest.mark.parametrize(
++    ("qwen3-1.7b", [], False),
--    ("base_key", "extra_flags", "expect_image_section"),
++    ("qwen3-1.7b-thinking", [], False),
--    [
++    ("qwen3-4b", [], False),
--        ("qwen3-1.7b", [], False),
++    ("qwen3-8b", [], False),
--        ("qwen3-4b", [], False),
++    ("llama-3.3-8b-instruct", ["--i-accept-license"], False),
--        ("qwen3-8b", [], False),
++    ("phi-4-mini-reasoning", [], False),
--        ("llama-3.3-8b-instruct", ["--i-accept-license"], False),
++    ("gemma-2-2b-it", ["--i-accept-license"], False),
--        ("phi-4-mini-reasoning", [], False),
++    ("gemma-2-9b-it", ["--i-accept-license"], False),
--        ("gemma-2-2b-it", ["--i-accept-license"], False),
++    ("mistral-small-3.1-24b-instruct", ["--multimodal"], True),
--        ("gemma-2-9b-it", ["--i-accept-license"], False),
++    ("smollm3-3b", [], False),
--        ("mistral-small-3.1-24b-instruct", ["--multimodal"], True),
++    ("olmo-2-7b-instruct", [], False),
--        ("smollm3-3b", [], False),
++    ("mixtral-8x7b-instruct", [], False),
--        ("olmo-2-7b-instruct", [], False),
++    ("internvl3-2b", ["--multimodal"], True),
--        ("mixtral-8x7b-instruct", [], False),
--    ],
+ )
--def test_init_scaffolds_for_landed_registry_refresh_entries(
++
++
++@pytest.mark.parametrize(("base_key", "extra_flags", "expect_image_section"), SPRINT40_INIT_CASES)
++def test_init_scaffolds_for_every_registry_refresh_entry(
      tmp_path: Path,
      base_key: str,
      extra_flags: list[str],

tests/integration/gate/test_mixtral_gate_smoke.pyadded

++"""Sprint 40 smoke proof that the Mixtral row still flows through Sprint 34 gate paths."""
++
++from __future__ import annotations
++
++from pathlib import Path
++from types import SimpleNamespace
++
++from dlm.base_models import BASE_MODELS
++from dlm.inference.gate import GateHandle, load_gate_handle
++from dlm.modality import modality_for
++from dlm.train.gate import GateTrainingSample, train_gate
++
++
++def _store(tmp_path: Path) -> SimpleNamespace:
++    return SimpleNamespace(root=tmp_path)
++
++
++def test_mixtral_text_moe_row_still_uses_text_gate_pipeline(tmp_path: Path) -> None:
++    import torch
++
++    spec = BASE_MODELS["mixtral-8x7b-instruct"]
++    dispatch = modality_for(spec)
++    assert spec.modality == "text-moe"
++    assert dispatch.accepts_images is False
++    assert dispatch.accepts_audio is False
++
++    store = _store(tmp_path)
++    samples: list[GateTrainingSample] = []
++    for _ in range(12):
++        samples.append(
++            GateTrainingSample(embedding=torch.ones(8) + 0.05 * torch.randn(8), adapter_name="a")
++        )
++        samples.append(
++            GateTrainingSample(embedding=-torch.ones(8) + 0.05 * torch.randn(8), adapter_name="b")
++        )
++
++    result = train_gate(
++        store,  # type: ignore[arg-type]
++        samples,
++        adapter_names=("a", "b"),
++        input_dim=8,
++        hidden_proj_dim=8,
++        steps=80,
++        lr=3e-3,
++        cold_start_floor=1,
++        batch_size=8,
++        seed=0,
++    )
++    assert result.mode == "trained"
++
++    handle = load_gate_handle(store)  # type: ignore[arg-type]
++    assert isinstance(handle, GateHandle)
++    assert handle.is_uniform is False
++    assert handle.metadata.adapter_names == ("a", "b")

tests/unit/base_models/test_audio_registry.pymodified

  - `qwen2-audio-7b-instruct` is present and has `modality="audio-language"`.
  - Its `AudioPreprocessorPlan` is pinned (16 kHz, 30 s, `<|AUDIO|>`, 750).
--- License is Apache-2.0 but the 7B weights are gated behind HF acceptance
++- License is Apache-2.0 and the current HF row is no longer gated, so
--  and flagged non-redistributable (pack tarball size).
++  the spec stays redistributable.
  - `modality="audio-language"` without a plan rejects at validate time;
    text bases cannot carry an audio plan; VL bases cannot carry an audio
    plan; audio bases cannot carry a VL plan.
          spec = BASE_MODELS["qwen2-audio-7b-instruct"]
          assert spec.vl_preprocessor_plan is None
--    def test_license_gated_not_redistributable(self) -> None:
++    def test_license_open_and_redistributable(self) -> None:
          spec = BASE_MODELS["qwen2-audio-7b-instruct"]
--        assert spec.requires_acceptance is True
++        assert spec.requires_acceptance is False
--        assert spec.redistributable is False
++        assert spec.redistributable is True
      def test_architecture_is_audio_conditional_generation(self) -> None:
          spec = BASE_MODELS["qwen2-audio-7b-instruct"]

tests/unit/base_models/test_registry.pymodified

              "qwen2.5-1.5b",
              "qwen2.5-coder-1.5b",
              "qwen3-1.7b",
++            "qwen3-1.7b-thinking",
              "qwen3-4b",
              "qwen3-8b",
              "mixtral-8x7b-instruct",
              "smollm3-3b",
              "olmo-2-7b-instruct",
++            "qwen2-audio-7b-instruct",
              "smollm2-135m",
              "smollm2-360m",
              "smollm2-1.7b",
              BASE_MODELS[k].size_gb_fp16 for k in ("qwen2.5-0.5b", "qwen2.5-1.5b", "qwen2.5-3b")
          ]
          assert qwen_sizes == sorted(qwen_sizes)
--        qwen3_sizes = [BASE_MODELS[k].size_gb_fp16 for k in ("qwen3-1.7b", "qwen3-4b", "qwen3-8b")]
++        qwen3_sizes = [
++            BASE_MODELS[k].size_gb_fp16
++            for k in ("qwen3-1.7b", "qwen3-1.7b-thinking", "qwen3-4b", "qwen3-8b")
++        ]
          assert qwen3_sizes == sorted(qwen3_sizes)
          smol_sizes = [
              BASE_MODELS[k].size_gb_fp16 for k in ("smollm2-135m", "smollm2-360m", "smollm2-1.7b")

tests/unit/base_models/test_registry_2026.pymodified

          assert spec.size_gb_fp16 == pytest.approx(16.0)
++class TestQwen3ThinkingRegistryEntry:
++    def test_entry_present(self) -> None:
++        assert "qwen3-1.7b-thinking" in BASE_MODELS
++
++    def test_reuses_live_qwen3_weights_with_reasoning_profile(self) -> None:
++        spec = BASE_MODELS["qwen3-1.7b-thinking"]
++        assert spec.hf_id == "Qwen/Qwen3-1.7B"
++        assert spec.architecture == "Qwen3ForCausalLM"
++        assert spec.template == "qwen3thinking"
++        assert spec.gguf_arch == "qwen3"
++        assert spec.tokenizer_pre == "qwen2"
++
++    def test_reasoning_profile_keeps_open_license_and_cooler_default(self) -> None:
++        spec = BASE_MODELS["qwen3-1.7b-thinking"]
++        assert spec.license_spdx == "Apache-2.0"
++        assert spec.requires_acceptance is False
++        assert spec.redistributable is True
++        assert spec.reasoning_tuned is True
++        assert spec.suggested_prompt_temperature == pytest.approx(0.6)
++
++
  class TestLlama33RegistryEntry:
      def test_entry_present(self) -> None:
          assert "llama-3.3-8b-instruct" in BASE_MODELS
          assert spec.recommended_seq_len == 2048
--class TestStaleSprintDraftRows:
++class TestInternVL3RegistryEntry:
--    def test_qwen3_thinking_is_not_a_separate_registry_row(self) -> None:
++    def test_entry_present(self) -> None:
--        """Upstream Qwen3-1.7B ships hybrid thinking in one model.
++        assert "internvl3-2b" in BASE_MODELS
--
++
--        Sprint 40's draft listed a separate `qwen3-1.7b-thinking`
++    def test_entry_keeps_remote_code_contract_explicit(self) -> None:
--        entry, but the live upstream contract exposes thinking mode as
++        spec = BASE_MODELS["internvl3-2b"]
--        a switch on `Qwen/Qwen3-1.7B` itself. Keep the registry honest:
++        assert spec.hf_id == "OpenGVLab/InternVL3-2B"
--        reasoning defaults belong on the real base row, not a fake key.
++        assert spec.architecture == "InternVLChatModel"
--        """
++        assert spec.template == "internvl2"
--        assert "qwen3-1.7b-thinking" not in BASE_MODELS
++        assert spec.trust_remote_code is True
--
++
--    def test_internvl3_not_shipped_until_remote_code_contract_is_pinned(self) -> None:
++    def test_entry_is_registry_visible_but_not_pretending_runtime_is_generic(self) -> None:
--        """Guard against copying the stale sprint draft into the registry.
++        spec = BASE_MODELS["internvl3-2b"]
--
++        assert spec.license_spdx == "Apache-2.0"
--        The live `OpenGVLab/InternVL3-2B` model card still documents
++        assert spec.requires_acceptance is False
--        `trust_remote_code=True`, and on the current stack the whole
++        assert spec.redistributable is True
--        InternVL family still exposes a tokenizer-only `AutoProcessor`
++        assert spec.modality == "vision-language"
--        rather than a complete image processor. Upstream also expands
++        assert spec.vl_preprocessor_plan is not None
--        `<image>` into repeated `<IMG_CONTEXT>` spans and threads
++        assert spec.vl_preprocessor_plan.resize_policy == "dynamic"
--        `image_flags` through the forward pass. Adding InternVL3 later
++        assert spec.vl_preprocessor_plan.image_token == "<image>"
--        is fine, but it needs an honest runtime contract instead of
--        assuming the old "cleaner than InternVL2" sprint note is still
--        true.
--        """
--        assert "internvl3-2b" not in BASE_MODELS

tests/unit/base_models/test_schema.pymodified

  class TestLiteralConstraints:
      @pytest.mark.parametrize(
          "template",
--        ["chatml", "gemma2", "smollm3", "olmo2", "llama3", "phi3", "phi4mini", "mistral"],
++        [
++            "chatml",
++            "qwen3thinking",
++            "gemma2",
++            "smollm3",
++            "olmo2",
++            "llama3",
++            "phi3",
++            "phi4mini",
++            "mistral",
++        ],
+     )
      def test_template_literals_accepted(self, template: str) -> None:
          spec = _minimal(template=template)

tests/unit/base_models/test_vl_registry.pymodified

      "paligemma-3b-mix-224",
      "qwen2-vl-2b-instruct",
      "internvl2-2b",
++    "internvl3-2b",
      "mistral-small-3.1-24b-instruct",
+ )
          assert spec.vl_preprocessor_plan is not None
          # Pinned identity fields — each one is part of the cache key,
          # so a silent default would silently invalidate caches.
--        assert spec.vl_preprocessor_plan.resize_policy == "fixed"
++        assert spec.vl_preprocessor_plan.resize_policy in {"fixed", "dynamic"}
          assert spec.vl_preprocessor_plan.num_image_tokens > 0
      @pytest.mark.parametrize("key", _VL_BASE_KEYS)
          assert BASE_MODELS["internvl2-2b"].template == "internvl2"
++class TestInternVL3RegistryEntry:
++    """Sprint 40 refresh: InternVL3 lands with an explicit runtime caveat."""
++
++    def test_entry_present(self) -> None:
++        assert "internvl3-2b" in BASE_MODELS
++
++    def test_apache_permissive(self) -> None:
++        spec = BASE_MODELS["internvl3-2b"]
++        assert spec.license_spdx == "Apache-2.0"
++        assert spec.requires_acceptance is False
++        assert spec.redistributable is True
++
++    def test_dynamic_preprocessing_plan_is_pinned(self) -> None:
++        spec = BASE_MODELS["internvl3-2b"]
++        plan = spec.vl_preprocessor_plan
++        assert plan is not None
++        assert plan.target_size == (448, 448)
++        assert plan.resize_policy == "dynamic"
++        assert plan.image_token == "<image>"
++        assert plan.num_image_tokens == 256
++
++    def test_architecture_and_template(self) -> None:
++        spec = BASE_MODELS["internvl3-2b"]
++        assert spec.architecture == "InternVLChatModel"
++        assert spec.template == "internvl2"
++        assert spec.trust_remote_code is True
++
++
  class TestDistinctVlBases:
      """The VL bases occupy distinct rows with no silent duplicates."""
      def test_all_keys_unique(self) -> None:
--        assert len(set(_VL_BASE_KEYS)) == 4
++        assert len(set(_VL_BASE_KEYS)) == 5
      def test_hf_ids_distinct(self) -> None:
          hf_ids = {BASE_MODELS[k].hf_id for k in _VL_BASE_KEYS}
--        assert len(hf_ids) == 4
++        assert len(hf_ids) == 5
      def test_image_tokens_distinct_per_base(self) -> None:
--        """Each VL base uses its native image-token string.
++        """VL rows pin their native placeholder tokens explicitly.
--        Silently sharing a placeholder across bases would break the
++        Some families legitimately reuse the same surface token
--        cache-key invariant in vl_cache.py (cache key includes the
++        (`<image>`), so this checks the concrete set rather than
--        token via processor_sha256).
++        forcing uniqueness for uniqueness' sake.
          """
          tokens = {
              BASE_MODELS[k].vl_preprocessor_plan.image_token  # type: ignore[union-attr]
              for k in _VL_BASE_KEYS
+         }
--        assert len(tokens) == 4
++        assert tokens == {"<image>", "<|image_pad|>", "<IMG_CONTEXT>", "[IMG]"}
  class TestCountVlRegistryEntries:
      def test_at_least_four_vl_bases_registered(self) -> None:
          vl_count = sum(1 for s in BASE_MODELS.values() if s.modality == "vision-language")
--        assert vl_count >= 4
++        assert vl_count >= 5
  class TestTrustRemoteCodeOptIn:
          is defined in the model repo, not in transformers."""
          assert BASE_MODELS["internvl2-2b"].trust_remote_code is True
++    def test_internvl3_opts_in(self) -> None:
++        assert BASE_MODELS["internvl3-2b"].trust_remote_code is True
++
      def test_text_bases_default_false(self) -> None:
          """None of the text bases opt into trust_remote_code."""
          for key, spec in BASE_MODELS.items():

tests/unit/doc/test_v12_migrator.pyadded

++"""Named Sprint 40 closeout checks for the v12 → v13 migrator."""
++
++from __future__ import annotations
++
++from typing import Any
++
++from dlm.doc.migrations.v12 import migrate
++from dlm.doc.schema import DlmFrontmatter
++
++_VALID_ULID = "01HZ4X7TGZM3J1A2B3C4D5E6F7"
++
++
++def test_v12_migrator_is_identity_for_existing_frontmatter() -> None:
++    raw: dict[str, Any] = {
++        "dlm_id": _VALID_ULID,
++        "base_model": "smollm2-135m",
++        "dlm_version": 12,
++        "training": {"audio": {"auto_resample": True}, "lora_r": 16},
++    }
++    out = migrate(raw)
++    assert out == raw
++    assert out is not raw
++
++
++def test_v12_migrator_output_validates_as_v13() -> None:
++    raw: dict[str, Any] = {
++        "dlm_id": _VALID_ULID,
++        "base_model": "smollm2-135m",
++        "dlm_version": 12,
++        "training": {"audio": {"auto_resample": True}},
++    }
++    out = migrate(raw)
++    out["dlm_version"] = 13
++    fm = DlmFrontmatter.model_validate(out)
++    assert fm.dlm_version == 13
++    assert fm.training.audio.auto_resample is True

tests/unit/export/ollama/test_template_registry.pymodified

  class TestRegistryCoverage:
--    def test_all_eight_dialects_registered(self) -> None:
++    def test_all_nine_dialects_registered(self) -> None:
          assert set(registered_dialects()) == {
              "chatml",
++            "qwen3thinking",
              "gemma2",
              "smollm3",
              "olmo2",
      @pytest.mark.parametrize(
          "dialect",
--        ["chatml", "gemma2", "smollm3", "olmo2", "llama3", "phi3", "phi4mini", "mistral"],
++        [
++            "chatml",
++            "qwen3thinking",
++            "gemma2",
++            "smollm3",
++            "olmo2",
++            "llama3",
++            "phi3",
++            "phi4mini",
++            "mistral",
++        ],
+     )
      def test_each_template_file_exists(self, dialect: str) -> None:
          row = get_template(dialect)
      @pytest.mark.parametrize(
          "dialect",
--        ["chatml", "gemma2", "smollm3", "olmo2", "llama3", "phi3", "phi4mini", "mistral"],
++        [
++            "chatml",
++            "qwen3thinking",
++            "gemma2",
++            "smollm3",
++            "olmo2",
++            "llama3",
++            "phi3",
++            "phi4mini",
++            "mistral",
++        ],
+     )
      def test_each_has_default_stops(self, dialect: str) -> None:
          row = get_template(dialect)
          ("dialect", "required"),
+         [
              ("chatml", {"<|im_end|>", "<|im_start|>"}),
++            ("qwen3thinking", {"<|im_end|>", "<|im_start|>"}),
              ("gemma2", {"<end_of_turn>", "<start_of_turn>"}),
              ("smollm3", {"<|im_end|>", "<|im_start|>"}),
              ("olmo2", {"<|endoftext|>", "<|user|>", "<|assistant|>"}),
          assert "<end_of_turn>" in text
          assert "model" in text
++    def test_qwen3thinking_keeps_chatml_markers_with_reasoning_defaults(self) -> None:
++        text = load_template_text("qwen3thinking")
++        assert "<|im_start|>" in text
++        assert "<|im_end|>" in text
++        row = get_template("qwen3thinking")
++        assert row.default_temperature == pytest.approx(0.6)
++        assert row.default_top_p == pytest.approx(0.95)
++
      def test_smollm3_has_reasoning_system_prompt(self) -> None:
          text = load_template_text("smollm3")
          assert "<|im_start|>system" in text

tests/unit/export/test_mixtral_template.pyadded

++"""Sprint 40 closeout checks for the Mixtral template row."""
++
++from __future__ import annotations
++
++import json
++from pathlib import Path
++
++from dlm.base_models import BASE_MODELS
++from dlm.export.ollama.modelfile import ModelfileContext, render_modelfile
++from dlm.export.plan import ExportPlan
++
++
++def _adapter_dir(tmp_path: Path) -> Path:
++    adapter = tmp_path / "adapter"
++    adapter.mkdir()
++    (adapter / "tokenizer_config.json").write_text(
++        json.dumps({"eos_token": "</s>", "added_tokens_decoder": {}}),
++        encoding="utf-8",
++    )
++    return adapter
++
++
++def test_mixtral_registry_row_renders_through_mistral_template(tmp_path: Path) -> None:
++    spec = BASE_MODELS["mixtral-8x7b-instruct"]
++    text = render_modelfile(
++        ModelfileContext(
++            spec=spec,
++            plan=ExportPlan(quant="Q4_K_M", merged=False),
++            adapter_dir=_adapter_dir(tmp_path),
++            base_gguf_name="base.gguf",
++            adapter_gguf_name="adapter.gguf",
++            dlm_id="01TEST",
++            adapter_version=1,
++        )
++    )
++    assert spec.modality == "text-moe"
++    assert "[INST]" in text
++    assert 'PARAMETER stop "[INST]"' in text

tests/unit/export/test_phi4_template.pyadded

++"""Sprint 40 closeout checks for the Phi-4 reasoning template row."""
++
++from __future__ import annotations
++
++import json
++from pathlib import Path
++
++from dlm.base_models import BASE_MODELS
++from dlm.export.ollama.modelfile import ModelfileContext, render_modelfile
++from dlm.export.plan import ExportPlan
++
++
++def _adapter_dir(tmp_path: Path) -> Path:
++    adapter = tmp_path / "adapter"
++    adapter.mkdir()
++    (adapter / "tokenizer_config.json").write_text(
++        json.dumps({"eos_token": "<|end|>", "added_tokens_decoder": {}}),
++        encoding="utf-8",
++    )
++    return adapter
++
++
++def test_phi4_reasoning_template_keeps_phi_system_preamble_and_stops(tmp_path: Path) -> None:
++    text = render_modelfile(
++        ModelfileContext(
++            spec=BASE_MODELS["phi-4-mini-reasoning"],
++            plan=ExportPlan(quant="Q4_K_M", merged=False),
++            adapter_dir=_adapter_dir(tmp_path),
++            base_gguf_name="base.gguf",
++            adapter_gguf_name="adapter.gguf",
++            dlm_id="01TEST",
++            adapter_version=1,
++        )
++    )
++    assert "Your name is Phi, an AI math expert developed by Microsoft." in text
++    assert 'PARAMETER stop "<|assistant|>"' in text
++    assert "PARAMETER temperature 0.6" in text

tests/unit/export/test_qwen3_template.pyadded

++"""Sprint 40 closeout checks for the Qwen3 reasoning-template row."""
++
++from __future__ import annotations
++
++import json
++from pathlib import Path
++
++from dlm.base_models import BASE_MODELS
++from dlm.export.ollama.modelfile import ModelfileContext, render_modelfile
++from dlm.export.plan import ExportPlan
++
++
++def _adapter_dir(tmp_path: Path) -> Path:
++    adapter = tmp_path / "adapter"
++    adapter.mkdir()
++    (adapter / "tokenizer_config.json").write_text(
++        json.dumps({"eos_token": "<|im_end|>", "added_tokens_decoder": {}}),
++        encoding="utf-8",
++    )
++    return adapter
++
++
++def test_qwen3_thinking_row_uses_distinct_reasoning_template_defaults(tmp_path: Path) -> None:
++    text = render_modelfile(
++        ModelfileContext(
++            spec=BASE_MODELS["qwen3-1.7b-thinking"],
++            plan=ExportPlan(quant="Q4_K_M", merged=False),
++            adapter_dir=_adapter_dir(tmp_path),
++            base_gguf_name="base.gguf",
++            adapter_gguf_name="adapter.gguf",
++            dlm_id="01TEST",
++            adapter_version=1,
++        )
++    )
++    assert "PARAMETER temperature 0.6" in text
++    assert "PARAMETER top_p 0.95" in text
++    assert "<|im_start|>assistant" in text

`@@ -101,6 +101,7 @@` class BaseModelSpec(BaseModel):
101	target_modules: list[str] = Field(..., min_length=1)	101	target_modules: list[str] = Field(..., min_length=1)
102	template: Literal[	102	template: Literal[
103	"chatml",	103	"chatml",
		104	+ "qwen3thinking",
104	"gemma2",	105	"gemma2",
105	"smollm3",	106	"smollm3",
106	"olmo2",	107	"olmo2",

tenseleyflow/documentlanguagemodel / `29a900f`

27 changed files

`@@ -30,6 +30,7 @@` from dlm.base_models.schema import BaseModelSpec
30		30
31	TemplateDialect = Literal[	31	TemplateDialect = Literal[
32	"chatml",	32	"chatml",
		33	+ "qwen3thinking",
33	"gemma2",	34	"gemma2",
34	"smollm3",	35	"smollm3",
35	"olmo2",	36	"olmo2",