`29a900f`

Refresh Sprint 40 closeout proofs

Authored by

espadonne 2 weeks ago

SHA: 29a900f3fb6e9f24b4830d7c0a9479b4f76a7a3d
Parents: fdc4063
Tree: 491766b

27 changed files

Status	File	+	-
M	`docs/cookbook/audio-training.md`	1	4
A	`docs/cookbook/choosing-a-base.md`	37	0
M	`docs/cookbook/multimodal-training.md`	11	5
M	`docs/format/frontmatter.md`	5	5
A	`docs/hardware/memory-estimates.md`	38	0
M	`docs/hardware/vl-memory.md`	23	16
M	`docs/index.md`	5	4
M	`mkdocs.yml`	3	0
M	`src/dlm/base_models/registry.py`	63	59
M	`src/dlm/base_models/resolver.py`	1	0
M	`src/dlm/base_models/schema.py`	1	0
A	`src/dlm/base_models/templates/qwen3thinking.jinja`	14	0
M	`src/dlm/export/ollama/template_registry.py`	21	1
A	`src/dlm/export/ollama/templates/qwen3thinking.gotmpl`	5	0
A	`tests/integration/base_models/test_13_entries_scaffold.py`	45	0
M	`tests/integration/cli/test_registry_refresh_init.py`	19	25
A	`tests/integration/gate/test_mixtral_gate_smoke.py`	54	0
M	`tests/unit/base_models/test_audio_registry.py`	5	5
M	`tests/unit/base_models/test_registry.py`	6	1
M	`tests/unit/base_models/test_registry_2026.py`	41	25
M	`tests/unit/base_models/test_schema.py`	11	1
M	`tests/unit/base_models/test_vl_registry.py`	41	9
A	`tests/unit/doc/test_v12_migrator.py`	36	0
M	`tests/unit/export/ollama/test_template_registry.py`	33	3
A	`tests/unit/export/test_mixtral_template.py`	38	0
A	`tests/unit/export/test_phi4_template.py`	37	0
A	`tests/unit/export/test_qwen3_template.py`	37	0

docs/cookbook/audio-training.mdmodified

 GB VRAM. Qwen2-Audio-7B-Instruct fp16 weighs ~15 GB; the 16 GB
    consumer GPUs don't fit this base without quantization (4-bit audio
    training is deferred).
 -- A Hugging Face account with the [Qwen2-Audio-7B-Instruct terms
 -  accepted](https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct) and
 -  `HF_TOKEN` exported.
  - Qwen2-Audio cached locally (`huggingface-cli download
    Qwen/Qwen2-Audio-7B-Instruct`). First train without this triggers
    the download automatically.
  ## Step 1 — Scaffold an audio `.dlm`
  ```bash
 -dlm init my-audio.dlm --audio --i-accept-license
 +dlm init my-audio.dlm --audio
  ```
  `--audio` pins the base to `qwen2-audio-7b-instruct` and emits a

docs/cookbook/choosing-a-base.mdadded

 +# Choosing a base
++
 +The fastest way to pick a DLM base is to decide three things first:
++
 +1. Do you need plain text, multimodal vision, or audio?
 +2. Do you want the most permissive license possible, or are gated rows fine?
 +3. Are you targeting Apple Silicon, a mid-size CUDA card, or a large CUDA box?
++
 +## Quick picks
++
 +| If you want… | Start with… | Why |
 +|---|---|---|
 +| Fast local iteration on almost any laptop | `smollm2-135m` | Tiny, cheap, and ideal for testing authoring loops. |
 +| Best general-purpose 2026 text base around the 4B tier | `qwen3-4b` | Strong default quality, permissive license, and current-generation tokenizer/chat behavior. |
 +| A reasoning-first 1.7B profile | `qwen3-1.7b-thinking` | Same upstream Qwen3 weights, but a curated reasoning-profile key with cooler defaults. |
 +| Fully open-model story | `olmo-2-7b-instruct` | Open weights and open-data lineage make it the cleanest reproducibility pitch. |
 +| Apache sparse-MoE experiments | `mixtral-8x7b-instruct` | First `text-moe` row in the registry; pairs with the learned gate work. |
 +| Small gated text base | `gemma-2-2b-it` | Useful when Gemma’s instruction style or ecosystem matters more than license friction. |
 +| Larger gated text base | `gemma-2-9b-it` | Upper-tier Gemma pick; large enough to want real GPU planning. |
 +| Large multimodal capability | `mistral-small-3.1-24b-instruct` | Strongest shipped VL row, but large-CUDA-first. |
 +| Safe default multimodal row on a smaller box | `qwen2-vl-2b-instruct` | Permissive, solid, and compatible with the current generic VL runtime. |
 +| Audio-language training | `qwen2-audio-7b-instruct` | Current shipped audio row; open-license and no longer gated on HF. |
++
 +## Notes on the sharp edges
++
 +- `llama-3.3-8b-instruct` is still treated like the Llama family in DLM’s policy surface: acceptance required, not redistributable, and intended for users who already know they want the Llama line.
 +- `internvl2-2b` and `internvl3-2b` are registry-visible planning targets, but the current generic VL runtime still refuses the InternVL family until DLM owns its custom processor/collator contract.
 +- `mistral-small-3.1-24b-instruct` is intentionally refused on MPS by default. It is a real shipped row, just not a casual laptop target.
++
 +## Hardware-first view
++
 +- Apple Silicon, 16 GB: `smollm2-*`, `qwen2.5-*`, `qwen3-1.7b`, and `qwen3-4b` are the comfortable text picks; `qwen2-vl-2b-instruct` is the safer VL row.
 +- Apple Silicon, 32 GB+: `qwen3-8b`, `gemma-2-2b-it`, and `phi-4-mini-reasoning` become practical. Large VL rows still need caution.
 +- CUDA, 24 GB: this is where `gemma-2-9b-it`, `mixtral-8x7b-instruct`, and the heavier multimodal rows start becoming realistic.
 +- CUDA, 48 GB+: this is the intended home for `mistral-small-3.1-24b-instruct`.
++
 +See [hardware/memory-estimates](../hardware/memory-estimates.md) for the text-family budget table and [hardware/vl-memory](../hardware/vl-memory.md) for the VL rows.

docs/cookbook/multimodal-training.mdmodified

  ### Picking a different VL base
 -Four VL bases ship in the registry today:
 +Five VL bases ship in the registry today:
  ```bash
  # Permissive + Apache-2.0 + strong general-purpose VL (pinned 672²):
  # MIT-licensed, smallest per-image footprint (448²):
  dlm init my-diagrams.dlm --multimodal --base internvl2-2b
 +# Newer InternVL planning row (dynamic 448-tiling, still runtime-deferred):
 +dlm init my-diagrams.dlm --multimodal --base internvl3-2b
++
  # Largest-capability VL row, CUDA-first (pinned 1540²):
  dlm init my-diagrams.dlm --multimodal --base mistral-small-3.1-24b-instruct
  the registry, but on the current stack DLM now refuses it for actual
  prompt/train/HF-snapshot-export work. The upstream family still needs a
  custom processor/collator path for its tokenizer-only `AutoProcessor`,
 -`<image>` expansion, and `image_flags` forward contract. That same
 -family gap is the reason `internvl3-2b` has not been added yet.
 +`<image>` expansion, and `image_flags` forward contract. The same
 +family gap applies to `internvl3-2b` as well: it is now registry-
 +visible and scaffoldable, but the generic runtime still refuses the
 +whole InternVL family until DLM owns that custom contract.
  **Heads-up on Mistral Small 3.1**: it is a real VL registry row now,
  but it is intentionally treated as a large-CUDA-first base. `dlm
  doctor` refuses it on Apple Silicon by default unless you explicitly
    None of the registered bases hit this verdict at the pinned tag.
  - **UNSUPPORTED** — llama.cpp doesn't know the arch at all. Falls
    back to HF-snapshot with an actionable banner naming the arch
 -  class and the vendored tag. **paligemma-3b-mix-224** and
 -  **internvl2-2b** are UNSUPPORTED at the pinned tag.
 +  class and the vendored tag. **paligemma-3b-mix-224**,
 +  **internvl2-2b**, and **internvl3-2b** are UNSUPPORTED at the
 +  pinned tag.
  See [docs/hardware/vl-memory.md](../hardware/vl-memory.md#llamacpp-gguf-support-matrix-sprint-354)
  for the current support verdicts; bump the vendored tag with

docs/format/frontmatter.mdmodified

  The shipped registry is broader than this quick-start table. Current
  additions include:
 -- 2026 text-family refresh rows: `qwen3-1.7b`, `qwen3-4b`, `qwen3-8b`,
 -  `llama-3.3-8b-instruct`, `phi-4-mini-reasoning`, `gemma-2-2b-it`,
 -  `gemma-2-9b-it`, `smollm3-3b`, `olmo-2-7b-instruct`, and
 -  `mixtral-8x7b-instruct`.
 +- 2026 text-family refresh rows: `qwen3-1.7b`, `qwen3-1.7b-thinking`,
 +  `qwen3-4b`, `qwen3-8b`, `llama-3.3-8b-instruct`,
 +  `phi-4-mini-reasoning`, `gemma-2-2b-it`, `gemma-2-9b-it`,
 +  `smollm3-3b`, `olmo-2-7b-instruct`, and `mixtral-8x7b-instruct`.
  - Vision-language rows: `paligemma-3b-mix-224`,
 -  `qwen2-vl-2b-instruct`, `internvl2-2b`, and
 +  `qwen2-vl-2b-instruct`, `internvl2-2b`, `internvl3-2b`, and
    `mistral-small-3.1-24b-instruct`.
  - Audio-language row: `qwen2-audio-7b-instruct`.

docs/hardware/memory-estimates.mdadded

 +# Memory estimates
++
 +These are planning numbers, not a promise. DLM’s doctor still does the
 +real refusal/fit decision, but the table below is the quick mental map
 +for the Sprint 40 refresh rows that changed the most user expectations.
++
 +## Text-family checkpoints
++
 +| Base | fp16 weights | Practical target |
 +|---|---:|---|
 +| `qwen3-8b` | ~16 GB | 24 GB CUDA or high-memory Apple Silicon for LoRA; lighter inference on smaller boxes. |
 +| `llama-3.3-8b-instruct` | ~16.5 GB | Same class as other 8B text rows: real GPU planning required for training. |
 +| `gemma-2-9b-it` | ~18 GB | 24 GB CUDA is the comfortable floor. |
 +| `mistral-small-3.1-24b-instruct` | ~48 GB | Large-CUDA-first. Refused on MPS by default unless forced. |
++
 +## What the doctor is approximating
++
 +For LoRA/QLoRA, the planner estimates:
++
 +- base weights at the chosen load precision
 +- activation memory from `sequence_len × micro_batch × layers`
 +- optimizer state for the trainable adapter params
 +- LoRA parameter storage
 +- a 20% safety margin on top
++
 +That estimator lives in `src/dlm/hardware/memory.py` and is intentionally conservative.
++
 +## Rules of thumb
++
 +- 8B-class rows are where laptop experimentation starts turning into real hardware planning.
 +- 9B-class rows are usually fine on 24 GB CUDA, but not “casual” on smaller hosts.
 +- 24B-class rows are not broad consumer defaults. In DLM they are treated as explicit high-capacity picks.
 +- MPS can be surprisingly good for text LoRA, but DLM now refuses oversized bases like `mistral-small-3.1-24b-instruct` by default because unified memory headroom disappears too quickly.
++
 +## Related
++
 +- [Choosing a base](../cookbook/choosing-a-base.md)
 +- [Vision-language memory budget](vl-memory.md)

docs/hardware/vl-memory.mdmodified

  # Vision-language memory budget
 -Four VL rows now ship in the registry: **PaliGemma-3B-mix-224**,
 -**Qwen2-VL-2B-Instruct**, **InternVL2-2B**, and
 +Five VL rows now ship in the registry: **PaliGemma-3B-mix-224**,
 +**Qwen2-VL-2B-Instruct**, **InternVL2-2B**, **InternVL3-2B**, and
  **Mistral-Small-3.1-24B-Instruct-2503**. Each row carries a pinned
  preprocessing plan; dynamic-resolution support (Qwen2-VL's native
  capability, Mistral Small 3.1's longer-edge policy, and the broader
  **Reality check.** The generic VL train/prompt path is complete today
  for PaliGemma, Qwen2-VL, and Mistral Small 3.1. InternVL2 remains
 -registry-visible for planning and future support, but on the current
 -transformers stack its HF path still exposes a tokenizer-only
 -`AutoProcessor` and needs a custom collator/runtime contract. DLM now
 +registry-visible for planning and future support, and InternVL3 now
 +joins it under the same honest caveat: on the current transformers
 +stack the InternVL family still exposes a tokenizer-only
 +`AutoProcessor` and needs a custom collator/runtime contract. DLM
  refuses that family with a clear error instead of pretending the
  generic VL path is enough.
  | paligemma-3b-mix-224      | Gemma (gated) | The cleanest PEFT path + proven chart/doc QA; accept the Gemma license first. |
  | qwen2-vl-2b-instruct      | Apache-2.0 | Permissive licensing + strong general-purpose VL; dynamic-res is capped to 672² in v1 but native runtime supports more. |
  | internvl2-2b              | MIT        | Registry-visible planning target for a future custom InternVL path; current train/prompt/export-snapshot flows refuse it on this stack. |
 +| internvl3-2b              | Apache-2.0 | Newer InternVL planning target with dynamic 448-tiling and `trust_remote_code`; currently registry-visible but still refused by the generic runtime. |
  | mistral-small-3.1-24b-instruct | Apache-2.0 | Highest-capability VL row in the registry today; targets large CUDA boxes first and is refused on MPS by default unless you explicitly force it. |
  ## PaliGemma-3B-mix-224 (224×224, fp16)
  trims ~30% of peak; `training.gradient_checkpointing: true` in
  frontmatter enables it.
 -## InternVL2-2B (448×448, fp16)
 +## InternVL2-2B / InternVL3-2B (448×448, fp16)
  InternVL2 uses ViT-L/14 + pixel-shuffle 2×2 so 448² input yields 256
  image tokens per 448-tile — the smallest InternVL-family budget and
 -the cheapest of the four rows on paper.
 +the cheapest of the registry rows on paper. InternVL3 keeps the same
 +448 target size but switches the registry row to `resize_policy:
 +dynamic` and a user-visible `<image>` placeholder while still
 +expanding into the same hidden InternVL context window at runtime.
  | Config          | Base weights | Adapter | Activations | Total (peak) |
  |-----------------|-------------:|--------:|------------:|-------------:|
  memory alone. 12 GB CUDA would handle batch=1; 16 GB CUDA would handle
  batch=4.
 -**Current runtime status.** This row is not trainable/promptable via
 -the generic VL path today. InternVL2 ships as `InternVLChatModel`, a
 -custom remote-code family whose upstream runtime expands `<image>` into
 -repeated `<IMG_CONTEXT>` spans and threads `image_flags` through the
 -forward pass. On the current stack, `AutoProcessor.from_pretrained(...)`
 -resolves to a tokenizer-only object, so DLM refuses the family early
 -instead of failing later inside the model. Keep the budget numbers here
 -for planning, but use PaliGemma, Qwen2-VL, or Mistral Small 3.1 for
 -actual runs today.
 +**Current runtime status.** These rows are not trainable/promptable via
 +the generic VL path today. InternVL2 and InternVL3 both ship as
 +`InternVLChatModel`, a custom remote-code family whose upstream runtime
 +expands `<image>` into repeated `<IMG_CONTEXT>` spans and threads
 +`image_flags` through the forward pass. On the current stack,
 +`AutoProcessor.from_pretrained(...)` resolves to a tokenizer-only
 +object, so DLM refuses the family early instead of failing later inside
 +the model. Keep the budget numbers here for planning, but use
 +PaliGemma, Qwen2-VL, or Mistral Small 3.1 for actual runs today.
  ## Mistral Small 3.1 24B Instruct (pinned 1540×1540, fp16)
  | paligemma-3b-mix-224      | PaliGemmaForConditionalGeneration   | UNSUPPORTED  |
  | qwen2-vl-2b-instruct      | Qwen2VLForConditionalGeneration     | SUPPORTED    |
  | internvl2-2b              | InternVLChatModel                   | UNSUPPORTED  |
 +| internvl3-2b              | InternVLChatModel                   | UNSUPPORTED  |
  **UNSUPPORTED** means `dlm export` falls back to the HF-snapshot path
  with an actionable banner. **SUPPORTED** means single-file VL GGUF
  |---------------------------|------------:|----------------:|
  | paligemma-3b-mix-224      |     224×224 |        ~0.5 MB  |
  | internvl2-2b              |     448×448 |        ~2.0 MB  |
 +| internvl3-2b              |     448×448 |        ~2.0 MB  |
  | qwen2-vl-2b-instruct      |     672×672 |        ~4.5 MB  |
  | mistral-small-3.1-24b-instruct | 1540×1540 |       ~23.5 MB  |

docs/index.mdmodified

    control is both the prose you're training on and the configuration
    for how the training runs. Edit, retrain, share.
  - **Real pretrained bases.** SmolLM2-135M for fast iteration; newer
 -  registry rows like Qwen3, Llama 3.3, Gemma 2, SmolLM3, Phi-4-mini-
 -  reasoning, OLMo-2, Mixtral, and Mistral Small 3.1 cover current
 -  text, sparse-MoE, and multimodal use cases. No from-scratch
 -  transformers, no toy experiments.
 +  registry rows like Qwen3 (including a reasoning-profile key),
 +  Llama 3.3, Gemma 2, SmolLM3, Phi-4-mini-reasoning, OLMo-2, Mixtral,
 +  Mistral Small 3.1, and InternVL3 cover current text, sparse-MoE,
 +  and multimodal planning use cases. No from-scratch transformers,
 +  no toy experiments.
  - **Deterministic by contract.** Same document + same hardware tier +
    pinned versions produce bit-identical adapters. [Determinism](determinism.md)
    is a first-class feature.

mkdocs.ymlmodified

        - .dlm/ignore: format/dlm-ignore.md
    - CLI reference: cli/reference.md
    - Cookbook:
 +      - Choosing a base: cookbook/choosing-a-base.md
        - Coding tutor: cookbook/coding-tutor.md
        - Domain knowledge base: cookbook/domain-kb.md
        - Writing partner: cookbook/writing-partner.md
    - Architecture: architecture.md
    - Determinism: determinism.md
    - Hardware:
 +      - Memory estimates: hardware/memory-estimates.md
 +      - Vision-language memory: hardware/vl-memory.md
        - AMD ROCm: hardware/rocm.md
    - Troubleshooting: troubleshooting.md

src/dlm/base_models/registry.pymodified

      BaseModelSpec(
          key="qwen3-1.7b",
          hf_id="Qwen/Qwen3-1.7B",
 -        # Placeholder SHA: format-valid, not a real HF commit. The
 -        # weekly `scripts/refresh-registry.py --check` run surfaces
 -        # drift and prints the live value for manual review.
 -        revision="1a2b3c4d5e6f7890abcdeffedcba0987654321ab",
 +        revision="70d244cc86ccca08cf5af4e1e306ecf908b1ad5e",
          architecture="Qwen3ForCausalLM",
          params=1_700_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
          recommended_seq_len=2048,
          reasoning_tuned=True,
      ),
 +    BaseModelSpec(
 +        key="qwen3-1.7b-thinking",
 +        hf_id="Qwen/Qwen3-1.7B",
 +        revision="70d244cc86ccca08cf5af4e1e306ecf908b1ad5e",
 +        architecture="Qwen3ForCausalLM",
 +        params=1_700_000_000,
 +        target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
 +        template="qwen3thinking",
 +        gguf_arch="qwen3",
 +        tokenizer_pre="qwen2",
 +        license_spdx="Apache-2.0",
 +        license_url="https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/LICENSE",
 +        requires_acceptance=False,
 +        redistributable=True,
 +        size_gb_fp16=3.4,
 +        context_length=32_768,
 +        recommended_seq_len=2048,
 +        reasoning_tuned=True,
 +    ),
      BaseModelSpec(
          key="qwen3-4b",
          hf_id="Qwen/Qwen3-4B",
 -        revision="2b3c4d5e6f7890abcdeffedcba0987654321abc2",
 +        revision="1cfa9a7208912126459214e8b04321603b3df60c",
          architecture="Qwen3ForCausalLM",
          params=4_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="qwen3-8b",
          hf_id="Qwen/Qwen3-8B",
 -        revision="3c4d5e6f7890abcdeffedcba0987654321abc2d3",
 +        revision="b968826d9c46dd6066d109eabc6255188de91218",
          architecture="Qwen3ForCausalLM",
          params=8_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="smollm3-3b",
          hf_id="HuggingFaceTB/SmolLM3-3B",
 -        # Placeholder SHA: format-valid, not a real HF commit. The
 -        # weekly `scripts/refresh-registry.py --check` run surfaces
 -        # drift and prints the live value for manual review.
 -        revision="5e6f7890abcdeffedcba0987654321abc2d3e4f5",
 +        revision="a07cc9a04f16550a088caea529712d1d335b0ac1",
          architecture="SmolLM3ForCausalLM",
          params=3_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="olmo-2-7b-instruct",
          hf_id="allenai/OLMo-2-1124-7B-Instruct",
 -        # Placeholder SHA: format-valid, not a real HF commit. The
 -        # weekly `scripts/refresh-registry.py --check` run surfaces
 -        # drift and prints the live value for manual review.
 -        revision="6f7890abcdeffedcba0987654321abc2d3e4f5a6",
 +        revision="470b1fba1ae01581f270116362ee4aa1b97f4c84",
          architecture="Olmo2ForCausalLM",
          params=7_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="gemma-2-2b-it",
          hf_id="google/gemma-2-2b-it",
 -        # Placeholder SHA: format-valid, not a real HF commit. The
 -        # weekly `scripts/refresh-registry.py --check` run surfaces
 -        # drift and prints the live value for manual review.
 -        revision="7a890abcdeffedcba0987654321abc2d3e4f5a6b",
 +        revision="299a8560bedf22ed1c72a8a11e7dce4a7f9f51f8",
          architecture="Gemma2ForCausalLM",
          params=2_600_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="gemma-2-9b-it",
          hf_id="google/gemma-2-9b-it",
 -        # Placeholder SHA: format-valid, not a real HF commit. The
 -        # weekly `scripts/refresh-registry.py --check` run surfaces
 -        # drift and prints the live value for manual review.
 -        revision="8f90abcdeffedcba0987654321abc2d3e4f5a6b7",
 +        revision="11c9b309abf73637e4b6f9a3fa1e92e615547819",
          architecture="Gemma2ForCausalLM",
          params=9_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="phi-4-mini-reasoning",
          hf_id="microsoft/Phi-4-mini-reasoning",
 -        # Placeholder SHA: format-valid, not a real HF commit. The
 -        # weekly `scripts/refresh-registry.py --check` run surfaces
 -        # drift and prints the live value for manual review.
 -        revision="9a0bcdeffedcba0987654321abc2d3e4f5a6b7c8",
 +        revision="0e3b1e2d02ee478a3743abe3f629e9c0cb722e0a",
          architecture="Phi3ForCausalLM",
          params=3_800_000_000,
          target_modules=["qkv_proj", "o_proj", "gate_up_proj", "down_proj"],
      BaseModelSpec(
          key="mixtral-8x7b-instruct",
          hf_id="mistralai/Mixtral-8x7B-Instruct-v0.1",
 -        # Placeholder SHA: format-valid, not a real HF commit. The
 -        # weekly `scripts/refresh-registry.py --check` run surfaces
 -        # drift and prints the live value for manual review.
 -        revision="bc0deffedcba0987654321abc2d3e4f5a6b7c8d9",
 +        revision="eba92302a2861cdc0098cc54bc9f17cb2c47eb61",
          architecture="MixtralForCausalLM",
          params=46_700_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="mistral-small-3.1-24b-instruct",
          hf_id="mistralai/Mistral-Small-3.1-24B-Instruct-2503",
 -        # Placeholder SHA: format-valid, not a real HF commit. The
 -        # weekly `scripts/refresh-registry.py --check` run surfaces
 -        # drift and prints the live value for manual review.
 -        revision="ab0cdeffedcba0987654321abc2d3e4f5a6b7c8d",
 +        revision="68faf511d618ef198fef186659617cfd2eb8e33a",
          architecture="Mistral3ForConditionalGeneration",
          params=24_000_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="paligemma-3b-mix-224",
          hf_id="google/paligemma-3b-mix-224",
 -        # Placeholder SHA: format-valid, not a real HF commit. The
 -        # weekly `scripts/refresh-registry.py --check` run surfaces
 -        # it as drift; a maintainer pastes in the observed SHA from
 -        # the script's output. Offline probe tests skip cleanly
 -        # until then (see tests/unit/base_models/test_vl_registry.py).
 -        # To verify, run:
 -        #     uv run python scripts/refresh-registry.py --check
 -        revision="8d2f7bc9c15d71a00c14f9eb7e4c7b99c79e0a11",
 +        revision="d1d8734c9c3ad0ccfeea4afc270faa356c2ba515",
          architecture="PaliGemmaForConditionalGeneration",
          params=2_900_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="qwen2-vl-2b-instruct",
          hf_id="Qwen/Qwen2-VL-2B-Instruct",
 -        # Placeholder SHA (format-valid, not a real commit). See the
 -        # paligemma entry for the self-healing workflow via
 -        # `scripts/refresh-registry.py --check`.
 -        revision="c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9",
 +        revision="895c3a49bc3fa70a340399125c650a463535e71c",
          architecture="Qwen2VLForConditionalGeneration",
          params=2_200_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
      BaseModelSpec(
          key="internvl2-2b",
          hf_id="OpenGVLab/InternVL2-2B",
 -        # Placeholder SHA (format-valid, not a real commit).
 -        revision="d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b7c8d9e0",
 +        revision="e4f6747bd20f139e637642c6a058c6bd00b36919",
          architecture="InternVLChatModel",
          params=2_200_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
              num_image_tokens=256,
          ),
      ),
 +    BaseModelSpec(
 +        key="internvl3-2b",
 +        hf_id="OpenGVLab/InternVL3-2B",
 +        revision="899155015275a9b7338c7f4677e19c784e0e5a21",
 +        architecture="InternVLChatModel",
 +        params=2_000_000_000,
 +        target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
 +        template="internvl2",
 +        gguf_arch="internvl3",
 +        tokenizer_pre="internvl3",
 +        license_spdx="Apache-2.0",
 +        license_url="https://huggingface.co/OpenGVLab/InternVL3-2B",
 +        requires_acceptance=False,
 +        redistributable=True,
 +        trust_remote_code=True,
 +        size_gb_fp16=4.0,
 +        context_length=32_768,
 +        recommended_seq_len=2048,
 +        modality="vision-language",
 +        vl_preprocessor_plan=VlPreprocessorPlan(
 +            target_size=(448, 448),
 +            resize_policy="dynamic",
 +            image_token="<image>",
 +            num_image_tokens=256,
 +        ),
 +    ),
      # --- Audio-language bases -----------------------------------------------
      # Qwen2-Audio-7B-Instruct — Alibaba's open audio-text model. Uses
      # the Qwen2 LLM backbone + a dedicated audio encoder. Apache-2.0
 -    # but the 7B checkpoint is gated on HF via license acceptance, so
 -    # `requires_acceptance=True` flows through the same pattern the
 -    # Llama-3.2 / PaliGemma entries use. Redistributable under
 -    # Apache-2.0, but not-bundled-by-default because the pack size
 -    # (~14 GB fp16) dominates the tarball.
 +    # and currently ungated on HF, so the registry keeps it open and
 +    # redistributable like the other permissive Qwen rows.
+     #
      # The 16 kHz pin + 30 s max-length match the training-time
      # defaults documented in the Qwen2-Audio card. Resampling support
      BaseModelSpec(
          key="qwen2-audio-7b-instruct",
          hf_id="Qwen/Qwen2-Audio-7B-Instruct",
 -        # Placeholder SHA (format-valid, not a real commit). See the
 -        # paligemma entry for the self-healing workflow via
 -        # `scripts/refresh-registry.py --check`.
 -        revision="a1b2c3d4e5f678901234567890abcdef01234567",
 +        revision="0a095220c30b7b31434169c3086508ef3ea5bf0a",
          architecture="Qwen2AudioForConditionalGeneration",
          params=8_400_000_000,
          target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
          tokenizer_pre="qwen2",
          license_spdx="Apache-2.0",
          license_url="https://huggingface.co/Qwen/Qwen2-Audio-7B-Instruct",
 -        requires_acceptance=True,
 -        redistributable=False,
 +        requires_acceptance=False,
 +        redistributable=True,
          size_gb_fp16=15.5,
          context_length=8_192,
          recommended_seq_len=2048,

src/dlm/base_models/resolver.pymodified

  TemplateDialect = Literal[
      "chatml",
 +    "qwen3thinking",
      "gemma2",
      "smollm3",
      "olmo2",

src/dlm/base_models/schema.pymodified

      target_modules: list[str] = Field(..., min_length=1)
      template: Literal[
          "chatml",
 +        "qwen3thinking",
          "gemma2",
          "smollm3",
          "olmo2",

src/dlm/base_models/templates/qwen3thinking.jinjaadded

 +{#
 +Qwen3 reasoning-profile reference template.
++
 +Upstream keeps ChatML framing for request construction; the profile
 +delta is the reasoning-first sampling/runtime behavior rather than a
 +different turn wrapper.
 +#}
 +{%- for message in messages -%}
 +<|im_start|>{{ message['role'] }}
 +{{ message['content'] }}<|im_end|>
 +{% endfor -%}
 +{%- if add_generation_prompt -%}
 +<|im_start|>assistant
 +{%- endif -%}

src/dlm/export/ollama/template_registry.pymodified

  from dlm.export.ollama.errors import TemplateRegistryError
 -Dialect = Literal["chatml", "gemma2", "smollm3", "olmo2", "llama3", "phi3", "phi4mini", "mistral"]
 +Dialect = Literal[
 +    "chatml",
 +    "qwen3thinking",
 +    "gemma2",
 +    "smollm3",
 +    "olmo2",
 +    "llama3",
 +    "phi3",
 +    "phi4mini",
 +    "mistral",
 +]
  _TEMPLATES_DIR: Final[Path] = Path(__file__).resolve().parent / "templates"
          # synthesize a new turn instead of yielding.
          default_stops=("<|im_end|>", "<|endoftext|>", "<|im_start|>"),
      ),
 +    "qwen3thinking": DialectTemplate(
 +        dialect="qwen3thinking",
 +        template_path=_TEMPLATES_DIR / "qwen3thinking.gotmpl",
 +        # Qwen3's reasoning profile still uses ChatML turn framing, but
 +        # the upstream defaults run slightly broader sampling than the
 +        # legacy ChatML family.
 +        default_stops=("<|im_end|>", "<|endoftext|>", "<|im_start|>"),
 +        default_temperature=0.6,
 +        default_top_p=0.95,
 +    ),
      "gemma2": DialectTemplate(
          dialect="gemma2",
          template_path=_TEMPLATES_DIR / "gemma2.gotmpl",

src/dlm/export/ollama/templates/qwen3thinking.gotmpladded

 +{{- if .System }}<|im_start|>system
 +{{ .System }}<|im_end|>
 +{{ end }}{{- range .Messages }}<|im_start|>{{ .Role }}
 +{{ .Content }}<|im_end|>
 +{{ end }}<|im_start|>assistant

tests/integration/base_models/test_13_entries_scaffold.pyadded

 +"""Sprint 40 closeout mirror for the named 13-entry scaffold deliverable."""
++
 +from __future__ import annotations
++
 +from pathlib import Path
++
 +import pytest
 +from tests.integration.cli.test_registry_refresh_init import SPRINT40_INIT_CASES
 +from typer.testing import CliRunner
++
 +from dlm.cli.app import app
 +from dlm.doc.parser import parse_file
 +from dlm.doc.sections import SectionType
++
++
 +@pytest.mark.parametrize(("base_key", "extra_flags", "expect_image_section"), SPRINT40_INIT_CASES)
 +def test_init_scaffolds_for_all_thirteen_registry_refresh_entries(
 +    tmp_path: Path,
 +    base_key: str,
 +    extra_flags: list[str],
 +    expect_image_section: bool,
 +) -> None:
 +    runner = CliRunner()
 +    home = tmp_path / "home"
 +    doc = tmp_path / f"{base_key}.dlm"
++
 +    result = runner.invoke(
 +        app,
 +        [
 +            "--home",
 +            str(home),
 +            "init",
 +            str(doc),
 +            "--base",
 +            base_key,
 +            *extra_flags,
 +        ],
 +    )
 +    assert result.exit_code == 0, result.output
 +    parsed = parse_file(doc)
 +    section_types = {section.type for section in parsed.sections}
 +    if expect_image_section:
 +        assert SectionType.IMAGE in section_types
 +    else:
 +        assert SectionType.IMAGE not in section_types

tests/integration/cli/test_registry_refresh_init.pymodified

 -"""Scaffold coverage for the Sprint 40 registry refresh entries we ship.
+-
 -This is intentionally scoped to entries that currently exist in the
 -registry. Two rows in the original sprint draft still need upstream
 -reality work (`qwen3-1.7b-thinking`, `internvl3-2b`), so this test
 -guards the refresh surface we have actually landed rather than baking
 -stale assumptions into CI.
 -"""
 +"""Scaffold coverage for every Sprint 40 registry-refresh entry."""
  from __future__ import annotations
  from dlm.doc.parser import parse_file
  from dlm.doc.sections import SectionType
+-
 -@pytest.mark.parametrize(
 -    ("base_key", "extra_flags", "expect_image_section"),
 -    [
 -        ("qwen3-1.7b", [], False),
 -        ("qwen3-4b", [], False),
 -        ("qwen3-8b", [], False),
 -        ("llama-3.3-8b-instruct", ["--i-accept-license"], False),
 -        ("phi-4-mini-reasoning", [], False),
 -        ("gemma-2-2b-it", ["--i-accept-license"], False),
 -        ("gemma-2-9b-it", ["--i-accept-license"], False),
 -        ("mistral-small-3.1-24b-instruct", ["--multimodal"], True),
 -        ("smollm3-3b", [], False),
 -        ("olmo-2-7b-instruct", [], False),
 -        ("mixtral-8x7b-instruct", [], False),
 -    ],
 +SPRINT40_INIT_CASES: tuple[tuple[str, list[str], bool], ...] = (
 +    ("qwen3-1.7b", [], False),
 +    ("qwen3-1.7b-thinking", [], False),
 +    ("qwen3-4b", [], False),
 +    ("qwen3-8b", [], False),
 +    ("llama-3.3-8b-instruct", ["--i-accept-license"], False),
 +    ("phi-4-mini-reasoning", [], False),
 +    ("gemma-2-2b-it", ["--i-accept-license"], False),
 +    ("gemma-2-9b-it", ["--i-accept-license"], False),
 +    ("mistral-small-3.1-24b-instruct", ["--multimodal"], True),
 +    ("smollm3-3b", [], False),
 +    ("olmo-2-7b-instruct", [], False),
 +    ("mixtral-8x7b-instruct", [], False),
 +    ("internvl3-2b", ["--multimodal"], True),
+ )
 -def test_init_scaffolds_for_landed_registry_refresh_entries(
++
++
 +@pytest.mark.parametrize(("base_key", "extra_flags", "expect_image_section"), SPRINT40_INIT_CASES)
 +def test_init_scaffolds_for_every_registry_refresh_entry(
      tmp_path: Path,
      base_key: str,
      extra_flags: list[str],

tests/integration/gate/test_mixtral_gate_smoke.pyadded

 +"""Sprint 40 smoke proof that the Mixtral row still flows through Sprint 34 gate paths."""
++
 +from __future__ import annotations
++
 +from pathlib import Path
 +from types import SimpleNamespace
++
 +from dlm.base_models import BASE_MODELS
 +from dlm.inference.gate import GateHandle, load_gate_handle
 +from dlm.modality import modality_for
 +from dlm.train.gate import GateTrainingSample, train_gate
++
++
 +def _store(tmp_path: Path) -> SimpleNamespace:
 +    return SimpleNamespace(root=tmp_path)
++
++
 +def test_mixtral_text_moe_row_still_uses_text_gate_pipeline(tmp_path: Path) -> None:
 +    import torch
++
 +    spec = BASE_MODELS["mixtral-8x7b-instruct"]
 +    dispatch = modality_for(spec)
 +    assert spec.modality == "text-moe"
 +    assert dispatch.accepts_images is False
 +    assert dispatch.accepts_audio is False
++
 +    store = _store(tmp_path)
 +    samples: list[GateTrainingSample] = []
 +    for _ in range(12):
 +        samples.append(
 +            GateTrainingSample(embedding=torch.ones(8) + 0.05 * torch.randn(8), adapter_name="a")
 +        )
 +        samples.append(
 +            GateTrainingSample(embedding=-torch.ones(8) + 0.05 * torch.randn(8), adapter_name="b")
 +        )
++
 +    result = train_gate(
 +        store,  # type: ignore[arg-type]
 +        samples,
 +        adapter_names=("a", "b"),
 +        input_dim=8,
 +        hidden_proj_dim=8,
 +        steps=80,
 +        lr=3e-3,
 +        cold_start_floor=1,
 +        batch_size=8,
 +        seed=0,
 +    )
 +    assert result.mode == "trained"
++
 +    handle = load_gate_handle(store)  # type: ignore[arg-type]
 +    assert isinstance(handle, GateHandle)
 +    assert handle.is_uniform is False
 +    assert handle.metadata.adapter_names == ("a", "b")

tests/unit/base_models/test_audio_registry.pymodified

  - `qwen2-audio-7b-instruct` is present and has `modality="audio-language"`.
  - Its `AudioPreprocessorPlan` is pinned (16 kHz, 30 s, `<|AUDIO|>`, 750).
 -- License is Apache-2.0 but the 7B weights are gated behind HF acceptance
 -  and flagged non-redistributable (pack tarball size).
 +- License is Apache-2.0 and the current HF row is no longer gated, so
 +  the spec stays redistributable.
  - `modality="audio-language"` without a plan rejects at validate time;
    text bases cannot carry an audio plan; VL bases cannot carry an audio
    plan; audio bases cannot carry a VL plan.
          spec = BASE_MODELS["qwen2-audio-7b-instruct"]
          assert spec.vl_preprocessor_plan is None
 -    def test_license_gated_not_redistributable(self) -> None:
 +    def test_license_open_and_redistributable(self) -> None:
          spec = BASE_MODELS["qwen2-audio-7b-instruct"]
 -        assert spec.requires_acceptance is True
 -        assert spec.redistributable is False
 +        assert spec.requires_acceptance is False
 +        assert spec.redistributable is True
      def test_architecture_is_audio_conditional_generation(self) -> None:
          spec = BASE_MODELS["qwen2-audio-7b-instruct"]

tests/unit/base_models/test_registry.pymodified

              "qwen2.5-1.5b",
              "qwen2.5-coder-1.5b",
              "qwen3-1.7b",
 +            "qwen3-1.7b-thinking",
              "qwen3-4b",
              "qwen3-8b",
              "mixtral-8x7b-instruct",
              "smollm3-3b",
              "olmo-2-7b-instruct",
 +            "qwen2-audio-7b-instruct",
              "smollm2-135m",
              "smollm2-360m",
              "smollm2-1.7b",
              BASE_MODELS[k].size_gb_fp16 for k in ("qwen2.5-0.5b", "qwen2.5-1.5b", "qwen2.5-3b")
+         ]
          assert qwen_sizes == sorted(qwen_sizes)
 -        qwen3_sizes = [BASE_MODELS[k].size_gb_fp16 for k in ("qwen3-1.7b", "qwen3-4b", "qwen3-8b")]
 +        qwen3_sizes = [
 +            BASE_MODELS[k].size_gb_fp16
 +            for k in ("qwen3-1.7b", "qwen3-1.7b-thinking", "qwen3-4b", "qwen3-8b")
 +        ]
          assert qwen3_sizes == sorted(qwen3_sizes)
          smol_sizes = [
              BASE_MODELS[k].size_gb_fp16 for k in ("smollm2-135m", "smollm2-360m", "smollm2-1.7b")

tests/unit/base_models/test_registry_2026.pymodified

          assert spec.size_gb_fp16 == pytest.approx(16.0)
 +class TestQwen3ThinkingRegistryEntry:
 +    def test_entry_present(self) -> None:
 +        assert "qwen3-1.7b-thinking" in BASE_MODELS
++
 +    def test_reuses_live_qwen3_weights_with_reasoning_profile(self) -> None:
 +        spec = BASE_MODELS["qwen3-1.7b-thinking"]
 +        assert spec.hf_id == "Qwen/Qwen3-1.7B"
 +        assert spec.architecture == "Qwen3ForCausalLM"
 +        assert spec.template == "qwen3thinking"
 +        assert spec.gguf_arch == "qwen3"
 +        assert spec.tokenizer_pre == "qwen2"
++
 +    def test_reasoning_profile_keeps_open_license_and_cooler_default(self) -> None:
 +        spec = BASE_MODELS["qwen3-1.7b-thinking"]
 +        assert spec.license_spdx == "Apache-2.0"
 +        assert spec.requires_acceptance is False
 +        assert spec.redistributable is True
 +        assert spec.reasoning_tuned is True
 +        assert spec.suggested_prompt_temperature == pytest.approx(0.6)
++
++
  class TestLlama33RegistryEntry:
      def test_entry_present(self) -> None:
          assert "llama-3.3-8b-instruct" in BASE_MODELS
          assert spec.recommended_seq_len == 2048
 -class TestStaleSprintDraftRows:
 -    def test_qwen3_thinking_is_not_a_separate_registry_row(self) -> None:
 -        """Upstream Qwen3-1.7B ships hybrid thinking in one model.
+-
 -        Sprint 40's draft listed a separate `qwen3-1.7b-thinking`
 -        entry, but the live upstream contract exposes thinking mode as
 -        a switch on `Qwen/Qwen3-1.7B` itself. Keep the registry honest:
 -        reasoning defaults belong on the real base row, not a fake key.
 -        """
 -        assert "qwen3-1.7b-thinking" not in BASE_MODELS
+-
 -    def test_internvl3_not_shipped_until_remote_code_contract_is_pinned(self) -> None:
 -        """Guard against copying the stale sprint draft into the registry.
+-
 -        The live `OpenGVLab/InternVL3-2B` model card still documents
 -        `trust_remote_code=True`, and on the current stack the whole
 -        InternVL family still exposes a tokenizer-only `AutoProcessor`
 -        rather than a complete image processor. Upstream also expands
 -        `<image>` into repeated `<IMG_CONTEXT>` spans and threads
 -        `image_flags` through the forward pass. Adding InternVL3 later
 -        is fine, but it needs an honest runtime contract instead of
 -        assuming the old "cleaner than InternVL2" sprint note is still
 -        true.
 -        """
 -        assert "internvl3-2b" not in BASE_MODELS
 +class TestInternVL3RegistryEntry:
 +    def test_entry_present(self) -> None:
 +        assert "internvl3-2b" in BASE_MODELS
++
 +    def test_entry_keeps_remote_code_contract_explicit(self) -> None:
 +        spec = BASE_MODELS["internvl3-2b"]
 +        assert spec.hf_id == "OpenGVLab/InternVL3-2B"
 +        assert spec.architecture == "InternVLChatModel"
 +        assert spec.template == "internvl2"
 +        assert spec.trust_remote_code is True
++
 +    def test_entry_is_registry_visible_but_not_pretending_runtime_is_generic(self) -> None:
 +        spec = BASE_MODELS["internvl3-2b"]
 +        assert spec.license_spdx == "Apache-2.0"
 +        assert spec.requires_acceptance is False
 +        assert spec.redistributable is True
 +        assert spec.modality == "vision-language"
 +        assert spec.vl_preprocessor_plan is not None
 +        assert spec.vl_preprocessor_plan.resize_policy == "dynamic"
 +        assert spec.vl_preprocessor_plan.image_token == "<image>"

tests/unit/base_models/test_schema.pymodified

  class TestLiteralConstraints:
      @pytest.mark.parametrize(
          "template",
 -        ["chatml", "gemma2", "smollm3", "olmo2", "llama3", "phi3", "phi4mini", "mistral"],
 +        [
 +            "chatml",
 +            "qwen3thinking",
 +            "gemma2",
 +            "smollm3",
 +            "olmo2",
 +            "llama3",
 +            "phi3",
 +            "phi4mini",
 +            "mistral",
 +        ],
+     )
      def test_template_literals_accepted(self, template: str) -> None:
          spec = _minimal(template=template)

tests/unit/base_models/test_vl_registry.pymodified

      "paligemma-3b-mix-224",
      "qwen2-vl-2b-instruct",
      "internvl2-2b",
 +    "internvl3-2b",
      "mistral-small-3.1-24b-instruct",
+ )
          assert spec.vl_preprocessor_plan is not None
          # Pinned identity fields — each one is part of the cache key,
          # so a silent default would silently invalidate caches.
 -        assert spec.vl_preprocessor_plan.resize_policy == "fixed"
 +        assert spec.vl_preprocessor_plan.resize_policy in {"fixed", "dynamic"}
          assert spec.vl_preprocessor_plan.num_image_tokens > 0
      @pytest.mark.parametrize("key", _VL_BASE_KEYS)
          assert BASE_MODELS["internvl2-2b"].template == "internvl2"
 +class TestInternVL3RegistryEntry:
 +    """Sprint 40 refresh: InternVL3 lands with an explicit runtime caveat."""
++
 +    def test_entry_present(self) -> None:
 +        assert "internvl3-2b" in BASE_MODELS
++
 +    def test_apache_permissive(self) -> None:
 +        spec = BASE_MODELS["internvl3-2b"]
 +        assert spec.license_spdx == "Apache-2.0"
 +        assert spec.requires_acceptance is False
 +        assert spec.redistributable is True
++
 +    def test_dynamic_preprocessing_plan_is_pinned(self) -> None:
 +        spec = BASE_MODELS["internvl3-2b"]
 +        plan = spec.vl_preprocessor_plan
 +        assert plan is not None
 +        assert plan.target_size == (448, 448)
 +        assert plan.resize_policy == "dynamic"
 +        assert plan.image_token == "<image>"
 +        assert plan.num_image_tokens == 256
++
 +    def test_architecture_and_template(self) -> None:
 +        spec = BASE_MODELS["internvl3-2b"]
 +        assert spec.architecture == "InternVLChatModel"
 +        assert spec.template == "internvl2"
 +        assert spec.trust_remote_code is True
++
++
  class TestDistinctVlBases:
      """The VL bases occupy distinct rows with no silent duplicates."""
      def test_all_keys_unique(self) -> None:
 -        assert len(set(_VL_BASE_KEYS)) == 4
 +        assert len(set(_VL_BASE_KEYS)) == 5
      def test_hf_ids_distinct(self) -> None:
          hf_ids = {BASE_MODELS[k].hf_id for k in _VL_BASE_KEYS}
 -        assert len(hf_ids) == 4
 +        assert len(hf_ids) == 5
      def test_image_tokens_distinct_per_base(self) -> None:
 -        """Each VL base uses its native image-token string.
 +        """VL rows pin their native placeholder tokens explicitly.
 -        Silently sharing a placeholder across bases would break the
 -        cache-key invariant in vl_cache.py (cache key includes the
 -        token via processor_sha256).
 +        Some families legitimately reuse the same surface token
 +        (`<image>`), so this checks the concrete set rather than
 +        forcing uniqueness for uniqueness' sake.
          """
          tokens = {
              BASE_MODELS[k].vl_preprocessor_plan.image_token  # type: ignore[union-attr]
              for k in _VL_BASE_KEYS
+         }
 -        assert len(tokens) == 4
 +        assert tokens == {"<image>", "<|image_pad|>", "<IMG_CONTEXT>", "[IMG]"}
  class TestCountVlRegistryEntries:
      def test_at_least_four_vl_bases_registered(self) -> None:
          vl_count = sum(1 for s in BASE_MODELS.values() if s.modality == "vision-language")
 -        assert vl_count >= 4
 +        assert vl_count >= 5
  class TestTrustRemoteCodeOptIn:
          is defined in the model repo, not in transformers."""
          assert BASE_MODELS["internvl2-2b"].trust_remote_code is True
 +    def test_internvl3_opts_in(self) -> None:
 +        assert BASE_MODELS["internvl3-2b"].trust_remote_code is True
++
      def test_text_bases_default_false(self) -> None:
          """None of the text bases opt into trust_remote_code."""
          for key, spec in BASE_MODELS.items():

tests/unit/doc/test_v12_migrator.pyadded

 +"""Named Sprint 40 closeout checks for the v12 → v13 migrator."""
++
 +from __future__ import annotations
++
 +from typing import Any
++
 +from dlm.doc.migrations.v12 import migrate
 +from dlm.doc.schema import DlmFrontmatter
++
 +_VALID_ULID = "01HZ4X7TGZM3J1A2B3C4D5E6F7"
++
++
 +def test_v12_migrator_is_identity_for_existing_frontmatter() -> None:
 +    raw: dict[str, Any] = {
 +        "dlm_id": _VALID_ULID,
 +        "base_model": "smollm2-135m",
 +        "dlm_version": 12,
 +        "training": {"audio": {"auto_resample": True}, "lora_r": 16},
 +    }
 +    out = migrate(raw)
 +    assert out == raw
 +    assert out is not raw
++
++
 +def test_v12_migrator_output_validates_as_v13() -> None:
 +    raw: dict[str, Any] = {
 +        "dlm_id": _VALID_ULID,
 +        "base_model": "smollm2-135m",
 +        "dlm_version": 12,
 +        "training": {"audio": {"auto_resample": True}},
 +    }
 +    out = migrate(raw)
 +    out["dlm_version"] = 13
 +    fm = DlmFrontmatter.model_validate(out)
 +    assert fm.dlm_version == 13
 +    assert fm.training.audio.auto_resample is True

tests/unit/export/ollama/test_template_registry.pymodified

  class TestRegistryCoverage:
 -    def test_all_eight_dialects_registered(self) -> None:
 +    def test_all_nine_dialects_registered(self) -> None:
          assert set(registered_dialects()) == {
              "chatml",
 +            "qwen3thinking",
              "gemma2",
              "smollm3",
              "olmo2",
      @pytest.mark.parametrize(
          "dialect",
 -        ["chatml", "gemma2", "smollm3", "olmo2", "llama3", "phi3", "phi4mini", "mistral"],
 +        [
 +            "chatml",
 +            "qwen3thinking",
 +            "gemma2",
 +            "smollm3",
 +            "olmo2",
 +            "llama3",
 +            "phi3",
 +            "phi4mini",
 +            "mistral",
 +        ],
+     )
      def test_each_template_file_exists(self, dialect: str) -> None:
          row = get_template(dialect)
      @pytest.mark.parametrize(
          "dialect",
 -        ["chatml", "gemma2", "smollm3", "olmo2", "llama3", "phi3", "phi4mini", "mistral"],
 +        [
 +            "chatml",
 +            "qwen3thinking",
 +            "gemma2",
 +            "smollm3",
 +            "olmo2",
 +            "llama3",
 +            "phi3",
 +            "phi4mini",
 +            "mistral",
 +        ],
+     )
      def test_each_has_default_stops(self, dialect: str) -> None:
          row = get_template(dialect)
          ("dialect", "required"),
+         [
              ("chatml", {"<|im_end|>", "<|im_start|>"}),
 +            ("qwen3thinking", {"<|im_end|>", "<|im_start|>"}),
              ("gemma2", {"<end_of_turn>", "<start_of_turn>"}),
              ("smollm3", {"<|im_end|>", "<|im_start|>"}),
              ("olmo2", {"<|endoftext|>", "<|user|>", "<|assistant|>"}),
          assert "<end_of_turn>" in text
          assert "model" in text
 +    def test_qwen3thinking_keeps_chatml_markers_with_reasoning_defaults(self) -> None:
 +        text = load_template_text("qwen3thinking")
 +        assert "<|im_start|>" in text
 +        assert "<|im_end|>" in text
 +        row = get_template("qwen3thinking")
 +        assert row.default_temperature == pytest.approx(0.6)
 +        assert row.default_top_p == pytest.approx(0.95)
++
      def test_smollm3_has_reasoning_system_prompt(self) -> None:
          text = load_template_text("smollm3")
          assert "<|im_start|>system" in text

tests/unit/export/test_mixtral_template.pyadded

 +"""Sprint 40 closeout checks for the Mixtral template row."""
++
 +from __future__ import annotations
++
 +import json
 +from pathlib import Path
++
 +from dlm.base_models import BASE_MODELS
 +from dlm.export.ollama.modelfile import ModelfileContext, render_modelfile
 +from dlm.export.plan import ExportPlan
++
++
 +def _adapter_dir(tmp_path: Path) -> Path:
 +    adapter = tmp_path / "adapter"
 +    adapter.mkdir()
 +    (adapter / "tokenizer_config.json").write_text(
 +        json.dumps({"eos_token": "</s>", "added_tokens_decoder": {}}),
 +        encoding="utf-8",
 +    )
 +    return adapter
++
++
 +def test_mixtral_registry_row_renders_through_mistral_template(tmp_path: Path) -> None:
 +    spec = BASE_MODELS["mixtral-8x7b-instruct"]
 +    text = render_modelfile(
 +        ModelfileContext(
 +            spec=spec,
 +            plan=ExportPlan(quant="Q4_K_M", merged=False),
 +            adapter_dir=_adapter_dir(tmp_path),
 +            base_gguf_name="base.gguf",
 +            adapter_gguf_name="adapter.gguf",
 +            dlm_id="01TEST",
 +            adapter_version=1,
 +        )
 +    )
 +    assert spec.modality == "text-moe"
 +    assert "[INST]" in text
 +    assert 'PARAMETER stop "[INST]"' in text

tests/unit/export/test_phi4_template.pyadded

 +"""Sprint 40 closeout checks for the Phi-4 reasoning template row."""
++
 +from __future__ import annotations
++
 +import json
 +from pathlib import Path
++
 +from dlm.base_models import BASE_MODELS
 +from dlm.export.ollama.modelfile import ModelfileContext, render_modelfile
 +from dlm.export.plan import ExportPlan
++
++
 +def _adapter_dir(tmp_path: Path) -> Path:
 +    adapter = tmp_path / "adapter"
 +    adapter.mkdir()
 +    (adapter / "tokenizer_config.json").write_text(
 +        json.dumps({"eos_token": "<|end|>", "added_tokens_decoder": {}}),
 +        encoding="utf-8",
 +    )
 +    return adapter
++
++
 +def test_phi4_reasoning_template_keeps_phi_system_preamble_and_stops(tmp_path: Path) -> None:
 +    text = render_modelfile(
 +        ModelfileContext(
 +            spec=BASE_MODELS["phi-4-mini-reasoning"],
 +            plan=ExportPlan(quant="Q4_K_M", merged=False),
 +            adapter_dir=_adapter_dir(tmp_path),
 +            base_gguf_name="base.gguf",
 +            adapter_gguf_name="adapter.gguf",
 +            dlm_id="01TEST",
 +            adapter_version=1,
 +        )
 +    )
 +    assert "Your name is Phi, an AI math expert developed by Microsoft." in text
 +    assert 'PARAMETER stop "<|assistant|>"' in text
 +    assert "PARAMETER temperature 0.6" in text

tests/unit/export/test_qwen3_template.pyadded

 +"""Sprint 40 closeout checks for the Qwen3 reasoning-template row."""
++
 +from __future__ import annotations
++
 +import json
 +from pathlib import Path
++
 +from dlm.base_models import BASE_MODELS
 +from dlm.export.ollama.modelfile import ModelfileContext, render_modelfile
 +from dlm.export.plan import ExportPlan
++
++
 +def _adapter_dir(tmp_path: Path) -> Path:
 +    adapter = tmp_path / "adapter"
 +    adapter.mkdir()
 +    (adapter / "tokenizer_config.json").write_text(
 +        json.dumps({"eos_token": "<|im_end|>", "added_tokens_decoder": {}}),
 +        encoding="utf-8",
 +    )
 +    return adapter
++
++
 +def test_qwen3_thinking_row_uses_distinct_reasoning_template_defaults(tmp_path: Path) -> None:
 +    text = render_modelfile(
 +        ModelfileContext(
 +            spec=BASE_MODELS["qwen3-1.7b-thinking"],
 +            plan=ExportPlan(quant="Q4_K_M", merged=False),
 +            adapter_dir=_adapter_dir(tmp_path),
 +            base_gguf_name="base.gguf",
 +            adapter_gguf_name="adapter.gguf",
 +            dlm_id="01TEST",
 +            adapter_version=1,
 +        )
 +    )
 +    assert "PARAMETER temperature 0.6" in text
 +    assert "PARAMETER top_p 0.95" in text
 +    assert "<|im_start|>assistant" in text