markdown · 3157 bytes Raw Blame History

Choosing a base

The fastest way to pick a DLM base is to decide three things first:

  1. Do you need plain text, multimodal vision, or audio?
  2. Do you want the most permissive license possible, or are gated rows fine?
  3. Are you targeting Apple Silicon, a mid-size CUDA card, or a large CUDA box?

Quick picks

If you want… Start with… Why
Fast local iteration on almost any laptop smollm2-135m Tiny, cheap, and ideal for testing authoring loops.
Best general-purpose 2026 text base around the 4B tier qwen3-4b Strong default quality, permissive license, and current-generation tokenizer/chat behavior.
A reasoning-first 1.7B profile qwen3-1.7b-thinking Same upstream Qwen3 weights, but a curated reasoning-profile key with cooler defaults.
Fully open-model story olmo-2-7b-instruct Open weights and open-data lineage make it the cleanest reproducibility pitch.
Apache sparse-MoE experiments mixtral-8x7b-instruct First text-moe row in the registry; pairs with the learned gate work.
Small gated text base gemma-2-2b-it Useful when Gemma’s instruction style or ecosystem matters more than license friction.
Larger gated text base gemma-2-9b-it Upper-tier Gemma pick; large enough to want real GPU planning.
Large multimodal capability mistral-small-3.1-24b-instruct Strongest shipped VL row, but large-CUDA-first.
Safe default multimodal row on a smaller box qwen2-vl-2b-instruct Permissive, solid, and compatible with the current generic VL runtime.
Audio-language training qwen2-audio-7b-instruct Current shipped audio row; open-license and no longer gated on HF.

Notes on the sharp edges

  • llama-3.3-8b-instruct is still treated like the Llama family in DLM’s policy surface: acceptance required, not redistributable, and intended for users who already know they want the Llama line. Today it resolves through a community HF mirror while DLM pins provenance against Meta’s official LlamaCon/newsroom announcement, because Meta has not published a first-party HF repo for this row.
  • internvl2-2b and internvl3-2b are registry-visible planning targets, but the current generic VL runtime still refuses the InternVL family until DLM owns its custom processor/collator contract.
  • mistral-small-3.1-24b-instruct is intentionally refused on MPS by default. It is a real shipped row, just not a casual laptop target.

Hardware-first view

  • Apple Silicon, 16 GB: smollm2-*, qwen2.5-*, qwen3-1.7b, and qwen3-4b are the comfortable text picks; qwen2-vl-2b-instruct is the safer VL row.
  • Apple Silicon, 32 GB+: qwen3-8b, gemma-2-2b-it, and phi-4-mini-reasoning become practical. Large VL rows still need caution.
  • CUDA, 24 GB: this is where gemma-2-9b-it, mixtral-8x7b-instruct, and the heavier multimodal rows start becoming realistic.
  • CUDA, 48 GB+: this is the intended home for mistral-small-3.1-24b-instruct.

See hardware/memory-estimates for the text-family budget table and hardware/vl-memory for the VL rows.

View source
1 # Choosing a base
2
3 The fastest way to pick a DLM base is to decide three things first:
4
5 1. Do you need plain text, multimodal vision, or audio?
6 2. Do you want the most permissive license possible, or are gated rows fine?
7 3. Are you targeting Apple Silicon, a mid-size CUDA card, or a large CUDA box?
8
9 ## Quick picks
10
11 | If you want… | Start with… | Why |
12 |---|---|---|
13 | Fast local iteration on almost any laptop | `smollm2-135m` | Tiny, cheap, and ideal for testing authoring loops. |
14 | Best general-purpose 2026 text base around the 4B tier | `qwen3-4b` | Strong default quality, permissive license, and current-generation tokenizer/chat behavior. |
15 | A reasoning-first 1.7B profile | `qwen3-1.7b-thinking` | Same upstream Qwen3 weights, but a curated reasoning-profile key with cooler defaults. |
16 | Fully open-model story | `olmo-2-7b-instruct` | Open weights and open-data lineage make it the cleanest reproducibility pitch. |
17 | Apache sparse-MoE experiments | `mixtral-8x7b-instruct` | First `text-moe` row in the registry; pairs with the learned gate work. |
18 | Small gated text base | `gemma-2-2b-it` | Useful when Gemma’s instruction style or ecosystem matters more than license friction. |
19 | Larger gated text base | `gemma-2-9b-it` | Upper-tier Gemma pick; large enough to want real GPU planning. |
20 | Large multimodal capability | `mistral-small-3.1-24b-instruct` | Strongest shipped VL row, but large-CUDA-first. |
21 | Safe default multimodal row on a smaller box | `qwen2-vl-2b-instruct` | Permissive, solid, and compatible with the current generic VL runtime. |
22 | Audio-language training | `qwen2-audio-7b-instruct` | Current shipped audio row; open-license and no longer gated on HF. |
23
24 ## Notes on the sharp edges
25
26 - `llama-3.3-8b-instruct` is still treated like the Llama family in DLM’s policy surface: acceptance required, not redistributable, and intended for users who already know they want the Llama line. Today it resolves through a community HF mirror while DLM pins provenance against Meta’s official LlamaCon/newsroom announcement, because Meta has not published a first-party HF repo for this row.
27 - `internvl2-2b` and `internvl3-2b` are registry-visible planning targets, but the current generic VL runtime still refuses the InternVL family until DLM owns its custom processor/collator contract.
28 - `mistral-small-3.1-24b-instruct` is intentionally refused on MPS by default. It is a real shipped row, just not a casual laptop target.
29
30 ## Hardware-first view
31
32 - Apple Silicon, 16 GB: `smollm2-*`, `qwen2.5-*`, `qwen3-1.7b`, and `qwen3-4b` are the comfortable text picks; `qwen2-vl-2b-instruct` is the safer VL row.
33 - Apple Silicon, 32 GB+: `qwen3-8b`, `gemma-2-2b-it`, and `phi-4-mini-reasoning` become practical. Large VL rows still need caution.
34 - CUDA, 24 GB: this is where `gemma-2-9b-it`, `mixtral-8x7b-instruct`, and the heavier multimodal rows start becoming realistic.
35 - CUDA, 48 GB+: this is the intended home for `mistral-small-3.1-24b-instruct`.
36
37 See [hardware/memory-estimates](../hardware/memory-estimates.md) for the text-family budget table and [hardware/vl-memory](../hardware/vl-memory.md) for the VL rows.