documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 3157 bytes Raw Blame History

Choosing a base

The fastest way to pick a DLM base is to decide three things first:

Do you need plain text, multimodal vision, or audio?
Do you want the most permissive license possible, or are gated rows fine?
Are you targeting Apple Silicon, a mid-size CUDA card, or a large CUDA box?

Quick picks

If you want…	Start with…	Why
Fast local iteration on almost any laptop	`smollm2-135m`	Tiny, cheap, and ideal for testing authoring loops.
Best general-purpose 2026 text base around the 4B tier	`qwen3-4b`	Strong default quality, permissive license, and current-generation tokenizer/chat behavior.
A reasoning-first 1.7B profile	`qwen3-1.7b-thinking`	Same upstream Qwen3 weights, but a curated reasoning-profile key with cooler defaults.
Fully open-model story	`olmo-2-7b-instruct`	Open weights and open-data lineage make it the cleanest reproducibility pitch.
Apache sparse-MoE experiments	`mixtral-8x7b-instruct`	First `text-moe` row in the registry; pairs with the learned gate work.
Small gated text base	`gemma-2-2b-it`	Useful when Gemma’s instruction style or ecosystem matters more than license friction.
Larger gated text base	`gemma-2-9b-it`	Upper-tier Gemma pick; large enough to want real GPU planning.
Large multimodal capability	`mistral-small-3.1-24b-instruct`	Strongest shipped VL row, but large-CUDA-first.
Safe default multimodal row on a smaller box	`qwen2-vl-2b-instruct`	Permissive, solid, and compatible with the current generic VL runtime.
Audio-language training	`qwen2-audio-7b-instruct`	Current shipped audio row; open-license and no longer gated on HF.

Notes on the sharp edges

llama-3.3-8b-instruct is still treated like the Llama family in DLM’s policy surface: acceptance required, not redistributable, and intended for users who already know they want the Llama line. Today it resolves through a community HF mirror while DLM pins provenance against Meta’s official LlamaCon/newsroom announcement, because Meta has not published a first-party HF repo for this row.
internvl2-2b and internvl3-2b are registry-visible planning targets, but the current generic VL runtime still refuses the InternVL family until DLM owns its custom processor/collator contract.
mistral-small-3.1-24b-instruct is intentionally refused on MPS by default. It is a real shipped row, just not a casual laptop target.

Hardware-first view

Apple Silicon, 16 GB: smollm2-*, qwen2.5-*, qwen3-1.7b, and qwen3-4b are the comfortable text picks; qwen2-vl-2b-instruct is the safer VL row.
Apple Silicon, 32 GB+: qwen3-8b, gemma-2-2b-it, and phi-4-mini-reasoning become practical. Large VL rows still need caution.
CUDA, 24 GB: this is where gemma-2-9b-it, mixtral-8x7b-instruct, and the heavier multimodal rows start becoming realistic.
CUDA, 48 GB+: this is the intended home for mistral-small-3.1-24b-instruct.

See hardware/memory-estimates for the text-family budget table and hardware/vl-memory for the VL rows.

View source

  
        1
        # Choosing a base
      
        2
        
        3
        The fastest way to pick a DLM base is to decide three things first:
      
        4
        
        5
        1. Do you need plain text, multimodal vision, or audio?
      
        6
        2. Do you want the most permissive license possible, or are gated rows fine?
      
        7
        3. Are you targeting Apple Silicon, a mid-size CUDA card, or a large CUDA box?
      
        8
        
        9
        ## Quick picks
      
        10
        
        11
        | If you want… | Start with… | Why |
      
        12
        |---|---|---|
      
        13
        | Fast local iteration on almost any laptop | `smollm2-135m` | Tiny, cheap, and ideal for testing authoring loops. |
      
        14
        | Best general-purpose 2026 text base around the 4B tier | `qwen3-4b` | Strong default quality, permissive license, and current-generation tokenizer/chat behavior. |
      
        15
        | A reasoning-first 1.7B profile | `qwen3-1.7b-thinking` | Same upstream Qwen3 weights, but a curated reasoning-profile key with cooler defaults. |
      
        16
        | Fully open-model story | `olmo-2-7b-instruct` | Open weights and open-data lineage make it the cleanest reproducibility pitch. |
      
        17
        | Apache sparse-MoE experiments | `mixtral-8x7b-instruct` | First `text-moe` row in the registry; pairs with the learned gate work. |
      
        18
        | Small gated text base | `gemma-2-2b-it` | Useful when Gemma’s instruction style or ecosystem matters more than license friction. |
      
        19
        | Larger gated text base | `gemma-2-9b-it` | Upper-tier Gemma pick; large enough to want real GPU planning. |
      
        20
        | Large multimodal capability | `mistral-small-3.1-24b-instruct` | Strongest shipped VL row, but large-CUDA-first. |
      
        21
        | Safe default multimodal row on a smaller box | `qwen2-vl-2b-instruct` | Permissive, solid, and compatible with the current generic VL runtime. |
      
        22
        | Audio-language training | `qwen2-audio-7b-instruct` | Current shipped audio row; open-license and no longer gated on HF. |
      
        23
        
        24
        ## Notes on the sharp edges
      
        25
        
        26
        - `llama-3.3-8b-instruct` is still treated like the Llama family in DLM’s policy surface: acceptance required, not redistributable, and intended for users who already know they want the Llama line. Today it resolves through a community HF mirror while DLM pins provenance against Meta’s official LlamaCon/newsroom announcement, because Meta has not published a first-party HF repo for this row.
      
        27
        - `internvl2-2b` and `internvl3-2b` are registry-visible planning targets, but the current generic VL runtime still refuses the InternVL family until DLM owns its custom processor/collator contract.
      
        28
        - `mistral-small-3.1-24b-instruct` is intentionally refused on MPS by default. It is a real shipped row, just not a casual laptop target.
      
        29
        
        30
        ## Hardware-first view
      
        31
        
        32
        - Apple Silicon, 16 GB: `smollm2-*`, `qwen2.5-*`, `qwen3-1.7b`, and `qwen3-4b` are the comfortable text picks; `qwen2-vl-2b-instruct` is the safer VL row.
      
        33
        - Apple Silicon, 32 GB+: `qwen3-8b`, `gemma-2-2b-it`, and `phi-4-mini-reasoning` become practical. Large VL rows still need caution.
      
        34
        - CUDA, 24 GB: this is where `gemma-2-9b-it`, `mixtral-8x7b-instruct`, and the heavier multimodal rows start becoming realistic.
      
        35
        - CUDA, 48 GB+: this is the intended home for `mistral-small-3.1-24b-instruct`.
      
        36
        
        37
        See [hardware/memory-estimates](../hardware/memory-estimates.md) for the text-family budget table and [hardware/vl-memory](../hardware/vl-memory.md) for the VL rows.

1	# Choosing a base
2
3	The fastest way to pick a DLM base is to decide three things first:
4
5	1. Do you need plain text, multimodal vision, or audio?
6	2. Do you want the most permissive license possible, or are gated rows fine?
7	3. Are you targeting Apple Silicon, a mid-size CUDA card, or a large CUDA box?
8
9	## Quick picks
10
11	\| If you want… \| Start with… \| Why \|
12	\|---\|---\|---\|
13	\| Fast local iteration on almost any laptop \| `smollm2-135m` \| Tiny, cheap, and ideal for testing authoring loops. \|
14	\| Best general-purpose 2026 text base around the 4B tier \| `qwen3-4b` \| Strong default quality, permissive license, and current-generation tokenizer/chat behavior. \|
15	\| A reasoning-first 1.7B profile \| `qwen3-1.7b-thinking` \| Same upstream Qwen3 weights, but a curated reasoning-profile key with cooler defaults. \|
16	\| Fully open-model story \| `olmo-2-7b-instruct` \| Open weights and open-data lineage make it the cleanest reproducibility pitch. \|
17	\| Apache sparse-MoE experiments \| `mixtral-8x7b-instruct` \| First `text-moe` row in the registry; pairs with the learned gate work. \|
18	\| Small gated text base \| `gemma-2-2b-it` \| Useful when Gemma’s instruction style or ecosystem matters more than license friction. \|
19	\| Larger gated text base \| `gemma-2-9b-it` \| Upper-tier Gemma pick; large enough to want real GPU planning. \|
20	\| Large multimodal capability \| `mistral-small-3.1-24b-instruct` \| Strongest shipped VL row, but large-CUDA-first. \|
21	\| Safe default multimodal row on a smaller box \| `qwen2-vl-2b-instruct` \| Permissive, solid, and compatible with the current generic VL runtime. \|
22	\| Audio-language training \| `qwen2-audio-7b-instruct` \| Current shipped audio row; open-license and no longer gated on HF. \|
23
24	## Notes on the sharp edges
25
26	- `llama-3.3-8b-instruct` is still treated like the Llama family in DLM’s policy surface: acceptance required, not redistributable, and intended for users who already know they want the Llama line. Today it resolves through a community HF mirror while DLM pins provenance against Meta’s official LlamaCon/newsroom announcement, because Meta has not published a first-party HF repo for this row.
27	- `internvl2-2b` and `internvl3-2b` are registry-visible planning targets, but the current generic VL runtime still refuses the InternVL family until DLM owns its custom processor/collator contract.
28	- `mistral-small-3.1-24b-instruct` is intentionally refused on MPS by default. It is a real shipped row, just not a casual laptop target.
29
30	## Hardware-first view
31
32	- Apple Silicon, 16 GB: `smollm2-`, `qwen2.5-`, `qwen3-1.7b`, and `qwen3-4b` are the comfortable text picks; `qwen2-vl-2b-instruct` is the safer VL row.
33	- Apple Silicon, 32 GB+: `qwen3-8b`, `gemma-2-2b-it`, and `phi-4-mini-reasoning` become practical. Large VL rows still need caution.
34	- CUDA, 24 GB: this is where `gemma-2-9b-it`, `mixtral-8x7b-instruct`, and the heavier multimodal rows start becoming realistic.
35	- CUDA, 48 GB+: this is the intended home for `mistral-small-3.1-24b-instruct`.
36
37	See [hardware/memory-estimates](../hardware/memory-estimates.md) for the text-family budget table and [hardware/vl-memory](../hardware/vl-memory.md) for the VL rows.