documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 1714 bytes Raw Blame History

Memory estimates

These are planning numbers, not a promise. DLM’s doctor still does the real refusal/fit decision, but the table below is the quick mental map for the Sprint 40 refresh rows that changed the most user expectations.

Text-family checkpoints

Base	fp16 weights	Practical target
`qwen3-8b`	~16 GB	24 GB CUDA or high-memory Apple Silicon for LoRA; lighter inference on smaller boxes.
`llama-3.3-8b-instruct`	~16.5 GB	Same class as other 8B text rows: real GPU planning required for training.
`gemma-2-9b-it`	~18 GB	24 GB CUDA is the comfortable floor.
`mistral-small-3.1-24b-instruct`	~48 GB	Large-CUDA-first. Refused on MPS by default unless forced.

What the doctor is approximating

For LoRA/QLoRA, the planner estimates:

base weights at the chosen load precision
activation memory from sequence_len × micro_batch × layers
optimizer state for the trainable adapter params
LoRA parameter storage
a 20% safety margin on top

That estimator lives in src/dlm/hardware/memory.py and is intentionally conservative.

Rules of thumb

8B-class rows are where laptop experimentation starts turning into real hardware planning.
9B-class rows are usually fine on 24 GB CUDA, but not “casual” on smaller hosts.
24B-class rows are not broad consumer defaults. In DLM they are treated as explicit high-capacity picks.
MPS can be surprisingly good for text LoRA, but DLM now refuses oversized bases like mistral-small-3.1-24b-instruct by default because unified memory headroom disappears too quickly.

View source

  
        1
        # Memory estimates
      
        2
        
        3
        These are planning numbers, not a promise. DLM’s doctor still does the
      
        4
        real refusal/fit decision, but the table below is the quick mental map
      
        5
        for the Sprint 40 refresh rows that changed the most user expectations.
      
        6
        
        7
        ## Text-family checkpoints
      
        8
        
        9
        | Base | fp16 weights | Practical target |
      
        10
        |---|---:|---|
      
        11
        | `qwen3-8b` | ~16 GB | 24 GB CUDA or high-memory Apple Silicon for LoRA; lighter inference on smaller boxes. |
      
        12
        | `llama-3.3-8b-instruct` | ~16.5 GB | Same class as other 8B text rows: real GPU planning required for training. |
      
        13
        | `gemma-2-9b-it` | ~18 GB | 24 GB CUDA is the comfortable floor. |
      
        14
        | `mistral-small-3.1-24b-instruct` | ~48 GB | Large-CUDA-first. Refused on MPS by default unless forced. |
      
        15
        
        16
        ## What the doctor is approximating
      
        17
        
        18
        For LoRA/QLoRA, the planner estimates:
      
        19
        
        20
        - base weights at the chosen load precision
      
        21
        - activation memory from `sequence_len × micro_batch × layers`
      
        22
        - optimizer state for the trainable adapter params
      
        23
        - LoRA parameter storage
      
        24
        - a 20% safety margin on top
      
        25
        
        26
        That estimator lives in `src/dlm/hardware/memory.py` and is intentionally conservative.
      
        27
        
        28
        ## Rules of thumb
      
        29
        
        30
        - 8B-class rows are where laptop experimentation starts turning into real hardware planning.
      
        31
        - 9B-class rows are usually fine on 24 GB CUDA, but not “casual” on smaller hosts.
      
        32
        - 24B-class rows are not broad consumer defaults. In DLM they are treated as explicit high-capacity picks.
      
        33
        - MPS can be surprisingly good for text LoRA, but DLM now refuses oversized bases like `mistral-small-3.1-24b-instruct` by default because unified memory headroom disappears too quickly.
      
        34
        
        35
        ## Related
      
        36
        
        37
        - [Choosing a base](../cookbook/choosing-a-base.md)
      
        38
        - [Vision-language memory budget](vl-memory.md)

1	# Memory estimates
2
3	These are planning numbers, not a promise. DLM’s doctor still does the
4	real refusal/fit decision, but the table below is the quick mental map
5	for the Sprint 40 refresh rows that changed the most user expectations.
6
7	## Text-family checkpoints
8
9	\| Base \| fp16 weights \| Practical target \|
10	\|---\|---:\|---\|
11	\| `qwen3-8b` \| ~16 GB \| 24 GB CUDA or high-memory Apple Silicon for LoRA; lighter inference on smaller boxes. \|
12	\| `llama-3.3-8b-instruct` \| ~16.5 GB \| Same class as other 8B text rows: real GPU planning required for training. \|
13	\| `gemma-2-9b-it` \| ~18 GB \| 24 GB CUDA is the comfortable floor. \|
14	\| `mistral-small-3.1-24b-instruct` \| ~48 GB \| Large-CUDA-first. Refused on MPS by default unless forced. \|
15
16	## What the doctor is approximating
17
18	For LoRA/QLoRA, the planner estimates:
19
20	- base weights at the chosen load precision
21	- activation memory from `sequence_len × micro_batch × layers`
22	- optimizer state for the trainable adapter params
23	- LoRA parameter storage
24	- a 20% safety margin on top
25
26	That estimator lives in `src/dlm/hardware/memory.py` and is intentionally conservative.
27
28	## Rules of thumb
29
30	- 8B-class rows are where laptop experimentation starts turning into real hardware planning.
31	- 9B-class rows are usually fine on 24 GB CUDA, but not “casual” on smaller hosts.
32	- 24B-class rows are not broad consumer defaults. In DLM they are treated as explicit high-capacity picks.
33	- MPS can be surprisingly good for text LoRA, but DLM now refuses oversized bases like `mistral-small-3.1-24b-instruct` by default because unified memory headroom disappears too quickly.
34
35	## Related
36
37	- [Choosing a base](../cookbook/choosing-a-base.md)
38	- [Vision-language memory budget](vl-memory.md)