markdown · 1714 bytes Raw Blame History

Memory estimates

These are planning numbers, not a promise. DLM’s doctor still does the real refusal/fit decision, but the table below is the quick mental map for the Sprint 40 refresh rows that changed the most user expectations.

Text-family checkpoints

Base fp16 weights Practical target
qwen3-8b ~16 GB 24 GB CUDA or high-memory Apple Silicon for LoRA; lighter inference on smaller boxes.
llama-3.3-8b-instruct ~16.5 GB Same class as other 8B text rows: real GPU planning required for training.
gemma-2-9b-it ~18 GB 24 GB CUDA is the comfortable floor.
mistral-small-3.1-24b-instruct ~48 GB Large-CUDA-first. Refused on MPS by default unless forced.

What the doctor is approximating

For LoRA/QLoRA, the planner estimates:

  • base weights at the chosen load precision
  • activation memory from sequence_len × micro_batch × layers
  • optimizer state for the trainable adapter params
  • LoRA parameter storage
  • a 20% safety margin on top

That estimator lives in src/dlm/hardware/memory.py and is intentionally conservative.

Rules of thumb

  • 8B-class rows are where laptop experimentation starts turning into real hardware planning.
  • 9B-class rows are usually fine on 24 GB CUDA, but not “casual” on smaller hosts.
  • 24B-class rows are not broad consumer defaults. In DLM they are treated as explicit high-capacity picks.
  • MPS can be surprisingly good for text LoRA, but DLM now refuses oversized bases like mistral-small-3.1-24b-instruct by default because unified memory headroom disappears too quickly.
View source
1 # Memory estimates
2
3 These are planning numbers, not a promise. DLM’s doctor still does the
4 real refusal/fit decision, but the table below is the quick mental map
5 for the Sprint 40 refresh rows that changed the most user expectations.
6
7 ## Text-family checkpoints
8
9 | Base | fp16 weights | Practical target |
10 |---|---:|---|
11 | `qwen3-8b` | ~16 GB | 24 GB CUDA or high-memory Apple Silicon for LoRA; lighter inference on smaller boxes. |
12 | `llama-3.3-8b-instruct` | ~16.5 GB | Same class as other 8B text rows: real GPU planning required for training. |
13 | `gemma-2-9b-it` | ~18 GB | 24 GB CUDA is the comfortable floor. |
14 | `mistral-small-3.1-24b-instruct` | ~48 GB | Large-CUDA-first. Refused on MPS by default unless forced. |
15
16 ## What the doctor is approximating
17
18 For LoRA/QLoRA, the planner estimates:
19
20 - base weights at the chosen load precision
21 - activation memory from `sequence_len × micro_batch × layers`
22 - optimizer state for the trainable adapter params
23 - LoRA parameter storage
24 - a 20% safety margin on top
25
26 That estimator lives in `src/dlm/hardware/memory.py` and is intentionally conservative.
27
28 ## Rules of thumb
29
30 - 8B-class rows are where laptop experimentation starts turning into real hardware planning.
31 - 9B-class rows are usually fine on 24 GB CUDA, but not “casual” on smaller hosts.
32 - 24B-class rows are not broad consumer defaults. In DLM they are treated as explicit high-capacity picks.
33 - MPS can be surprisingly good for text LoRA, but DLM now refuses oversized bases like `mistral-small-3.1-24b-instruct` by default because unified memory headroom disappears too quickly.
34
35 ## Related
36
37 - [Choosing a base](../cookbook/choosing-a-base.md)
38 - [Vision-language memory budget](vl-memory.md)