documentlanguagemodel Public
Memory estimates
These are planning numbers, not a promise. DLM’s doctor still does the real refusal/fit decision, but the table below is the quick mental map for the Sprint 40 refresh rows that changed the most user expectations.
Text-family checkpoints
| Base | fp16 weights | Practical target |
|---|---|---|
qwen3-8b |
~16 GB | 24 GB CUDA or high-memory Apple Silicon for LoRA; lighter inference on smaller boxes. |
llama-3.3-8b-instruct |
~16.5 GB | Same class as other 8B text rows: real GPU planning required for training. |
gemma-2-9b-it |
~18 GB | 24 GB CUDA is the comfortable floor. |
mistral-small-3.1-24b-instruct |
~48 GB | Large-CUDA-first. Refused on MPS by default unless forced. |
What the doctor is approximating
For LoRA/QLoRA, the planner estimates:
- base weights at the chosen load precision
- activation memory from
sequence_len × micro_batch × layers - optimizer state for the trainable adapter params
- LoRA parameter storage
- a 20% safety margin on top
That estimator lives in src/dlm/hardware/memory.py and is intentionally conservative.
Rules of thumb
- 8B-class rows are where laptop experimentation starts turning into real hardware planning.
- 9B-class rows are usually fine on 24 GB CUDA, but not “casual” on smaller hosts.
- 24B-class rows are not broad consumer defaults. In DLM they are treated as explicit high-capacity picks.
- MPS can be surprisingly good for text LoRA, but DLM now refuses oversized bases like
mistral-small-3.1-24b-instructby default because unified memory headroom disappears too quickly.
Related
View source
| 1 | # Memory estimates |
| 2 | |
| 3 | These are planning numbers, not a promise. DLM’s doctor still does the |
| 4 | real refusal/fit decision, but the table below is the quick mental map |
| 5 | for the Sprint 40 refresh rows that changed the most user expectations. |
| 6 | |
| 7 | ## Text-family checkpoints |
| 8 | |
| 9 | | Base | fp16 weights | Practical target | |
| 10 | |---|---:|---| |
| 11 | | `qwen3-8b` | ~16 GB | 24 GB CUDA or high-memory Apple Silicon for LoRA; lighter inference on smaller boxes. | |
| 12 | | `llama-3.3-8b-instruct` | ~16.5 GB | Same class as other 8B text rows: real GPU planning required for training. | |
| 13 | | `gemma-2-9b-it` | ~18 GB | 24 GB CUDA is the comfortable floor. | |
| 14 | | `mistral-small-3.1-24b-instruct` | ~48 GB | Large-CUDA-first. Refused on MPS by default unless forced. | |
| 15 | |
| 16 | ## What the doctor is approximating |
| 17 | |
| 18 | For LoRA/QLoRA, the planner estimates: |
| 19 | |
| 20 | - base weights at the chosen load precision |
| 21 | - activation memory from `sequence_len × micro_batch × layers` |
| 22 | - optimizer state for the trainable adapter params |
| 23 | - LoRA parameter storage |
| 24 | - a 20% safety margin on top |
| 25 | |
| 26 | That estimator lives in `src/dlm/hardware/memory.py` and is intentionally conservative. |
| 27 | |
| 28 | ## Rules of thumb |
| 29 | |
| 30 | - 8B-class rows are where laptop experimentation starts turning into real hardware planning. |
| 31 | - 9B-class rows are usually fine on 24 GB CUDA, but not “casual” on smaller hosts. |
| 32 | - 24B-class rows are not broad consumer defaults. In DLM they are treated as explicit high-capacity picks. |
| 33 | - MPS can be surprisingly good for text LoRA, but DLM now refuses oversized bases like `mistral-small-3.1-24b-instruct` by default because unified memory headroom disappears too quickly. |
| 34 | |
| 35 | ## Related |
| 36 | |
| 37 | - [Choosing a base](../cookbook/choosing-a-base.md) |
| 38 | - [Vision-language memory budget](vl-memory.md) |