# Memory estimates

These are planning numbers, not a promise. DLM’s doctor still does the
real refusal/fit decision, but the table below is the quick mental map
for the Sprint 40 refresh rows that changed the most user expectations.

## Text-family checkpoints

| Base | fp16 weights | Practical target |
|---|---:|---|
| `qwen3-8b` | ~16 GB | 24 GB CUDA or high-memory Apple Silicon for LoRA; lighter inference on smaller boxes. |
| `llama-3.3-8b-instruct` | ~16.5 GB | Same class as other 8B text rows: real GPU planning required for training. |
| `gemma-2-9b-it` | ~18 GB | 24 GB CUDA is the comfortable floor. |
| `mistral-small-3.1-24b-instruct` | ~48 GB | Large-CUDA-first. Refused on MPS by default unless forced. |

## What the doctor is approximating

For LoRA/QLoRA, the planner estimates:

- base weights at the chosen load precision
- activation memory from `sequence_len × micro_batch × layers`
- optimizer state for the trainable adapter params
- LoRA parameter storage
- a 20% safety margin on top

That estimator lives in `src/dlm/hardware/memory.py` and is intentionally conservative.

## Rules of thumb

- 8B-class rows are where laptop experimentation starts turning into real hardware planning.
- 9B-class rows are usually fine on 24 GB CUDA, but not “casual” on smaller hosts.
- 24B-class rows are not broad consumer defaults. In DLM they are treated as explicit high-capacity picks.
- MPS can be surprisingly good for text LoRA, but DLM now refuses oversized bases like `mistral-small-3.1-24b-instruct` by default because unified memory headroom disappears too quickly.

## Related

- [Choosing a base](../cookbook/choosing-a-base.md)
- [Vision-language memory budget](vl-memory.md)