# Multi-adapter composition Train a single `.dlm` with more than one named adapter — keep knowledge and tone orthogonal, mix them at export time, or prompt against one at a time. Reach for this when you want separate "what the model knows" and "how it says things" training signals without spinning up two documents. ## When to use it - A handbook where the **knowledge** is stable but the **tone** evolves (you might rewrite just the style examples next month). - A single base model that needs to serve two personas — one customer- facing, one internal-engineering — where the instruction sets diverge. - Experiments where you want to A/B two training recipes against the same prose corpus without forking the document. If the answer is "one adapter is fine," skip this. Multi-adapter trades simplicity for composition flexibility — pay that cost when you need it. ## Document shape ```dlm --- dlm_id: 01KPM618S7NXSPAY10BHKVECYX base_model: qwen2.5-1.5b training: sequence_len: 2048 num_epochs: 2 adapters: knowledge: adapter: lora lora_r: 8 tone: adapter: lora lora_r: 4 target_modules: [q_proj, v_proj] learning_rate: 1e-4 export: default_quant: Q4_K_M --- # Domain prose This prose trains BOTH adapters by default — prose without a `#name` suffix fans out to every declared adapter. Most documents keep prose shared so both adapters pick up the same domain vocabulary. ::instruction#knowledge:: ### Q What is the capital of France? ### A Paris. ::instruction#tone:: ### Q How should I phrase things? ### A Crisply. One sentence. ``` ### Routing rules | Section | Fence | Trains | |---|---|---| | Prose (no suffix) | `# heading` / plain prose | all adapters | | Prose (pinned) | `::prose#knowledge::` | only `knowledge` | | Instruction (no suffix) | `::instruction::` | first-declared adapter | | Instruction (pinned) | `::instruction#tone::` | only `tone` | | Preference | same — `::preference#name::` | only `name` | The first-declared adapter acts as the implicit "default" for untagged non-prose sections. Declaration order is the order you write them in the YAML block. ## Training One `dlm train` invocation trains all declared adapters: ```sh $ uv run dlm train mydoc.dlm ``` Each adapter gets its own version history under `~/.dlm/store//adapter//versions/vNNNN/` with an independent `current.txt` pointer. The manifest grows one `TrainingRunSummary` per adapter per invocation — running `dlm train` again commits fresh `v0002` directories for each, never mixing lanes. Each adapter is trained as a fresh LoRA from the base on its routed rows; the base model loads once per adapter. Shared hyperparameters (`sequence_len`, `num_epochs`, `seed`, optimizer, scheduler, warmup) live at the `training` top level — per-adapter overrides are intentionally limited to the LoRA-specific knobs. ## Prompting a specific adapter ```sh $ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter knowledge $ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter tone ``` `--adapter` is required on multi-adapter documents and rejected on single-adapter ones. Unknown names get a clear error listing the declared adapters. ## Exporting a specific adapter ```sh $ uv run dlm export mydoc.dlm --adapter knowledge ``` One adapter → one Ollama model. The GGUF bundle + Modelfile embeds that adapter only; `manifest.exports[-1].adapter_name` records which one. ## Weighted composition at export To ship a single Ollama model that combines both adapters: ```sh $ uv run dlm export mydoc.dlm --adapter-mix knowledge:1.0,tone:0.5 ``` This uses PEFT's `add_weighted_adapter` with linear combination to produce a composite adapter, which is then converted to GGUF and registered with Ollama as one unit. Caveats (from the PEFT reference): - **LoRA-only.** `add_weighted_adapter` doesn't support prefix / prompt tuning, and it can't merge across different LoRA ranks robustly. Keep all adapters in the mix on the same `adapter: lora` shape. - **QLoRA requires dequantize.** Combining 4-bit quantized adapters into a composite is precision-unsafe; `dlm` refuses unless you pass `--dequantize` and `--merged` explicitly. - **Mix is frozen in the export.** Once the Ollama model is built, the weights are baked. To change the mix, re-run `dlm export` with a new `--adapter-mix`. Ollama doesn't support hot-swapping adapter weights at runtime — keep the separate per-adapter exports around if you need dynamic composition at inference time. ## Hardware notes `dlm doctor` refuses multi-adapter + QLoRA plans whose estimated VRAM exceeds the device's 85% headroom (roughly: `base_4bit + 1 GB/adapter + 25% activations`). The failure points at two fixes: drop to `adapter: lora` across the board, or reduce the adapter count. LoRA multi-adapter plans are always accepted — each adapter's extra state is negligible next to the base weights. ## When to fold back to a single adapter Multi-adapter adds cognitive load and per-adapter training cost. Fold back when: - The adapters converge on similar behavior despite separate routing — the extra structure isn't doing work. - One adapter's training set is so small (<10 rows) that it's adding noise instead of signal. - Your export pipeline is always `--adapter-mix name:1.0,other:1.0` — a single adapter trained on the union is equivalent and cheaper. ## See also - [Preference tuning (DPO vs ORPO)](preference-dpo-vs-orpo.md) — applies per-adapter on multi-adapter docs via `::preference#name::` routing. - [Domain knowledge base](domain-kb.md) — the single-adapter story.