markdown · 5679 bytes Raw Blame History

Multi-adapter composition

Train a single .dlm with more than one named adapter — keep knowledge and tone orthogonal, mix them at export time, or prompt against one at a time. Reach for this when you want separate "what the model knows" and "how it says things" training signals without spinning up two documents.

When to use it

  • A handbook where the knowledge is stable but the tone evolves (you might rewrite just the style examples next month).
  • A single base model that needs to serve two personas — one customer- facing, one internal-engineering — where the instruction sets diverge.
  • Experiments where you want to A/B two training recipes against the same prose corpus without forking the document.

If the answer is "one adapter is fine," skip this. Multi-adapter trades simplicity for composition flexibility — pay that cost when you need it.

Document shape

---
dlm_id: 01KPM618S7NXSPAY10BHKVECYX
base_model: qwen2.5-1.5b
training:
  sequence_len: 2048
  num_epochs: 2
  adapters:
    knowledge:
      adapter: lora
      lora_r: 8
    tone:
      adapter: lora
      lora_r: 4
      target_modules: [q_proj, v_proj]
      learning_rate: 1e-4
export:
  default_quant: Q4_K_M
---

# Domain prose

This prose trains BOTH adapters by default — prose without a `#name`
suffix fans out to every declared adapter. Most documents keep prose
shared so both adapters pick up the same domain vocabulary.

::instruction#knowledge::
### Q
What is the capital of France?
### A
Paris.

::instruction#tone::
### Q
How should I phrase things?
### A
Crisply. One sentence.

Routing rules

Section Fence Trains
Prose (no suffix) # heading / plain prose all adapters
Prose (pinned) ::prose#knowledge:: only knowledge
Instruction (no suffix) ::instruction:: first-declared adapter
Instruction (pinned) ::instruction#tone:: only tone
Preference same — ::preference#name:: only name

The first-declared adapter acts as the implicit "default" for untagged non-prose sections. Declaration order is the order you write them in the YAML block.

Training

One dlm train invocation trains all declared adapters:

$ uv run dlm train mydoc.dlm

Each adapter gets its own version history under ~/.dlm/store/<dlm_id>/adapter/<name>/versions/vNNNN/ with an independent current.txt pointer. The manifest grows one TrainingRunSummary per adapter per invocation — running dlm train again commits fresh v0002 directories for each, never mixing lanes.

Each adapter is trained as a fresh LoRA from the base on its routed rows; the base model loads once per adapter. Shared hyperparameters (sequence_len, num_epochs, seed, optimizer, scheduler, warmup) live at the training top level — per-adapter overrides are intentionally limited to the LoRA-specific knobs.

Prompting a specific adapter

$ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter knowledge
$ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter tone

--adapter is required on multi-adapter documents and rejected on single-adapter ones. Unknown names get a clear error listing the declared adapters.

Exporting a specific adapter

$ uv run dlm export mydoc.dlm --adapter knowledge

One adapter → one Ollama model. The GGUF bundle + Modelfile embeds that adapter only; manifest.exports[-1].adapter_name records which one.

Weighted composition at export

To ship a single Ollama model that combines both adapters:

$ uv run dlm export mydoc.dlm --adapter-mix knowledge:1.0,tone:0.5

This uses PEFT's add_weighted_adapter with linear combination to produce a composite adapter, which is then converted to GGUF and registered with Ollama as one unit.

Caveats (from the PEFT reference):

  • LoRA-only. add_weighted_adapter doesn't support prefix / prompt tuning, and it can't merge across different LoRA ranks robustly. Keep all adapters in the mix on the same adapter: lora shape.
  • QLoRA requires dequantize. Combining 4-bit quantized adapters into a composite is precision-unsafe; dlm refuses unless you pass --dequantize and --merged explicitly.
  • Mix is frozen in the export. Once the Ollama model is built, the weights are baked. To change the mix, re-run dlm export with a new --adapter-mix. Ollama doesn't support hot-swapping adapter weights at runtime — keep the separate per-adapter exports around if you need dynamic composition at inference time.

Hardware notes

dlm doctor refuses multi-adapter + QLoRA plans whose estimated VRAM exceeds the device's 85% headroom (roughly: `base_4bit + 1 GB/adapter

  • 25% activations). The failure points at two fixes: drop to adapter: lora` across the board, or reduce the adapter count. LoRA multi-adapter plans are always accepted — each adapter's extra state is negligible next to the base weights.

When to fold back to a single adapter

Multi-adapter adds cognitive load and per-adapter training cost. Fold back when:

  • The adapters converge on similar behavior despite separate routing — the extra structure isn't doing work.
  • One adapter's training set is so small (<10 rows) that it's adding noise instead of signal.
  • Your export pipeline is always --adapter-mix name:1.0,other:1.0 — a single adapter trained on the union is equivalent and cheaper.

See also

View source
1 # Multi-adapter composition
2
3 Train a single `.dlm` with more than one named adapter — keep knowledge
4 and tone orthogonal, mix them at export time, or prompt against one at
5 a time. Reach for this when you want separate "what the model knows"
6 and "how it says things" training signals without spinning up two
7 documents.
8
9 ## When to use it
10
11 - A handbook where the **knowledge** is stable but the **tone** evolves
12 (you might rewrite just the style examples next month).
13 - A single base model that needs to serve two personas — one customer-
14 facing, one internal-engineering — where the instruction sets
15 diverge.
16 - Experiments where you want to A/B two training recipes against the
17 same prose corpus without forking the document.
18
19 If the answer is "one adapter is fine," skip this. Multi-adapter trades
20 simplicity for composition flexibility — pay that cost when you need
21 it.
22
23 ## Document shape
24
25 ```dlm
26 ---
27 dlm_id: 01KPM618S7NXSPAY10BHKVECYX
28 base_model: qwen2.5-1.5b
29 training:
30 sequence_len: 2048
31 num_epochs: 2
32 adapters:
33 knowledge:
34 adapter: lora
35 lora_r: 8
36 tone:
37 adapter: lora
38 lora_r: 4
39 target_modules: [q_proj, v_proj]
40 learning_rate: 1e-4
41 export:
42 default_quant: Q4_K_M
43 ---
44
45 # Domain prose
46
47 This prose trains BOTH adapters by default — prose without a `#name`
48 suffix fans out to every declared adapter. Most documents keep prose
49 shared so both adapters pick up the same domain vocabulary.
50
51 ::instruction#knowledge::
52 ### Q
53 What is the capital of France?
54 ### A
55 Paris.
56
57 ::instruction#tone::
58 ### Q
59 How should I phrase things?
60 ### A
61 Crisply. One sentence.
62 ```
63
64 ### Routing rules
65
66 | Section | Fence | Trains |
67 |---|---|---|
68 | Prose (no suffix) | `# heading` / plain prose | all adapters |
69 | Prose (pinned) | `::prose#knowledge::` | only `knowledge` |
70 | Instruction (no suffix) | `::instruction::` | first-declared adapter |
71 | Instruction (pinned) | `::instruction#tone::` | only `tone` |
72 | Preference | same — `::preference#name::` | only `name` |
73
74 The first-declared adapter acts as the implicit "default" for untagged
75 non-prose sections. Declaration order is the order you write them in
76 the YAML block.
77
78 ## Training
79
80 One `dlm train` invocation trains all declared adapters:
81
82 ```sh
83 $ uv run dlm train mydoc.dlm
84 ```
85
86 Each adapter gets its own version history under
87 `~/.dlm/store/<dlm_id>/adapter/<name>/versions/vNNNN/` with an
88 independent `current.txt` pointer. The manifest grows one
89 `TrainingRunSummary` per adapter per invocation — running `dlm train`
90 again commits fresh `v0002` directories for each, never mixing lanes.
91
92 Each adapter is trained as a fresh LoRA from the base on its routed
93 rows; the base model loads once per adapter. Shared hyperparameters
94 (`sequence_len`, `num_epochs`, `seed`, optimizer, scheduler, warmup)
95 live at the `training` top level — per-adapter overrides are
96 intentionally limited to the LoRA-specific knobs.
97
98 ## Prompting a specific adapter
99
100 ```sh
101 $ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter knowledge
102 $ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter tone
103 ```
104
105 `--adapter` is required on multi-adapter documents and rejected on
106 single-adapter ones. Unknown names get a clear error listing the
107 declared adapters.
108
109 ## Exporting a specific adapter
110
111 ```sh
112 $ uv run dlm export mydoc.dlm --adapter knowledge
113 ```
114
115 One adapter → one Ollama model. The GGUF bundle + Modelfile embeds
116 that adapter only; `manifest.exports[-1].adapter_name` records which
117 one.
118
119 ## Weighted composition at export
120
121 To ship a single Ollama model that combines both adapters:
122
123 ```sh
124 $ uv run dlm export mydoc.dlm --adapter-mix knowledge:1.0,tone:0.5
125 ```
126
127 This uses PEFT's `add_weighted_adapter` with linear combination to
128 produce a composite adapter, which is then converted to GGUF and
129 registered with Ollama as one unit.
130
131 Caveats (from the PEFT reference):
132
133 - **LoRA-only.** `add_weighted_adapter` doesn't support prefix / prompt
134 tuning, and it can't merge across different LoRA ranks robustly. Keep
135 all adapters in the mix on the same `adapter: lora` shape.
136 - **QLoRA requires dequantize.** Combining 4-bit quantized adapters
137 into a composite is precision-unsafe; `dlm` refuses unless you pass
138 `--dequantize` and `--merged` explicitly.
139 - **Mix is frozen in the export.** Once the Ollama model is built, the
140 weights are baked. To change the mix, re-run `dlm export` with a new
141 `--adapter-mix`. Ollama doesn't support hot-swapping adapter weights
142 at runtime — keep the separate per-adapter exports around if you
143 need dynamic composition at inference time.
144
145 ## Hardware notes
146
147 `dlm doctor` refuses multi-adapter + QLoRA plans whose estimated VRAM
148 exceeds the device's 85% headroom (roughly: `base_4bit + 1 GB/adapter
149 + 25% activations`). The failure points at two fixes: drop to
150 `adapter: lora` across the board, or reduce the adapter count. LoRA
151 multi-adapter plans are always accepted — each adapter's extra state
152 is negligible next to the base weights.
153
154 ## When to fold back to a single adapter
155
156 Multi-adapter adds cognitive load and per-adapter training cost. Fold
157 back when:
158
159 - The adapters converge on similar behavior despite separate routing —
160 the extra structure isn't doing work.
161 - One adapter's training set is so small (<10 rows) that it's adding
162 noise instead of signal.
163 - Your export pipeline is always `--adapter-mix name:1.0,other:1.0`
164 a single adapter trained on the union is equivalent and cheaper.
165
166 ## See also
167
168 - [Preference tuning (DPO vs ORPO)](preference-dpo-vs-orpo.md) —
169 applies per-adapter on multi-adapter docs via `::preference#name::`
170 routing.
171 - [Domain knowledge base](domain-kb.md) — the single-adapter story.