documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 5679 bytes Raw Blame History

Multi-adapter composition

Train a single .dlm with more than one named adapter — keep knowledge and tone orthogonal, mix them at export time, or prompt against one at a time. Reach for this when you want separate "what the model knows" and "how it says things" training signals without spinning up two documents.

When to use it

A handbook where the knowledge is stable but the tone evolves (you might rewrite just the style examples next month).
A single base model that needs to serve two personas — one customer- facing, one internal-engineering — where the instruction sets diverge.
Experiments where you want to A/B two training recipes against the same prose corpus without forking the document.

If the answer is "one adapter is fine," skip this. Multi-adapter trades simplicity for composition flexibility — pay that cost when you need it.

Document shape

---
dlm_id: 01KPM618S7NXSPAY10BHKVECYX
base_model: qwen2.5-1.5b
training:
  sequence_len: 2048
  num_epochs: 2
  adapters:
    knowledge:
      adapter: lora
      lora_r: 8
    tone:
      adapter: lora
      lora_r: 4
      target_modules: [q_proj, v_proj]
      learning_rate: 1e-4
export:
  default_quant: Q4_K_M
---

# Domain prose

This prose trains BOTH adapters by default — prose without a `#name`
suffix fans out to every declared adapter. Most documents keep prose
shared so both adapters pick up the same domain vocabulary.

::instruction#knowledge::
### Q
What is the capital of France?
### A
Paris.

::instruction#tone::
### Q
How should I phrase things?
### A
Crisply. One sentence.

Routing rules

Section	Fence	Trains
Prose (no suffix)	`# heading` / plain prose	all adapters
Prose (pinned)	`::prose#knowledge::`	only `knowledge`
Instruction (no suffix)	`::instruction::`	first-declared adapter
Instruction (pinned)	`::instruction#tone::`	only `tone`
Preference	same — `::preference#name::`	only `name`

The first-declared adapter acts as the implicit "default" for untagged non-prose sections. Declaration order is the order you write them in the YAML block.

Training

One dlm train invocation trains all declared adapters:

$ uv run dlm train mydoc.dlm

Each adapter gets its own version history under ~/.dlm/store/<dlm_id>/adapter/<name>/versions/vNNNN/ with an independent current.txt pointer. The manifest grows one TrainingRunSummary per adapter per invocation — running dlm train again commits fresh v0002 directories for each, never mixing lanes.

Each adapter is trained as a fresh LoRA from the base on its routed rows; the base model loads once per adapter. Shared hyperparameters (sequence_len, num_epochs, seed, optimizer, scheduler, warmup) live at the training top level — per-adapter overrides are intentionally limited to the LoRA-specific knobs.

Prompting a specific adapter

$ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter knowledge
$ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter tone

--adapter is required on multi-adapter documents and rejected on single-adapter ones. Unknown names get a clear error listing the declared adapters.

Exporting a specific adapter

$ uv run dlm export mydoc.dlm --adapter knowledge

One adapter → one Ollama model. The GGUF bundle + Modelfile embeds that adapter only; manifest.exports[-1].adapter_name records which one.

Weighted composition at export

To ship a single Ollama model that combines both adapters:

$ uv run dlm export mydoc.dlm --adapter-mix knowledge:1.0,tone:0.5

This uses PEFT's add_weighted_adapter with linear combination to produce a composite adapter, which is then converted to GGUF and registered with Ollama as one unit.

Caveats (from the PEFT reference):

LoRA-only. add_weighted_adapter doesn't support prefix / prompt tuning, and it can't merge across different LoRA ranks robustly. Keep all adapters in the mix on the same adapter: lora shape.
QLoRA requires dequantize. Combining 4-bit quantized adapters into a composite is precision-unsafe; dlm refuses unless you pass --dequantize and --merged explicitly.
Mix is frozen in the export. Once the Ollama model is built, the weights are baked. To change the mix, re-run dlm export with a new --adapter-mix. Ollama doesn't support hot-swapping adapter weights at runtime — keep the separate per-adapter exports around if you need dynamic composition at inference time.

Hardware notes

dlm doctor refuses multi-adapter + QLoRA plans whose estimated VRAM exceeds the device's 85% headroom (roughly: `base_4bit + 1 GB/adapter

25% activations). The failure points at two fixes: drop to adapter: lora` across the board, or reduce the adapter count. LoRA multi-adapter plans are always accepted — each adapter's extra state is negligible next to the base weights.

When to fold back to a single adapter

Multi-adapter adds cognitive load and per-adapter training cost. Fold back when:

The adapters converge on similar behavior despite separate routing — the extra structure isn't doing work.
One adapter's training set is so small (<10 rows) that it's adding noise instead of signal.
Your export pipeline is always --adapter-mix name:1.0,other:1.0 — a single adapter trained on the union is equivalent and cheaper.

  
        1
        # Multi-adapter composition
      
        2
        
        3
        Train a single `.dlm` with more than one named adapter — keep knowledge
      
        4
        and tone orthogonal, mix them at export time, or prompt against one at
      
        5
        a time. Reach for this when you want separate "what the model knows"
      
        6
        and "how it says things" training signals without spinning up two
      
        7
        documents.
      
        8
        
        9
        ## When to use it
      
        10
        
        11
        - A handbook where the **knowledge** is stable but the **tone** evolves
      
        12
          (you might rewrite just the style examples next month).
      
        13
        - A single base model that needs to serve two personas — one customer-
      
        14
          facing, one internal-engineering — where the instruction sets
      
        15
          diverge.
      
        16
        - Experiments where you want to A/B two training recipes against the
      
        17
          same prose corpus without forking the document.
      
        18
        
        19
        If the answer is "one adapter is fine," skip this. Multi-adapter trades
      
        20
        simplicity for composition flexibility — pay that cost when you need
      
        21
        it.
      
        22
        
        23
        ## Document shape
      
        24
        
        25
        ```dlm
      
        26
        ---
      
        27
        dlm_id: 01KPM618S7NXSPAY10BHKVECYX
      
        28
        base_model: qwen2.5-1.5b
      
        29
        training:
      
        30
          sequence_len: 2048
      
        31
          num_epochs: 2
      
        32
          adapters:
      
        33
            knowledge:
      
        34
              adapter: lora
      
        35
              lora_r: 8
      
        36
            tone:
      
        37
              adapter: lora
      
        38
              lora_r: 4
      
        39
              target_modules: [q_proj, v_proj]
      
        40
              learning_rate: 1e-4
      
        41
        export:
      
        42
          default_quant: Q4_K_M
      
        43
        ---
      
        44
        
        45
        # Domain prose
      
        46
        
        47
        This prose trains BOTH adapters by default — prose without a `#name`
      
        48
        suffix fans out to every declared adapter. Most documents keep prose
      
        49
        shared so both adapters pick up the same domain vocabulary.
      
        50
        
        51
        ::instruction#knowledge::
      
        52
        ### Q
      
        53
        What is the capital of France?
      
        54
        ### A
      
        55
        Paris.
      
        56
        
        57
        ::instruction#tone::
      
        58
        ### Q
      
        59
        How should I phrase things?
      
        60
        ### A
      
        61
        Crisply. One sentence.
      
        62
        ```
      
        63
        
        64
        ### Routing rules
      
        65
        
        66
        | Section | Fence | Trains |
      
        67
        |---|---|---|
      
        68
        | Prose (no suffix) | `# heading` / plain prose | all adapters |
      
        69
        | Prose (pinned) | `::prose#knowledge::` | only `knowledge` |
      
        70
        | Instruction (no suffix) | `::instruction::` | first-declared adapter |
      
        71
        | Instruction (pinned) | `::instruction#tone::` | only `tone` |
      
        72
        | Preference | same — `::preference#name::` | only `name` |
      
        73
        
        74
        The first-declared adapter acts as the implicit "default" for untagged
      
        75
        non-prose sections. Declaration order is the order you write them in
      
        76
        the YAML block.
      
        77
        
        78
        ## Training
      
        79
        
        80
        One `dlm train` invocation trains all declared adapters:
      
        81
        
        82
        ```sh
      
        83
        $ uv run dlm train mydoc.dlm
      
        84
        ```
      
        85
        
        86
        Each adapter gets its own version history under
      
        87
        `~/.dlm/store/<dlm_id>/adapter/<name>/versions/vNNNN/` with an
      
        88
        independent `current.txt` pointer. The manifest grows one
      
        89
        `TrainingRunSummary` per adapter per invocation — running `dlm train`
      
        90
        again commits fresh `v0002` directories for each, never mixing lanes.
      
        91
        
        92
        Each adapter is trained as a fresh LoRA from the base on its routed
      
        93
        rows; the base model loads once per adapter. Shared hyperparameters
      
        94
        (`sequence_len`, `num_epochs`, `seed`, optimizer, scheduler, warmup)
      
        95
        live at the `training` top level — per-adapter overrides are
      
        96
        intentionally limited to the LoRA-specific knobs.
      
        97
        
        98
        ## Prompting a specific adapter
      
        99
        
        100
        ```sh
      
        101
        $ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter knowledge
      
        102
        $ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter tone
      
        103
        ```
      
        104
        
        105
        `--adapter` is required on multi-adapter documents and rejected on
      
        106
        single-adapter ones. Unknown names get a clear error listing the
      
        107
        declared adapters.
      
        108
        
        109
        ## Exporting a specific adapter
      
        110
        
        111
        ```sh
      
        112
        $ uv run dlm export mydoc.dlm --adapter knowledge
      
        113
        ```
      
        114
        
        115
        One adapter → one Ollama model. The GGUF bundle + Modelfile embeds
      
        116
        that adapter only; `manifest.exports[-1].adapter_name` records which
      
        117
        one.
      
        118
        
        119
        ## Weighted composition at export
      
        120
        
        121
        To ship a single Ollama model that combines both adapters:
      
        122
        
        123
        ```sh
      
        124
        $ uv run dlm export mydoc.dlm --adapter-mix knowledge:1.0,tone:0.5
      
        125
        ```
      
        126
        
        127
        This uses PEFT's `add_weighted_adapter` with linear combination to
      
        128
        produce a composite adapter, which is then converted to GGUF and
      
        129
        registered with Ollama as one unit.
      
        130
        
        131
        Caveats (from the PEFT reference):
      
        132
        
        133
        - **LoRA-only.** `add_weighted_adapter` doesn't support prefix / prompt
      
        134
          tuning, and it can't merge across different LoRA ranks robustly. Keep
      
        135
          all adapters in the mix on the same `adapter: lora` shape.
      
        136
        - **QLoRA requires dequantize.** Combining 4-bit quantized adapters
      
        137
          into a composite is precision-unsafe; `dlm` refuses unless you pass
      
        138
          `--dequantize` and `--merged` explicitly.
      
        139
        - **Mix is frozen in the export.** Once the Ollama model is built, the
      
        140
          weights are baked. To change the mix, re-run `dlm export` with a new
      
        141
          `--adapter-mix`. Ollama doesn't support hot-swapping adapter weights
      
        142
          at runtime — keep the separate per-adapter exports around if you
      
        143
          need dynamic composition at inference time.
      
        144
        
        145
        ## Hardware notes
      
        146
        
        147
        `dlm doctor` refuses multi-adapter + QLoRA plans whose estimated VRAM
      
        148
        exceeds the device's 85% headroom (roughly: `base_4bit + 1 GB/adapter
      
        149
        + 25% activations`). The failure points at two fixes: drop to
      
        150
        `adapter: lora` across the board, or reduce the adapter count. LoRA
      
        151
        multi-adapter plans are always accepted — each adapter's extra state
      
        152
        is negligible next to the base weights.
      
        153
        
        154
        ## When to fold back to a single adapter
      
        155
        
        156
        Multi-adapter adds cognitive load and per-adapter training cost. Fold
      
        157
        back when:
      
        158
        
        159
        - The adapters converge on similar behavior despite separate routing —
      
        160
          the extra structure isn't doing work.
      
        161
        - One adapter's training set is so small (<10 rows) that it's adding
      
        162
          noise instead of signal.
      
        163
        - Your export pipeline is always `--adapter-mix name:1.0,other:1.0` —
      
        164
          a single adapter trained on the union is equivalent and cheaper.
      
        165
        
        166
        ## See also
      
        167
        
        168
        - [Preference tuning (DPO vs ORPO)](preference-dpo-vs-orpo.md) —
      
        169
          applies per-adapter on multi-adapter docs via `::preference#name::`
      
        170
          routing.
      
        171
        - [Domain knowledge base](domain-kb.md) — the single-adapter story.

1	# Multi-adapter composition
2
3	Train a single `.dlm` with more than one named adapter — keep knowledge
4	and tone orthogonal, mix them at export time, or prompt against one at
5	a time. Reach for this when you want separate "what the model knows"
6	and "how it says things" training signals without spinning up two
7	documents.
8
9	## When to use it
10
11	- A handbook where the knowledge is stable but the tone evolves
12	(you might rewrite just the style examples next month).
13	- A single base model that needs to serve two personas — one customer-
14	facing, one internal-engineering — where the instruction sets
15	diverge.
16	- Experiments where you want to A/B two training recipes against the
17	same prose corpus without forking the document.
18
19	If the answer is "one adapter is fine," skip this. Multi-adapter trades
20	simplicity for composition flexibility — pay that cost when you need
21	it.
22
23	## Document shape
24
25	```dlm
26	---
27	dlm_id: 01KPM618S7NXSPAY10BHKVECYX
28	base_model: qwen2.5-1.5b
29	training:
30	sequence_len: 2048
31	num_epochs: 2
32	adapters:
33	knowledge:
34	adapter: lora
35	lora_r: 8
36	tone:
37	adapter: lora
38	lora_r: 4
39	target_modules: [q_proj, v_proj]
40	learning_rate: 1e-4
41	export:
42	default_quant: Q4_K_M
43	---
44
45	# Domain prose
46
47	This prose trains BOTH adapters by default — prose without a `#name`
48	suffix fans out to every declared adapter. Most documents keep prose
49	shared so both adapters pick up the same domain vocabulary.
50
51	::instruction#knowledge::
52	### Q
53	What is the capital of France?
54	### A
55	Paris.
56
57	::instruction#tone::
58	### Q
59	How should I phrase things?
60	### A
61	Crisply. One sentence.
62	```
63
64	### Routing rules
65
66	\| Section \| Fence \| Trains \|
67	\|---\|---\|---\|
68	\| Prose (no suffix) \| `# heading` / plain prose \| all adapters \|
69	\| Prose (pinned) \| `::prose#knowledge::` \| only `knowledge` \|
70	\| Instruction (no suffix) \| `::instruction::` \| first-declared adapter \|
71	\| Instruction (pinned) \| `::instruction#tone::` \| only `tone` \|
72	\| Preference \| same — `::preference#name::` \| only `name` \|
73
74	The first-declared adapter acts as the implicit "default" for untagged
75	non-prose sections. Declaration order is the order you write them in
76	the YAML block.
77
78	## Training
79
80	One `dlm train` invocation trains all declared adapters:
81
82	```sh
83	$ uv run dlm train mydoc.dlm
84	```
85
86	Each adapter gets its own version history under
87	`~/.dlm/store/<dlm_id>/adapter/<name>/versions/vNNNN/` with an
88	independent `current.txt` pointer. The manifest grows one
89	`TrainingRunSummary` per adapter per invocation — running `dlm train`
90	again commits fresh `v0002` directories for each, never mixing lanes.
91
92	Each adapter is trained as a fresh LoRA from the base on its routed
93	rows; the base model loads once per adapter. Shared hyperparameters
94	(`sequence_len`, `num_epochs`, `seed`, optimizer, scheduler, warmup)
95	live at the `training` top level — per-adapter overrides are
96	intentionally limited to the LoRA-specific knobs.
97
98	## Prompting a specific adapter
99
100	```sh
101	$ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter knowledge
102	$ uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter tone
103	```
104
105	`--adapter` is required on multi-adapter documents and rejected on
106	single-adapter ones. Unknown names get a clear error listing the
107	declared adapters.
108
109	## Exporting a specific adapter
110
111	```sh
112	$ uv run dlm export mydoc.dlm --adapter knowledge
113	```
114
115	One adapter → one Ollama model. The GGUF bundle + Modelfile embeds
116	that adapter only; `manifest.exports[-1].adapter_name` records which
117	one.
118
119	## Weighted composition at export
120
121	To ship a single Ollama model that combines both adapters:
122
123	```sh
124	$ uv run dlm export mydoc.dlm --adapter-mix knowledge:1.0,tone:0.5
125	```
126
127	This uses PEFT's `add_weighted_adapter` with linear combination to
128	produce a composite adapter, which is then converted to GGUF and
129	registered with Ollama as one unit.
130
131	Caveats (from the PEFT reference):
132
133	- LoRA-only. `add_weighted_adapter` doesn't support prefix / prompt
134	tuning, and it can't merge across different LoRA ranks robustly. Keep
135	all adapters in the mix on the same `adapter: lora` shape.
136	- QLoRA requires dequantize. Combining 4-bit quantized adapters
137	into a composite is precision-unsafe; `dlm` refuses unless you pass
138	`--dequantize` and `--merged` explicitly.
139	- Mix is frozen in the export. Once the Ollama model is built, the
140	weights are baked. To change the mix, re-run `dlm export` with a new
141	`--adapter-mix`. Ollama doesn't support hot-swapping adapter weights
142	at runtime — keep the separate per-adapter exports around if you
143	need dynamic composition at inference time.
144
145	## Hardware notes
146
147	`dlm doctor` refuses multi-adapter + QLoRA plans whose estimated VRAM
148	exceeds the device's 85% headroom (roughly: `base_4bit + 1 GB/adapter
149	+ 25% activations`). The failure points at two fixes: drop to
150	`adapter: lora` across the board, or reduce the adapter count. LoRA
151	multi-adapter plans are always accepted — each adapter's extra state
152	is negligible next to the base weights.
153
154	## When to fold back to a single adapter
155
156	Multi-adapter adds cognitive load and per-adapter training cost. Fold
157	back when:
158
159	- The adapters converge on similar behavior despite separate routing —
160	the extra structure isn't doing work.
161	- One adapter's training set is so small (<10 rows) that it's adding
162	noise instead of signal.
163	- Your export pipeline is always `--adapter-mix name:1.0,other:1.0` —
164	a single adapter trained on the union is equivalent and cheaper.
165
166	## See also
167
168	- [Preference tuning (DPO vs ORPO)](preference-dpo-vs-orpo.md) —
169	applies per-adapter on multi-adapter docs via `::preference#name::`
170	routing.
171	- [Domain knowledge base](domain-kb.md) — the single-adapter story.