documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 4982 bytes Raw Blame History

Bootstrap self-improving

The self-teacher loop is the most interesting version of Sprint 43: your current adapter writes new ::instruction:: sections for its own document, then the next train run folds them back in.

This is not magic. It works because DLM already has:

replay-backed retraining
synthesized instruction provenance (auto_synth)
a local sway judge for filtering weak candidates

Used carefully, it turns one trained document into a steadily better instruction corpus.

The honest starting point

--teacher self uses the current adapter for that .dlm. That means the loop starts after there is already a trainable local adapter.

A good bootstrap pattern is:

Start with prose plus at least some useful seed supervision, or do an initial train from prose and existing sections.
Run dlm synth instructions --teacher self.
Retrain on the accepted synth sections.
Repeat in small batches.

If the adapter still cannot answer basic questions about the document, synthetic instruction generation will mostly amplify noise.

Minimal loop

Train once:

uv run dlm train notes.dlm

Generate a small accepted batch from the current adapter and write it back immediately:

uv run dlm synth instructions notes.dlm \
  --teacher self \
  --per-section 1 \
  --strategy extraction \
  --max-pairs 4 \
  --apply

Retrain on the expanded instruction set:

uv run dlm train notes.dlm

Then inspect real output quality:

uv run dlm prompt notes.dlm "What does DGEMM do?"

That is the basic self-improving loop.

Safer staged version

If you want to inspect before writing:

uv run dlm synth instructions notes.dlm \
  --teacher self \
  --per-section 1 \
  --strategy extraction

uv run dlm synth list notes.dlm

The current implementation stages accepted synth sections for inspection, but it does not yet have a separate dlm synth apply subcommand. Use --apply on the synth run when you want the sections written straight into the document.

Why `sway` stays the default

The self-teacher path is the place where the default --filter sway matters most.

Without filtering, a weak adapter can happily generate:

duplicates
overly generic answers
plausible but wrong extrapolations

The current synth filter stack is:

dedup
optional judge pass
optional threshold cut

The CLI prints those counts so you can tell whether the loop is getting better or just louder.

A conservative rhythm

This is a healthy local rhythm for a real project:

uv run dlm train notes.dlm
uv run dlm synth instructions notes.dlm \
  --teacher self \
  --per-section 1 \
  --max-pairs 4 \
  --apply
uv run dlm train notes.dlm
uv run dlm prompt notes.dlm "Explain the core idea."

Keep the accepted batch small at first. The point is to improve the document's instruction surface, not flood it with speculative rows.

When to switch away from `self`

The self-teacher is convenient, but not always the right teacher.

Prefer an external teacher when:

the local adapter is still very early and weak
you need broader general knowledge than the current adapter can supply
you want to compare local-vs-external synth quality on the same prose

That usually looks like:

uv run dlm synth instructions notes.dlm \
  --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
  --per-section 1 \
  --apply

and then later moving back to --teacher self once the adapter has real domain traction.

Pairing Sprint 43 with Sprint 42

Instruction synthesis and preference mining are complementary:

dlm synth instructions grows the SFT side of the document
dlm synth preferences / dlm preference mine sharpens ranking and behavior once the adapter can already produce multiple plausible answers

A practical sequence is:

train
synth instructions
train
mine preferences
train preference phase

That is the closest current DLM path to a fully local self-improving document loop.

Failure modes to watch

The second pass is not better

That usually means one of:

the first synth batch was too weak
the document still lacks enough domain prose
the adapter is too small for the domain

Do not assume "more synthetic rows" automatically means "better model."

Expansion mode gets weird

--strategy expansion is useful, but it is also the fastest route to polished nonsense. Prefer extraction for early loops and only widen to both or expansion once the adapter is already grounded.

Prompt quality improves but factuality does not

That is a signal to go back to better prose or hand-authored instructional supervision. Self-improvement cannot invent missing source knowledge.

  
        1
        # Bootstrap self-improving
      
        2
        
        3
        The self-teacher loop is the most interesting version of Sprint 43:
      
        4
        your current adapter writes new `::instruction::` sections for its own
      
        5
        document, then the next train run folds them back in.
      
        6
        
        7
        This is not magic. It works because DLM already has:
      
        8
        
        9
        - replay-backed retraining
      
        10
        - synthesized instruction provenance (`auto_synth`)
      
        11
        - a local `sway` judge for filtering weak candidates
      
        12
        
        13
        Used carefully, it turns one trained document into a steadily better
      
        14
        instruction corpus.
      
        15
        
        16
        ## The honest starting point
      
        17
        
        18
        `--teacher self` uses the current adapter for that `.dlm`. That means
      
        19
        the loop starts **after** there is already a trainable local adapter.
      
        20
        
        21
        A good bootstrap pattern is:
      
        22
        
        23
        1. Start with prose plus at least some useful seed supervision, or do an
      
        24
           initial train from prose and existing sections.
      
        25
        2. Run `dlm synth instructions --teacher self`.
      
        26
        3. Retrain on the accepted synth sections.
      
        27
        4. Repeat in small batches.
      
        28
        
        29
        If the adapter still cannot answer basic questions about the document,
      
        30
        synthetic instruction generation will mostly amplify noise.
      
        31
        
        32
        ## Minimal loop
      
        33
        
        34
        Train once:
      
        35
        
        36
        ```sh
      
        37
        uv run dlm train notes.dlm
      
        38
        ```
      
        39
        
        40
        Generate a small accepted batch from the current adapter and write it
      
        41
        back immediately:
      
        42
        
        43
        ```sh
      
        44
        uv run dlm synth instructions notes.dlm \
      
        45
          --teacher self \
      
        46
          --per-section 1 \
      
        47
          --strategy extraction \
      
        48
          --max-pairs 4 \
      
        49
          --apply
      
        50
        ```
      
        51
        
        52
        Retrain on the expanded instruction set:
      
        53
        
        54
        ```sh
      
        55
        uv run dlm train notes.dlm
      
        56
        ```
      
        57
        
        58
        Then inspect real output quality:
      
        59
        
        60
        ```sh
      
        61
        uv run dlm prompt notes.dlm "What does DGEMM do?"
      
        62
        ```
      
        63
        
        64
        That is the basic self-improving loop.
      
        65
        
        66
        ## Safer staged version
      
        67
        
        68
        If you want to inspect before writing:
      
        69
        
        70
        ```sh
      
        71
        uv run dlm synth instructions notes.dlm \
      
        72
          --teacher self \
      
        73
          --per-section 1 \
      
        74
          --strategy extraction
      
        75
        
        76
        uv run dlm synth list notes.dlm
      
        77
        ```
      
        78
        
        79
        The current implementation stages accepted synth sections for
      
        80
        inspection, but it does not yet have a separate `dlm synth apply`
      
        81
        subcommand. Use `--apply` on the synth run when you want the sections
      
        82
        written straight into the document.
      
        83
        
        84
        ## Why `sway` stays the default
      
        85
        
        86
        The self-teacher path is the place where the default `--filter sway`
      
        87
        matters most.
      
        88
        
        89
        Without filtering, a weak adapter can happily generate:
      
        90
        
        91
        - duplicates
      
        92
        - overly generic answers
      
        93
        - plausible but wrong extrapolations
      
        94
        
        95
        The current synth filter stack is:
      
        96
        
        97
        1. dedup
      
        98
        2. optional judge pass
      
        99
        3. optional threshold cut
      
        100
        
        101
        The CLI prints those counts so you can tell whether the loop is getting
      
        102
        better or just louder.
      
        103
        
        104
        ## A conservative rhythm
      
        105
        
        106
        This is a healthy local rhythm for a real project:
      
        107
        
        108
        ```sh
      
        109
        uv run dlm train notes.dlm
      
        110
        uv run dlm synth instructions notes.dlm \
      
        111
          --teacher self \
      
        112
          --per-section 1 \
      
        113
          --max-pairs 4 \
      
        114
          --apply
      
        115
        uv run dlm train notes.dlm
      
        116
        uv run dlm prompt notes.dlm "Explain the core idea."
      
        117
        ```
      
        118
        
        119
        Keep the accepted batch small at first. The point is to improve the
      
        120
        document's instruction surface, not flood it with speculative rows.
      
        121
        
        122
        ## When to switch away from `self`
      
        123
        
        124
        The self-teacher is convenient, but not always the right teacher.
      
        125
        
        126
        Prefer an external teacher when:
      
        127
        
        128
        - the local adapter is still very early and weak
      
        129
        - you need broader general knowledge than the current adapter can supply
      
        130
        - you want to compare local-vs-external synth quality on the same prose
      
        131
        
        132
        That usually looks like:
      
        133
        
        134
        ```sh
      
        135
        uv run dlm synth instructions notes.dlm \
      
        136
          --teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
      
        137
          --per-section 1 \
      
        138
          --apply
      
        139
        ```
      
        140
        
        141
        and then later moving back to `--teacher self` once the adapter has real
      
        142
        domain traction.
      
        143
        
        144
        ## Pairing Sprint 43 with Sprint 42
      
        145
        
        146
        Instruction synthesis and preference mining are complementary:
      
        147
        
        148
        - `dlm synth instructions` grows the SFT side of the document
      
        149
        - `dlm synth preferences` / `dlm preference mine` sharpens ranking and
      
        150
          behavior once the adapter can already produce multiple plausible
      
        151
          answers
      
        152
        
        153
        A practical sequence is:
      
        154
        
        155
        1. train
      
        156
        2. synth instructions
      
        157
        3. train
      
        158
        4. mine preferences
      
        159
        5. train preference phase
      
        160
        
        161
        That is the closest current DLM path to a fully local self-improving
      
        162
        document loop.
      
        163
        
        164
        ## Failure modes to watch
      
        165
        
        166
        ### The second pass is not better
      
        167
        
        168
        That usually means one of:
      
        169
        
        170
        - the first synth batch was too weak
      
        171
        - the document still lacks enough domain prose
      
        172
        - the adapter is too small for the domain
      
        173
        
        174
        Do not assume "more synthetic rows" automatically means "better model."
      
        175
        
        176
        ### Expansion mode gets weird
      
        177
        
        178
        `--strategy expansion` is useful, but it is also the fastest route to
      
        179
        polished nonsense. Prefer `extraction` for early loops and only widen to
      
        180
        `both` or `expansion` once the adapter is already grounded.
      
        181
        
        182
        ### Prompt quality improves but factuality does not
      
        183
        
        184
        That is a signal to go back to better prose or hand-authored
      
        185
        instructional supervision. Self-improvement cannot invent missing source
      
        186
        knowledge.
      
        187
        
        188
        ## See also
      
        189
        
        190
        - [Synthesize training data](synthesize-training-data.md)
      
        191
        - [Instruction section reference](../format/instruction-section.md)
      
        192
        - [Self-improving loop](self-improving-loop.md)
      
        193
        - [Reward-model integration](reward-model-integration.md)

1	# Bootstrap self-improving
2
3	The self-teacher loop is the most interesting version of Sprint 43:
4	your current adapter writes new `::instruction::` sections for its own
5	document, then the next train run folds them back in.
6
7	This is not magic. It works because DLM already has:
8
9	- replay-backed retraining
10	- synthesized instruction provenance (`auto_synth`)
11	- a local `sway` judge for filtering weak candidates
12
13	Used carefully, it turns one trained document into a steadily better
14	instruction corpus.
15
16	## The honest starting point
17
18	`--teacher self` uses the current adapter for that `.dlm`. That means
19	the loop starts after there is already a trainable local adapter.
20
21	A good bootstrap pattern is:
22
23	1. Start with prose plus at least some useful seed supervision, or do an
24	initial train from prose and existing sections.
25	2. Run `dlm synth instructions --teacher self`.
26	3. Retrain on the accepted synth sections.
27	4. Repeat in small batches.
28
29	If the adapter still cannot answer basic questions about the document,
30	synthetic instruction generation will mostly amplify noise.
31
32	## Minimal loop
33
34	Train once:
35
36	```sh
37	uv run dlm train notes.dlm
38	```
39
40	Generate a small accepted batch from the current adapter and write it
41	back immediately:
42
43	```sh
44	uv run dlm synth instructions notes.dlm \
45	--teacher self \
46	--per-section 1 \
47	--strategy extraction \
48	--max-pairs 4 \
49	--apply
50	```
51
52	Retrain on the expanded instruction set:
53
54	```sh
55	uv run dlm train notes.dlm
56	```
57
58	Then inspect real output quality:
59
60	```sh
61	uv run dlm prompt notes.dlm "What does DGEMM do?"
62	```
63
64	That is the basic self-improving loop.
65
66	## Safer staged version
67
68	If you want to inspect before writing:
69
70	```sh
71	uv run dlm synth instructions notes.dlm \
72	--teacher self \
73	--per-section 1 \
74	--strategy extraction
75
76	uv run dlm synth list notes.dlm
77	```
78
79	The current implementation stages accepted synth sections for
80	inspection, but it does not yet have a separate `dlm synth apply`
81	subcommand. Use `--apply` on the synth run when you want the sections
82	written straight into the document.
83
84	## Why `sway` stays the default
85
86	The self-teacher path is the place where the default `--filter sway`
87	matters most.
88
89	Without filtering, a weak adapter can happily generate:
90
91	- duplicates
92	- overly generic answers
93	- plausible but wrong extrapolations
94
95	The current synth filter stack is:
96
97	1. dedup
98	2. optional judge pass
99	3. optional threshold cut
100
101	The CLI prints those counts so you can tell whether the loop is getting
102	better or just louder.
103
104	## A conservative rhythm
105
106	This is a healthy local rhythm for a real project:
107
108	```sh
109	uv run dlm train notes.dlm
110	uv run dlm synth instructions notes.dlm \
111	--teacher self \
112	--per-section 1 \
113	--max-pairs 4 \
114	--apply
115	uv run dlm train notes.dlm
116	uv run dlm prompt notes.dlm "Explain the core idea."
117	```
118
119	Keep the accepted batch small at first. The point is to improve the
120	document's instruction surface, not flood it with speculative rows.
121
122	## When to switch away from `self`
123
124	The self-teacher is convenient, but not always the right teacher.
125
126	Prefer an external teacher when:
127
128	- the local adapter is still very early and weak
129	- you need broader general knowledge than the current adapter can supply
130	- you want to compare local-vs-external synth quality on the same prose
131
132	That usually looks like:
133
134	```sh
135	uv run dlm synth instructions notes.dlm \
136	--teacher hf:Qwen/Qwen2.5-1.5B-Instruct \
137	--per-section 1 \
138	--apply
139	```
140
141	and then later moving back to `--teacher self` once the adapter has real
142	domain traction.
143
144	## Pairing Sprint 43 with Sprint 42
145
146	Instruction synthesis and preference mining are complementary:
147
148	- `dlm synth instructions` grows the SFT side of the document
149	- `dlm synth preferences` / `dlm preference mine` sharpens ranking and
150	behavior once the adapter can already produce multiple plausible
151	answers
152
153	A practical sequence is:
154
155	1. train
156	2. synth instructions
157	3. train
158	4. mine preferences
159	5. train preference phase
160
161	That is the closest current DLM path to a fully local self-improving
162	document loop.
163
164	## Failure modes to watch
165
166	### The second pass is not better
167
168	That usually means one of:
169
170	- the first synth batch was too weak
171	- the document still lacks enough domain prose
172	- the adapter is too small for the domain
173
174	Do not assume "more synthetic rows" automatically means "better model."
175
176	### Expansion mode gets weird
177
178	`--strategy expansion` is useful, but it is also the fastest route to
179	polished nonsense. Prefer `extraction` for early loops and only widen to
180	`both` or `expansion` once the adapter is already grounded.
181
182	### Prompt quality improves but factuality does not
183
184	That is a signal to go back to better prose or hand-authored
185	instructional supervision. Self-improvement cannot invent missing source
186	knowledge.
187
188	## See also
189
190	- [Synthesize training data](synthesize-training-data.md)
191	- [Instruction section reference](../format/instruction-section.md)
192	- [Self-improving loop](self-improving-loop.md)
193	- [Reward-model integration](reward-model-integration.md)