documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 3425 bytes Raw Blame History

Instruction section reference

::instruction:: sections are the supervised fine-tuning format DLM uses for prompt/answer training data.

They are valid in hand-authored .dlm files and in synthetic output written by dlm synth instructions --apply.

Basic shape

Each instruction section contains one or more Q / A pairs:

::instruction::
### Q
What is a decorator?

### A
A function that takes a function and returns a wrapped function.

### Q
When should I use `functools.wraps`?

### A
Whenever a decorator returns another callable and you want to preserve
the wrapped function's metadata.

DLM splits those into individual supervised rows at parse time.

Semantics

Q is the prompt shown to the model.
A is the target response.

At train time, DLM uses the question as context and the answer as the supervised target. This is the section type that most directly shapes assistant behavior.

Auto-synth instruction sections

When dlm synth instructions writes sections back into a document, it adds an HTML marker immediately after the section fence:

::instruction::
<!-- dlm-auto-synth: synth_teacher="self" synth_strategy="extraction" synth_at="2026-04-24T10:18:42Z" source_section_id="b6b7d8a2f4b3f9c0" -->
### Q
What does DGEMM do?

### A
It multiplies dense matrices and can optionally accumulate the result.

That marker corresponds to these parsed fields on the section:

auto_synth: true
synth_teacher
synth_strategy
synth_at
source_section_id

Hand-authored instruction sections omit the marker and keep auto_synth=false.

Validation rules

The auto-synth marker is only valid on ::instruction:: sections.
Auto-synth sections must provide all metadata fields together.
synth_teacher and synth_strategy must be non-empty strings.
source_section_id must be a valid referenced section ID.
Section identity ignores the synth metadata, so the same logical question/answer pair keeps the same content identity whether it was written by hand or synthesized automatically.

Interaction with training

dlm train includes synthesized instruction sections by default.
There is currently no separate "ignore auto-synth instructions" train flag; they flow through the normal SFT path once they are present in the document.
dlm synth revert strips every auto_synth: true instruction section from the file without touching hand-authored rows.

Interaction with `dlm synth`

Relevant commands:

dlm synth instructions <path>
dlm synth list <path>
dlm synth revert <path>

The current instructions command can:

stage accepted synth sections for inspection
write accepted synth sections directly with --apply
preview only with --dry-run

Choosing a good instruction section

Hand-authored or synthesized, good instruction sections tend to have:

a clear prompt with one task
an answer that matches the tone you want the adapter to learn
enough domain specificity that the pair teaches something real

Weak instruction sections tend to be:

generic
repetitive
too broad to answer well
stylistically inconsistent with the rest of the document

  
        1
        # Instruction section reference
      
        2
        
        3
        `::instruction::` sections are the supervised fine-tuning format DLM
      
        4
        uses for prompt/answer training data.
      
        5
        
        6
        They are valid in hand-authored `.dlm` files and in synthetic output
      
        7
        written by `dlm synth instructions --apply`.
      
        8
        
        9
        ## Basic shape
      
        10
        
        11
        Each instruction section contains one or more `Q` / `A` pairs:
      
        12
        
        13
        ```dlm
      
        14
        ::instruction::
      
        15
        ### Q
      
        16
        What is a decorator?
      
        17
        
        18
        ### A
      
        19
        A function that takes a function and returns a wrapped function.
      
        20
        
        21
        ### Q
      
        22
        When should I use `functools.wraps`?
      
        23
        
        24
        ### A
      
        25
        Whenever a decorator returns another callable and you want to preserve
      
        26
        the wrapped function's metadata.
      
        27
        ```
      
        28
        
        29
        DLM splits those into individual supervised rows at parse time.
      
        30
        
        31
        ## Semantics
      
        32
        
        33
        - `Q` is the prompt shown to the model.
      
        34
        - `A` is the target response.
      
        35
        
        36
        At train time, DLM uses the question as context and the answer as the
      
        37
        supervised target. This is the section type that most directly shapes
      
        38
        assistant behavior.
      
        39
        
        40
        ## Auto-synth instruction sections
      
        41
        
        42
        When `dlm synth instructions` writes sections back into a document, it
      
        43
        adds an HTML marker immediately after the section fence:
      
        44
        
        45
        ```dlm
      
        46
        ::instruction::
      
        47
        <!-- dlm-auto-synth: synth_teacher="self" synth_strategy="extraction" synth_at="2026-04-24T10:18:42Z" source_section_id="b6b7d8a2f4b3f9c0" -->
      
        48
        ### Q
      
        49
        What does DGEMM do?
      
        50
        
        51
        ### A
      
        52
        It multiplies dense matrices and can optionally accumulate the result.
      
        53
        ```
      
        54
        
        55
        That marker corresponds to these parsed fields on the section:
      
        56
        
        57
        - `auto_synth: true`
      
        58
        - `synth_teacher`
      
        59
        - `synth_strategy`
      
        60
        - `synth_at`
      
        61
        - `source_section_id`
      
        62
        
        63
        Hand-authored instruction sections omit the marker and keep
      
        64
        `auto_synth=false`.
      
        65
        
        66
        ## Validation rules
      
        67
        
        68
        - The auto-synth marker is only valid on `::instruction::` sections.
      
        69
        - Auto-synth sections must provide all metadata fields together.
      
        70
        - `synth_teacher` and `synth_strategy` must be non-empty strings.
      
        71
        - `source_section_id` must be a valid referenced section ID.
      
        72
        - Section identity ignores the synth metadata, so the same logical
      
        73
          question/answer pair keeps the same content identity whether it was
      
        74
          written by hand or synthesized automatically.
      
        75
        
        76
        ## Interaction with training
      
        77
        
        78
        - `dlm train` includes synthesized instruction sections by default.
      
        79
        - There is currently no separate "ignore auto-synth instructions" train
      
        80
          flag; they flow through the normal SFT path once they are present in
      
        81
          the document.
      
        82
        - `dlm synth revert` strips every `auto_synth: true` instruction section
      
        83
          from the file without touching hand-authored rows.
      
        84
        
        85
        ## Interaction with `dlm synth`
      
        86
        
        87
        Relevant commands:
      
        88
        
        89
        - `dlm synth instructions <path>`
      
        90
        - `dlm synth list <path>`
      
        91
        - `dlm synth revert <path>`
      
        92
        
        93
        The current `instructions` command can:
      
        94
        
        95
        - stage accepted synth sections for inspection
      
        96
        - write accepted synth sections directly with `--apply`
      
        97
        - preview only with `--dry-run`
      
        98
        
        99
        ## Choosing a good instruction section
      
        100
        
        101
        Hand-authored or synthesized, good instruction sections tend to have:
      
        102
        
        103
        - a clear prompt with one task
      
        104
        - an answer that matches the tone you want the adapter to learn
      
        105
        - enough domain specificity that the pair teaches something real
      
        106
        
        107
        Weak instruction sections tend to be:
      
        108
        
        109
        - generic
      
        110
        - repetitive
      
        111
        - too broad to answer well
      
        112
        - stylistically inconsistent with the rest of the document
      
        113
        
        114
        ## See also
      
        115
        
        116
        - [Section grammar](sections.md)
      
        117
        - [Synthesize training data](../cookbook/synthesize-training-data.md)
      
        118
        - [Bootstrap self-improving](../cookbook/bootstrap-self-improving.md)
      
        119
        - [CLI reference](../cli/reference.md)

1	# Instruction section reference
2
3	`::instruction::` sections are the supervised fine-tuning format DLM
4	uses for prompt/answer training data.
5
6	They are valid in hand-authored `.dlm` files and in synthetic output
7	written by `dlm synth instructions --apply`.
8
9	## Basic shape
10
11	Each instruction section contains one or more `Q` / `A` pairs:
12
13	```dlm
14	::instruction::
15	### Q
16	What is a decorator?
17
18	### A
19	A function that takes a function and returns a wrapped function.
20
21	### Q
22	When should I use `functools.wraps`?
23
24	### A
25	Whenever a decorator returns another callable and you want to preserve
26	the wrapped function's metadata.
27	```
28
29	DLM splits those into individual supervised rows at parse time.
30
31	## Semantics
32
33	- `Q` is the prompt shown to the model.
34	- `A` is the target response.
35
36	At train time, DLM uses the question as context and the answer as the
37	supervised target. This is the section type that most directly shapes
38	assistant behavior.
39
40	## Auto-synth instruction sections
41
42	When `dlm synth instructions` writes sections back into a document, it
43	adds an HTML marker immediately after the section fence:
44
45	```dlm
46	::instruction::
47	<!-- dlm-auto-synth: synth_teacher="self" synth_strategy="extraction" synth_at="2026-04-24T10:18:42Z" source_section_id="b6b7d8a2f4b3f9c0" -->
48	### Q
49	What does DGEMM do?
50
51	### A
52	It multiplies dense matrices and can optionally accumulate the result.
53	```
54
55	That marker corresponds to these parsed fields on the section:
56
57	- `auto_synth: true`
58	- `synth_teacher`
59	- `synth_strategy`
60	- `synth_at`
61	- `source_section_id`
62
63	Hand-authored instruction sections omit the marker and keep
64	`auto_synth=false`.
65
66	## Validation rules
67
68	- The auto-synth marker is only valid on `::instruction::` sections.
69	- Auto-synth sections must provide all metadata fields together.
70	- `synth_teacher` and `synth_strategy` must be non-empty strings.
71	- `source_section_id` must be a valid referenced section ID.
72	- Section identity ignores the synth metadata, so the same logical
73	question/answer pair keeps the same content identity whether it was
74	written by hand or synthesized automatically.
75
76	## Interaction with training
77
78	- `dlm train` includes synthesized instruction sections by default.
79	- There is currently no separate "ignore auto-synth instructions" train
80	flag; they flow through the normal SFT path once they are present in
81	the document.
82	- `dlm synth revert` strips every `auto_synth: true` instruction section
83	from the file without touching hand-authored rows.
84
85	## Interaction with `dlm synth`
86
87	Relevant commands:
88
89	- `dlm synth instructions <path>`
90	- `dlm synth list <path>`
91	- `dlm synth revert <path>`
92
93	The current `instructions` command can:
94
95	- stage accepted synth sections for inspection
96	- write accepted synth sections directly with `--apply`
97	- preview only with `--dry-run`
98
99	## Choosing a good instruction section
100
101	Hand-authored or synthesized, good instruction sections tend to have:
102
103	- a clear prompt with one task
104	- an answer that matches the tone you want the adapter to learn
105	- enough domain specificity that the pair teaches something real
106
107	Weak instruction sections tend to be:
108
109	- generic
110	- repetitive
111	- too broad to answer well
112	- stylistically inconsistent with the rest of the document
113
114	## See also
115
116	- [Section grammar](sections.md)
117	- [Synthesize training data](../cookbook/synthesize-training-data.md)
118	- [Bootstrap self-improving](../cookbook/bootstrap-self-improving.md)
119	- [CLI reference](../cli/reference.md)