documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 2872 bytes Raw Blame History

Preference section reference

::preference:: sections are the pairwise alignment format DLM feeds into the preference-training path (dpo / orpo). They are valid in hand-authored .dlm files and in auto-mined output written by dlm preference mine --apply.

Basic shape

Each record contains three labeled blocks:

::preference::
### Prompt
Explain recursion to a beginner.

### Chosen
Recursion is when a function calls itself on a smaller version of the
same problem.

### Rejected
Recursion is a self-referential computational strategy implemented with
stack-managed frame expansion.

One ::preference:: section can hold one or more Prompt/Chosen/Rejected triples. DLM splits them into preference rows at parse time.

Semantics

Prompt is the input shown to the model.
Chosen is the preferred response.
Rejected is the lower-quality alternative.

Preference training does not try to predict the Rejected text. Instead, it learns to increase the model's relative preference for the Chosen response over the Rejected one.

Auto-mined sections

When dlm preference mine writes sections back into a document, it marks them with an HTML comment immediately after the section fence:

::preference::
<!-- dlm-auto-mined: judge_name="sway" judge_score_chosen="0.82" judge_score_rejected="0.31" mined_at="2026-04-23T18:42:11Z" mined_run_id="7" -->
### Prompt
What is 2 + 2?
### Chosen
4.
### Rejected
The sum of two and two is four.

That marker corresponds to these parsed fields on the section:

auto_mined: true
judge_name
judge_score_chosen
judge_score_rejected
mined_at
mined_run_id

These metadata fields are required together for auto-mined preference sections. Hand-authored sections omit the marker and keep auto_mined=false.

Validation rules

The auto-mined marker is only valid on ::preference:: sections.
Auto-mined sections must provide all metadata fields together.
The parser rejects malformed score/timestamp/run-id values rather than silently guessing.
Section identity ignores the auto-mined metadata, so the same logical preference pair keeps the same content identity whether it was written by hand or mined automatically.

Interaction with training

dlm train includes auto-mined preference sections by default.
dlm train --no-mined excludes only auto_mined=true sections and still uses hand-authored preference pairs.
Replay snapshots also preserve the auto_mined bit so future preference runs can opt in or out consistently.

dlm preference mine <path>
dlm preference apply <path>
dlm preference revert <path>
dlm train <path> --no-mined

  
        1
        # Preference section reference
      
        2
        
        3
        `::preference::` sections are the pairwise alignment format DLM feeds
      
        4
        into the preference-training path (`dpo` / `orpo`). They are valid in
      
        5
        hand-authored `.dlm` files and in auto-mined output written by
      
        6
        `dlm preference mine --apply`.
      
        7
        
        8
        ## Basic shape
      
        9
        
        10
        Each record contains three labeled blocks:
      
        11
        
        12
        ```dlm
      
        13
        ::preference::
      
        14
        ### Prompt
      
        15
        Explain recursion to a beginner.
      
        16
        
        17
        ### Chosen
      
        18
        Recursion is when a function calls itself on a smaller version of the
      
        19
        same problem.
      
        20
        
        21
        ### Rejected
      
        22
        Recursion is a self-referential computational strategy implemented with
      
        23
        stack-managed frame expansion.
      
        24
        ```
      
        25
        
        26
        One `::preference::` section can hold one or more Prompt/Chosen/Rejected
      
        27
        triples. DLM splits them into preference rows at parse time.
      
        28
        
        29
        ## Semantics
      
        30
        
        31
        - `Prompt` is the input shown to the model.
      
        32
        - `Chosen` is the preferred response.
      
        33
        - `Rejected` is the lower-quality alternative.
      
        34
        
        35
        Preference training does not try to predict the `Rejected` text.
      
        36
        Instead, it learns to increase the model's relative preference for the
      
        37
        Chosen response over the Rejected one.
      
        38
        
        39
        ## Auto-mined sections
      
        40
        
        41
        When `dlm preference mine` writes sections back into a document, it
      
        42
        marks them with an HTML comment immediately after the section fence:
      
        43
        
        44
        ```dlm
      
        45
        ::preference::
      
        46
        <!-- dlm-auto-mined: judge_name="sway" judge_score_chosen="0.82" judge_score_rejected="0.31" mined_at="2026-04-23T18:42:11Z" mined_run_id="7" -->
      
        47
        ### Prompt
      
        48
        What is 2 + 2?
      
        49
        ### Chosen
      
        50
        4.
      
        51
        ### Rejected
      
        52
        The sum of two and two is four.
      
        53
        ```
      
        54
        
        55
        That marker corresponds to these parsed fields on the section:
      
        56
        
        57
        - `auto_mined: true`
      
        58
        - `judge_name`
      
        59
        - `judge_score_chosen`
      
        60
        - `judge_score_rejected`
      
        61
        - `mined_at`
      
        62
        - `mined_run_id`
      
        63
        
        64
        These metadata fields are required together for auto-mined preference
      
        65
        sections. Hand-authored sections omit the marker and keep
      
        66
        `auto_mined=false`.
      
        67
        
        68
        ## Validation rules
      
        69
        
        70
        - The auto-mined marker is only valid on `::preference::` sections.
      
        71
        - Auto-mined sections must provide all metadata fields together.
      
        72
        - The parser rejects malformed score/timestamp/run-id values rather than
      
        73
          silently guessing.
      
        74
        - Section identity ignores the auto-mined metadata, so the same logical
      
        75
          preference pair keeps the same content identity whether it was written
      
        76
          by hand or mined automatically.
      
        77
        
        78
        ## Interaction with training
      
        79
        
        80
        - `dlm train` includes auto-mined preference sections by default.
      
        81
        - `dlm train --no-mined` excludes only `auto_mined=true` sections and
      
        82
          still uses hand-authored preference pairs.
      
        83
        - Replay snapshots also preserve the `auto_mined` bit so future
      
        84
          preference runs can opt in or out consistently.
      
        85
        
        86
        ## Related commands
      
        87
        
        88
        - `dlm preference mine <path>`
      
        89
        - `dlm preference apply <path>`
      
        90
        - `dlm preference revert <path>`
      
        91
        - `dlm train <path> --no-mined`
      
        92
        
        93
        ## See also
      
        94
        
        95
        - [Section grammar](sections.md)
      
        96
        - [CLI reference](../cli/reference.md)
      
        97
        - [Self-improving loop cookbook](../cookbook/self-improving-loop.md)

1	# Preference section reference
2
3	`::preference::` sections are the pairwise alignment format DLM feeds
4	into the preference-training path (`dpo` / `orpo`). They are valid in
5	hand-authored `.dlm` files and in auto-mined output written by
6	`dlm preference mine --apply`.
7
8	## Basic shape
9
10	Each record contains three labeled blocks:
11
12	```dlm
13	::preference::
14	### Prompt
15	Explain recursion to a beginner.
16
17	### Chosen
18	Recursion is when a function calls itself on a smaller version of the
19	same problem.
20
21	### Rejected
22	Recursion is a self-referential computational strategy implemented with
23	stack-managed frame expansion.
24	```
25
26	One `::preference::` section can hold one or more Prompt/Chosen/Rejected
27	triples. DLM splits them into preference rows at parse time.
28
29	## Semantics
30
31	- `Prompt` is the input shown to the model.
32	- `Chosen` is the preferred response.
33	- `Rejected` is the lower-quality alternative.
34
35	Preference training does not try to predict the `Rejected` text.
36	Instead, it learns to increase the model's relative preference for the
37	Chosen response over the Rejected one.
38
39	## Auto-mined sections
40
41	When `dlm preference mine` writes sections back into a document, it
42	marks them with an HTML comment immediately after the section fence:
43
44	```dlm
45	::preference::
46	<!-- dlm-auto-mined: judge_name="sway" judge_score_chosen="0.82" judge_score_rejected="0.31" mined_at="2026-04-23T18:42:11Z" mined_run_id="7" -->
47	### Prompt
48	What is 2 + 2?
49	### Chosen
50	4.
51	### Rejected
52	The sum of two and two is four.
53	```
54
55	That marker corresponds to these parsed fields on the section:
56
57	- `auto_mined: true`
58	- `judge_name`
59	- `judge_score_chosen`
60	- `judge_score_rejected`
61	- `mined_at`
62	- `mined_run_id`
63
64	These metadata fields are required together for auto-mined preference
65	sections. Hand-authored sections omit the marker and keep
66	`auto_mined=false`.
67
68	## Validation rules
69
70	- The auto-mined marker is only valid on `::preference::` sections.
71	- Auto-mined sections must provide all metadata fields together.
72	- The parser rejects malformed score/timestamp/run-id values rather than
73	silently guessing.
74	- Section identity ignores the auto-mined metadata, so the same logical
75	preference pair keeps the same content identity whether it was written
76	by hand or mined automatically.
77
78	## Interaction with training
79
80	- `dlm train` includes auto-mined preference sections by default.
81	- `dlm train --no-mined` excludes only `auto_mined=true` sections and
82	still uses hand-authored preference pairs.
83	- Replay snapshots also preserve the `auto_mined` bit so future
84	preference runs can opt in or out consistently.
85
86	## Related commands
87
88	- `dlm preference mine <path>`
89	- `dlm preference apply <path>`
90	- `dlm preference revert <path>`
91	- `dlm train <path> --no-mined`
92
93	## See also
94
95	- [Section grammar](sections.md)
96	- [CLI reference](../cli/reference.md)
97	- [Self-improving loop cookbook](../cookbook/self-improving-loop.md)