markdown · 2872 bytes Raw Blame History

Preference section reference

::preference:: sections are the pairwise alignment format DLM feeds into the preference-training path (dpo / orpo). They are valid in hand-authored .dlm files and in auto-mined output written by dlm preference mine --apply.

Basic shape

Each record contains three labeled blocks:

::preference::
### Prompt
Explain recursion to a beginner.

### Chosen
Recursion is when a function calls itself on a smaller version of the
same problem.

### Rejected
Recursion is a self-referential computational strategy implemented with
stack-managed frame expansion.

One ::preference:: section can hold one or more Prompt/Chosen/Rejected triples. DLM splits them into preference rows at parse time.

Semantics

  • Prompt is the input shown to the model.
  • Chosen is the preferred response.
  • Rejected is the lower-quality alternative.

Preference training does not try to predict the Rejected text. Instead, it learns to increase the model's relative preference for the Chosen response over the Rejected one.

Auto-mined sections

When dlm preference mine writes sections back into a document, it marks them with an HTML comment immediately after the section fence:

::preference::
<!-- dlm-auto-mined: judge_name="sway" judge_score_chosen="0.82" judge_score_rejected="0.31" mined_at="2026-04-23T18:42:11Z" mined_run_id="7" -->
### Prompt
What is 2 + 2?
### Chosen
4.
### Rejected
The sum of two and two is four.

That marker corresponds to these parsed fields on the section:

  • auto_mined: true
  • judge_name
  • judge_score_chosen
  • judge_score_rejected
  • mined_at
  • mined_run_id

These metadata fields are required together for auto-mined preference sections. Hand-authored sections omit the marker and keep auto_mined=false.

Validation rules

  • The auto-mined marker is only valid on ::preference:: sections.
  • Auto-mined sections must provide all metadata fields together.
  • The parser rejects malformed score/timestamp/run-id values rather than silently guessing.
  • Section identity ignores the auto-mined metadata, so the same logical preference pair keeps the same content identity whether it was written by hand or mined automatically.

Interaction with training

  • dlm train includes auto-mined preference sections by default.
  • dlm train --no-mined excludes only auto_mined=true sections and still uses hand-authored preference pairs.
  • Replay snapshots also preserve the auto_mined bit so future preference runs can opt in or out consistently.
  • dlm preference mine <path>
  • dlm preference apply <path>
  • dlm preference revert <path>
  • dlm train <path> --no-mined

See also

View source
1 # Preference section reference
2
3 `::preference::` sections are the pairwise alignment format DLM feeds
4 into the preference-training path (`dpo` / `orpo`). They are valid in
5 hand-authored `.dlm` files and in auto-mined output written by
6 `dlm preference mine --apply`.
7
8 ## Basic shape
9
10 Each record contains three labeled blocks:
11
12 ```dlm
13 ::preference::
14 ### Prompt
15 Explain recursion to a beginner.
16
17 ### Chosen
18 Recursion is when a function calls itself on a smaller version of the
19 same problem.
20
21 ### Rejected
22 Recursion is a self-referential computational strategy implemented with
23 stack-managed frame expansion.
24 ```
25
26 One `::preference::` section can hold one or more Prompt/Chosen/Rejected
27 triples. DLM splits them into preference rows at parse time.
28
29 ## Semantics
30
31 - `Prompt` is the input shown to the model.
32 - `Chosen` is the preferred response.
33 - `Rejected` is the lower-quality alternative.
34
35 Preference training does not try to predict the `Rejected` text.
36 Instead, it learns to increase the model's relative preference for the
37 Chosen response over the Rejected one.
38
39 ## Auto-mined sections
40
41 When `dlm preference mine` writes sections back into a document, it
42 marks them with an HTML comment immediately after the section fence:
43
44 ```dlm
45 ::preference::
46 <!-- dlm-auto-mined: judge_name="sway" judge_score_chosen="0.82" judge_score_rejected="0.31" mined_at="2026-04-23T18:42:11Z" mined_run_id="7" -->
47 ### Prompt
48 What is 2 + 2?
49 ### Chosen
50 4.
51 ### Rejected
52 The sum of two and two is four.
53 ```
54
55 That marker corresponds to these parsed fields on the section:
56
57 - `auto_mined: true`
58 - `judge_name`
59 - `judge_score_chosen`
60 - `judge_score_rejected`
61 - `mined_at`
62 - `mined_run_id`
63
64 These metadata fields are required together for auto-mined preference
65 sections. Hand-authored sections omit the marker and keep
66 `auto_mined=false`.
67
68 ## Validation rules
69
70 - The auto-mined marker is only valid on `::preference::` sections.
71 - Auto-mined sections must provide all metadata fields together.
72 - The parser rejects malformed score/timestamp/run-id values rather than
73 silently guessing.
74 - Section identity ignores the auto-mined metadata, so the same logical
75 preference pair keeps the same content identity whether it was written
76 by hand or mined automatically.
77
78 ## Interaction with training
79
80 - `dlm train` includes auto-mined preference sections by default.
81 - `dlm train --no-mined` excludes only `auto_mined=true` sections and
82 still uses hand-authored preference pairs.
83 - Replay snapshots also preserve the `auto_mined` bit so future
84 preference runs can opt in or out consistently.
85
86 ## Related commands
87
88 - `dlm preference mine <path>`
89 - `dlm preference apply <path>`
90 - `dlm preference revert <path>`
91 - `dlm train <path> --no-mined`
92
93 ## See also
94
95 - [Section grammar](sections.md)
96 - [CLI reference](../cli/reference.md)
97 - [Self-improving loop cookbook](../cookbook/self-improving-loop.md)