@@ -0,0 +1,97 @@ |
| 1 | +# Preference section reference |
| 2 | + |
| 3 | +`::preference::` sections are the pairwise alignment format DLM feeds |
| 4 | +into the preference-training path (`dpo` / `orpo`). They are valid in |
| 5 | +hand-authored `.dlm` files and in auto-mined output written by |
| 6 | +`dlm preference mine --apply`. |
| 7 | + |
| 8 | +## Basic shape |
| 9 | + |
| 10 | +Each record contains three labeled blocks: |
| 11 | + |
| 12 | +```dlm |
| 13 | +::preference:: |
| 14 | +### Prompt |
| 15 | +Explain recursion to a beginner. |
| 16 | + |
| 17 | +### Chosen |
| 18 | +Recursion is when a function calls itself on a smaller version of the |
| 19 | +same problem. |
| 20 | + |
| 21 | +### Rejected |
| 22 | +Recursion is a self-referential computational strategy implemented with |
| 23 | +stack-managed frame expansion. |
| 24 | +``` |
| 25 | + |
| 26 | +One `::preference::` section can hold one or more Prompt/Chosen/Rejected |
| 27 | +triples. DLM splits them into preference rows at parse time. |
| 28 | + |
| 29 | +## Semantics |
| 30 | + |
| 31 | +- `Prompt` is the input shown to the model. |
| 32 | +- `Chosen` is the preferred response. |
| 33 | +- `Rejected` is the lower-quality alternative. |
| 34 | + |
| 35 | +Preference training does not try to predict the `Rejected` text. |
| 36 | +Instead, it learns to increase the model's relative preference for the |
| 37 | +Chosen response over the Rejected one. |
| 38 | + |
| 39 | +## Auto-mined sections |
| 40 | + |
| 41 | +When `dlm preference mine` writes sections back into a document, it |
| 42 | +marks them with an HTML comment immediately after the section fence: |
| 43 | + |
| 44 | +```dlm |
| 45 | +::preference:: |
| 46 | +<!-- dlm-auto-mined: judge_name="sway" judge_score_chosen="0.82" judge_score_rejected="0.31" mined_at="2026-04-23T18:42:11Z" mined_run_id="7" --> |
| 47 | +### Prompt |
| 48 | +What is 2 + 2? |
| 49 | +### Chosen |
| 50 | +4. |
| 51 | +### Rejected |
| 52 | +The sum of two and two is four. |
| 53 | +``` |
| 54 | + |
| 55 | +That marker corresponds to these parsed fields on the section: |
| 56 | + |
| 57 | +- `auto_mined: true` |
| 58 | +- `judge_name` |
| 59 | +- `judge_score_chosen` |
| 60 | +- `judge_score_rejected` |
| 61 | +- `mined_at` |
| 62 | +- `mined_run_id` |
| 63 | + |
| 64 | +These metadata fields are required together for auto-mined preference |
| 65 | +sections. Hand-authored sections omit the marker and keep |
| 66 | +`auto_mined=false`. |
| 67 | + |
| 68 | +## Validation rules |
| 69 | + |
| 70 | +- The auto-mined marker is only valid on `::preference::` sections. |
| 71 | +- Auto-mined sections must provide all metadata fields together. |
| 72 | +- The parser rejects malformed score/timestamp/run-id values rather than |
| 73 | + silently guessing. |
| 74 | +- Section identity ignores the auto-mined metadata, so the same logical |
| 75 | + preference pair keeps the same content identity whether it was written |
| 76 | + by hand or mined automatically. |
| 77 | + |
| 78 | +## Interaction with training |
| 79 | + |
| 80 | +- `dlm train` includes auto-mined preference sections by default. |
| 81 | +- `dlm train --no-mined` excludes only `auto_mined=true` sections and |
| 82 | + still uses hand-authored preference pairs. |
| 83 | +- Replay snapshots also preserve the `auto_mined` bit so future |
| 84 | + preference runs can opt in or out consistently. |
| 85 | + |
| 86 | +## Related commands |
| 87 | + |
| 88 | +- `dlm preference mine <path>` |
| 89 | +- `dlm preference apply <path>` |
| 90 | +- `dlm preference revert <path>` |
| 91 | +- `dlm train <path> --no-mined` |
| 92 | + |
| 93 | +## See also |
| 94 | + |
| 95 | +- [Section grammar](sections.md) |
| 96 | +- [CLI reference](../cli/reference.md) |
| 97 | +- [Self-improving loop cookbook](../cookbook/self-improving-loop.md) |