documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 5235 bytes Raw Blame History

Probe-driven training

Close the loop between a differential-testing eval harness and the trainer: failing probes flow back into the document, the adapter retrains, and the next eval run measures improvement. Two directions:

Pull: dlm harvest --sway-json <report> reads a sway JSON report and appends failing probes as ::instruction:: sections tagged !probe, with auto_harvest: true for provenance.
Push: dlm train --listen-rpc <host:port> opens a JSON-RPC endpoint that accepts inject_probe pushes during --watch mode; probes enter a queue and drain at the next cycle boundary.

Both paths assume you run the eval harness (sway or equivalent) separately; dlm owns the document edit and retrain, not the eval.

Pull path — harvesting a sway report

Sway emits a JSON report describing per-probe outcomes. Extract failing probes with reference answers back into the document:

# Dry-run first — shows what would be added, no writes:
dlm harvest mydoc.dlm --sway-json sway-run-1.json

# Apply after review:
dlm harvest mydoc.dlm --sway-json sway-run-1.json --apply

What lands on disk: for each failing probe with evidence.prompt + evidence.reference, one ::instruction:: section in the shape

::instruction::
### Q !probe
<prompt from sway>

### A
<reference from sway>
::

The section carries auto_harvest: true and harvest_source: "<tag>/<probe_name>" for traceability.

Harvest flags

Flag	Effect
`--sway-json PATH`	Required. Path to the sway report.
`--apply`	Write changes to disk. Default: dry-run.
`--dry-run`	Explicit dry-run (default).
`--revert`	Strip all `auto_harvest=True` sections. Mutually exclusive with `--sway-json`.
`--tag NAME`	Override the default `auto-harvest` tag in `harvest_source`.
`--min-confidence F`	Drop candidates below this confidence threshold.
`--strict` / `--lax`	Strict: fail if any failing probe lacks a reference. Lax: skip + log.

Refusals

--sway-json missing → exit 1
Sway JSON malformed → exit 1
No failing probes with references → exit 2 (no candidates)
--revert + --sway-json → exit 1 (mutually exclusive)
Strict mode + probe without reference → exit 1 (hint: --lax)

Revert path

If a harvest pass pulls in noise (bad prompt wording, duplicated content), revert in one command:

dlm harvest mydoc.dlm --revert

All sections with auto_harvest=true are stripped; hand-authored sections stay. Coarser than "undo the last harvest" by design — users audit the diff before --apply, so "undo all auto-edits" is the safe escape hatch.

Push path — live probe injection

For a long-running --watch session, open an RPC endpoint so an external sway (or equivalent) process can push failing probes as they arrive:

export DLM_PROBE_TOKEN=$(openssl rand -hex 16)
dlm train mydoc.dlm --watch --listen-rpc 127.0.0.1:7429

The server accepts POSTs at /rpc:

POST /rpc HTTP/1.1
Authorization: Bearer <token>
Content-Type: application/json

{
  "method": "inject_probe",
  "params": {
    "prompt": "What does DGEMM compute?",
    "reference": "A double-precision general matrix multiplication.",
    "tags": ["nightly-ci"]
  }
}

Successful response:

{"accepted": true, "next_cycle_eta_s": 0, "queue_depth": 1}

Status codes

200 accepted + queued
400 malformed payload (bad JSON, missing fields, non-string tags)
401 missing or invalid bearer token
404 unknown method or path
429 queue past capacity (default 1000)

Security notes

Localhost-only in v1. The endpoint binds whatever host you pass; use 127.0.0.1 unless you know what you're doing. Remote pushes are a training-data-poisoning vector.
Bearer token is mandatory. Without DLM_PROBE_TOKEN set, the flag refuses at startup. The server uses constant-time compare.
Body size capped at 64 KiB. Bounds the DOS surface.
Queue is bounded. Past capacity, returns 429 — the client should retry after the next cycle drain.

Combining pull and push

You can use both: push for real-time streaming during a --watch session, then harvest the accumulated sway reports later to capture anything that didn't reach the live endpoint. The two paths share the same on-disk shape, so the retrain behavior is identical.

What the trainer sees

A harvested or injected probe becomes a ### Q !probe pair in the document. At training time:

Row building: the !probe marker is stripped before the strict instruction parser runs, so the pair trains as a normal SFT example.
Probe extraction: dlm.eval.probes picks up the same marker and uses the pair as an explicit probe prompt for post-train eval.

The effect: every harvested probe both trains the model to answer it right and gets reused as an eval prompt on the retrained adapter. That's the closed loop — sway's complaint becomes a training example and a regression check in one section.

Reference

dlm harvest — docs/cli/reference.md
Section schema (auto_harvest, harvest_source) — docs/format/frontmatter.md
Sway report format — upstream sway docs

View source

  
        1
        # Probe-driven training
      
        2
        
        3
        Close the loop between a differential-testing eval harness and the
      
        4
        trainer: failing probes flow back into the document, the adapter
      
        5
        retrains, and the next eval run measures improvement. Two directions:
      
        6
        
        7
        - **Pull**: `dlm harvest --sway-json <report>` reads a sway JSON report
      
        8
          and appends failing probes as `::instruction::` sections tagged
      
        9
          `!probe`, with `auto_harvest: true` for provenance.
      
        10
        - **Push**: `dlm train --listen-rpc <host:port>` opens a JSON-RPC
      
        11
          endpoint that accepts `inject_probe` pushes during `--watch` mode;
      
        12
          probes enter a queue and drain at the next cycle boundary.
      
        13
        
        14
        Both paths assume you run the eval harness (sway or equivalent)
      
        15
        separately; dlm owns the document edit and retrain, not the eval.
      
        16
        
        17
        ## Pull path — harvesting a sway report
      
        18
        
        19
        Sway emits a JSON report describing per-probe outcomes. Extract failing
      
        20
        probes with reference answers back into the document:
      
        21
        
        22
        ```bash
      
        23
        # Dry-run first — shows what would be added, no writes:
      
        24
        dlm harvest mydoc.dlm --sway-json sway-run-1.json
      
        25
        
        26
        # Apply after review:
      
        27
        dlm harvest mydoc.dlm --sway-json sway-run-1.json --apply
      
        28
        ```
      
        29
        
        30
        What lands on disk: for each failing probe with `evidence.prompt` +
      
        31
        `evidence.reference`, one `::instruction::` section in the shape
      
        32
        
        33
        ```
      
        34
        ::instruction::
      
        35
        ### Q !probe
      
        36
        <prompt from sway>
      
        37
        
        38
        ### A
      
        39
        <reference from sway>
      
        40
        ::
      
        41
        ```
      
        42
        
        43
        The section carries `auto_harvest: true` and
      
        44
        `harvest_source: "<tag>/<probe_name>"` for traceability.
      
        45
        
        46
        ### Harvest flags
      
        47
        
        48
        | Flag | Effect |
      
        49
        |---|---|
      
        50
        | `--sway-json PATH` | Required. Path to the sway report. |
      
        51
        | `--apply` | Write changes to disk. Default: dry-run. |
      
        52
        | `--dry-run` | Explicit dry-run (default). |
      
        53
        | `--revert` | Strip all `auto_harvest=True` sections. Mutually exclusive with `--sway-json`. |
      
        54
        | `--tag NAME` | Override the default `auto-harvest` tag in `harvest_source`. |
      
        55
        | `--min-confidence F` | Drop candidates below this confidence threshold. |
      
        56
        | `--strict` / `--lax` | Strict: fail if any failing probe lacks a reference. Lax: skip + log. |
      
        57
        
        58
        ### Refusals
      
        59
        
        60
        - `--sway-json` missing → exit 1
      
        61
        - Sway JSON malformed → exit 1
      
        62
        - No failing probes with references → exit 2 (no candidates)
      
        63
        - `--revert` + `--sway-json` → exit 1 (mutually exclusive)
      
        64
        - Strict mode + probe without reference → exit 1 (hint: `--lax`)
      
        65
        
        66
        ### Revert path
      
        67
        
        68
        If a harvest pass pulls in noise (bad prompt wording, duplicated
      
        69
        content), revert in one command:
      
        70
        
        71
        ```bash
      
        72
        dlm harvest mydoc.dlm --revert
      
        73
        ```
      
        74
        
        75
        All sections with `auto_harvest=true` are stripped; hand-authored
      
        76
        sections stay. Coarser than "undo the last harvest" by design — users
      
        77
        audit the diff before `--apply`, so "undo all auto-edits" is the safe
      
        78
        escape hatch.
      
        79
        
        80
        ## Push path — live probe injection
      
        81
        
        82
        For a long-running `--watch` session, open an RPC endpoint so an
      
        83
        external sway (or equivalent) process can push failing probes as they
      
        84
        arrive:
      
        85
        
        86
        ```bash
      
        87
        export DLM_PROBE_TOKEN=$(openssl rand -hex 16)
      
        88
        dlm train mydoc.dlm --watch --listen-rpc 127.0.0.1:7429
      
        89
        ```
      
        90
        
        91
        The server accepts POSTs at `/rpc`:
      
        92
        
        93
        ```http
      
        94
        POST /rpc HTTP/1.1
      
        95
        Authorization: Bearer <token>
      
        96
        Content-Type: application/json
      
        97
        
        98
        ```
      
        99
        
        100
        Successful response:
      
        101
        
        102
        ```json
      
        103
        {"accepted": true, "next_cycle_eta_s": 0, "queue_depth": 1}
      
        104
        ```
      
        105
        
        106
        ### Status codes
      
        107
        
        108
        - `200` accepted + queued
      
        109
        - `400` malformed payload (bad JSON, missing fields, non-string tags)
      
        110
        - `401` missing or invalid bearer token
      
        111
        - `404` unknown method or path
      
        112
        - `429` queue past capacity (default 1000)
      
        113
        
        114
        ### Security notes
      
        115
        
        116
        - **Localhost-only in v1.** The endpoint binds whatever host you pass;
      
        117
          use `127.0.0.1` unless you know what you're doing. Remote pushes are
      
        118
          a training-data-poisoning vector.
      
        119
        - **Bearer token is mandatory.** Without `DLM_PROBE_TOKEN` set, the
      
        120
          flag refuses at startup. The server uses constant-time compare.
      
        121
        - **Body size capped at 64 KiB.** Bounds the DOS surface.
      
        122
        - **Queue is bounded.** Past capacity, returns 429 — the client should
      
        123
          retry after the next cycle drain.
      
        124
        
        125
        ### Combining pull and push
      
        126
        
        127
        You can use both: push for real-time streaming during a `--watch`
      
        128
        session, then harvest the accumulated sway reports later to capture
      
        129
        anything that didn't reach the live endpoint. The two paths share the
      
        130
        same on-disk shape, so the retrain behavior is identical.
      
        131
        
        132
        ## What the trainer sees
      
        133
        
        134
        A harvested or injected probe becomes a `### Q !probe` pair in the
      
        135
        document. At training time:
      
        136
        
        137
        - **Row building**: the `!probe` marker is stripped before the strict
      
        138
          instruction parser runs, so the pair trains as a normal SFT example.
      
        139
        - **Probe extraction**: `dlm.eval.probes` picks up the same marker and
      
        140
          uses the pair as an explicit probe prompt for post-train eval.
      
        141
        
        142
        The effect: every harvested probe both *trains the model to answer it
      
        143
        right* and *gets reused as an eval prompt on the retrained adapter*.
      
        144
        That's the closed loop — sway's complaint becomes a training example
      
        145
        and a regression check in one section.
      
        146
        
        147
        ## Reference
      
        148
        
        149
        - `dlm harvest` — `docs/cli/reference.md`
      
        150
        - Section schema (`auto_harvest`, `harvest_source`) — `docs/format/frontmatter.md`
      
        151
        - Sway report format — upstream sway docs

1	# Probe-driven training
2
3	Close the loop between a differential-testing eval harness and the
4	trainer: failing probes flow back into the document, the adapter
5	retrains, and the next eval run measures improvement. Two directions:
6
7	- Pull: `dlm harvest --sway-json <report>` reads a sway JSON report
8	and appends failing probes as `::instruction::` sections tagged
9	`!probe`, with `auto_harvest: true` for provenance.
10	- Push: `dlm train --listen-rpc <host:port>` opens a JSON-RPC
11	endpoint that accepts `inject_probe` pushes during `--watch` mode;
12	probes enter a queue and drain at the next cycle boundary.
13
14	Both paths assume you run the eval harness (sway or equivalent)
15	separately; dlm owns the document edit and retrain, not the eval.
16
17	## Pull path — harvesting a sway report
18
19	Sway emits a JSON report describing per-probe outcomes. Extract failing
20	probes with reference answers back into the document:
21
22	```bash
23	# Dry-run first — shows what would be added, no writes:
24	dlm harvest mydoc.dlm --sway-json sway-run-1.json
25
26	# Apply after review:
27	dlm harvest mydoc.dlm --sway-json sway-run-1.json --apply
28	```
29
30	What lands on disk: for each failing probe with `evidence.prompt` +
31	`evidence.reference`, one `::instruction::` section in the shape
32
33	```
34	::instruction::
35	### Q !probe
36	<prompt from sway>
37
38	### A
39	<reference from sway>
40	::
41	```
42
43	The section carries `auto_harvest: true` and
44	`harvest_source: "<tag>/<probe_name>"` for traceability.
45
46	### Harvest flags
47
48	\| Flag \| Effect \|
49	\|---\|---\|
50	\| `--sway-json PATH` \| Required. Path to the sway report. \|
51	\| `--apply` \| Write changes to disk. Default: dry-run. \|
52	\| `--dry-run` \| Explicit dry-run (default). \|
53	\| `--revert` \| Strip all `auto_harvest=True` sections. Mutually exclusive with `--sway-json`. \|
54	\| `--tag NAME` \| Override the default `auto-harvest` tag in `harvest_source`. \|
55	\| `--min-confidence F` \| Drop candidates below this confidence threshold. \|
56	\| `--strict` / `--lax` \| Strict: fail if any failing probe lacks a reference. Lax: skip + log. \|
57
58	### Refusals
59
60	- `--sway-json` missing → exit 1
61	- Sway JSON malformed → exit 1
62	- No failing probes with references → exit 2 (no candidates)
63	- `--revert` + `--sway-json` → exit 1 (mutually exclusive)
64	- Strict mode + probe without reference → exit 1 (hint: `--lax`)
65
66	### Revert path
67
68	If a harvest pass pulls in noise (bad prompt wording, duplicated
69	content), revert in one command:
70
71	```bash
72	dlm harvest mydoc.dlm --revert
73	```
74
75	All sections with `auto_harvest=true` are stripped; hand-authored
76	sections stay. Coarser than "undo the last harvest" by design — users
77	audit the diff before `--apply`, so "undo all auto-edits" is the safe
78	escape hatch.
79
80	## Push path — live probe injection
81
82	For a long-running `--watch` session, open an RPC endpoint so an
83	external sway (or equivalent) process can push failing probes as they
84	arrive:
85
86	```bash
87	export DLM_PROBE_TOKEN=$(openssl rand -hex 16)
88	dlm train mydoc.dlm --watch --listen-rpc 127.0.0.1:7429
89	```
90
91	The server accepts POSTs at `/rpc`:
92
93	```http
94	POST /rpc HTTP/1.1
95	Authorization: Bearer <token>
96	Content-Type: application/json
97
98	```
99
100	Successful response:
101
102	```json
103	{"accepted": true, "next_cycle_eta_s": 0, "queue_depth": 1}
104	```
105
106	### Status codes
107
108	- `200` accepted + queued
109	- `400` malformed payload (bad JSON, missing fields, non-string tags)
110	- `401` missing or invalid bearer token
111	- `404` unknown method or path
112	- `429` queue past capacity (default 1000)
113
114	### Security notes
115
116	- Localhost-only in v1. The endpoint binds whatever host you pass;
117	use `127.0.0.1` unless you know what you're doing. Remote pushes are
118	a training-data-poisoning vector.
119	- Bearer token is mandatory. Without `DLM_PROBE_TOKEN` set, the
120	flag refuses at startup. The server uses constant-time compare.
121	- Body size capped at 64 KiB. Bounds the DOS surface.
122	- Queue is bounded. Past capacity, returns 429 — the client should
123	retry after the next cycle drain.
124
125	### Combining pull and push
126
127	You can use both: push for real-time streaming during a `--watch`
128	session, then harvest the accumulated sway reports later to capture
129	anything that didn't reach the live endpoint. The two paths share the
130	same on-disk shape, so the retrain behavior is identical.
131
132	## What the trainer sees
133
134	A harvested or injected probe becomes a `### Q !probe` pair in the
135	document. At training time:
136
137	- Row building: the `!probe` marker is stripped before the strict
138	instruction parser runs, so the pair trains as a normal SFT example.
139	- Probe extraction: `dlm.eval.probes` picks up the same marker and
140	uses the pair as an explicit probe prompt for post-train eval.
141
142	The effect: every harvested probe both *trains the model to answer it
143	right* and gets reused as an eval prompt on the retrained adapter.
144	That's the closed loop — sway's complaint becomes a training example
145	and a regression check in one section.
146
147	## Reference
148
149	- `dlm harvest` — `docs/cli/reference.md`
150	- Section schema (`auto_harvest`, `harvest_source`) — `docs/format/frontmatter.md`
151	- Sway report format — upstream sway docs