documentlanguagemodel Public
Probe-driven training
Close the loop between a differential-testing eval harness and the trainer: failing probes flow back into the document, the adapter retrains, and the next eval run measures improvement. Two directions:
- Pull:
dlm harvest --sway-json <report>reads a sway JSON report and appends failing probes as::instruction::sections tagged!probe, withauto_harvest: truefor provenance. - Push:
dlm train --listen-rpc <host:port>opens a JSON-RPC endpoint that acceptsinject_probepushes during--watchmode; probes enter a queue and drain at the next cycle boundary.
Both paths assume you run the eval harness (sway or equivalent) separately; dlm owns the document edit and retrain, not the eval.
Pull path — harvesting a sway report
Sway emits a JSON report describing per-probe outcomes. Extract failing probes with reference answers back into the document:
# Dry-run first — shows what would be added, no writes:
dlm harvest mydoc.dlm --sway-json sway-run-1.json
# Apply after review:
dlm harvest mydoc.dlm --sway-json sway-run-1.json --apply
What lands on disk: for each failing probe with evidence.prompt +
evidence.reference, one ::instruction:: section in the shape
::instruction::
### Q !probe
<prompt from sway>
### A
<reference from sway>
::
The section carries auto_harvest: true and
harvest_source: "<tag>/<probe_name>" for traceability.
Harvest flags
| Flag | Effect |
|---|---|
--sway-json PATH |
Required. Path to the sway report. |
--apply |
Write changes to disk. Default: dry-run. |
--dry-run |
Explicit dry-run (default). |
--revert |
Strip all auto_harvest=True sections. Mutually exclusive with --sway-json. |
--tag NAME |
Override the default auto-harvest tag in harvest_source. |
--min-confidence F |
Drop candidates below this confidence threshold. |
--strict / --lax |
Strict: fail if any failing probe lacks a reference. Lax: skip + log. |
Refusals
--sway-jsonmissing → exit 1- Sway JSON malformed → exit 1
- No failing probes with references → exit 2 (no candidates)
--revert+--sway-json→ exit 1 (mutually exclusive)- Strict mode + probe without reference → exit 1 (hint:
--lax)
Revert path
If a harvest pass pulls in noise (bad prompt wording, duplicated content), revert in one command:
dlm harvest mydoc.dlm --revert
All sections with auto_harvest=true are stripped; hand-authored
sections stay. Coarser than "undo the last harvest" by design — users
audit the diff before --apply, so "undo all auto-edits" is the safe
escape hatch.
Push path — live probe injection
For a long-running --watch session, open an RPC endpoint so an
external sway (or equivalent) process can push failing probes as they
arrive:
export DLM_PROBE_TOKEN=$(openssl rand -hex 16)
dlm train mydoc.dlm --watch --listen-rpc 127.0.0.1:7429
The server accepts POSTs at /rpc:
POST /rpc HTTP/1.1
Authorization: Bearer <token>
Content-Type: application/json
{
"method": "inject_probe",
"params": {
"prompt": "What does DGEMM compute?",
"reference": "A double-precision general matrix multiplication.",
"tags": ["nightly-ci"]
}
}
Successful response:
{"accepted": true, "next_cycle_eta_s": 0, "queue_depth": 1}
Status codes
200accepted + queued400malformed payload (bad JSON, missing fields, non-string tags)401missing or invalid bearer token404unknown method or path429queue past capacity (default 1000)
Security notes
- Localhost-only in v1. The endpoint binds whatever host you pass;
use
127.0.0.1unless you know what you're doing. Remote pushes are a training-data-poisoning vector. - Bearer token is mandatory. Without
DLM_PROBE_TOKENset, the flag refuses at startup. The server uses constant-time compare. - Body size capped at 64 KiB. Bounds the DOS surface.
- Queue is bounded. Past capacity, returns 429 — the client should retry after the next cycle drain.
Combining pull and push
You can use both: push for real-time streaming during a --watch
session, then harvest the accumulated sway reports later to capture
anything that didn't reach the live endpoint. The two paths share the
same on-disk shape, so the retrain behavior is identical.
What the trainer sees
A harvested or injected probe becomes a ### Q !probe pair in the
document. At training time:
- Row building: the
!probemarker is stripped before the strict instruction parser runs, so the pair trains as a normal SFT example. - Probe extraction:
dlm.eval.probespicks up the same marker and uses the pair as an explicit probe prompt for post-train eval.
The effect: every harvested probe both trains the model to answer it right and gets reused as an eval prompt on the retrained adapter. That's the closed loop — sway's complaint becomes a training example and a regression check in one section.
Reference
dlm harvest—docs/cli/reference.md- Section schema (
auto_harvest,harvest_source) —docs/format/frontmatter.md - Sway report format — upstream sway docs
View source
| 1 | # Probe-driven training |
| 2 | |
| 3 | Close the loop between a differential-testing eval harness and the |
| 4 | trainer: failing probes flow back into the document, the adapter |
| 5 | retrains, and the next eval run measures improvement. Two directions: |
| 6 | |
| 7 | - **Pull**: `dlm harvest --sway-json <report>` reads a sway JSON report |
| 8 | and appends failing probes as `::instruction::` sections tagged |
| 9 | `!probe`, with `auto_harvest: true` for provenance. |
| 10 | - **Push**: `dlm train --listen-rpc <host:port>` opens a JSON-RPC |
| 11 | endpoint that accepts `inject_probe` pushes during `--watch` mode; |
| 12 | probes enter a queue and drain at the next cycle boundary. |
| 13 | |
| 14 | Both paths assume you run the eval harness (sway or equivalent) |
| 15 | separately; dlm owns the document edit and retrain, not the eval. |
| 16 | |
| 17 | ## Pull path — harvesting a sway report |
| 18 | |
| 19 | Sway emits a JSON report describing per-probe outcomes. Extract failing |
| 20 | probes with reference answers back into the document: |
| 21 | |
| 22 | ```bash |
| 23 | # Dry-run first — shows what would be added, no writes: |
| 24 | dlm harvest mydoc.dlm --sway-json sway-run-1.json |
| 25 | |
| 26 | # Apply after review: |
| 27 | dlm harvest mydoc.dlm --sway-json sway-run-1.json --apply |
| 28 | ``` |
| 29 | |
| 30 | What lands on disk: for each failing probe with `evidence.prompt` + |
| 31 | `evidence.reference`, one `::instruction::` section in the shape |
| 32 | |
| 33 | ``` |
| 34 | ::instruction:: |
| 35 | ### Q !probe |
| 36 | <prompt from sway> |
| 37 | |
| 38 | ### A |
| 39 | <reference from sway> |
| 40 | :: |
| 41 | ``` |
| 42 | |
| 43 | The section carries `auto_harvest: true` and |
| 44 | `harvest_source: "<tag>/<probe_name>"` for traceability. |
| 45 | |
| 46 | ### Harvest flags |
| 47 | |
| 48 | | Flag | Effect | |
| 49 | |---|---| |
| 50 | | `--sway-json PATH` | Required. Path to the sway report. | |
| 51 | | `--apply` | Write changes to disk. Default: dry-run. | |
| 52 | | `--dry-run` | Explicit dry-run (default). | |
| 53 | | `--revert` | Strip all `auto_harvest=True` sections. Mutually exclusive with `--sway-json`. | |
| 54 | | `--tag NAME` | Override the default `auto-harvest` tag in `harvest_source`. | |
| 55 | | `--min-confidence F` | Drop candidates below this confidence threshold. | |
| 56 | | `--strict` / `--lax` | Strict: fail if any failing probe lacks a reference. Lax: skip + log. | |
| 57 | |
| 58 | ### Refusals |
| 59 | |
| 60 | - `--sway-json` missing → exit 1 |
| 61 | - Sway JSON malformed → exit 1 |
| 62 | - No failing probes with references → exit 2 (no candidates) |
| 63 | - `--revert` + `--sway-json` → exit 1 (mutually exclusive) |
| 64 | - Strict mode + probe without reference → exit 1 (hint: `--lax`) |
| 65 | |
| 66 | ### Revert path |
| 67 | |
| 68 | If a harvest pass pulls in noise (bad prompt wording, duplicated |
| 69 | content), revert in one command: |
| 70 | |
| 71 | ```bash |
| 72 | dlm harvest mydoc.dlm --revert |
| 73 | ``` |
| 74 | |
| 75 | All sections with `auto_harvest=true` are stripped; hand-authored |
| 76 | sections stay. Coarser than "undo the last harvest" by design — users |
| 77 | audit the diff before `--apply`, so "undo all auto-edits" is the safe |
| 78 | escape hatch. |
| 79 | |
| 80 | ## Push path — live probe injection |
| 81 | |
| 82 | For a long-running `--watch` session, open an RPC endpoint so an |
| 83 | external sway (or equivalent) process can push failing probes as they |
| 84 | arrive: |
| 85 | |
| 86 | ```bash |
| 87 | export DLM_PROBE_TOKEN=$(openssl rand -hex 16) |
| 88 | dlm train mydoc.dlm --watch --listen-rpc 127.0.0.1:7429 |
| 89 | ``` |
| 90 | |
| 91 | The server accepts POSTs at `/rpc`: |
| 92 | |
| 93 | ```http |
| 94 | POST /rpc HTTP/1.1 |
| 95 | Authorization: Bearer <token> |
| 96 | Content-Type: application/json |
| 97 | |
| 98 | ``` |
| 99 | |
| 100 | Successful response: |
| 101 | |
| 102 | ```json |
| 103 | {"accepted": true, "next_cycle_eta_s": 0, "queue_depth": 1} |
| 104 | ``` |
| 105 | |
| 106 | ### Status codes |
| 107 | |
| 108 | - `200` accepted + queued |
| 109 | - `400` malformed payload (bad JSON, missing fields, non-string tags) |
| 110 | - `401` missing or invalid bearer token |
| 111 | - `404` unknown method or path |
| 112 | - `429` queue past capacity (default 1000) |
| 113 | |
| 114 | ### Security notes |
| 115 | |
| 116 | - **Localhost-only in v1.** The endpoint binds whatever host you pass; |
| 117 | use `127.0.0.1` unless you know what you're doing. Remote pushes are |
| 118 | a training-data-poisoning vector. |
| 119 | - **Bearer token is mandatory.** Without `DLM_PROBE_TOKEN` set, the |
| 120 | flag refuses at startup. The server uses constant-time compare. |
| 121 | - **Body size capped at 64 KiB.** Bounds the DOS surface. |
| 122 | - **Queue is bounded.** Past capacity, returns 429 — the client should |
| 123 | retry after the next cycle drain. |
| 124 | |
| 125 | ### Combining pull and push |
| 126 | |
| 127 | You can use both: push for real-time streaming during a `--watch` |
| 128 | session, then harvest the accumulated sway reports later to capture |
| 129 | anything that didn't reach the live endpoint. The two paths share the |
| 130 | same on-disk shape, so the retrain behavior is identical. |
| 131 | |
| 132 | ## What the trainer sees |
| 133 | |
| 134 | A harvested or injected probe becomes a `### Q !probe` pair in the |
| 135 | document. At training time: |
| 136 | |
| 137 | - **Row building**: the `!probe` marker is stripped before the strict |
| 138 | instruction parser runs, so the pair trains as a normal SFT example. |
| 139 | - **Probe extraction**: `dlm.eval.probes` picks up the same marker and |
| 140 | uses the pair as an explicit probe prompt for post-train eval. |
| 141 | |
| 142 | The effect: every harvested probe both *trains the model to answer it |
| 143 | right* and *gets reused as an eval prompt on the retrained adapter*. |
| 144 | That's the closed loop — sway's complaint becomes a training example |
| 145 | and a regression check in one section. |
| 146 | |
| 147 | ## Reference |
| 148 | |
| 149 | - `dlm harvest` — `docs/cli/reference.md` |
| 150 | - Section schema (`auto_harvest`, `harvest_source`) — `docs/format/frontmatter.md` |
| 151 | - Sway report format — upstream sway docs |