markdown · 3836 bytes Raw Blame History

Metrics & observability

Every dlm train cycle writes its step and eval metrics into a per-store SQLite database at ~/.dlm/store/<dlm_id>/metrics.sqlite. dlm metrics reads from that DB. Optional TensorBoard / W&B sinks are available behind the observability extra.

What gets recorded

  • runs: one row per trainer.run invocation — run_id, started_at, ended_at, adapter_version, phase, seed, status (running / ok / failed / cancelled).
  • steps: one row per logged training step — run_id, step, loss, lr, grad_norm, timestamp.
  • evals: one row per eval cadence hit — run_id, step, val_loss, perplexity, optional retention.
  • exports: one row per dlm export completion.

Writes are best-effort: a metrics failure never takes down training.

dlm metrics <path>

Default view lists the most-recent runs:

$ dlm metrics mydoc.dlm
Runs: 3
  run_id=3  phase=sft  seed=42  status=ok  started=2026-04-20T17:12:04Z
  run_id=2  phase=sft  seed=42  status=ok  started=2026-04-20T16:58:11Z
  run_id=1  phase=sft  seed=42  status=ok  started=2026-04-20T16:40:22Z

Drill into one run with --run-id N to see step + eval counts. --json emits a machine-readable object; --csv emits the steps + eval table for spreadsheet import.

Filters

  • --phase sft|dpo|orpo|cpt — restrict to one training phase.
  • --since 24h|7d|30m|10s — time window on started_at.
  • --run-id N — drill-down on a specific run.
  • --limit N — cap the number of runs returned (default 20).

dlm metrics watch <path>

Tails the metrics DB — prints new step and eval rows as they land. Useful in a second terminal while dlm train (or dlm train --watch) runs in the first.

$ dlm metrics watch mydoc.dlm
metrics watch: polling ~/.dlm/store/01HZ.../ every 1.0s (Ctrl-C to exit)
→ following run_id=4
  step    10  loss=1.87  lr=0.0002  grad_norm=0.31
  step    20  loss=1.73  lr=0.00018  grad_norm=0.27
  eval @ step 20  val_loss=1.81  perplexity=6.11

--poll-seconds N tunes how often the DB is re-read (default 1.0).

TensorBoard sink

uv sync --extra observability
dlm train mydoc.dlm --tensorboard
tensorboard --logdir ~/.dlm/store/<dlm_id>/tensorboard

The sink writes one run directory per trainer.run under store/tensorboard/run_NNNN/. Scalars logged: train/loss, train/lr, train/grad_norm, eval/val_loss, eval/perplexity.

Skipped cleanly if the observability extra isn't installed — you get the SQLite DB either way.

W&B sink (opt-in)

uv sync --extra observability
dlm train mydoc.dlm --wandb my-project

Runs W&B in offline mode by default. The run directory sits at store/wandb/offline-run-*/. To upload, run wandb sync <dir> explicitly — we never upload automatically. If you haven't logged in to W&B, offline mode still captures the run locally for later review.

Privacy posture: no network calls from the training process. Uploading is always a separate, explicit step.

SQLite schema

The database at metrics.sqlite is queryable directly:

sqlite3 ~/.dlm/store/<dlm_id>/metrics.sqlite
sqlite> .tables
evals    exports  runs     steps
sqlite> SELECT run_id, phase, status FROM runs;

WAL mode is on: readers (including the CLI) don't block the trainer, and a Ctrl-C mid-write leaves a recoverable DB.

Pruning

No auto-prune today. If the DB grows past comfort, drop older rows:

DELETE FROM steps WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
DELETE FROM evals WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
DELETE FROM runs  WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
VACUUM;

A built-in dlm metrics prune is on the backlog.

View source
1 # Metrics & observability
2
3 Every `dlm train` cycle writes its step and eval metrics into a
4 per-store SQLite database at `~/.dlm/store/<dlm_id>/metrics.sqlite`.
5 `dlm metrics` reads from that DB. Optional TensorBoard / W&B sinks
6 are available behind the `observability` extra.
7
8 ## What gets recorded
9
10 - **runs**: one row per `trainer.run` invocation —
11 `run_id`, `started_at`, `ended_at`, `adapter_version`, `phase`,
12 `seed`, `status` (`running` / `ok` / `failed` / `cancelled`).
13 - **steps**: one row per logged training step —
14 `run_id`, `step`, `loss`, `lr`, `grad_norm`, timestamp.
15 - **evals**: one row per eval cadence hit —
16 `run_id`, `step`, `val_loss`, `perplexity`, optional `retention`.
17 - **exports**: one row per `dlm export` completion.
18
19 Writes are best-effort: a metrics failure never takes down training.
20
21 ## `dlm metrics <path>`
22
23 Default view lists the most-recent runs:
24
25 ```bash
26 $ dlm metrics mydoc.dlm
27 Runs: 3
28 run_id=3 phase=sft seed=42 status=ok started=2026-04-20T17:12:04Z
29 run_id=2 phase=sft seed=42 status=ok started=2026-04-20T16:58:11Z
30 run_id=1 phase=sft seed=42 status=ok started=2026-04-20T16:40:22Z
31 ```
32
33 Drill into one run with `--run-id N` to see step + eval counts.
34 `--json` emits a machine-readable object; `--csv` emits the steps +
35 eval table for spreadsheet import.
36
37 ### Filters
38
39 - `--phase sft|dpo|orpo|cpt` — restrict to one training phase.
40 - `--since 24h|7d|30m|10s` — time window on `started_at`.
41 - `--run-id N` — drill-down on a specific run.
42 - `--limit N` — cap the number of runs returned (default 20).
43
44 ## `dlm metrics watch <path>`
45
46 Tails the metrics DB — prints new step and eval rows as they land.
47 Useful in a second terminal while `dlm train` (or `dlm train --watch`)
48 runs in the first.
49
50 ```bash
51 $ dlm metrics watch mydoc.dlm
52 metrics watch: polling ~/.dlm/store/01HZ.../ every 1.0s (Ctrl-C to exit)
53 → following run_id=4
54 step 10 loss=1.87 lr=0.0002 grad_norm=0.31
55 step 20 loss=1.73 lr=0.00018 grad_norm=0.27
56 eval @ step 20 val_loss=1.81 perplexity=6.11
57 ```
58
59 `--poll-seconds N` tunes how often the DB is re-read (default 1.0).
60
61 ## TensorBoard sink
62
63 ```bash
64 uv sync --extra observability
65 dlm train mydoc.dlm --tensorboard
66 tensorboard --logdir ~/.dlm/store/<dlm_id>/tensorboard
67 ```
68
69 The sink writes one run directory per `trainer.run` under
70 `store/tensorboard/run_NNNN/`. Scalars logged: `train/loss`,
71 `train/lr`, `train/grad_norm`, `eval/val_loss`, `eval/perplexity`.
72
73 Skipped cleanly if the `observability` extra isn't installed — you
74 get the SQLite DB either way.
75
76 ## W&B sink (opt-in)
77
78 ```bash
79 uv sync --extra observability
80 dlm train mydoc.dlm --wandb my-project
81 ```
82
83 Runs W&B in **offline mode** by default. The run directory sits at
84 `store/wandb/offline-run-*/`. To upload, run `wandb sync <dir>`
85 explicitly — we never upload automatically. If you haven't logged
86 in to W&B, offline mode still captures the run locally for later
87 review.
88
89 Privacy posture: no network calls from the training process.
90 Uploading is always a separate, explicit step.
91
92 ## SQLite schema
93
94 The database at `metrics.sqlite` is queryable directly:
95
96 ```bash
97 sqlite3 ~/.dlm/store/<dlm_id>/metrics.sqlite
98 sqlite> .tables
99 evals exports runs steps
100 sqlite> SELECT run_id, phase, status FROM runs;
101 ```
102
103 WAL mode is on: readers (including the CLI) don't block the trainer,
104 and a Ctrl-C mid-write leaves a recoverable DB.
105
106 ## Pruning
107
108 No auto-prune today. If the DB grows past comfort, drop older rows:
109
110 ```sql
111 DELETE FROM steps WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
112 DELETE FROM evals WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
113 DELETE FROM runs WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
114 VACUUM;
115 ```
116
117 A built-in `dlm metrics prune` is on the backlog.