documentlanguagemodel Public
Metrics & observability
Every dlm train cycle writes its step and eval metrics into a
per-store SQLite database at ~/.dlm/store/<dlm_id>/metrics.sqlite.
dlm metrics reads from that DB. Optional TensorBoard / W&B sinks
are available behind the observability extra.
What gets recorded
- runs: one row per
trainer.runinvocation —run_id,started_at,ended_at,adapter_version,phase,seed,status(running/ok/failed/cancelled). - steps: one row per logged training step —
run_id,step,loss,lr,grad_norm, timestamp. - evals: one row per eval cadence hit —
run_id,step,val_loss,perplexity, optionalretention. - exports: one row per
dlm exportcompletion.
Writes are best-effort: a metrics failure never takes down training.
dlm metrics <path>
Default view lists the most-recent runs:
$ dlm metrics mydoc.dlm
Runs: 3
run_id=3 phase=sft seed=42 status=ok started=2026-04-20T17:12:04Z
run_id=2 phase=sft seed=42 status=ok started=2026-04-20T16:58:11Z
run_id=1 phase=sft seed=42 status=ok started=2026-04-20T16:40:22Z
Drill into one run with --run-id N to see step + eval counts.
--json emits a machine-readable object; --csv emits the steps +
eval table for spreadsheet import.
Filters
--phase sft|dpo|orpo|cpt— restrict to one training phase.--since 24h|7d|30m|10s— time window onstarted_at.--run-id N— drill-down on a specific run.--limit N— cap the number of runs returned (default 20).
dlm metrics watch <path>
Tails the metrics DB — prints new step and eval rows as they land.
Useful in a second terminal while dlm train (or dlm train --watch)
runs in the first.
$ dlm metrics watch mydoc.dlm
metrics watch: polling ~/.dlm/store/01HZ.../ every 1.0s (Ctrl-C to exit)
→ following run_id=4
step 10 loss=1.87 lr=0.0002 grad_norm=0.31
step 20 loss=1.73 lr=0.00018 grad_norm=0.27
eval @ step 20 val_loss=1.81 perplexity=6.11
--poll-seconds N tunes how often the DB is re-read (default 1.0).
TensorBoard sink
uv sync --extra observability
dlm train mydoc.dlm --tensorboard
tensorboard --logdir ~/.dlm/store/<dlm_id>/tensorboard
The sink writes one run directory per trainer.run under
store/tensorboard/run_NNNN/. Scalars logged: train/loss,
train/lr, train/grad_norm, eval/val_loss, eval/perplexity.
Skipped cleanly if the observability extra isn't installed — you
get the SQLite DB either way.
W&B sink (opt-in)
uv sync --extra observability
dlm train mydoc.dlm --wandb my-project
Runs W&B in offline mode by default. The run directory sits at
store/wandb/offline-run-*/. To upload, run wandb sync <dir>
explicitly — we never upload automatically. If you haven't logged
in to W&B, offline mode still captures the run locally for later
review.
Privacy posture: no network calls from the training process. Uploading is always a separate, explicit step.
SQLite schema
The database at metrics.sqlite is queryable directly:
sqlite3 ~/.dlm/store/<dlm_id>/metrics.sqlite
sqlite> .tables
evals exports runs steps
sqlite> SELECT run_id, phase, status FROM runs;
WAL mode is on: readers (including the CLI) don't block the trainer, and a Ctrl-C mid-write leaves a recoverable DB.
Pruning
No auto-prune today. If the DB grows past comfort, drop older rows:
DELETE FROM steps WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
DELETE FROM evals WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
DELETE FROM runs WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
VACUUM;
A built-in dlm metrics prune is on the backlog.
View source
| 1 | # Metrics & observability |
| 2 | |
| 3 | Every `dlm train` cycle writes its step and eval metrics into a |
| 4 | per-store SQLite database at `~/.dlm/store/<dlm_id>/metrics.sqlite`. |
| 5 | `dlm metrics` reads from that DB. Optional TensorBoard / W&B sinks |
| 6 | are available behind the `observability` extra. |
| 7 | |
| 8 | ## What gets recorded |
| 9 | |
| 10 | - **runs**: one row per `trainer.run` invocation — |
| 11 | `run_id`, `started_at`, `ended_at`, `adapter_version`, `phase`, |
| 12 | `seed`, `status` (`running` / `ok` / `failed` / `cancelled`). |
| 13 | - **steps**: one row per logged training step — |
| 14 | `run_id`, `step`, `loss`, `lr`, `grad_norm`, timestamp. |
| 15 | - **evals**: one row per eval cadence hit — |
| 16 | `run_id`, `step`, `val_loss`, `perplexity`, optional `retention`. |
| 17 | - **exports**: one row per `dlm export` completion. |
| 18 | |
| 19 | Writes are best-effort: a metrics failure never takes down training. |
| 20 | |
| 21 | ## `dlm metrics <path>` |
| 22 | |
| 23 | Default view lists the most-recent runs: |
| 24 | |
| 25 | ```bash |
| 26 | $ dlm metrics mydoc.dlm |
| 27 | Runs: 3 |
| 28 | run_id=3 phase=sft seed=42 status=ok started=2026-04-20T17:12:04Z |
| 29 | run_id=2 phase=sft seed=42 status=ok started=2026-04-20T16:58:11Z |
| 30 | run_id=1 phase=sft seed=42 status=ok started=2026-04-20T16:40:22Z |
| 31 | ``` |
| 32 | |
| 33 | Drill into one run with `--run-id N` to see step + eval counts. |
| 34 | `--json` emits a machine-readable object; `--csv` emits the steps + |
| 35 | eval table for spreadsheet import. |
| 36 | |
| 37 | ### Filters |
| 38 | |
| 39 | - `--phase sft|dpo|orpo|cpt` — restrict to one training phase. |
| 40 | - `--since 24h|7d|30m|10s` — time window on `started_at`. |
| 41 | - `--run-id N` — drill-down on a specific run. |
| 42 | - `--limit N` — cap the number of runs returned (default 20). |
| 43 | |
| 44 | ## `dlm metrics watch <path>` |
| 45 | |
| 46 | Tails the metrics DB — prints new step and eval rows as they land. |
| 47 | Useful in a second terminal while `dlm train` (or `dlm train --watch`) |
| 48 | runs in the first. |
| 49 | |
| 50 | ```bash |
| 51 | $ dlm metrics watch mydoc.dlm |
| 52 | metrics watch: polling ~/.dlm/store/01HZ.../ every 1.0s (Ctrl-C to exit) |
| 53 | → following run_id=4 |
| 54 | step 10 loss=1.87 lr=0.0002 grad_norm=0.31 |
| 55 | step 20 loss=1.73 lr=0.00018 grad_norm=0.27 |
| 56 | eval @ step 20 val_loss=1.81 perplexity=6.11 |
| 57 | ``` |
| 58 | |
| 59 | `--poll-seconds N` tunes how often the DB is re-read (default 1.0). |
| 60 | |
| 61 | ## TensorBoard sink |
| 62 | |
| 63 | ```bash |
| 64 | uv sync --extra observability |
| 65 | dlm train mydoc.dlm --tensorboard |
| 66 | tensorboard --logdir ~/.dlm/store/<dlm_id>/tensorboard |
| 67 | ``` |
| 68 | |
| 69 | The sink writes one run directory per `trainer.run` under |
| 70 | `store/tensorboard/run_NNNN/`. Scalars logged: `train/loss`, |
| 71 | `train/lr`, `train/grad_norm`, `eval/val_loss`, `eval/perplexity`. |
| 72 | |
| 73 | Skipped cleanly if the `observability` extra isn't installed — you |
| 74 | get the SQLite DB either way. |
| 75 | |
| 76 | ## W&B sink (opt-in) |
| 77 | |
| 78 | ```bash |
| 79 | uv sync --extra observability |
| 80 | dlm train mydoc.dlm --wandb my-project |
| 81 | ``` |
| 82 | |
| 83 | Runs W&B in **offline mode** by default. The run directory sits at |
| 84 | `store/wandb/offline-run-*/`. To upload, run `wandb sync <dir>` |
| 85 | explicitly — we never upload automatically. If you haven't logged |
| 86 | in to W&B, offline mode still captures the run locally for later |
| 87 | review. |
| 88 | |
| 89 | Privacy posture: no network calls from the training process. |
| 90 | Uploading is always a separate, explicit step. |
| 91 | |
| 92 | ## SQLite schema |
| 93 | |
| 94 | The database at `metrics.sqlite` is queryable directly: |
| 95 | |
| 96 | ```bash |
| 97 | sqlite3 ~/.dlm/store/<dlm_id>/metrics.sqlite |
| 98 | sqlite> .tables |
| 99 | evals exports runs steps |
| 100 | sqlite> SELECT run_id, phase, status FROM runs; |
| 101 | ``` |
| 102 | |
| 103 | WAL mode is on: readers (including the CLI) don't block the trainer, |
| 104 | and a Ctrl-C mid-write leaves a recoverable DB. |
| 105 | |
| 106 | ## Pruning |
| 107 | |
| 108 | No auto-prune today. If the DB grows past comfort, drop older rows: |
| 109 | |
| 110 | ```sql |
| 111 | DELETE FROM steps WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10); |
| 112 | DELETE FROM evals WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10); |
| 113 | DELETE FROM runs WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10); |
| 114 | VACUUM; |
| 115 | ``` |
| 116 | |
| 117 | A built-in `dlm metrics prune` is on the backlog. |