documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 3836 bytes Raw Blame History

Metrics & observability

Every dlm train cycle writes its step and eval metrics into a per-store SQLite database at ~/.dlm/store/<dlm_id>/metrics.sqlite. dlm metrics reads from that DB. Optional TensorBoard / W&B sinks are available behind the observability extra.

What gets recorded

runs: one row per trainer.run invocation — run_id, started_at, ended_at, adapter_version, phase, seed, status (running / ok / failed / cancelled).
steps: one row per logged training step — run_id, step, loss, lr, grad_norm, timestamp.
evals: one row per eval cadence hit — run_id, step, val_loss, perplexity, optional retention.
exports: one row per dlm export completion.

Writes are best-effort: a metrics failure never takes down training.

`dlm metrics <path>`

Default view lists the most-recent runs:

$ dlm metrics mydoc.dlm
Runs: 3
  run_id=3  phase=sft  seed=42  status=ok  started=2026-04-20T17:12:04Z
  run_id=2  phase=sft  seed=42  status=ok  started=2026-04-20T16:58:11Z
  run_id=1  phase=sft  seed=42  status=ok  started=2026-04-20T16:40:22Z

Drill into one run with --run-id N to see step + eval counts. --json emits a machine-readable object; --csv emits the steps + eval table for spreadsheet import.

Filters

--phase sft|dpo|orpo|cpt — restrict to one training phase.
--since 24h|7d|30m|10s — time window on started_at.
--run-id N — drill-down on a specific run.
--limit N — cap the number of runs returned (default 20).

`dlm metrics watch <path>`

Tails the metrics DB — prints new step and eval rows as they land. Useful in a second terminal while dlm train (or dlm train --watch) runs in the first.

$ dlm metrics watch mydoc.dlm
metrics watch: polling ~/.dlm/store/01HZ.../ every 1.0s (Ctrl-C to exit)
→ following run_id=4
  step    10  loss=1.87  lr=0.0002  grad_norm=0.31
  step    20  loss=1.73  lr=0.00018  grad_norm=0.27
  eval @ step 20  val_loss=1.81  perplexity=6.11

--poll-seconds N tunes how often the DB is re-read (default 1.0).

TensorBoard sink

uv sync --extra observability
dlm train mydoc.dlm --tensorboard
tensorboard --logdir ~/.dlm/store/<dlm_id>/tensorboard

The sink writes one run directory per trainer.run under store/tensorboard/run_NNNN/. Scalars logged: train/loss, train/lr, train/grad_norm, eval/val_loss, eval/perplexity.

Skipped cleanly if the observability extra isn't installed — you get the SQLite DB either way.

W&B sink (opt-in)

uv sync --extra observability
dlm train mydoc.dlm --wandb my-project

Runs W&B in offline mode by default. The run directory sits at store/wandb/offline-run-*/. To upload, run wandb sync <dir> explicitly — we never upload automatically. If you haven't logged in to W&B, offline mode still captures the run locally for later review.

Privacy posture: no network calls from the training process. Uploading is always a separate, explicit step.

SQLite schema

The database at metrics.sqlite is queryable directly:

sqlite3 ~/.dlm/store/<dlm_id>/metrics.sqlite
sqlite> .tables
evals    exports  runs     steps
sqlite> SELECT run_id, phase, status FROM runs;

WAL mode is on: readers (including the CLI) don't block the trainer, and a Ctrl-C mid-write leaves a recoverable DB.

Pruning

No auto-prune today. If the DB grows past comfort, drop older rows:

DELETE FROM steps WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
DELETE FROM evals WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
DELETE FROM runs  WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
VACUUM;

A built-in dlm metrics prune is on the backlog.

View source

  
        1
        # Metrics & observability
      
        2
        
        3
        Every `dlm train` cycle writes its step and eval metrics into a
      
        4
        per-store SQLite database at `~/.dlm/store/<dlm_id>/metrics.sqlite`.
      
        5
        `dlm metrics` reads from that DB. Optional TensorBoard / W&B sinks
      
        6
        are available behind the `observability` extra.
      
        7
        
        8
        ## What gets recorded
      
        9
        
        10
        - **runs**: one row per `trainer.run` invocation —
      
        11
          `run_id`, `started_at`, `ended_at`, `adapter_version`, `phase`,
      
        12
          `seed`, `status` (`running` / `ok` / `failed` / `cancelled`).
      
        13
        - **steps**: one row per logged training step —
      
        14
          `run_id`, `step`, `loss`, `lr`, `grad_norm`, timestamp.
      
        15
        - **evals**: one row per eval cadence hit —
      
        16
          `run_id`, `step`, `val_loss`, `perplexity`, optional `retention`.
      
        17
        - **exports**: one row per `dlm export` completion.
      
        18
        
        19
        Writes are best-effort: a metrics failure never takes down training.
      
        20
        
        21
        ## `dlm metrics <path>`
      
        22
        
        23
        Default view lists the most-recent runs:
      
        24
        
        25
        ```bash
      
        26
        $ dlm metrics mydoc.dlm
      
        27
        Runs: 3
      
        28
          run_id=3  phase=sft  seed=42  status=ok  started=2026-04-20T17:12:04Z
      
        29
          run_id=2  phase=sft  seed=42  status=ok  started=2026-04-20T16:58:11Z
      
        30
          run_id=1  phase=sft  seed=42  status=ok  started=2026-04-20T16:40:22Z
      
        31
        ```
      
        32
        
        33
        Drill into one run with `--run-id N` to see step + eval counts.
      
        34
        `--json` emits a machine-readable object; `--csv` emits the steps +
      
        35
        eval table for spreadsheet import.
      
        36
        
        37
        ### Filters
      
        38
        
        39
        - `--phase sft|dpo|orpo|cpt` — restrict to one training phase.
      
        40
        - `--since 24h|7d|30m|10s` — time window on `started_at`.
      
        41
        - `--run-id N` — drill-down on a specific run.
      
        42
        - `--limit N` — cap the number of runs returned (default 20).
      
        43
        
        44
        ## `dlm metrics watch <path>`
      
        45
        
        46
        Tails the metrics DB — prints new step and eval rows as they land.
      
        47
        Useful in a second terminal while `dlm train` (or `dlm train --watch`)
      
        48
        runs in the first.
      
        49
        
        50
        ```bash
      
        51
        $ dlm metrics watch mydoc.dlm
      
        52
        metrics watch: polling ~/.dlm/store/01HZ.../ every 1.0s (Ctrl-C to exit)
      
        53
        → following run_id=4
      
        54
          step    10  loss=1.87  lr=0.0002  grad_norm=0.31
      
        55
          step    20  loss=1.73  lr=0.00018  grad_norm=0.27
      
        56
          eval @ step 20  val_loss=1.81  perplexity=6.11
      
        57
        ```
      
        58
        
        59
        `--poll-seconds N` tunes how often the DB is re-read (default 1.0).
      
        60
        
        61
        ## TensorBoard sink
      
        62
        
        63
        ```bash
      
        64
        uv sync --extra observability
      
        65
        dlm train mydoc.dlm --tensorboard
      
        66
        tensorboard --logdir ~/.dlm/store/<dlm_id>/tensorboard
      
        67
        ```
      
        68
        
        69
        The sink writes one run directory per `trainer.run` under
      
        70
        `store/tensorboard/run_NNNN/`. Scalars logged: `train/loss`,
      
        71
        `train/lr`, `train/grad_norm`, `eval/val_loss`, `eval/perplexity`.
      
        72
        
        73
        Skipped cleanly if the `observability` extra isn't installed — you
      
        74
        get the SQLite DB either way.
      
        75
        
        76
        ## W&B sink (opt-in)
      
        77
        
        78
        ```bash
      
        79
        uv sync --extra observability
      
        80
        dlm train mydoc.dlm --wandb my-project
      
        81
        ```
      
        82
        
        83
        Runs W&B in **offline mode** by default. The run directory sits at
      
        84
        `store/wandb/offline-run-*/`. To upload, run `wandb sync <dir>`
      
        85
        explicitly — we never upload automatically. If you haven't logged
      
        86
        in to W&B, offline mode still captures the run locally for later
      
        87
        review.
      
        88
        
        89
        Privacy posture: no network calls from the training process.
      
        90
        Uploading is always a separate, explicit step.
      
        91
        
        92
        ## SQLite schema
      
        93
        
        94
        The database at `metrics.sqlite` is queryable directly:
      
        95
        
        96
        ```bash
      
        97
        sqlite3 ~/.dlm/store/<dlm_id>/metrics.sqlite
      
        98
        sqlite> .tables
      
        99
        evals    exports  runs     steps
      
        100
        sqlite> SELECT run_id, phase, status FROM runs;
      
        101
        ```
      
        102
        
        103
        WAL mode is on: readers (including the CLI) don't block the trainer,
      
        104
        and a Ctrl-C mid-write leaves a recoverable DB.
      
        105
        
        106
        ## Pruning
      
        107
        
        108
        No auto-prune today. If the DB grows past comfort, drop older rows:
      
        109
        
        110
        ```sql
      
        111
        DELETE FROM steps WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
      
        112
        DELETE FROM evals WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
      
        113
        DELETE FROM runs  WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
      
        114
        VACUUM;
      
        115
        ```
      
        116
        
        117
        A built-in `dlm metrics prune` is on the backlog.

1	# Metrics & observability
2
3	Every `dlm train` cycle writes its step and eval metrics into a
4	per-store SQLite database at `~/.dlm/store/<dlm_id>/metrics.sqlite`.
5	`dlm metrics` reads from that DB. Optional TensorBoard / W&B sinks
6	are available behind the `observability` extra.
7
8	## What gets recorded
9
10	- runs: one row per `trainer.run` invocation —
11	`run_id`, `started_at`, `ended_at`, `adapter_version`, `phase`,
12	`seed`, `status` (`running` / `ok` / `failed` / `cancelled`).
13	- steps: one row per logged training step —
14	`run_id`, `step`, `loss`, `lr`, `grad_norm`, timestamp.
15	- evals: one row per eval cadence hit —
16	`run_id`, `step`, `val_loss`, `perplexity`, optional `retention`.
17	- exports: one row per `dlm export` completion.
18
19	Writes are best-effort: a metrics failure never takes down training.
20
21	## `dlm metrics <path>`
22
23	Default view lists the most-recent runs:
24
25	```bash
26	$ dlm metrics mydoc.dlm
27	Runs: 3
28	run_id=3 phase=sft seed=42 status=ok started=2026-04-20T17:12:04Z
29	run_id=2 phase=sft seed=42 status=ok started=2026-04-20T16:58:11Z
30	run_id=1 phase=sft seed=42 status=ok started=2026-04-20T16:40:22Z
31	```
32
33	Drill into one run with `--run-id N` to see step + eval counts.
34	`--json` emits a machine-readable object; `--csv` emits the steps +
35	eval table for spreadsheet import.
36
37	### Filters
38
39	- `--phase sft\|dpo\|orpo\|cpt` — restrict to one training phase.
40	- `--since 24h\|7d\|30m\|10s` — time window on `started_at`.
41	- `--run-id N` — drill-down on a specific run.
42	- `--limit N` — cap the number of runs returned (default 20).
43
44	## `dlm metrics watch <path>`
45
46	Tails the metrics DB — prints new step and eval rows as they land.
47	Useful in a second terminal while `dlm train` (or `dlm train --watch`)
48	runs in the first.
49
50	```bash
51	$ dlm metrics watch mydoc.dlm
52	metrics watch: polling ~/.dlm/store/01HZ.../ every 1.0s (Ctrl-C to exit)
53	→ following run_id=4
54	step 10 loss=1.87 lr=0.0002 grad_norm=0.31
55	step 20 loss=1.73 lr=0.00018 grad_norm=0.27
56	eval @ step 20 val_loss=1.81 perplexity=6.11
57	```
58
59	`--poll-seconds N` tunes how often the DB is re-read (default 1.0).
60
61	## TensorBoard sink
62
63	```bash
64	uv sync --extra observability
65	dlm train mydoc.dlm --tensorboard
66	tensorboard --logdir ~/.dlm/store/<dlm_id>/tensorboard
67	```
68
69	The sink writes one run directory per `trainer.run` under
70	`store/tensorboard/run_NNNN/`. Scalars logged: `train/loss`,
71	`train/lr`, `train/grad_norm`, `eval/val_loss`, `eval/perplexity`.
72
73	Skipped cleanly if the `observability` extra isn't installed — you
74	get the SQLite DB either way.
75
76	## W&B sink (opt-in)
77
78	```bash
79	uv sync --extra observability
80	dlm train mydoc.dlm --wandb my-project
81	```
82
83	Runs W&B in offline mode by default. The run directory sits at
84	`store/wandb/offline-run-*/`. To upload, run `wandb sync <dir>`
85	explicitly — we never upload automatically. If you haven't logged
86	in to W&B, offline mode still captures the run locally for later
87	review.
88
89	Privacy posture: no network calls from the training process.
90	Uploading is always a separate, explicit step.
91
92	## SQLite schema
93
94	The database at `metrics.sqlite` is queryable directly:
95
96	```bash
97	sqlite3 ~/.dlm/store/<dlm_id>/metrics.sqlite
98	sqlite> .tables
99	evals exports runs steps
100	sqlite> SELECT run_id, phase, status FROM runs;
101	```
102
103	WAL mode is on: readers (including the CLI) don't block the trainer,
104	and a Ctrl-C mid-write leaves a recoverable DB.
105
106	## Pruning
107
108	No auto-prune today. If the DB grows past comfort, drop older rows:
109
110	```sql
111	DELETE FROM steps WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
112	DELETE FROM evals WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
113	DELETE FROM runs WHERE run_id NOT IN (SELECT run_id FROM runs ORDER BY run_id DESC LIMIT 10);
114	VACUUM;
115	```
116
117	A built-in `dlm metrics prune` is on the backlog.