markdown · 3659 bytes Raw Blame History

First training cycle

This walks you through creating a .dlm document, training a LoRA adapter against smollm2-135m, and confirming the artifacts on disk.

1. Create a document

$ uv run dlm init tutor.dlm --base smollm2-135m
created: tutor.dlm
dlm_id: 01KC…                (26-character ULID)
base:   smollm2-135m         (HuggingFaceTB/SmolLM2-135M-Instruct)
store:  ~/.dlm/store/01KC…/

dlm init writes a minimal .dlm with a fresh ULID in the frontmatter and provisions the store directory.

Open tutor.dlm in your editor and add some training signal:

---
dlm_id: 01KC...
dlm_version: 1
base_model: smollm2-135m
training:
  seed: 42
---

# Python decorators primer

::instruction::
### Q
What is a Python decorator?

### A
A decorator is a function that takes another function as input and
returns a new function that wraps extra behavior around the original.
The `@decorator_name` syntax above a `def` is equivalent to
`name = decorator_name(name)`.

### Q
When should I use `functools.wraps`?

### A
Always use `@functools.wraps(func)` inside a decorator so the wrapped
function keeps its `__name__`, `__doc__`, and `__wrapped__` attribute.
Without it, debugging and introspection get confused.

Prose outside section fences trains via continued pretraining; instruction blocks (### Q / ### A) train via SFT.

2. Run the training loop

$ uv run dlm train tutor.dlm

DLM runs the hardware doctor, resolves the plan (precision, batch size, grad accumulation), downloads the base model (cached on re-runs), and kicks off the SFTTrainer. On a Mac M-series with MPS, 20 steps of SmolLM2-135M take about two minutes.

Output — the CLI prints the summary lines; per-step metrics go to a JSONL log for programmatic consumption (Sprint 09's StepLogger):

trained:   v0001 (20 steps, seed=42, determinism=best-effort)
adapter:   ~/.dlm/store/01KC…/adapter/versions/v0001
log:       ~/.dlm/store/01KC…/logs/train-000001-…jsonl

Tail the JSONL log to see per-step loss in the shape:

{"type": "banner", "run_id": 1, "seed": 42, "determinism_class": "best-effort", ...}
{"type": "step", "step": 5, "loss": 3.421, "lr": 0.0005, "grad_norm": 2.14, "timestamp": "..."}
{"type": "step", "step": 10, "loss": 2.887, "lr": 0.000447, ...}
...

A pretty-print dlm metrics command lands in Phase 6 (Sprint 26).

3. Inspect the store

$ uv run dlm show tutor.dlm
dlm_id:        01KC…
base_model:    smollm2-135m
training_runs: 1
    run 1 → v0001, 20 steps, seed=42, loss 2.30
adapter:       v0001
manifest:      ~/.dlm/store/01KC…/manifest.json
lock:          ~/.dlm/store/01KC…/dlm.lock

Under the hood, each run produced:

  • adapter/versions/v0001/adapter_config.json + adapter_model.safetensors — the LoRA weights
  • adapter/versions/v0001/training_state.pt + .sha256 — optimizer/scheduler/RNG sidecar (for bit-exact resume)
  • manifest.json — one TrainingRunSummary + the content_hashes delta
  • logs/train-000001-*.jsonl — per-step metrics
  • dlm.lock — pinned versions + hardware tier + determinism contract

4. Retrain after edits

Edit the document, add more Q&A pairs, then:

$ uv run dlm train tutor.dlm

The delta system (audit-04 M1/M2) compares content_hashes in the manifest against the current sections, so only new content drives the new training signal — everything from v0001 is still in the replay corpus and gets sampled into the v0002 training mix.

Want to force a clean restart instead?

$ uv run dlm train tutor.dlm --fresh

Next

You have a trained adapter. Prompt it next.

View source
1 # First training cycle
2
3 This walks you through creating a `.dlm` document, training a LoRA
4 adapter against `smollm2-135m`, and confirming the artifacts on disk.
5
6 ## 1. Create a document
7
8 ```sh
9 $ uv run dlm init tutor.dlm --base smollm2-135m
10 created: tutor.dlm
11 dlm_id: 01KC… (26-character ULID)
12 base: smollm2-135m (HuggingFaceTB/SmolLM2-135M-Instruct)
13 store: ~/.dlm/store/01KC…/
14 ```
15
16 `dlm init` writes a minimal `.dlm` with a fresh ULID in the frontmatter
17 and provisions the store directory.
18
19 Open `tutor.dlm` in your editor and add some training signal:
20
21 ```dlm
22 ---
23 dlm_id: 01KC...
24 dlm_version: 1
25 base_model: smollm2-135m
26 training:
27 seed: 42
28 ---
29
30 # Python decorators primer
31
32 ::instruction::
33 ### Q
34 What is a Python decorator?
35
36 ### A
37 A decorator is a function that takes another function as input and
38 returns a new function that wraps extra behavior around the original.
39 The `@decorator_name` syntax above a `def` is equivalent to
40 `name = decorator_name(name)`.
41
42 ### Q
43 When should I use `functools.wraps`?
44
45 ### A
46 Always use `@functools.wraps(func)` inside a decorator so the wrapped
47 function keeps its `__name__`, `__doc__`, and `__wrapped__` attribute.
48 Without it, debugging and introspection get confused.
49 ```
50
51 Prose outside section fences trains via continued pretraining;
52 instruction blocks (`### Q` / `### A`) train via SFT.
53
54 ## 2. Run the training loop
55
56 ```sh
57 $ uv run dlm train tutor.dlm
58 ```
59
60 DLM runs the hardware doctor, resolves the plan (precision,
61 batch size, grad accumulation), downloads the base model (cached on
62 re-runs), and kicks off the SFTTrainer. On a Mac M-series with MPS,
63 20 steps of SmolLM2-135M take about two minutes.
64
65 Output — the CLI prints the summary lines; per-step metrics go to
66 a JSONL log for programmatic consumption (Sprint 09's StepLogger):
67
68 ```
69 trained: v0001 (20 steps, seed=42, determinism=best-effort)
70 adapter: ~/.dlm/store/01KC…/adapter/versions/v0001
71 log: ~/.dlm/store/01KC…/logs/train-000001-…jsonl
72 ```
73
74 Tail the JSONL log to see per-step loss in the shape:
75
76 ```
77 {"type": "banner", "run_id": 1, "seed": 42, "determinism_class": "best-effort", ...}
78 {"type": "step", "step": 5, "loss": 3.421, "lr": 0.0005, "grad_norm": 2.14, "timestamp": "..."}
79 {"type": "step", "step": 10, "loss": 2.887, "lr": 0.000447, ...}
80 ...
81 ```
82
83 A pretty-print `dlm metrics` command lands in Phase 6 (Sprint 26).
84
85 ## 3. Inspect the store
86
87 ```sh
88 $ uv run dlm show tutor.dlm
89 dlm_id: 01KC…
90 base_model: smollm2-135m
91 training_runs: 1
92 run 1 → v0001, 20 steps, seed=42, loss 2.30
93 adapter: v0001
94 manifest: ~/.dlm/store/01KC…/manifest.json
95 lock: ~/.dlm/store/01KC…/dlm.lock
96 ```
97
98 Under the hood, each run produced:
99
100 - `adapter/versions/v0001/adapter_config.json` + `adapter_model.safetensors` — the LoRA weights
101 - `adapter/versions/v0001/training_state.pt` + `.sha256` — optimizer/scheduler/RNG sidecar (for bit-exact resume)
102 - `manifest.json` — one `TrainingRunSummary` + the `content_hashes` delta
103 - `logs/train-000001-*.jsonl` — per-step metrics
104 - `dlm.lock` — pinned versions + hardware tier + determinism contract
105
106 ## 4. Retrain after edits
107
108 Edit the document, add more Q&A pairs, then:
109
110 ```sh
111 $ uv run dlm train tutor.dlm
112 ```
113
114 The delta system (audit-04 M1/M2) compares `content_hashes` in the
115 manifest against the current sections, so only new content drives the
116 new training signal — everything from v0001 is still in the replay
117 corpus and gets sampled into the v0002 training mix.
118
119 Want to force a clean restart instead?
120
121 ```sh
122 $ uv run dlm train tutor.dlm --fresh
123 ```
124
125 ## Next
126
127 You have a trained adapter. [Prompt it](first-prompt.md) next.