documentlanguagemodel Public
First training cycle
This walks you through creating a .dlm document, training a LoRA
adapter against smollm2-135m, and confirming the artifacts on disk.
1. Create a document
$ uv run dlm init tutor.dlm --base smollm2-135m
created: tutor.dlm
dlm_id: 01KC… (26-character ULID)
base: smollm2-135m (HuggingFaceTB/SmolLM2-135M-Instruct)
store: ~/.dlm/store/01KC…/
dlm init writes a minimal .dlm with a fresh ULID in the frontmatter
and provisions the store directory.
Open tutor.dlm in your editor and add some training signal:
---
dlm_id: 01KC...
dlm_version: 1
base_model: smollm2-135m
training:
seed: 42
---
# Python decorators primer
::instruction::
### Q
What is a Python decorator?
### A
A decorator is a function that takes another function as input and
returns a new function that wraps extra behavior around the original.
The `@decorator_name` syntax above a `def` is equivalent to
`name = decorator_name(name)`.
### Q
When should I use `functools.wraps`?
### A
Always use `@functools.wraps(func)` inside a decorator so the wrapped
function keeps its `__name__`, `__doc__`, and `__wrapped__` attribute.
Without it, debugging and introspection get confused.
Prose outside section fences trains via continued pretraining;
instruction blocks (### Q / ### A) train via SFT.
2. Run the training loop
$ uv run dlm train tutor.dlm
DLM runs the hardware doctor, resolves the plan (precision, batch size, grad accumulation), downloads the base model (cached on re-runs), and kicks off the SFTTrainer. On a Mac M-series with MPS, 20 steps of SmolLM2-135M take about two minutes.
Output — the CLI prints the summary lines; per-step metrics go to a JSONL log for programmatic consumption (Sprint 09's StepLogger):
trained: v0001 (20 steps, seed=42, determinism=best-effort)
adapter: ~/.dlm/store/01KC…/adapter/versions/v0001
log: ~/.dlm/store/01KC…/logs/train-000001-…jsonl
Tail the JSONL log to see per-step loss in the shape:
{"type": "banner", "run_id": 1, "seed": 42, "determinism_class": "best-effort", ...}
{"type": "step", "step": 5, "loss": 3.421, "lr": 0.0005, "grad_norm": 2.14, "timestamp": "..."}
{"type": "step", "step": 10, "loss": 2.887, "lr": 0.000447, ...}
...
A pretty-print dlm metrics command lands in Phase 6 (Sprint 26).
3. Inspect the store
$ uv run dlm show tutor.dlm
dlm_id: 01KC…
base_model: smollm2-135m
training_runs: 1
run 1 → v0001, 20 steps, seed=42, loss 2.30
adapter: v0001
manifest: ~/.dlm/store/01KC…/manifest.json
lock: ~/.dlm/store/01KC…/dlm.lock
Under the hood, each run produced:
adapter/versions/v0001/adapter_config.json+adapter_model.safetensors— the LoRA weightsadapter/versions/v0001/training_state.pt+.sha256— optimizer/scheduler/RNG sidecar (for bit-exact resume)manifest.json— oneTrainingRunSummary+ thecontent_hashesdeltalogs/train-000001-*.jsonl— per-step metricsdlm.lock— pinned versions + hardware tier + determinism contract
4. Retrain after edits
Edit the document, add more Q&A pairs, then:
$ uv run dlm train tutor.dlm
The delta system (audit-04 M1/M2) compares content_hashes in the
manifest against the current sections, so only new content drives the
new training signal — everything from v0001 is still in the replay
corpus and gets sampled into the v0002 training mix.
Want to force a clean restart instead?
$ uv run dlm train tutor.dlm --fresh
Next
You have a trained adapter. Prompt it next.
View source
| 1 | # First training cycle |
| 2 | |
| 3 | This walks you through creating a `.dlm` document, training a LoRA |
| 4 | adapter against `smollm2-135m`, and confirming the artifacts on disk. |
| 5 | |
| 6 | ## 1. Create a document |
| 7 | |
| 8 | ```sh |
| 9 | $ uv run dlm init tutor.dlm --base smollm2-135m |
| 10 | created: tutor.dlm |
| 11 | dlm_id: 01KC… (26-character ULID) |
| 12 | base: smollm2-135m (HuggingFaceTB/SmolLM2-135M-Instruct) |
| 13 | store: ~/.dlm/store/01KC…/ |
| 14 | ``` |
| 15 | |
| 16 | `dlm init` writes a minimal `.dlm` with a fresh ULID in the frontmatter |
| 17 | and provisions the store directory. |
| 18 | |
| 19 | Open `tutor.dlm` in your editor and add some training signal: |
| 20 | |
| 21 | ```dlm |
| 22 | --- |
| 23 | dlm_id: 01KC... |
| 24 | dlm_version: 1 |
| 25 | base_model: smollm2-135m |
| 26 | training: |
| 27 | seed: 42 |
| 28 | --- |
| 29 | |
| 30 | # Python decorators primer |
| 31 | |
| 32 | ::instruction:: |
| 33 | ### Q |
| 34 | What is a Python decorator? |
| 35 | |
| 36 | ### A |
| 37 | A decorator is a function that takes another function as input and |
| 38 | returns a new function that wraps extra behavior around the original. |
| 39 | The `@decorator_name` syntax above a `def` is equivalent to |
| 40 | `name = decorator_name(name)`. |
| 41 | |
| 42 | ### Q |
| 43 | When should I use `functools.wraps`? |
| 44 | |
| 45 | ### A |
| 46 | Always use `@functools.wraps(func)` inside a decorator so the wrapped |
| 47 | function keeps its `__name__`, `__doc__`, and `__wrapped__` attribute. |
| 48 | Without it, debugging and introspection get confused. |
| 49 | ``` |
| 50 | |
| 51 | Prose outside section fences trains via continued pretraining; |
| 52 | instruction blocks (`### Q` / `### A`) train via SFT. |
| 53 | |
| 54 | ## 2. Run the training loop |
| 55 | |
| 56 | ```sh |
| 57 | $ uv run dlm train tutor.dlm |
| 58 | ``` |
| 59 | |
| 60 | DLM runs the hardware doctor, resolves the plan (precision, |
| 61 | batch size, grad accumulation), downloads the base model (cached on |
| 62 | re-runs), and kicks off the SFTTrainer. On a Mac M-series with MPS, |
| 63 | 20 steps of SmolLM2-135M take about two minutes. |
| 64 | |
| 65 | Output — the CLI prints the summary lines; per-step metrics go to |
| 66 | a JSONL log for programmatic consumption (Sprint 09's StepLogger): |
| 67 | |
| 68 | ``` |
| 69 | trained: v0001 (20 steps, seed=42, determinism=best-effort) |
| 70 | adapter: ~/.dlm/store/01KC…/adapter/versions/v0001 |
| 71 | log: ~/.dlm/store/01KC…/logs/train-000001-…jsonl |
| 72 | ``` |
| 73 | |
| 74 | Tail the JSONL log to see per-step loss in the shape: |
| 75 | |
| 76 | ``` |
| 77 | {"type": "banner", "run_id": 1, "seed": 42, "determinism_class": "best-effort", ...} |
| 78 | {"type": "step", "step": 5, "loss": 3.421, "lr": 0.0005, "grad_norm": 2.14, "timestamp": "..."} |
| 79 | {"type": "step", "step": 10, "loss": 2.887, "lr": 0.000447, ...} |
| 80 | ... |
| 81 | ``` |
| 82 | |
| 83 | A pretty-print `dlm metrics` command lands in Phase 6 (Sprint 26). |
| 84 | |
| 85 | ## 3. Inspect the store |
| 86 | |
| 87 | ```sh |
| 88 | $ uv run dlm show tutor.dlm |
| 89 | dlm_id: 01KC… |
| 90 | base_model: smollm2-135m |
| 91 | training_runs: 1 |
| 92 | run 1 → v0001, 20 steps, seed=42, loss 2.30 |
| 93 | adapter: v0001 |
| 94 | manifest: ~/.dlm/store/01KC…/manifest.json |
| 95 | lock: ~/.dlm/store/01KC…/dlm.lock |
| 96 | ``` |
| 97 | |
| 98 | Under the hood, each run produced: |
| 99 | |
| 100 | - `adapter/versions/v0001/adapter_config.json` + `adapter_model.safetensors` — the LoRA weights |
| 101 | - `adapter/versions/v0001/training_state.pt` + `.sha256` — optimizer/scheduler/RNG sidecar (for bit-exact resume) |
| 102 | - `manifest.json` — one `TrainingRunSummary` + the `content_hashes` delta |
| 103 | - `logs/train-000001-*.jsonl` — per-step metrics |
| 104 | - `dlm.lock` — pinned versions + hardware tier + determinism contract |
| 105 | |
| 106 | ## 4. Retrain after edits |
| 107 | |
| 108 | Edit the document, add more Q&A pairs, then: |
| 109 | |
| 110 | ```sh |
| 111 | $ uv run dlm train tutor.dlm |
| 112 | ``` |
| 113 | |
| 114 | The delta system (audit-04 M1/M2) compares `content_hashes` in the |
| 115 | manifest against the current sections, so only new content drives the |
| 116 | new training signal — everything from v0001 is still in the replay |
| 117 | corpus and gets sampled into the v0002 training mix. |
| 118 | |
| 119 | Want to force a clean restart instead? |
| 120 | |
| 121 | ```sh |
| 122 | $ uv run dlm train tutor.dlm --fresh |
| 123 | ``` |
| 124 | |
| 125 | ## Next |
| 126 | |
| 127 | You have a trained adapter. [Prompt it](first-prompt.md) next. |