# First training cycle This walks you through creating a `.dlm` document, training a LoRA adapter against `smollm2-135m`, and confirming the artifacts on disk. ## 1. Create a document ```sh $ uv run dlm init tutor.dlm --base smollm2-135m created: tutor.dlm dlm_id: 01KC… (26-character ULID) base: smollm2-135m (HuggingFaceTB/SmolLM2-135M-Instruct) store: ~/.dlm/store/01KC…/ ``` `dlm init` writes a minimal `.dlm` with a fresh ULID in the frontmatter and provisions the store directory. Open `tutor.dlm` in your editor and add some training signal: ```dlm --- dlm_id: 01KC... dlm_version: 1 base_model: smollm2-135m training: seed: 42 --- # Python decorators primer ::instruction:: ### Q What is a Python decorator? ### A A decorator is a function that takes another function as input and returns a new function that wraps extra behavior around the original. The `@decorator_name` syntax above a `def` is equivalent to `name = decorator_name(name)`. ### Q When should I use `functools.wraps`? ### A Always use `@functools.wraps(func)` inside a decorator so the wrapped function keeps its `__name__`, `__doc__`, and `__wrapped__` attribute. Without it, debugging and introspection get confused. ``` Prose outside section fences trains via continued pretraining; instruction blocks (`### Q` / `### A`) train via SFT. ## 2. Run the training loop ```sh $ uv run dlm train tutor.dlm ``` DLM runs the hardware doctor, resolves the plan (precision, batch size, grad accumulation), downloads the base model (cached on re-runs), and kicks off the SFTTrainer. On a Mac M-series with MPS, 20 steps of SmolLM2-135M take about two minutes. Output — the CLI prints the summary lines; per-step metrics go to a JSONL log for programmatic consumption (Sprint 09's StepLogger): ``` trained: v0001 (20 steps, seed=42, determinism=best-effort) adapter: ~/.dlm/store/01KC…/adapter/versions/v0001 log: ~/.dlm/store/01KC…/logs/train-000001-…jsonl ``` Tail the JSONL log to see per-step loss in the shape: ``` {"type": "banner", "run_id": 1, "seed": 42, "determinism_class": "best-effort", ...} {"type": "step", "step": 5, "loss": 3.421, "lr": 0.0005, "grad_norm": 2.14, "timestamp": "..."} {"type": "step", "step": 10, "loss": 2.887, "lr": 0.000447, ...} ... ``` A pretty-print `dlm metrics` command lands in Phase 6 (Sprint 26). ## 3. Inspect the store ```sh $ uv run dlm show tutor.dlm dlm_id: 01KC… base_model: smollm2-135m training_runs: 1 run 1 → v0001, 20 steps, seed=42, loss 2.30 adapter: v0001 manifest: ~/.dlm/store/01KC…/manifest.json lock: ~/.dlm/store/01KC…/dlm.lock ``` Under the hood, each run produced: - `adapter/versions/v0001/adapter_config.json` + `adapter_model.safetensors` — the LoRA weights - `adapter/versions/v0001/training_state.pt` + `.sha256` — optimizer/scheduler/RNG sidecar (for bit-exact resume) - `manifest.json` — one `TrainingRunSummary` + the `content_hashes` delta - `logs/train-000001-*.jsonl` — per-step metrics - `dlm.lock` — pinned versions + hardware tier + determinism contract ## 4. Retrain after edits Edit the document, add more Q&A pairs, then: ```sh $ uv run dlm train tutor.dlm ``` The delta system (audit-04 M1/M2) compares `content_hashes` in the manifest against the current sections, so only new content drives the new training signal — everything from v0001 is still in the replay corpus and gets sampled into the v0002 training mix. Want to force a clean restart instead? ```sh $ uv run dlm train tutor.dlm --fresh ``` ## Next You have a trained adapter. [Prompt it](first-prompt.md) next.