documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 3659 bytes Raw Blame History

First training cycle

This walks you through creating a .dlm document, training a LoRA adapter against smollm2-135m, and confirming the artifacts on disk.

1. Create a document

$ uv run dlm init tutor.dlm --base smollm2-135m
created: tutor.dlm
dlm_id: 01KC…                (26-character ULID)
base:   smollm2-135m         (HuggingFaceTB/SmolLM2-135M-Instruct)
store:  ~/.dlm/store/01KC…/

dlm init writes a minimal .dlm with a fresh ULID in the frontmatter and provisions the store directory.

Open tutor.dlm in your editor and add some training signal:

---
dlm_id: 01KC...
dlm_version: 1
base_model: smollm2-135m
training:
  seed: 42
---

# Python decorators primer

::instruction::
### Q
What is a Python decorator?

### A
A decorator is a function that takes another function as input and
returns a new function that wraps extra behavior around the original.
The `@decorator_name` syntax above a `def` is equivalent to
`name = decorator_name(name)`.

### Q
When should I use `functools.wraps`?

### A
Always use `@functools.wraps(func)` inside a decorator so the wrapped
function keeps its `__name__`, `__doc__`, and `__wrapped__` attribute.
Without it, debugging and introspection get confused.

Prose outside section fences trains via continued pretraining; instruction blocks (### Q / ### A) train via SFT.

2. Run the training loop

$ uv run dlm train tutor.dlm

DLM runs the hardware doctor, resolves the plan (precision, batch size, grad accumulation), downloads the base model (cached on re-runs), and kicks off the SFTTrainer. On a Mac M-series with MPS, 20 steps of SmolLM2-135M take about two minutes.

Output — the CLI prints the summary lines; per-step metrics go to a JSONL log for programmatic consumption (Sprint 09's StepLogger):

trained:   v0001 (20 steps, seed=42, determinism=best-effort)
adapter:   ~/.dlm/store/01KC…/adapter/versions/v0001
log:       ~/.dlm/store/01KC…/logs/train-000001-…jsonl

Tail the JSONL log to see per-step loss in the shape:

{"type": "banner", "run_id": 1, "seed": 42, "determinism_class": "best-effort", ...}
{"type": "step", "step": 5, "loss": 3.421, "lr": 0.0005, "grad_norm": 2.14, "timestamp": "..."}
{"type": "step", "step": 10, "loss": 2.887, "lr": 0.000447, ...}
...

A pretty-print dlm metrics command lands in Phase 6 (Sprint 26).

3. Inspect the store

$ uv run dlm show tutor.dlm
dlm_id:        01KC…
base_model:    smollm2-135m
training_runs: 1
    run 1 → v0001, 20 steps, seed=42, loss 2.30
adapter:       v0001
manifest:      ~/.dlm/store/01KC…/manifest.json
lock:          ~/.dlm/store/01KC…/dlm.lock

Under the hood, each run produced:

adapter/versions/v0001/adapter_config.json + adapter_model.safetensors — the LoRA weights
adapter/versions/v0001/training_state.pt + .sha256 — optimizer/scheduler/RNG sidecar (for bit-exact resume)
manifest.json — one TrainingRunSummary + the content_hashes delta
logs/train-000001-*.jsonl — per-step metrics
dlm.lock — pinned versions + hardware tier + determinism contract

4. Retrain after edits

Edit the document, add more Q&A pairs, then:

$ uv run dlm train tutor.dlm

The delta system (audit-04 M1/M2) compares content_hashes in the manifest against the current sections, so only new content drives the new training signal — everything from v0001 is still in the replay corpus and gets sampled into the v0002 training mix.

Want to force a clean restart instead?

$ uv run dlm train tutor.dlm --fresh

You have a trained adapter. Prompt it next.

View source

  
        1
        # First training cycle
      
        2
        
        3
        This walks you through creating a `.dlm` document, training a LoRA
      
        4
        adapter against `smollm2-135m`, and confirming the artifacts on disk.
      
        5
        
        6
        ## 1. Create a document
      
        7
        
        8
        ```sh
      
        9
        $ uv run dlm init tutor.dlm --base smollm2-135m
      
        10
        created: tutor.dlm
      
        11
        dlm_id: 01KC…                (26-character ULID)
      
        12
        base:   smollm2-135m         (HuggingFaceTB/SmolLM2-135M-Instruct)
      
        13
        store:  ~/.dlm/store/01KC…/
      
        14
        ```
      
        15
        
        16
        `dlm init` writes a minimal `.dlm` with a fresh ULID in the frontmatter
      
        17
        and provisions the store directory.
      
        18
        
        19
        Open `tutor.dlm` in your editor and add some training signal:
      
        20
        
        21
        ```dlm
      
        22
        ---
      
        23
        dlm_id: 01KC...
      
        24
        dlm_version: 1
      
        25
        base_model: smollm2-135m
      
        26
        training:
      
        27
          seed: 42
      
        28
        ---
      
        29
        
        30
        # Python decorators primer
      
        31
        
        32
        ::instruction::
      
        33
        ### Q
      
        34
        What is a Python decorator?
      
        35
        
        36
        ### A
      
        37
        A decorator is a function that takes another function as input and
      
        38
        returns a new function that wraps extra behavior around the original.
      
        39
        The `@decorator_name` syntax above a `def` is equivalent to
      
        40
        `name = decorator_name(name)`.
      
        41
        
        42
        ### Q
      
        43
        When should I use `functools.wraps`?
      
        44
        
        45
        ### A
      
        46
        Always use `@functools.wraps(func)` inside a decorator so the wrapped
      
        47
        function keeps its `__name__`, `__doc__`, and `__wrapped__` attribute.
      
        48
        Without it, debugging and introspection get confused.
      
        49
        ```
      
        50
        
        51
        Prose outside section fences trains via continued pretraining;
      
        52
        instruction blocks (`### Q` / `### A`) train via SFT.
      
        53
        
        54
        ## 2. Run the training loop
      
        55
        
        56
        ```sh
      
        57
        $ uv run dlm train tutor.dlm
      
        58
        ```
      
        59
        
        60
        DLM runs the hardware doctor, resolves the plan (precision,
      
        61
        batch size, grad accumulation), downloads the base model (cached on
      
        62
        re-runs), and kicks off the SFTTrainer. On a Mac M-series with MPS,
      
        63
        20 steps of SmolLM2-135M take about two minutes.
      
        64
        
        65
        Output — the CLI prints the summary lines; per-step metrics go to
      
        66
        a JSONL log for programmatic consumption (Sprint 09's StepLogger):
      
        67
        
        68
        ```
      
        69
        trained:   v0001 (20 steps, seed=42, determinism=best-effort)
      
        70
        adapter:   ~/.dlm/store/01KC…/adapter/versions/v0001
      
        71
        log:       ~/.dlm/store/01KC…/logs/train-000001-…jsonl
      
        72
        ```
      
        73
        
        74
        Tail the JSONL log to see per-step loss in the shape:
      
        75
        
        76
        ```
      
        77
        {"type": "banner", "run_id": 1, "seed": 42, "determinism_class": "best-effort", ...}
      
        78
        {"type": "step", "step": 5, "loss": 3.421, "lr": 0.0005, "grad_norm": 2.14, "timestamp": "..."}
      
        79
        {"type": "step", "step": 10, "loss": 2.887, "lr": 0.000447, ...}
      
        80
        ...
      
        81
        ```
      
        82
        
        83
        A pretty-print `dlm metrics` command lands in Phase 6 (Sprint 26).
      
        84
        
        85
        ## 3. Inspect the store
      
        86
        
        87
        ```sh
      
        88
        $ uv run dlm show tutor.dlm
      
        89
        dlm_id:        01KC…
      
        90
        base_model:    smollm2-135m
      
        91
        training_runs: 1
      
        92
            run 1 → v0001, 20 steps, seed=42, loss 2.30
      
        93
        adapter:       v0001
      
        94
        manifest:      ~/.dlm/store/01KC…/manifest.json
      
        95
        lock:          ~/.dlm/store/01KC…/dlm.lock
      
        96
        ```
      
        97
        
        98
        Under the hood, each run produced:
      
        99
        
        100
        - `adapter/versions/v0001/adapter_config.json` + `adapter_model.safetensors` — the LoRA weights
      
        101
        - `adapter/versions/v0001/training_state.pt` + `.sha256` — optimizer/scheduler/RNG sidecar (for bit-exact resume)
      
        102
        - `manifest.json` — one `TrainingRunSummary` + the `content_hashes` delta
      
        103
        - `logs/train-000001-*.jsonl` — per-step metrics
      
        104
        - `dlm.lock` — pinned versions + hardware tier + determinism contract
      
        105
        
        106
        ## 4. Retrain after edits
      
        107
        
        108
        Edit the document, add more Q&A pairs, then:
      
        109
        
        110
        ```sh
      
        111
        $ uv run dlm train tutor.dlm
      
        112
        ```
      
        113
        
        114
        The delta system (audit-04 M1/M2) compares `content_hashes` in the
      
        115
        manifest against the current sections, so only new content drives the
      
        116
        new training signal — everything from v0001 is still in the replay
      
        117
        corpus and gets sampled into the v0002 training mix.
      
        118
        
        119
        Want to force a clean restart instead?
      
        120
        
        121
        ```sh
      
        122
        $ uv run dlm train tutor.dlm --fresh
      
        123
        ```
      
        124
        
        125
        ## Next
      
        126
        
        127
        You have a trained adapter. [Prompt it](first-prompt.md) next.

1	# First training cycle
2
3	This walks you through creating a `.dlm` document, training a LoRA
4	adapter against `smollm2-135m`, and confirming the artifacts on disk.
5
6	## 1. Create a document
7
8	```sh
9	$ uv run dlm init tutor.dlm --base smollm2-135m
10	created: tutor.dlm
11	dlm_id: 01KC… (26-character ULID)
12	base: smollm2-135m (HuggingFaceTB/SmolLM2-135M-Instruct)
13	store: ~/.dlm/store/01KC…/
14	```
15
16	`dlm init` writes a minimal `.dlm` with a fresh ULID in the frontmatter
17	and provisions the store directory.
18
19	Open `tutor.dlm` in your editor and add some training signal:
20
21	```dlm
22	---
23	dlm_id: 01KC...
24	dlm_version: 1
25	base_model: smollm2-135m
26	training:
27	seed: 42
28	---
29
30	# Python decorators primer
31
32	::instruction::
33	### Q
34	What is a Python decorator?
35
36	### A
37	A decorator is a function that takes another function as input and
38	returns a new function that wraps extra behavior around the original.
39	The `@decorator_name` syntax above a `def` is equivalent to
40	`name = decorator_name(name)`.
41
42	### Q
43	When should I use `functools.wraps`?
44
45	### A
46	Always use `@functools.wraps(func)` inside a decorator so the wrapped
47	function keeps its `__name__`, `__doc__`, and `__wrapped__` attribute.
48	Without it, debugging and introspection get confused.
49	```
50
51	Prose outside section fences trains via continued pretraining;
52	instruction blocks (`### Q` / `### A`) train via SFT.
53
54	## 2. Run the training loop
55
56	```sh
57	$ uv run dlm train tutor.dlm
58	```
59
60	DLM runs the hardware doctor, resolves the plan (precision,
61	batch size, grad accumulation), downloads the base model (cached on
62	re-runs), and kicks off the SFTTrainer. On a Mac M-series with MPS,
63	20 steps of SmolLM2-135M take about two minutes.
64
65	Output — the CLI prints the summary lines; per-step metrics go to
66	a JSONL log for programmatic consumption (Sprint 09's StepLogger):
67
68	```
69	trained: v0001 (20 steps, seed=42, determinism=best-effort)
70	adapter: ~/.dlm/store/01KC…/adapter/versions/v0001
71	log: ~/.dlm/store/01KC…/logs/train-000001-…jsonl
72	```
73
74	Tail the JSONL log to see per-step loss in the shape:
75
76	```
77	{"type": "banner", "run_id": 1, "seed": 42, "determinism_class": "best-effort", ...}
78	{"type": "step", "step": 5, "loss": 3.421, "lr": 0.0005, "grad_norm": 2.14, "timestamp": "..."}
79	{"type": "step", "step": 10, "loss": 2.887, "lr": 0.000447, ...}
80	...
81	```
82
83	A pretty-print `dlm metrics` command lands in Phase 6 (Sprint 26).
84
85	## 3. Inspect the store
86
87	```sh
88	$ uv run dlm show tutor.dlm
89	dlm_id: 01KC…
90	base_model: smollm2-135m
91	training_runs: 1
92	run 1 → v0001, 20 steps, seed=42, loss 2.30
93	adapter: v0001
94	manifest: ~/.dlm/store/01KC…/manifest.json
95	lock: ~/.dlm/store/01KC…/dlm.lock
96	```
97
98	Under the hood, each run produced:
99
100	- `adapter/versions/v0001/adapter_config.json` + `adapter_model.safetensors` — the LoRA weights
101	- `adapter/versions/v0001/training_state.pt` + `.sha256` — optimizer/scheduler/RNG sidecar (for bit-exact resume)
102	- `manifest.json` — one `TrainingRunSummary` + the `content_hashes` delta
103	- `logs/train-000001-*.jsonl` — per-step metrics
104	- `dlm.lock` — pinned versions + hardware tier + determinism contract
105
106	## 4. Retrain after edits
107
108	Edit the document, add more Q&A pairs, then:
109
110	```sh
111	$ uv run dlm train tutor.dlm
112	```
113
114	The delta system (audit-04 M1/M2) compares `content_hashes` in the
115	manifest against the current sections, so only new content drives the
116	new training signal — everything from v0001 is still in the replay
117	corpus and gets sampled into the v0002 training mix.
118
119	Want to force a clean restart instead?
120
121	```sh
122	$ uv run dlm train tutor.dlm --fresh
123	```
124
125	## Next
126
127	You have a trained adapter. [Prompt it](first-prompt.md) next.