tenseleyflow/documentlanguagemodel / 467e4c5

Browse files

docs: getting-started (install, first-train, first-prompt, first-export) (sprint 16)

Authored by espadonne
SHA
467e4c5d54f21b3a706f548c5495ecc1f35b2e22
Parents
7f4d9a7
Tree
c64b392

4 changed files

StatusFile+-
A docs/getting-started/first-export.md 87 0
A docs/getting-started/first-prompt.md 66 0
A docs/getting-started/first-train.md 121 0
A docs/getting-started/install.md 65 0
docs/getting-started/first-export.mdadded
@@ -0,0 +1,87 @@
1
+# First export
2
+
3
+`dlm export` converts the base + adapter into GGUF files, writes a
4
+Modelfile with an explicit Go `text/template` (no fuzzy matching),
5
+registers the model with `ollama create`, and runs a smoke prompt.
6
+
7
+## Prerequisites
8
+
9
+- `vendor/llama.cpp` submodule is built:
10
+  ```sh
11
+  $ scripts/bump-llama-cpp.sh build
12
+  ```
13
+  This compiles `llama-quantize` and `llama-imatrix` under
14
+  `vendor/llama.cpp/build/bin/`.
15
+
16
+- [Ollama](https://ollama.com/) is installed and its daemon is running.
17
+  `dlm doctor` reports the minimum version.
18
+
19
+## Export
20
+
21
+```sh
22
+$ uv run dlm export tutor.dlm --quant Q4_K_M --name my-tutor
23
+export: preflight ok
24
+export: base.Q4_K_M.gguf (47 MiB)
25
+export: adapter.gguf (3 MiB)
26
+export: Modelfile written; ollama create my-tutor:latest
27
+export: smoke: "Hi!" → "Hello! How can I help?"
28
+manifest: exports[-1] recorded at ~/.dlm/store/01KC…/
29
+```
30
+
31
+Under the hood:
32
+
33
+1. The export **preflight** (Sprint 11) checks the adapter config
34
+   matches the base architecture, asserts the tokenizer vocab agrees
35
+   with the base, validates the chat template, and confirms the
36
+   adapter wasn't QLoRA-trained (pitfall #3 — QLoRA merge needs
37
+   `--dequantize`).
38
+2. The base model is converted to GGUF and quantized via
39
+   `llama-quantize`. The GGUF is cached under
40
+   `~/.dlm/store/<id>/exports/Q4_K_M/base.Q4_K_M.gguf` — subsequent
41
+   exports at the same quant reuse the file.
42
+3. The LoRA adapter is converted to `adapter.gguf`.
43
+4. An explicit `Modelfile` is emitted with `FROM`, `ADAPTER`, and an
44
+   explicit `TEMPLATE "..."` directive (Sprint 12). Ollama will **not**
45
+   fuzzy-match the template — the exact Go template for the base's
46
+   dialect is committed.
47
+5. `ollama create <name>:latest` registers the model under the Ollama
48
+   daemon's control.
49
+6. A smoke prompt runs; the first line of output is recorded in
50
+   `manifest.exports[-1].smoke_output_first_line`.
51
+
52
+## Quant levels
53
+
54
+| Quant | Size | Quality | When to use |
55
+|---|---|---|---|
56
+| `Q4_K_M` | ~50% of fp16 | Great default | General-purpose; recommended starting point. |
57
+| `Q5_K_M` | ~60% | Higher quality | Willing to trade more disk for fidelity. |
58
+| `Q8_0` | ~100% of int8 | Near-lossless | Baseline for quality comparisons. |
59
+| `F16` | 100% | No quantization | Debugging a quant-caused regression. |
60
+
61
+See [Quantization tradeoffs](../cookbook/quantization-tradeoffs.md) for
62
+a deeper dive.
63
+
64
+## imatrix-calibrated quantization
65
+
66
+If your store has a replay corpus with enough signal (Sprint 11.6),
67
+the export runner automatically builds an imatrix from it and passes
68
+`--imatrix` to `llama-quantize`. This gives noticeable quality
69
+improvements on `Q4_K_M` and below without changing the API.
70
+
71
+Opt out with `--no-imatrix` if you'd rather have a static quant for
72
+comparison.
73
+
74
+## Just produce GGUFs, skip Ollama
75
+
76
+```sh
77
+$ uv run dlm export tutor.dlm --quant Q4_K_M --skip-ollama
78
+```
79
+
80
+Useful on CI runners without the Ollama daemon installed. The GGUFs
81
+land in `exports/Q4_K_M/`; wire them into your own runtime.
82
+
83
+## Next
84
+
85
+Want to send the whole training history to a friend? The
86
+[Sharing with pack](../cookbook/sharing-with-pack.md) cookbook shows
87
+the `dlm pack` / `dlm unpack` round trip.
docs/getting-started/first-prompt.mdadded
@@ -0,0 +1,66 @@
1
+# First prompt
2
+
3
+`dlm prompt` runs inference against the current adapter using the base
4
+model. It's the fastest way to check "did the training actually stick?"
5
+without involving Ollama or GGUF conversion.
6
+
7
+## The happy path
8
+
9
+```sh
10
+$ uv run dlm prompt tutor.dlm "What is a Python decorator?"
11
+A decorator is a function that takes another function as input…
12
+```
13
+
14
+Behind the scenes:
15
+
16
+1. `dlm prompt` parses the `.dlm`, resolves the base model, and
17
+   checks the hardware doctor's capability report.
18
+2. It loads the base model + `adapter/current.txt`-pointed LoRA
19
+   weights via PEFT.
20
+3. It calls `generate()` with your prompt, `--max-tokens 256`,
21
+   `--temp 0.7` by default.
22
+4. The response is streamed to stdout; the Rich reporter writes
23
+   progress / plan info to stderr so you can pipe stdout cleanly.
24
+
25
+## Deterministic generation
26
+
27
+For reproducible output (useful for comparing adapters), pin
28
+temperature to 0:
29
+
30
+```sh
31
+$ uv run dlm prompt tutor.dlm --temp 0 --max-tokens 32 "Say hi"
32
+```
33
+
34
+Greedy decoding is deterministic when the weights are byte-identical —
35
+which is the whole point of the [determinism contract](../determinism.md).
36
+
37
+## Verbose plan
38
+
39
+Pass `--verbose` to surface the inference plan before generation:
40
+
41
+```sh
42
+$ uv run dlm prompt tutor.dlm --verbose "Hello"
43
+plan: {'device': 'mps', 'dtype': 'fp16', 'adapter_path': '...', 'quantization': 'none'}
44
+adapter: ~/.dlm/store/01KC…/adapter/versions/v0001
45
+Hello! How can I help you today?
46
+```
47
+
48
+The `plan` dict is the same object written into `manifest.json` on
49
+training, so you can cross-reference what the model was doing the
50
+last time it trained.
51
+
52
+## Piping and stdin
53
+
54
+Prompt via stdin for long inputs:
55
+
56
+```sh
57
+$ cat long-prompt.txt | uv run dlm prompt tutor.dlm
58
+```
59
+
60
+An empty stdin (no query argument either) exits with a non-zero code
61
+and a clear error, rather than hanging.
62
+
63
+## Next
64
+
65
+Happy with inference? [Export to Ollama](first-export.md) for a real
66
+standalone model.
docs/getting-started/first-train.mdadded
@@ -0,0 +1,121 @@
1
+# First training cycle
2
+
3
+This walks you through creating a `.dlm` document, training a LoRA
4
+adapter against `smollm2-135m`, and confirming the artifacts on disk.
5
+
6
+## 1. Create a document
7
+
8
+```sh
9
+$ uv run dlm init tutor.dlm --base smollm2-135m
10
+created: tutor.dlm
11
+dlm_id: 01KC…                (26-character ULID)
12
+base:   smollm2-135m         (HuggingFaceTB/SmolLM2-135M-Instruct)
13
+store:  ~/.dlm/store/01KC…/
14
+```
15
+
16
+`dlm init` writes a minimal `.dlm` with a fresh ULID in the frontmatter
17
+and provisions the store directory.
18
+
19
+Open `tutor.dlm` in your editor and add some training signal:
20
+
21
+```dlm
22
+---
23
+dlm_id: 01KC...
24
+dlm_version: 1
25
+base_model: smollm2-135m
26
+training:
27
+  seed: 42
28
+---
29
+
30
+# Python decorators primer
31
+
32
+::instruction::
33
+### Q
34
+What is a Python decorator?
35
+
36
+### A
37
+A decorator is a function that takes another function as input and
38
+returns a new function that wraps extra behavior around the original.
39
+The `@decorator_name` syntax above a `def` is equivalent to
40
+`name = decorator_name(name)`.
41
+
42
+### Q
43
+When should I use `functools.wraps`?
44
+
45
+### A
46
+Always use `@functools.wraps(func)` inside a decorator so the wrapped
47
+function keeps its `__name__`, `__doc__`, and `__wrapped__` attribute.
48
+Without it, debugging and introspection get confused.
49
+```
50
+
51
+Prose outside section fences trains via continued pretraining;
52
+instruction blocks (`### Q` / `### A`) train via SFT.
53
+
54
+## 2. Run the training loop
55
+
56
+```sh
57
+$ uv run dlm train tutor.dlm
58
+```
59
+
60
+DLM runs the hardware doctor, resolves the plan (precision,
61
+batch size, grad accumulation), downloads the base model (cached on
62
+re-runs), and kicks off the SFTTrainer. On a Mac M-series with MPS,
63
+20 steps of SmolLM2-135M take about two minutes.
64
+
65
+Output (abbreviated):
66
+
67
+```
68
+preflight: 9.6 GB free under ~/.dlm/store/01KC…/
69
+banner:    seed=42 determinism=best-effort plan=fp16/sdpa/bs=1×8
70
+step 5:    loss=3.421  lr=5.00e-04
71
+step 10:   loss=2.887  lr=4.47e-04
72
+step 15:   loss=2.541  lr=3.45e-04
73
+step 20:   loss=2.298  lr=2.08e-04
74
+trained:   v0001 (20 steps, seed=42, determinism=best-effort)
75
+adapter:   ~/.dlm/store/01KC…/adapter/versions/v0001
76
+log:       ~/.dlm/store/01KC…/logs/train-000001-…jsonl
77
+```
78
+
79
+## 3. Inspect the store
80
+
81
+```sh
82
+$ uv run dlm show tutor.dlm
83
+dlm_id:        01KC…
84
+base_model:    smollm2-135m
85
+training_runs: 1
86
+    run 1 → v0001, 20 steps, seed=42, loss 2.30
87
+adapter:       v0001
88
+manifest:      ~/.dlm/store/01KC…/manifest.json
89
+lock:          ~/.dlm/store/01KC…/dlm.lock
90
+```
91
+
92
+Under the hood, each run produced:
93
+
94
+- `adapter/versions/v0001/adapter_config.json` + `adapter_model.safetensors` — the LoRA weights
95
+- `adapter/versions/v0001/training_state.pt` + `.sha256` — optimizer/scheduler/RNG sidecar (for bit-exact resume)
96
+- `manifest.json` — one `TrainingRunSummary` + the `content_hashes` delta
97
+- `logs/train-000001-*.jsonl` — per-step metrics
98
+- `dlm.lock` — pinned versions + hardware tier + determinism contract
99
+
100
+## 4. Retrain after edits
101
+
102
+Edit the document, add more Q&A pairs, then:
103
+
104
+```sh
105
+$ uv run dlm train tutor.dlm
106
+```
107
+
108
+The delta system (audit-04 M1/M2) compares `content_hashes` in the
109
+manifest against the current sections, so only new content drives the
110
+new training signal — everything from v0001 is still in the replay
111
+corpus and gets sampled into the v0002 training mix.
112
+
113
+Want to force a clean restart instead?
114
+
115
+```sh
116
+$ uv run dlm train tutor.dlm --fresh
117
+```
118
+
119
+## Next
120
+
121
+You have a trained adapter. [Prompt it](first-prompt.md) next.
docs/getting-started/install.mdadded
@@ -0,0 +1,65 @@
1
+# Install
2
+
3
+DocumentLanguageModel is a Python package. It depends on `torch` (GPU or
4
+CPU build), `transformers`, `peft`, `trl`, and — optionally for export —
5
+the `ollama` binary on your PATH.
6
+
7
+## Prerequisites
8
+
9
+| Requirement | Minimum | Notes |
10
+|---|---|---|
11
+| Python | 3.11 | `pyproject.toml` pins `python >= 3.11`. |
12
+| [uv](https://github.com/astral-sh/uv) | any recent | Used for dependency resolution and running scripts. |
13
+| PyTorch | 2.4+ | Installed automatically by `uv sync`. |
14
+| Ollama | as reported by `dlm doctor` | Only needed for `dlm export` smoke runs. |
15
+| `vendor/llama.cpp` submodule | built | Only needed for `dlm export`. `scripts/bump-llama-cpp.sh build` compiles `llama-quantize` + `llama-imatrix`. |
16
+
17
+On Apple Silicon, MPS acceleration is detected automatically and DLM
18
+plans for fp16 LoRA. On CUDA, compute capability ≥ 8.0 (Ampere and
19
+newer) unlocks bf16 + QLoRA 4-bit. See [Architecture](../architecture.md)
20
+for the full refusal matrix.
21
+
22
+## Install from source
23
+
24
+```sh
25
+git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git
26
+cd DocumentLanguageModel
27
+uv sync
28
+uv run dlm --help
29
+```
30
+
31
+`uv sync` resolves the dependency tree into `.venv/` and pulls the
32
+pinned versions from `uv.lock`. Use `uv run dlm <command>` (not
33
+`dlm <command>` — the CLI isn't on your shell PATH unless you activate
34
+the venv).
35
+
36
+## Install from PyPI
37
+
38
+```sh
39
+# Coming with v1.0 — the tagged release workflow publishes to PyPI via
40
+# trusted-publisher OIDC. Until then, install from source.
41
+pip install dlm
42
+```
43
+
44
+## Verify
45
+
46
+```sh
47
+$ uv run dlm --version
48
+dlm 0.1.0
49
+
50
+$ uv run dlm doctor
51
+backend: mps
52
+precision: fp16
53
+attn:     sdpa
54
+...
55
+```
56
+
57
+`dlm doctor` is the first command to run on a new machine. It probes
58
+the GPU, reports the memory budget, picks a training plan, and warns
59
+about anything missing (e.g. FlashAttention unavailable, bitsandbytes
60
+not importable on CPU-only hosts).
61
+
62
+## Next
63
+
64
+Got `dlm doctor` output that looks healthy? Move on to the
65
+[first training cycle](first-train.md).