`467e4c5`

docs: getting-started (install, first-train, first-prompt, first-export) (sprint 16)

Authored by

espadonne 3 weeks ago

SHA: 467e4c5d54f21b3a706f548c5495ecc1f35b2e22
Parents: 7f4d9a7
Tree: c64b392

4 changed files

Status	File	+
A	`docs/getting-started/first-export.md`	87
A	`docs/getting-started/first-prompt.md`	66
A	`docs/getting-started/first-train.md`	121
A	`docs/getting-started/install.md`	65

docs/getting-started/first-export.mdadded

 +# First export
++
 +`dlm export` converts the base + adapter into GGUF files, writes a
 +Modelfile with an explicit Go `text/template` (no fuzzy matching),
 +registers the model with `ollama create`, and runs a smoke prompt.
++
 +## Prerequisites
++
 +- `vendor/llama.cpp` submodule is built:
 +  ```sh
 +  $ scripts/bump-llama-cpp.sh build
 +  ```
 +  This compiles `llama-quantize` and `llama-imatrix` under
 +  `vendor/llama.cpp/build/bin/`.
++
 +- [Ollama](https://ollama.com/) is installed and its daemon is running.
 +  `dlm doctor` reports the minimum version.
++
 +## Export
++
 +```sh
 +$ uv run dlm export tutor.dlm --quant Q4_K_M --name my-tutor
 +export: preflight ok
 +export: base.Q4_K_M.gguf (47 MiB)
 +export: adapter.gguf (3 MiB)
 +export: Modelfile written; ollama create my-tutor:latest
 +export: smoke: "Hi!" → "Hello! How can I help?"
 +manifest: exports[-1] recorded at ~/.dlm/store/01KC…/
 +```
++
 +Under the hood:
++
 +1. The export **preflight** (Sprint 11) checks the adapter config
 +   matches the base architecture, asserts the tokenizer vocab agrees
 +   with the base, validates the chat template, and confirms the
 +   adapter wasn't QLoRA-trained (pitfall #3 — QLoRA merge needs
 +   `--dequantize`).
 +2. The base model is converted to GGUF and quantized via
 +   `llama-quantize`. The GGUF is cached under
 +   `~/.dlm/store/<id>/exports/Q4_K_M/base.Q4_K_M.gguf` — subsequent
 +   exports at the same quant reuse the file.
 +3. The LoRA adapter is converted to `adapter.gguf`.
 +4. An explicit `Modelfile` is emitted with `FROM`, `ADAPTER`, and an
 +   explicit `TEMPLATE "..."` directive (Sprint 12). Ollama will **not**
 +   fuzzy-match the template — the exact Go template for the base's
 +   dialect is committed.
 +5. `ollama create <name>:latest` registers the model under the Ollama
 +   daemon's control.
 +6. A smoke prompt runs; the first line of output is recorded in
 +   `manifest.exports[-1].smoke_output_first_line`.
++
 +## Quant levels
++
 +| Quant | Size | Quality | When to use |
 +|---|---|---|---|
 +| `Q4_K_M` | ~50% of fp16 | Great default | General-purpose; recommended starting point. |
 +| `Q5_K_M` | ~60% | Higher quality | Willing to trade more disk for fidelity. |
 +| `Q8_0` | ~100% of int8 | Near-lossless | Baseline for quality comparisons. |
 +| `F16` | 100% | No quantization | Debugging a quant-caused regression. |
++
 +See [Quantization tradeoffs](../cookbook/quantization-tradeoffs.md) for
 +a deeper dive.
++
 +## imatrix-calibrated quantization
++
 +If your store has a replay corpus with enough signal (Sprint 11.6),
 +the export runner automatically builds an imatrix from it and passes
 +`--imatrix` to `llama-quantize`. This gives noticeable quality
 +improvements on `Q4_K_M` and below without changing the API.
++
 +Opt out with `--no-imatrix` if you'd rather have a static quant for
 +comparison.
++
 +## Just produce GGUFs, skip Ollama
++
 +```sh
 +$ uv run dlm export tutor.dlm --quant Q4_K_M --skip-ollama
 +```
++
 +Useful on CI runners without the Ollama daemon installed. The GGUFs
 +land in `exports/Q4_K_M/`; wire them into your own runtime.
++
 +## Next
++
 +Want to send the whole training history to a friend? The
 +[Sharing with pack](../cookbook/sharing-with-pack.md) cookbook shows
 +the `dlm pack` / `dlm unpack` round trip.

docs/getting-started/first-prompt.mdadded

 +# First prompt
++
 +`dlm prompt` runs inference against the current adapter using the base
 +model. It's the fastest way to check "did the training actually stick?"
 +without involving Ollama or GGUF conversion.
++
 +## The happy path
++
 +```sh
 +$ uv run dlm prompt tutor.dlm "What is a Python decorator?"
 +A decorator is a function that takes another function as input…
 +```
++
 +Behind the scenes:
++
 +1. `dlm prompt` parses the `.dlm`, resolves the base model, and
 +   checks the hardware doctor's capability report.
 +2. It loads the base model + `adapter/current.txt`-pointed LoRA
 +   weights via PEFT.
 +3. It calls `generate()` with your prompt, `--max-tokens 256`,
 +   `--temp 0.7` by default.
 +4. The response is streamed to stdout; the Rich reporter writes
 +   progress / plan info to stderr so you can pipe stdout cleanly.
++
 +## Deterministic generation
++
 +For reproducible output (useful for comparing adapters), pin
 +temperature to 0:
++
 +```sh
 +$ uv run dlm prompt tutor.dlm --temp 0 --max-tokens 32 "Say hi"
 +```
++
 +Greedy decoding is deterministic when the weights are byte-identical —
 +which is the whole point of the [determinism contract](../determinism.md).
++
 +## Verbose plan
++
 +Pass `--verbose` to surface the inference plan before generation:
++
 +```sh
 +$ uv run dlm prompt tutor.dlm --verbose "Hello"
 +plan: {'device': 'mps', 'dtype': 'fp16', 'adapter_path': '...', 'quantization': 'none'}
 +adapter: ~/.dlm/store/01KC…/adapter/versions/v0001
 +Hello! How can I help you today?
 +```
++
 +The `plan` dict is the same object written into `manifest.json` on
 +training, so you can cross-reference what the model was doing the
 +last time it trained.
++
 +## Piping and stdin
++
 +Prompt via stdin for long inputs:
++
 +```sh
 +$ cat long-prompt.txt | uv run dlm prompt tutor.dlm
 +```
++
 +An empty stdin (no query argument either) exits with a non-zero code
 +and a clear error, rather than hanging.
++
 +## Next
++
 +Happy with inference? [Export to Ollama](first-export.md) for a real
 +standalone model.

docs/getting-started/first-train.mdadded

 +# First training cycle
++
 +This walks you through creating a `.dlm` document, training a LoRA
 +adapter against `smollm2-135m`, and confirming the artifacts on disk.
++
 +## 1. Create a document
++
 +```sh
 +$ uv run dlm init tutor.dlm --base smollm2-135m
 +created: tutor.dlm
 +dlm_id: 01KC…                (26-character ULID)
 +base:   smollm2-135m         (HuggingFaceTB/SmolLM2-135M-Instruct)
 +store:  ~/.dlm/store/01KC…/
 +```
++
 +`dlm init` writes a minimal `.dlm` with a fresh ULID in the frontmatter
 +and provisions the store directory.
++
 +Open `tutor.dlm` in your editor and add some training signal:
++
 +```dlm
 +---
 +dlm_id: 01KC...
 +dlm_version: 1
 +base_model: smollm2-135m
 +training:
 +  seed: 42
 +---
++
 +# Python decorators primer
++
 +::instruction::
 +### Q
 +What is a Python decorator?
++
 +### A
 +A decorator is a function that takes another function as input and
 +returns a new function that wraps extra behavior around the original.
 +The `@decorator_name` syntax above a `def` is equivalent to
 +`name = decorator_name(name)`.
++
 +### Q
 +When should I use `functools.wraps`?
++
 +### A
 +Always use `@functools.wraps(func)` inside a decorator so the wrapped
 +function keeps its `__name__`, `__doc__`, and `__wrapped__` attribute.
 +Without it, debugging and introspection get confused.
 +```
++
 +Prose outside section fences trains via continued pretraining;
 +instruction blocks (`### Q` / `### A`) train via SFT.
++
 +## 2. Run the training loop
++
 +```sh
 +$ uv run dlm train tutor.dlm
 +```
++
 +DLM runs the hardware doctor, resolves the plan (precision,
 +batch size, grad accumulation), downloads the base model (cached on
 +re-runs), and kicks off the SFTTrainer. On a Mac M-series with MPS,
 +20 steps of SmolLM2-135M take about two minutes.
++
 +Output (abbreviated):
++
 +```
 +preflight: 9.6 GB free under ~/.dlm/store/01KC…/
 +banner:    seed=42 determinism=best-effort plan=fp16/sdpa/bs=1×8
 +step 5:    loss=3.421  lr=5.00e-04
 +step 10:   loss=2.887  lr=4.47e-04
 +step 15:   loss=2.541  lr=3.45e-04
 +step 20:   loss=2.298  lr=2.08e-04
 +trained:   v0001 (20 steps, seed=42, determinism=best-effort)
 +adapter:   ~/.dlm/store/01KC…/adapter/versions/v0001
 +log:       ~/.dlm/store/01KC…/logs/train-000001-…jsonl
 +```
++
 +## 3. Inspect the store
++
 +```sh
 +$ uv run dlm show tutor.dlm
 +dlm_id:        01KC…
 +base_model:    smollm2-135m
 +training_runs: 1
 +    run 1 → v0001, 20 steps, seed=42, loss 2.30
 +adapter:       v0001
 +manifest:      ~/.dlm/store/01KC…/manifest.json
 +lock:          ~/.dlm/store/01KC…/dlm.lock
 +```
++
 +Under the hood, each run produced:
++
 +- `adapter/versions/v0001/adapter_config.json` + `adapter_model.safetensors` — the LoRA weights
 +- `adapter/versions/v0001/training_state.pt` + `.sha256` — optimizer/scheduler/RNG sidecar (for bit-exact resume)
 +- `manifest.json` — one `TrainingRunSummary` + the `content_hashes` delta
 +- `logs/train-000001-*.jsonl` — per-step metrics
 +- `dlm.lock` — pinned versions + hardware tier + determinism contract
++
 +## 4. Retrain after edits
++
 +Edit the document, add more Q&A pairs, then:
++
 +```sh
 +$ uv run dlm train tutor.dlm
 +```
++
 +The delta system (audit-04 M1/M2) compares `content_hashes` in the
 +manifest against the current sections, so only new content drives the
 +new training signal — everything from v0001 is still in the replay
 +corpus and gets sampled into the v0002 training mix.
++
 +Want to force a clean restart instead?
++
 +```sh
 +$ uv run dlm train tutor.dlm --fresh
 +```
++
 +## Next
++
 +You have a trained adapter. [Prompt it](first-prompt.md) next.

docs/getting-started/install.mdadded

 +# Install
++
 +DocumentLanguageModel is a Python package. It depends on `torch` (GPU or
 +CPU build), `transformers`, `peft`, `trl`, and — optionally for export —
 +the `ollama` binary on your PATH.
++
 +## Prerequisites
++
 +| Requirement | Minimum | Notes |
 +|---|---|---|
 +| Python | 3.11 | `pyproject.toml` pins `python >= 3.11`. |
 +| [uv](https://github.com/astral-sh/uv) | any recent | Used for dependency resolution and running scripts. |
 +| PyTorch | 2.4+ | Installed automatically by `uv sync`. |
 +| Ollama | as reported by `dlm doctor` | Only needed for `dlm export` smoke runs. |
 +| `vendor/llama.cpp` submodule | built | Only needed for `dlm export`. `scripts/bump-llama-cpp.sh build` compiles `llama-quantize` + `llama-imatrix`. |
++
 +On Apple Silicon, MPS acceleration is detected automatically and DLM
 +plans for fp16 LoRA. On CUDA, compute capability ≥ 8.0 (Ampere and
 +newer) unlocks bf16 + QLoRA 4-bit. See [Architecture](../architecture.md)
 +for the full refusal matrix.
++
 +## Install from source
++
 +```sh
 +git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git
 +cd DocumentLanguageModel
 +uv sync
 +uv run dlm --help
 +```
++
 +`uv sync` resolves the dependency tree into `.venv/` and pulls the
 +pinned versions from `uv.lock`. Use `uv run dlm <command>` (not
 +`dlm <command>` — the CLI isn't on your shell PATH unless you activate
 +the venv).
++
 +## Install from PyPI
++
 +```sh
 +# Coming with v1.0 — the tagged release workflow publishes to PyPI via
 +# trusted-publisher OIDC. Until then, install from source.
 +pip install dlm
 +```
++
 +## Verify
++
 +```sh
 +$ uv run dlm --version
 +dlm 0.1.0
++
 +$ uv run dlm doctor
 +backend: mps
 +precision: fp16
 +attn:     sdpa
 +...
 +```
++
 +`dlm doctor` is the first command to run on a new machine. It probes
 +the GPU, reports the memory budget, picks a training plan, and warns
 +about anything missing (e.g. FlashAttention unavailable, bitsandbytes
 +not importable on CPU-only hosts).
++
 +## Next
++
 +Got `dlm doctor` output that looks healthy? Move on to the
 +[first training cycle](first-train.md).