docs: getting-started (install, first-train, first-prompt, first-export) (sprint 16)
- SHA
467e4c5d54f21b3a706f548c5495ecc1f35b2e22- Parents
-
7f4d9a7 - Tree
c64b392
467e4c5
467e4c5d54f21b3a706f548c5495ecc1f35b2e227f4d9a7
c64b392| Status | File | + | - |
|---|---|---|---|
| A |
docs/getting-started/first-export.md
|
87 | 0 |
| A |
docs/getting-started/first-prompt.md
|
66 | 0 |
| A |
docs/getting-started/first-train.md
|
121 | 0 |
| A |
docs/getting-started/install.md
|
65 | 0 |
docs/getting-started/first-export.mdadded@@ -0,0 +1,87 @@ | ||
| 1 | +# First export | |
| 2 | + | |
| 3 | +`dlm export` converts the base + adapter into GGUF files, writes a | |
| 4 | +Modelfile with an explicit Go `text/template` (no fuzzy matching), | |
| 5 | +registers the model with `ollama create`, and runs a smoke prompt. | |
| 6 | + | |
| 7 | +## Prerequisites | |
| 8 | + | |
| 9 | +- `vendor/llama.cpp` submodule is built: | |
| 10 | + ```sh | |
| 11 | + $ scripts/bump-llama-cpp.sh build | |
| 12 | + ``` | |
| 13 | + This compiles `llama-quantize` and `llama-imatrix` under | |
| 14 | + `vendor/llama.cpp/build/bin/`. | |
| 15 | + | |
| 16 | +- [Ollama](https://ollama.com/) is installed and its daemon is running. | |
| 17 | + `dlm doctor` reports the minimum version. | |
| 18 | + | |
| 19 | +## Export | |
| 20 | + | |
| 21 | +```sh | |
| 22 | +$ uv run dlm export tutor.dlm --quant Q4_K_M --name my-tutor | |
| 23 | +export: preflight ok | |
| 24 | +export: base.Q4_K_M.gguf (47 MiB) | |
| 25 | +export: adapter.gguf (3 MiB) | |
| 26 | +export: Modelfile written; ollama create my-tutor:latest | |
| 27 | +export: smoke: "Hi!" → "Hello! How can I help?" | |
| 28 | +manifest: exports[-1] recorded at ~/.dlm/store/01KC…/ | |
| 29 | +``` | |
| 30 | + | |
| 31 | +Under the hood: | |
| 32 | + | |
| 33 | +1. The export **preflight** (Sprint 11) checks the adapter config | |
| 34 | + matches the base architecture, asserts the tokenizer vocab agrees | |
| 35 | + with the base, validates the chat template, and confirms the | |
| 36 | + adapter wasn't QLoRA-trained (pitfall #3 — QLoRA merge needs | |
| 37 | + `--dequantize`). | |
| 38 | +2. The base model is converted to GGUF and quantized via | |
| 39 | + `llama-quantize`. The GGUF is cached under | |
| 40 | + `~/.dlm/store/<id>/exports/Q4_K_M/base.Q4_K_M.gguf` — subsequent | |
| 41 | + exports at the same quant reuse the file. | |
| 42 | +3. The LoRA adapter is converted to `adapter.gguf`. | |
| 43 | +4. An explicit `Modelfile` is emitted with `FROM`, `ADAPTER`, and an | |
| 44 | + explicit `TEMPLATE "..."` directive (Sprint 12). Ollama will **not** | |
| 45 | + fuzzy-match the template — the exact Go template for the base's | |
| 46 | + dialect is committed. | |
| 47 | +5. `ollama create <name>:latest` registers the model under the Ollama | |
| 48 | + daemon's control. | |
| 49 | +6. A smoke prompt runs; the first line of output is recorded in | |
| 50 | + `manifest.exports[-1].smoke_output_first_line`. | |
| 51 | + | |
| 52 | +## Quant levels | |
| 53 | + | |
| 54 | +| Quant | Size | Quality | When to use | | |
| 55 | +|---|---|---|---| | |
| 56 | +| `Q4_K_M` | ~50% of fp16 | Great default | General-purpose; recommended starting point. | | |
| 57 | +| `Q5_K_M` | ~60% | Higher quality | Willing to trade more disk for fidelity. | | |
| 58 | +| `Q8_0` | ~100% of int8 | Near-lossless | Baseline for quality comparisons. | | |
| 59 | +| `F16` | 100% | No quantization | Debugging a quant-caused regression. | | |
| 60 | + | |
| 61 | +See [Quantization tradeoffs](../cookbook/quantization-tradeoffs.md) for | |
| 62 | +a deeper dive. | |
| 63 | + | |
| 64 | +## imatrix-calibrated quantization | |
| 65 | + | |
| 66 | +If your store has a replay corpus with enough signal (Sprint 11.6), | |
| 67 | +the export runner automatically builds an imatrix from it and passes | |
| 68 | +`--imatrix` to `llama-quantize`. This gives noticeable quality | |
| 69 | +improvements on `Q4_K_M` and below without changing the API. | |
| 70 | + | |
| 71 | +Opt out with `--no-imatrix` if you'd rather have a static quant for | |
| 72 | +comparison. | |
| 73 | + | |
| 74 | +## Just produce GGUFs, skip Ollama | |
| 75 | + | |
| 76 | +```sh | |
| 77 | +$ uv run dlm export tutor.dlm --quant Q4_K_M --skip-ollama | |
| 78 | +``` | |
| 79 | + | |
| 80 | +Useful on CI runners without the Ollama daemon installed. The GGUFs | |
| 81 | +land in `exports/Q4_K_M/`; wire them into your own runtime. | |
| 82 | + | |
| 83 | +## Next | |
| 84 | + | |
| 85 | +Want to send the whole training history to a friend? The | |
| 86 | +[Sharing with pack](../cookbook/sharing-with-pack.md) cookbook shows | |
| 87 | +the `dlm pack` / `dlm unpack` round trip. | |
docs/getting-started/first-prompt.mdadded@@ -0,0 +1,66 @@ | ||
| 1 | +# First prompt | |
| 2 | + | |
| 3 | +`dlm prompt` runs inference against the current adapter using the base | |
| 4 | +model. It's the fastest way to check "did the training actually stick?" | |
| 5 | +without involving Ollama or GGUF conversion. | |
| 6 | + | |
| 7 | +## The happy path | |
| 8 | + | |
| 9 | +```sh | |
| 10 | +$ uv run dlm prompt tutor.dlm "What is a Python decorator?" | |
| 11 | +A decorator is a function that takes another function as input… | |
| 12 | +``` | |
| 13 | + | |
| 14 | +Behind the scenes: | |
| 15 | + | |
| 16 | +1. `dlm prompt` parses the `.dlm`, resolves the base model, and | |
| 17 | + checks the hardware doctor's capability report. | |
| 18 | +2. It loads the base model + `adapter/current.txt`-pointed LoRA | |
| 19 | + weights via PEFT. | |
| 20 | +3. It calls `generate()` with your prompt, `--max-tokens 256`, | |
| 21 | + `--temp 0.7` by default. | |
| 22 | +4. The response is streamed to stdout; the Rich reporter writes | |
| 23 | + progress / plan info to stderr so you can pipe stdout cleanly. | |
| 24 | + | |
| 25 | +## Deterministic generation | |
| 26 | + | |
| 27 | +For reproducible output (useful for comparing adapters), pin | |
| 28 | +temperature to 0: | |
| 29 | + | |
| 30 | +```sh | |
| 31 | +$ uv run dlm prompt tutor.dlm --temp 0 --max-tokens 32 "Say hi" | |
| 32 | +``` | |
| 33 | + | |
| 34 | +Greedy decoding is deterministic when the weights are byte-identical — | |
| 35 | +which is the whole point of the [determinism contract](../determinism.md). | |
| 36 | + | |
| 37 | +## Verbose plan | |
| 38 | + | |
| 39 | +Pass `--verbose` to surface the inference plan before generation: | |
| 40 | + | |
| 41 | +```sh | |
| 42 | +$ uv run dlm prompt tutor.dlm --verbose "Hello" | |
| 43 | +plan: {'device': 'mps', 'dtype': 'fp16', 'adapter_path': '...', 'quantization': 'none'} | |
| 44 | +adapter: ~/.dlm/store/01KC…/adapter/versions/v0001 | |
| 45 | +Hello! How can I help you today? | |
| 46 | +``` | |
| 47 | + | |
| 48 | +The `plan` dict is the same object written into `manifest.json` on | |
| 49 | +training, so you can cross-reference what the model was doing the | |
| 50 | +last time it trained. | |
| 51 | + | |
| 52 | +## Piping and stdin | |
| 53 | + | |
| 54 | +Prompt via stdin for long inputs: | |
| 55 | + | |
| 56 | +```sh | |
| 57 | +$ cat long-prompt.txt | uv run dlm prompt tutor.dlm | |
| 58 | +``` | |
| 59 | + | |
| 60 | +An empty stdin (no query argument either) exits with a non-zero code | |
| 61 | +and a clear error, rather than hanging. | |
| 62 | + | |
| 63 | +## Next | |
| 64 | + | |
| 65 | +Happy with inference? [Export to Ollama](first-export.md) for a real | |
| 66 | +standalone model. | |
docs/getting-started/first-train.mdadded@@ -0,0 +1,121 @@ | ||
| 1 | +# First training cycle | |
| 2 | + | |
| 3 | +This walks you through creating a `.dlm` document, training a LoRA | |
| 4 | +adapter against `smollm2-135m`, and confirming the artifacts on disk. | |
| 5 | + | |
| 6 | +## 1. Create a document | |
| 7 | + | |
| 8 | +```sh | |
| 9 | +$ uv run dlm init tutor.dlm --base smollm2-135m | |
| 10 | +created: tutor.dlm | |
| 11 | +dlm_id: 01KC… (26-character ULID) | |
| 12 | +base: smollm2-135m (HuggingFaceTB/SmolLM2-135M-Instruct) | |
| 13 | +store: ~/.dlm/store/01KC…/ | |
| 14 | +``` | |
| 15 | + | |
| 16 | +`dlm init` writes a minimal `.dlm` with a fresh ULID in the frontmatter | |
| 17 | +and provisions the store directory. | |
| 18 | + | |
| 19 | +Open `tutor.dlm` in your editor and add some training signal: | |
| 20 | + | |
| 21 | +```dlm | |
| 22 | +--- | |
| 23 | +dlm_id: 01KC... | |
| 24 | +dlm_version: 1 | |
| 25 | +base_model: smollm2-135m | |
| 26 | +training: | |
| 27 | + seed: 42 | |
| 28 | +--- | |
| 29 | + | |
| 30 | +# Python decorators primer | |
| 31 | + | |
| 32 | +::instruction:: | |
| 33 | +### Q | |
| 34 | +What is a Python decorator? | |
| 35 | + | |
| 36 | +### A | |
| 37 | +A decorator is a function that takes another function as input and | |
| 38 | +returns a new function that wraps extra behavior around the original. | |
| 39 | +The `@decorator_name` syntax above a `def` is equivalent to | |
| 40 | +`name = decorator_name(name)`. | |
| 41 | + | |
| 42 | +### Q | |
| 43 | +When should I use `functools.wraps`? | |
| 44 | + | |
| 45 | +### A | |
| 46 | +Always use `@functools.wraps(func)` inside a decorator so the wrapped | |
| 47 | +function keeps its `__name__`, `__doc__`, and `__wrapped__` attribute. | |
| 48 | +Without it, debugging and introspection get confused. | |
| 49 | +``` | |
| 50 | + | |
| 51 | +Prose outside section fences trains via continued pretraining; | |
| 52 | +instruction blocks (`### Q` / `### A`) train via SFT. | |
| 53 | + | |
| 54 | +## 2. Run the training loop | |
| 55 | + | |
| 56 | +```sh | |
| 57 | +$ uv run dlm train tutor.dlm | |
| 58 | +``` | |
| 59 | + | |
| 60 | +DLM runs the hardware doctor, resolves the plan (precision, | |
| 61 | +batch size, grad accumulation), downloads the base model (cached on | |
| 62 | +re-runs), and kicks off the SFTTrainer. On a Mac M-series with MPS, | |
| 63 | +20 steps of SmolLM2-135M take about two minutes. | |
| 64 | + | |
| 65 | +Output (abbreviated): | |
| 66 | + | |
| 67 | +``` | |
| 68 | +preflight: 9.6 GB free under ~/.dlm/store/01KC…/ | |
| 69 | +banner: seed=42 determinism=best-effort plan=fp16/sdpa/bs=1×8 | |
| 70 | +step 5: loss=3.421 lr=5.00e-04 | |
| 71 | +step 10: loss=2.887 lr=4.47e-04 | |
| 72 | +step 15: loss=2.541 lr=3.45e-04 | |
| 73 | +step 20: loss=2.298 lr=2.08e-04 | |
| 74 | +trained: v0001 (20 steps, seed=42, determinism=best-effort) | |
| 75 | +adapter: ~/.dlm/store/01KC…/adapter/versions/v0001 | |
| 76 | +log: ~/.dlm/store/01KC…/logs/train-000001-…jsonl | |
| 77 | +``` | |
| 78 | + | |
| 79 | +## 3. Inspect the store | |
| 80 | + | |
| 81 | +```sh | |
| 82 | +$ uv run dlm show tutor.dlm | |
| 83 | +dlm_id: 01KC… | |
| 84 | +base_model: smollm2-135m | |
| 85 | +training_runs: 1 | |
| 86 | + run 1 → v0001, 20 steps, seed=42, loss 2.30 | |
| 87 | +adapter: v0001 | |
| 88 | +manifest: ~/.dlm/store/01KC…/manifest.json | |
| 89 | +lock: ~/.dlm/store/01KC…/dlm.lock | |
| 90 | +``` | |
| 91 | + | |
| 92 | +Under the hood, each run produced: | |
| 93 | + | |
| 94 | +- `adapter/versions/v0001/adapter_config.json` + `adapter_model.safetensors` — the LoRA weights | |
| 95 | +- `adapter/versions/v0001/training_state.pt` + `.sha256` — optimizer/scheduler/RNG sidecar (for bit-exact resume) | |
| 96 | +- `manifest.json` — one `TrainingRunSummary` + the `content_hashes` delta | |
| 97 | +- `logs/train-000001-*.jsonl` — per-step metrics | |
| 98 | +- `dlm.lock` — pinned versions + hardware tier + determinism contract | |
| 99 | + | |
| 100 | +## 4. Retrain after edits | |
| 101 | + | |
| 102 | +Edit the document, add more Q&A pairs, then: | |
| 103 | + | |
| 104 | +```sh | |
| 105 | +$ uv run dlm train tutor.dlm | |
| 106 | +``` | |
| 107 | + | |
| 108 | +The delta system (audit-04 M1/M2) compares `content_hashes` in the | |
| 109 | +manifest against the current sections, so only new content drives the | |
| 110 | +new training signal — everything from v0001 is still in the replay | |
| 111 | +corpus and gets sampled into the v0002 training mix. | |
| 112 | + | |
| 113 | +Want to force a clean restart instead? | |
| 114 | + | |
| 115 | +```sh | |
| 116 | +$ uv run dlm train tutor.dlm --fresh | |
| 117 | +``` | |
| 118 | + | |
| 119 | +## Next | |
| 120 | + | |
| 121 | +You have a trained adapter. [Prompt it](first-prompt.md) next. | |
docs/getting-started/install.mdadded@@ -0,0 +1,65 @@ | ||
| 1 | +# Install | |
| 2 | + | |
| 3 | +DocumentLanguageModel is a Python package. It depends on `torch` (GPU or | |
| 4 | +CPU build), `transformers`, `peft`, `trl`, and — optionally for export — | |
| 5 | +the `ollama` binary on your PATH. | |
| 6 | + | |
| 7 | +## Prerequisites | |
| 8 | + | |
| 9 | +| Requirement | Minimum | Notes | | |
| 10 | +|---|---|---| | |
| 11 | +| Python | 3.11 | `pyproject.toml` pins `python >= 3.11`. | | |
| 12 | +| [uv](https://github.com/astral-sh/uv) | any recent | Used for dependency resolution and running scripts. | | |
| 13 | +| PyTorch | 2.4+ | Installed automatically by `uv sync`. | | |
| 14 | +| Ollama | as reported by `dlm doctor` | Only needed for `dlm export` smoke runs. | | |
| 15 | +| `vendor/llama.cpp` submodule | built | Only needed for `dlm export`. `scripts/bump-llama-cpp.sh build` compiles `llama-quantize` + `llama-imatrix`. | | |
| 16 | + | |
| 17 | +On Apple Silicon, MPS acceleration is detected automatically and DLM | |
| 18 | +plans for fp16 LoRA. On CUDA, compute capability ≥ 8.0 (Ampere and | |
| 19 | +newer) unlocks bf16 + QLoRA 4-bit. See [Architecture](../architecture.md) | |
| 20 | +for the full refusal matrix. | |
| 21 | + | |
| 22 | +## Install from source | |
| 23 | + | |
| 24 | +```sh | |
| 25 | +git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git | |
| 26 | +cd DocumentLanguageModel | |
| 27 | +uv sync | |
| 28 | +uv run dlm --help | |
| 29 | +``` | |
| 30 | + | |
| 31 | +`uv sync` resolves the dependency tree into `.venv/` and pulls the | |
| 32 | +pinned versions from `uv.lock`. Use `uv run dlm <command>` (not | |
| 33 | +`dlm <command>` — the CLI isn't on your shell PATH unless you activate | |
| 34 | +the venv). | |
| 35 | + | |
| 36 | +## Install from PyPI | |
| 37 | + | |
| 38 | +```sh | |
| 39 | +# Coming with v1.0 — the tagged release workflow publishes to PyPI via | |
| 40 | +# trusted-publisher OIDC. Until then, install from source. | |
| 41 | +pip install dlm | |
| 42 | +``` | |
| 43 | + | |
| 44 | +## Verify | |
| 45 | + | |
| 46 | +```sh | |
| 47 | +$ uv run dlm --version | |
| 48 | +dlm 0.1.0 | |
| 49 | + | |
| 50 | +$ uv run dlm doctor | |
| 51 | +backend: mps | |
| 52 | +precision: fp16 | |
| 53 | +attn: sdpa | |
| 54 | +... | |
| 55 | +``` | |
| 56 | + | |
| 57 | +`dlm doctor` is the first command to run on a new machine. It probes | |
| 58 | +the GPU, reports the memory budget, picks a training plan, and warns | |
| 59 | +about anything missing (e.g. FlashAttention unavailable, bitsandbytes | |
| 60 | +not importable on CPU-only hosts). | |
| 61 | + | |
| 62 | +## Next | |
| 63 | + | |
| 64 | +Got `dlm doctor` output that looks healthy? Move on to the | |
| 65 | +[first training cycle](first-train.md). | |