documentlanguagemodel Public
DocumentLanguageModel
.dlmis a trainable local AI document format: typed sections, directives, replay-backed retraining, and export.
DocumentLanguageModel (DLM) is a local-first training, inference, and export toolchain built around authored documents instead of hosted dashboards.
A .dlm can be:
- a hand-written training document with prose, instruction, and preference data
- a directive-driven entrypoint into a codebase or notes tree
- a multi-adapter project with learned routing
- a selected multimodal or audio-language document
DLM trains LoRA / QLoRA / DoRA adapters on real pretrained bases, keeps a replay
history so retrains do not silently forget, and exports local runtimes such as
Ollama, llama-server, vllm, and mlx-serve.
Status: pre-v1.0, but far beyond the original MVP framing. The core
author/train/prompt/export/pack/share loop is real, and newer runtime-target
work is landing incrementally. Current export targets are ollama,
llama-server, vllm, and mlx-serve.
What A .dlm Actually Is
A .dlm is not just “a text file with a special extension.”
It is a trainable project surface with:
- frontmatter for base-model choice, training config, export defaults, sources, cache policy, and multi-adapter gate settings
- typed body sections such as prose,
::instruction::,::preference::,::image::, and::audio:: - adapter routing via fences like
::instruction#knowledge:: - directive-driven ingestion from files and directories through
training.sources - repo-local subtree control through
.dlm/training.yamland.dlm/ignore - a stable
dlm_idthat binds the document to a local store under~/.dlm/store/<dlm_id>/
That combination is what makes DLM more like a local AI authoring format than a single prompt file.
Why DLM
Most “personal AI” tooling still pushes you toward one of two bad choices:
- upload your data to someone else’s cloud
- run an oversized model with weak authoring and retraining ergonomics
DLM sits in the gap:
- The document is the interface. You author the thing you care about instead of wiring together a hidden dataset pipeline.
- Training is real. LoRA / QLoRA / DoRA on pretrained bases, not a toy from-scratch transformer.
- Retraining is additive. Previous document versions flow into a replay corpus so the model does not forget last week’s state by default.
- Everything stays local. Training, inference, store state, exports, and packs all live on your machine unless you explicitly push them somewhere.
- Determinism is a contract. Locks, pinned versions, and golden checks are first-class design constraints, not “best effort.”
Core Capabilities
- Author structured training data in one place. Mix prose, SFT examples, preferences, image sections, and audio sections in one document.
- Ingest whole trees, not just one file.
training.sourcescan walk a repo, and subtree-local.dlm/training.yaml/.dlm/ignorelet the corpus carry its own curation rules. - Train on modern base families. Text, reasoning-tuned, sparse-MoE,
vision-language, and audio-language registry rows ship today, plus
hf:org/nameescape hatches. - Compose multiple adapters in one document. Named adapters, weighted export
mixes, and learned adapter gates let one
.dlmseparate knowledge, tone, or persona lanes. - Mine preference pairs from a live adapter.
dlm preference minecan usesway, HF reward models, or external CLI judges to write auto-mined::preference::sections back into the document. - Stay in a local iteration loop.
dlm prompt,dlm repl,dlm train --watch,dlm metrics, anddlm doctorare all part of the normal workflow now. - Export beyond the original Ollama-only story. DLM still does explicit
Ollama exports with pinned templates, and now also emits
llama-server,vllm, andmlx-servelaunch artifacts for local runtime targets. - Close the eval loop.
dlm harvestcan pull failingsway-style probe reports back into the document as new training examples. - Pack and share reproducibly.
.dlm.pack, verification, push/pull, and local serve flows are all built around the same store contracts.
Supported Platforms
| Tier | Training | Inference / export |
|---|---|---|
| NVIDIA CUDA (SM ≥ 8.0) | bf16 + QLoRA 4-bit + FlashAttention | Ollama, GGUF export, llama-server, vllm |
| NVIDIA CUDA (SM < 8.0) | fp16 LoRA | Ollama, GGUF export, llama-server, vllm |
| Apple Silicon (MPS) | fp16 or fp32 LoRA depending on doctor plan | Ollama, selected MLX inference paths, GGUF export, vllm (conservative Metal defaults), mlx-serve |
| CPU | inference-first; training refused above small bases unless forced | GGUF export, Ollama, llama-server |
| AMD ROCm | experimental | ROCm-oriented llama.cpp flows |
See docs/hardware and docs/hardware/vl-memory.md for the real support matrix and current caveats.
Install
From the Homebrew tap
brew tap tenseleyFlow/tap
brew install dlm
# Optional, only if you want `--target ollama` registration/smoke:
brew install ollama
brew install dlm pulls in the Python environment and the vendored
llama.cpp source tree DLM uses for GGUF conversion. CUDA users unlock QLoRA
after install:
$(brew --prefix dlm)/libexec/venv/bin/pip install 'dlm[cuda]'
From source
git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git
cd DocumentLanguageModel
uv sync
# Build GGUF tooling:
scripts/bump-llama-cpp.sh build
# If you want the llama.cpp HTTP target too:
scripts/bump-llama-cpp.sh build --with-server
# If you want the Apple Silicon MLX HTTP target:
uv sync --extra mlx
# If you want the vLLM HTTP target:
# install a compatible vllm runtime separately; DLM writes launch artifacts
# but does not bundle the server runtime itself.
uv run dlm --help
We deliberately do not publish to PyPI yet. See CONTRIBUTING.md for the release flow.
30-Second Start
uv run dlm init tutor.dlm --base smollm2-135m
$EDITOR tutor.dlm
uv run dlm train tutor.dlm
uv run dlm prompt tutor.dlm "What is a Python decorator?"
uv run dlm export tutor.dlm --target ollama --name my-tutor
A minimal .dlm still works:
---
dlm_id: 01KPM5CXB51GRX86Q25AKERN6E
dlm_version: 1
base_model: smollm2-135m
---
# Your document title
Write prose here.
::instruction::
### Q
What is a decorator?
### A
A function that takes a function and returns a wrapped function.
That path is still important. It is just no longer the whole story.
Authoring Beyond The Toy Example
A more representative .dlm can mix directives, named adapters, and export
defaults in one place:
---
dlm_id: 01KTESTEXAMPLE000000000000
dlm_version: 1
base_model: qwen3-1.7b
system_prompt: |
You are a concise engineering assistant.
training:
adapter: lora
sequence_len: 4096
sources_policy: strict
sources:
- path: ./src
include: ["**/*.py", "**/*.md"]
exclude: ["tests/**", "**/__pycache__/**"]
adapters:
knowledge:
adapter: lora
lora_r: 8
tone:
adapter: lora
lora_r: 4
gate:
enabled: true
export:
default_quant: Q4_K_M
---
# Project notes
Shared prose trains all declared adapters by default.
::instruction#knowledge::
### Q
What does the cache layer do?
### A
It avoids re-tokenizing unchanged directive-sourced files.
::preference#tone::
### Prompt
Explain a failure mode.
### Chosen
Explain it directly, then give the fix.
### Rejected
Over-explain the background before naming the problem.
Two important upgrades over the older README story:
training.sourcescan turn a repo or notes tree into synthetic training sections.training.adapters+training.gatelet one document route prompts across multiple adapters instead of pretending one flat adapter is the only mode.
If you need deeper subtree-specific curation, drop .dlm/training.yaml and
.dlm/ignore into nested directories and let the corpus carry its own rules.
Common Workflows
1. Hand-authored document
uv run dlm init tutor.dlm --base smollm2-135m
uv run dlm train tutor.dlm
uv run dlm prompt tutor.dlm "Explain decorators"
2. Train across a codebase
uv run dlm train ./my-repo --base qwen3-1.7b --include '**/*.py' --name corpus
That auto-scaffolds a .dlm under ./my-repo/.dlm/ and lets the repo become
its own training surface.
3. Multi-adapter composition
uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter knowledge
uv run dlm export mydoc.dlm --adapter-mix knowledge:1.0,tone:0.5
4. Local iteration loop
uv run dlm train mydoc.dlm --watch
uv run dlm repl mydoc.dlm
uv run dlm metrics mydoc.dlm
5. Export and ship
uv run dlm export mydoc.dlm --target ollama --name mydoc
uv run dlm export mydoc.dlm --target llama-server
uv run dlm export mydoc.dlm --target vllm
uv run dlm export mydoc.dlm --target mlx-serve
uv run dlm pack mydoc.dlm --include-exports
uv run dlm verify mydoc.dlm.pack
# Also emit a ready-to-run sway.yaml next to the GGUF for downstream
# evaluation via `sway run` (requires the [sway] extra).
uv run dlm export mydoc.dlm --target ollama --emit-sway-json
uv run sway run <export-dir>/sway.yaml
On Apple Silicon, --target vllm now emits conservative vllm-metal
defaults in the launch script: it pins the server to the MLX KV path
(VLLM_METAL_USE_PAGED_ATTENTION=0, VLLM_METAL_MEMORY_FRACTION=auto)
and caps --max-model-len to the document's training.sequence_len
instead of blindly asking vllm for the base model's full context.
6. Mine preference pairs and retrain
uv run dlm preference mine mydoc.dlm --samples 4 --max-pairs 8
uv run dlm preference list mydoc.dlm
uv run dlm preference apply mydoc.dlm
uv run dlm train mydoc.dlm --phase preference
# A/B check against hand-authored pairs only:
uv run dlm train mydoc.dlm --phase preference --no-mined
# Use a different judge when bootstrap self-judging is not enough:
uv run dlm preference mine mydoc.dlm --judge hf:YourOrg/reward-model --apply
7. Scaffold multimodal or audio docs
uv run dlm init diagrams.dlm --multimodal --base qwen2-vl-2b-instruct
uv run dlm train diagrams.dlm
uv run dlm prompt diagrams.dlm --image figures/system.png "What is happening here?"
uv run dlm init calls.dlm --audio
uv run dlm train calls.dlm
uv run dlm prompt calls.dlm --audio clips/example.wav "Summarize the clip"
8. Pull eval failures back into training
uv run dlm harvest mydoc.dlm --sway-json sway-report.json --apply
That is the probe-driven loop: evaluation finds a miss, DLM turns it into document-level training data, and the next train closes the gap.
9. Inspect store state and reproducibility
uv run dlm doctor
uv run dlm show mydoc.dlm --json
uv run dlm metrics mydoc.dlm --run-id 7 --json
uv run dlm pack mydoc.dlm --include-exports
uv run dlm verify mydoc.dlm.pack
Command Surface
The CLI is broader than the original MVP now. A useful mental map:
| Area | Commands | What they cover |
|---|---|---|
| Author | init, templates, show, migrate, cache |
Create docs, inspect them, migrate schema, manage cache state |
| Train | train, doctor, metrics, harvest |
Run training, inspect plans, observe runs, pull eval misses back in |
| Align | preference |
Mine, stage, apply, revert, and inspect auto-mined preference sections |
| Infer | prompt, repl |
Local interactive and one-shot inference |
| Ship | export, pack, unpack, verify, push, pull, serve |
Export to runtimes, bundle, verify, and move artifacts |
See the CLI reference for the full flag surface.
Documentation
- Getting started
- Frontmatter reference
- Section grammar
- Preference section reference
- Training across codebases
- Train from a folder
- Multi-source training
- Tokenized-section cache
- Multi-adapter composition
- Learned adapter gate
- Self-improving loop / preference mining
- Reward-model integration
- Multimodal training
- Audio training
- Probe-driven training / sway harvest
- Multi-target export
- Sharing adapters and packs
- CLI reference
- Architecture
- Determinism
Principles
- The document is the interface. But the document is structured: frontmatter, typed sections, directives, and store contracts all matter.
- Training is real. LoRA / QLoRA / DoRA on pretrained bases, not a toy transformer.
- Retraining should not silently forget. Replay-backed accumulation is part of the product.
- Local-first is load-bearing. Your training data, adapters, exports, and packs stay on your machine unless you explicitly move them.
- Determinism is a contract. If a change breaks the reproducibility story, that is a product regression.
Tech Stack
Python 3.11+ · PyTorch · HuggingFace transformers / peft / trl /
accelerate / datasets · watchfiles · prompt-toolkit · safetensors ·
vendored llama.cpp for GGUF export · Ollama (optional runtime target) ·
Typer · Pydantic · uv
Contributing
See CONTRIBUTING.md. Testing conventions live in docs-internal/README-testing.md.
uv run pre-commit install
License
MIT. Base-model licenses are separate and enforced where DLM needs them:
dlm init, dlm train, dlm export, and dlm pack all keep the gated-base
acceptance path explicit.
View source
| 1 | # DocumentLanguageModel |
| 2 | |
| 3 | > `.dlm` is a trainable local AI document format: typed sections, directives, |
| 4 | > replay-backed retraining, and export. |
| 5 | |
| 6 | DocumentLanguageModel (DLM) is a local-first training, inference, and export |
| 7 | toolchain built around authored documents instead of hosted dashboards. |
| 8 | |
| 9 | A `.dlm` can be: |
| 10 | |
| 11 | - a hand-written training document with prose, instruction, and preference data |
| 12 | - a directive-driven entrypoint into a codebase or notes tree |
| 13 | - a multi-adapter project with learned routing |
| 14 | - a selected multimodal or audio-language document |
| 15 | |
| 16 | DLM trains LoRA / QLoRA / DoRA adapters on real pretrained bases, keeps a replay |
| 17 | history so retrains do not silently forget, and exports local runtimes such as |
| 18 | Ollama, `llama-server`, `vllm`, and `mlx-serve`. |
| 19 | |
| 20 | **Status:** pre-v1.0, but far beyond the original MVP framing. The core |
| 21 | author/train/prompt/export/pack/share loop is real, and newer runtime-target |
| 22 | work is landing incrementally. Current export targets are `ollama`, |
| 23 | `llama-server`, `vllm`, and `mlx-serve`. |
| 24 | |
| 25 | ## What A `.dlm` Actually Is |
| 26 | |
| 27 | A `.dlm` is not just “a text file with a special extension.” |
| 28 | |
| 29 | It is a trainable project surface with: |
| 30 | |
| 31 | - **frontmatter** for base-model choice, training config, export defaults, |
| 32 | sources, cache policy, and multi-adapter gate settings |
| 33 | - **typed body sections** such as prose, `::instruction::`, |
| 34 | `::preference::`, `::image::`, and `::audio::` |
| 35 | - **adapter routing** via fences like `::instruction#knowledge::` |
| 36 | - **directive-driven ingestion** from files and directories through |
| 37 | `training.sources` |
| 38 | - **repo-local subtree control** through `.dlm/training.yaml` and `.dlm/ignore` |
| 39 | - a stable **`dlm_id`** that binds the document to a local store under |
| 40 | `~/.dlm/store/<dlm_id>/` |
| 41 | |
| 42 | That combination is what makes DLM more like a local AI authoring format than a |
| 43 | single prompt file. |
| 44 | |
| 45 | ## Why DLM |
| 46 | |
| 47 | Most “personal AI” tooling still pushes you toward one of two bad choices: |
| 48 | |
| 49 | - upload your data to someone else’s cloud |
| 50 | - run an oversized model with weak authoring and retraining ergonomics |
| 51 | |
| 52 | DLM sits in the gap: |
| 53 | |
| 54 | - **The document is the interface.** You author the thing you care about instead |
| 55 | of wiring together a hidden dataset pipeline. |
| 56 | - **Training is real.** LoRA / QLoRA / DoRA on pretrained bases, not a toy |
| 57 | from-scratch transformer. |
| 58 | - **Retraining is additive.** Previous document versions flow into a replay |
| 59 | corpus so the model does not forget last week’s state by default. |
| 60 | - **Everything stays local.** Training, inference, store state, exports, and |
| 61 | packs all live on your machine unless you explicitly push them somewhere. |
| 62 | - **Determinism is a contract.** Locks, pinned versions, and golden checks are |
| 63 | first-class design constraints, not “best effort.” |
| 64 | |
| 65 | ## Core Capabilities |
| 66 | |
| 67 | - **Author structured training data in one place.** Mix prose, SFT examples, |
| 68 | preferences, image sections, and audio sections in one document. |
| 69 | - **Ingest whole trees, not just one file.** `training.sources` can walk a |
| 70 | repo, and subtree-local `.dlm/training.yaml` / `.dlm/ignore` let the corpus |
| 71 | carry its own curation rules. |
| 72 | - **Train on modern base families.** Text, reasoning-tuned, sparse-MoE, |
| 73 | vision-language, and audio-language registry rows ship today, plus `hf:org/name` |
| 74 | escape hatches. |
| 75 | - **Compose multiple adapters in one document.** Named adapters, weighted export |
| 76 | mixes, and learned adapter gates let one `.dlm` separate knowledge, tone, or |
| 77 | persona lanes. |
| 78 | - **Mine preference pairs from a live adapter.** `dlm preference mine` can use |
| 79 | `sway`, HF reward models, or external CLI judges to write auto-mined |
| 80 | `::preference::` sections back into the document. |
| 81 | - **Stay in a local iteration loop.** `dlm prompt`, `dlm repl`, |
| 82 | `dlm train --watch`, `dlm metrics`, and `dlm doctor` are all part of the |
| 83 | normal workflow now. |
| 84 | - **Export beyond the original Ollama-only story.** DLM still does explicit |
| 85 | Ollama exports with pinned templates, and now also emits `llama-server`, |
| 86 | `vllm`, and `mlx-serve` launch artifacts for local runtime targets. |
| 87 | - **Close the eval loop.** `dlm harvest` can pull failing `sway`-style probe |
| 88 | reports back into the document as new training examples. |
| 89 | - **Pack and share reproducibly.** `.dlm.pack`, verification, push/pull, and |
| 90 | local serve flows are all built around the same store contracts. |
| 91 | |
| 92 | ## Supported Platforms |
| 93 | |
| 94 | | Tier | Training | Inference / export | |
| 95 | |---|---|---| |
| 96 | | NVIDIA CUDA (SM ≥ 8.0) | bf16 + QLoRA 4-bit + FlashAttention | Ollama, GGUF export, `llama-server`, `vllm` | |
| 97 | | NVIDIA CUDA (SM < 8.0) | fp16 LoRA | Ollama, GGUF export, `llama-server`, `vllm` | |
| 98 | | Apple Silicon (MPS) | fp16 or fp32 LoRA depending on doctor plan | Ollama, selected MLX inference paths, GGUF export, `vllm` (conservative Metal defaults), `mlx-serve` | |
| 99 | | CPU | inference-first; training refused above small bases unless forced | GGUF export, Ollama, `llama-server` | |
| 100 | | AMD ROCm | experimental | ROCm-oriented llama.cpp flows | |
| 101 | |
| 102 | See [docs/hardware](./docs/hardware/memory-estimates.md) and |
| 103 | [docs/hardware/vl-memory.md](./docs/hardware/vl-memory.md) for the real support |
| 104 | matrix and current caveats. |
| 105 | |
| 106 | ## Install |
| 107 | |
| 108 | ### From the Homebrew tap |
| 109 | |
| 110 | ```sh |
| 111 | brew tap tenseleyFlow/tap |
| 112 | brew install dlm |
| 113 | |
| 114 | # Optional, only if you want `--target ollama` registration/smoke: |
| 115 | brew install ollama |
| 116 | ``` |
| 117 | |
| 118 | `brew install dlm` pulls in the Python environment and the vendored |
| 119 | `llama.cpp` source tree DLM uses for GGUF conversion. CUDA users unlock QLoRA |
| 120 | after install: |
| 121 | |
| 122 | ```sh |
| 123 | $(brew --prefix dlm)/libexec/venv/bin/pip install 'dlm[cuda]' |
| 124 | ``` |
| 125 | |
| 126 | ### From source |
| 127 | |
| 128 | ```sh |
| 129 | git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git |
| 130 | cd DocumentLanguageModel |
| 131 | uv sync |
| 132 | |
| 133 | # Build GGUF tooling: |
| 134 | scripts/bump-llama-cpp.sh build |
| 135 | |
| 136 | # If you want the llama.cpp HTTP target too: |
| 137 | scripts/bump-llama-cpp.sh build --with-server |
| 138 | |
| 139 | # If you want the Apple Silicon MLX HTTP target: |
| 140 | uv sync --extra mlx |
| 141 | |
| 142 | # If you want the vLLM HTTP target: |
| 143 | # install a compatible vllm runtime separately; DLM writes launch artifacts |
| 144 | # but does not bundle the server runtime itself. |
| 145 | |
| 146 | uv run dlm --help |
| 147 | ``` |
| 148 | |
| 149 | We deliberately do not publish to PyPI yet. See |
| 150 | [CONTRIBUTING.md](./CONTRIBUTING.md) for the release flow. |
| 151 | |
| 152 | ## 30-Second Start |
| 153 | |
| 154 | ```sh |
| 155 | uv run dlm init tutor.dlm --base smollm2-135m |
| 156 | $EDITOR tutor.dlm |
| 157 | uv run dlm train tutor.dlm |
| 158 | uv run dlm prompt tutor.dlm "What is a Python decorator?" |
| 159 | uv run dlm export tutor.dlm --target ollama --name my-tutor |
| 160 | ``` |
| 161 | |
| 162 | A minimal `.dlm` still works: |
| 163 | |
| 164 | ```dlm |
| 165 | --- |
| 166 | dlm_id: 01KPM5CXB51GRX86Q25AKERN6E |
| 167 | dlm_version: 1 |
| 168 | base_model: smollm2-135m |
| 169 | --- |
| 170 | |
| 171 | # Your document title |
| 172 | |
| 173 | Write prose here. |
| 174 | |
| 175 | ::instruction:: |
| 176 | ### Q |
| 177 | What is a decorator? |
| 178 | |
| 179 | ### A |
| 180 | A function that takes a function and returns a wrapped function. |
| 181 | ``` |
| 182 | |
| 183 | That path is still important. It is just no longer the whole story. |
| 184 | |
| 185 | ## Authoring Beyond The Toy Example |
| 186 | |
| 187 | A more representative `.dlm` can mix directives, named adapters, and export |
| 188 | defaults in one place: |
| 189 | |
| 190 | ```dlm |
| 191 | --- |
| 192 | dlm_id: 01KTESTEXAMPLE000000000000 |
| 193 | dlm_version: 1 |
| 194 | base_model: qwen3-1.7b |
| 195 | system_prompt: | |
| 196 | You are a concise engineering assistant. |
| 197 | training: |
| 198 | adapter: lora |
| 199 | sequence_len: 4096 |
| 200 | sources_policy: strict |
| 201 | sources: |
| 202 | - path: ./src |
| 203 | include: ["**/*.py", "**/*.md"] |
| 204 | exclude: ["tests/**", "**/__pycache__/**"] |
| 205 | adapters: |
| 206 | knowledge: |
| 207 | adapter: lora |
| 208 | lora_r: 8 |
| 209 | tone: |
| 210 | adapter: lora |
| 211 | lora_r: 4 |
| 212 | gate: |
| 213 | enabled: true |
| 214 | export: |
| 215 | default_quant: Q4_K_M |
| 216 | --- |
| 217 | |
| 218 | # Project notes |
| 219 | |
| 220 | Shared prose trains all declared adapters by default. |
| 221 | |
| 222 | ::instruction#knowledge:: |
| 223 | ### Q |
| 224 | What does the cache layer do? |
| 225 | |
| 226 | ### A |
| 227 | It avoids re-tokenizing unchanged directive-sourced files. |
| 228 | |
| 229 | ::preference#tone:: |
| 230 | ### Prompt |
| 231 | Explain a failure mode. |
| 232 | |
| 233 | ### Chosen |
| 234 | Explain it directly, then give the fix. |
| 235 | |
| 236 | ### Rejected |
| 237 | Over-explain the background before naming the problem. |
| 238 | ``` |
| 239 | |
| 240 | Two important upgrades over the older README story: |
| 241 | |
| 242 | - `training.sources` can turn a repo or notes tree into synthetic training |
| 243 | sections. |
| 244 | - `training.adapters` + `training.gate` let one document route prompts across |
| 245 | multiple adapters instead of pretending one flat adapter is the only mode. |
| 246 | |
| 247 | If you need deeper subtree-specific curation, drop `.dlm/training.yaml` and |
| 248 | `.dlm/ignore` into nested directories and let the corpus carry its own rules. |
| 249 | |
| 250 | ## Common Workflows |
| 251 | |
| 252 | ### 1. Hand-authored document |
| 253 | |
| 254 | ```sh |
| 255 | uv run dlm init tutor.dlm --base smollm2-135m |
| 256 | uv run dlm train tutor.dlm |
| 257 | uv run dlm prompt tutor.dlm "Explain decorators" |
| 258 | ``` |
| 259 | |
| 260 | ### 2. Train across a codebase |
| 261 | |
| 262 | ```sh |
| 263 | uv run dlm train ./my-repo --base qwen3-1.7b --include '**/*.py' --name corpus |
| 264 | ``` |
| 265 | |
| 266 | That auto-scaffolds a `.dlm` under `./my-repo/.dlm/` and lets the repo become |
| 267 | its own training surface. |
| 268 | |
| 269 | ### 3. Multi-adapter composition |
| 270 | |
| 271 | ```sh |
| 272 | uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter knowledge |
| 273 | uv run dlm export mydoc.dlm --adapter-mix knowledge:1.0,tone:0.5 |
| 274 | ``` |
| 275 | |
| 276 | ### 4. Local iteration loop |
| 277 | |
| 278 | ```sh |
| 279 | uv run dlm train mydoc.dlm --watch |
| 280 | uv run dlm repl mydoc.dlm |
| 281 | uv run dlm metrics mydoc.dlm |
| 282 | ``` |
| 283 | |
| 284 | ### 5. Export and ship |
| 285 | |
| 286 | ```sh |
| 287 | uv run dlm export mydoc.dlm --target ollama --name mydoc |
| 288 | uv run dlm export mydoc.dlm --target llama-server |
| 289 | uv run dlm export mydoc.dlm --target vllm |
| 290 | uv run dlm export mydoc.dlm --target mlx-serve |
| 291 | uv run dlm pack mydoc.dlm --include-exports |
| 292 | uv run dlm verify mydoc.dlm.pack |
| 293 | |
| 294 | # Also emit a ready-to-run sway.yaml next to the GGUF for downstream |
| 295 | # evaluation via `sway run` (requires the [sway] extra). |
| 296 | uv run dlm export mydoc.dlm --target ollama --emit-sway-json |
| 297 | uv run sway run <export-dir>/sway.yaml |
| 298 | ``` |
| 299 | |
| 300 | On Apple Silicon, `--target vllm` now emits conservative `vllm-metal` |
| 301 | defaults in the launch script: it pins the server to the MLX KV path |
| 302 | (`VLLM_METAL_USE_PAGED_ATTENTION=0`, `VLLM_METAL_MEMORY_FRACTION=auto`) |
| 303 | and caps `--max-model-len` to the document's `training.sequence_len` |
| 304 | instead of blindly asking `vllm` for the base model's full context. |
| 305 | |
| 306 | ### 6. Mine preference pairs and retrain |
| 307 | |
| 308 | ```sh |
| 309 | uv run dlm preference mine mydoc.dlm --samples 4 --max-pairs 8 |
| 310 | uv run dlm preference list mydoc.dlm |
| 311 | uv run dlm preference apply mydoc.dlm |
| 312 | uv run dlm train mydoc.dlm --phase preference |
| 313 | |
| 314 | # A/B check against hand-authored pairs only: |
| 315 | uv run dlm train mydoc.dlm --phase preference --no-mined |
| 316 | |
| 317 | # Use a different judge when bootstrap self-judging is not enough: |
| 318 | uv run dlm preference mine mydoc.dlm --judge hf:YourOrg/reward-model --apply |
| 319 | ``` |
| 320 | |
| 321 | ### 7. Scaffold multimodal or audio docs |
| 322 | |
| 323 | ```sh |
| 324 | uv run dlm init diagrams.dlm --multimodal --base qwen2-vl-2b-instruct |
| 325 | uv run dlm train diagrams.dlm |
| 326 | uv run dlm prompt diagrams.dlm --image figures/system.png "What is happening here?" |
| 327 | |
| 328 | uv run dlm init calls.dlm --audio |
| 329 | uv run dlm train calls.dlm |
| 330 | uv run dlm prompt calls.dlm --audio clips/example.wav "Summarize the clip" |
| 331 | ``` |
| 332 | |
| 333 | ### 8. Pull eval failures back into training |
| 334 | |
| 335 | ```sh |
| 336 | uv run dlm harvest mydoc.dlm --sway-json sway-report.json --apply |
| 337 | ``` |
| 338 | |
| 339 | That is the probe-driven loop: evaluation finds a miss, DLM turns it into |
| 340 | document-level training data, and the next train closes the gap. |
| 341 | |
| 342 | ### 9. Inspect store state and reproducibility |
| 343 | |
| 344 | ```sh |
| 345 | uv run dlm doctor |
| 346 | uv run dlm show mydoc.dlm --json |
| 347 | uv run dlm metrics mydoc.dlm --run-id 7 --json |
| 348 | uv run dlm pack mydoc.dlm --include-exports |
| 349 | uv run dlm verify mydoc.dlm.pack |
| 350 | ``` |
| 351 | |
| 352 | ## Command Surface |
| 353 | |
| 354 | The CLI is broader than the original MVP now. A useful mental map: |
| 355 | |
| 356 | | Area | Commands | What they cover | |
| 357 | |---|---|---| |
| 358 | | Author | `init`, `templates`, `show`, `migrate`, `cache` | Create docs, inspect them, migrate schema, manage cache state | |
| 359 | | Train | `train`, `doctor`, `metrics`, `harvest` | Run training, inspect plans, observe runs, pull eval misses back in | |
| 360 | | Align | `preference` | Mine, stage, apply, revert, and inspect auto-mined preference sections | |
| 361 | | Infer | `prompt`, `repl` | Local interactive and one-shot inference | |
| 362 | | Ship | `export`, `pack`, `unpack`, `verify`, `push`, `pull`, `serve` | Export to runtimes, bundle, verify, and move artifacts | |
| 363 | |
| 364 | See the [CLI reference](./docs/cli/reference.md) for the full flag surface. |
| 365 | |
| 366 | ## Documentation |
| 367 | |
| 368 | - [Getting started](./docs/getting-started/install.md) |
| 369 | - [Frontmatter reference](./docs/format/frontmatter.md) |
| 370 | - [Section grammar](./docs/format/sections.md) |
| 371 | - [Preference section reference](./docs/format/preference-section.md) |
| 372 | - [Training across codebases](./docs/cookbook/training-across-codebases.md) |
| 373 | - [Train from a folder](./docs/cookbook/train-from-folder.md) |
| 374 | - [Multi-source training](./docs/cookbook/multi-source-training.md) |
| 375 | - [Tokenized-section cache](./docs/cookbook/directive-cache.md) |
| 376 | - [Multi-adapter composition](./docs/cookbook/multi-adapter.md) |
| 377 | - [Learned adapter gate](./docs/cookbook/learned-adapter-gate.md) |
| 378 | - [Self-improving loop / preference mining](./docs/cookbook/self-improving-loop.md) |
| 379 | - [Reward-model integration](./docs/cookbook/reward-model-integration.md) |
| 380 | - [Multimodal training](./docs/cookbook/multimodal-training.md) |
| 381 | - [Audio training](./docs/cookbook/audio-training.md) |
| 382 | - [Probe-driven training / sway harvest](./docs/cookbook/probe-driven-training.md) |
| 383 | - [Multi-target export](./docs/cookbook/multi-target-export.md) |
| 384 | - [Sharing adapters and packs](./docs/cookbook/sharing.md) |
| 385 | - [CLI reference](./docs/cli/reference.md) |
| 386 | - [Architecture](./docs/architecture.md) |
| 387 | - [Determinism](./docs/determinism.md) |
| 388 | |
| 389 | ## Principles |
| 390 | |
| 391 | 1. **The document is the interface.** |
| 392 | But the document is structured: frontmatter, typed sections, directives, and |
| 393 | store contracts all matter. |
| 394 | 2. **Training is real.** |
| 395 | LoRA / QLoRA / DoRA on pretrained bases, not a toy transformer. |
| 396 | 3. **Retraining should not silently forget.** |
| 397 | Replay-backed accumulation is part of the product. |
| 398 | 4. **Local-first is load-bearing.** |
| 399 | Your training data, adapters, exports, and packs stay on your machine unless |
| 400 | you explicitly move them. |
| 401 | 5. **Determinism is a contract.** |
| 402 | If a change breaks the reproducibility story, that is a product regression. |
| 403 | |
| 404 | ## Tech Stack |
| 405 | |
| 406 | Python 3.11+ · PyTorch · HuggingFace `transformers` / `peft` / `trl` / |
| 407 | `accelerate` / `datasets` · `watchfiles` · `prompt-toolkit` · `safetensors` · |
| 408 | vendored `llama.cpp` for GGUF export · Ollama (optional runtime target) · |
| 409 | Typer · Pydantic · `uv` |
| 410 | |
| 411 | ## Contributing |
| 412 | |
| 413 | See [CONTRIBUTING.md](./CONTRIBUTING.md). Testing conventions live in |
| 414 | [docs-internal/README-testing.md](./docs-internal/README-testing.md). |
| 415 | |
| 416 | ```sh |
| 417 | uv run pre-commit install |
| 418 | ``` |
| 419 | |
| 420 | ## License |
| 421 | |
| 422 | MIT. Base-model licenses are separate and enforced where DLM needs them: |
| 423 | `dlm init`, `dlm train`, `dlm export`, and `dlm pack` all keep the gated-base |
| 424 | acceptance path explicit. |