Code
Use Git or checkout with SVN using the web URL.
No matching headings.
DocumentLanguageModel
A text file becomes your personal, locally-trained LLM.
Edit a .dlm file, train a LoRA adapter on it, export to Ollama — all
on your machine. No telemetry, no uploads, no cloud. Built on PyTorch
- HuggingFace with a hardware-aware planner that picks precision, attention, and batching for your box.
Status: pre-1.0 — the Phase 3 CLI surface (init, train,
prompt, export, pack, unpack, doctor, show, migrate) is
wired end-to-end but hasn't been battle-tested by a human running a
full train-export-ollama-run cycle. Ship target is v0.9.0 via the
Homebrew tap below; v1.0 waits on a real end-to-end train.
Why
Most "personal AI" tooling either wants your data in their cloud or asks you to run a 70B model you can't afford. DLM sits in the gap: plain-text input, real pretrained bases (SmolLM2 for iteration, Qwen or Llama for production), deterministic retraining, Ollama export.
- Edit a document, get a model. A
.dlmis plain UTF-8 with a YAML frontmatter and section fences (::instruction::,::preference::, default-prose). Prose trains via continued pretraining; instruction blocks train via SFT; preference blocks via DPO/ORPO (Phase 4). - LoRA / QLoRA on a real base. Curated registry of SmolLM2 135M–1.7B,
Qwen 2.5 0.5B–3B, Llama-3.2 1B/3B, Phi-3.5-mini. Any HuggingFace
model via an
hf:org/nameescape hatch. - Retrain, don't forget. Prior document versions stay in a zstd-compressed replay corpus and get sampled into each training run. Edits are additive by default.
- Deterministic by contract. Same doc + same hardware tier +
pinned versions → bit-identical adapter.
dlm.lockrecords the tuple;--strict-lockupgrades every warn to an error. See the determinism guide. - Explicit Ollama export.
dlm exportemits a base GGUF + adapter GGUF + Modelfile with a pinned Gotext/template(no fuzzy matching), then registers it viaollama create. - Hardware-aware.
dlm doctorprobes the GPU, picks precision (bf16 on Ampere+, fp16 on MPS), attention (FlashAttention when available, SDPA otherwise), batching, and gradient checkpointing.
Supported platforms
| Tier | Training | Inference |
|---|---|---|
| NVIDIA CUDA (SM ≥ 8.0) | bf16 + QLoRA 4-bit + FlashAttention | Ollama (GGUF CUDA) |
| NVIDIA CUDA (SM < 8.0) | fp16 LoRA | Ollama (GGUF CUDA) |
| Apple Silicon (MPS) | fp16 LoRA | Ollama (GGUF Metal) |
| CPU | inference-only by default (training refused above 200M params) | Ollama (GGUF CPU) |
| AMD ROCm | experimental (Phase 5) | llama.cpp ROCm |
Install
From the Homebrew tap (recommended)
brew tap tenseleyFlow/tap
brew install dlm
# Ollama is required for `dlm export` smoke runs:
brew install ollama
brew install dlm pulls in a vendored llama.cpp source tree for
GGUF conversion and declares depends_on "llama.cpp" for the
compiled llama-quantize / llama-imatrix binaries. On NVIDIA
hardware, unlock QLoRA 4-bit after install:
$(brew --prefix dlm)/libexec/venv/bin/pip install 'dlm[cuda]'
From source (contributors)
# Python 3.11+ and uv (https://github.com/astral-sh/uv).
git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git
cd DocumentLanguageModel
uv sync
# One-time: build the vendored llama.cpp binaries for `dlm export`.
scripts/bump-llama-cpp.sh build
uv run dlm --help
We deliberately don't publish to PyPI — too easy to ship unfinished work to a permanent-file-archive with 5 GB of transitive deps. See CONTRIBUTING.md for the release flow.
First run
$ uv run dlm init tutor.dlm --base smollm2-135m
init: wrote tutor.dlm
The scaffold:
---
dlm_id: 01KPM5CXB51GRX86Q25AKERN6E
dlm_version: 1
base_model: smollm2-135m
---
# Your document title
Write prose here. It will train via continued pretraining (CPT) loss.
::instruction::
### Q
Your example question.
### A
Your example answer.
Open tutor.dlm in your editor, replace the placeholder content with
real prose + Q/A pairs, then:
$ uv run dlm train tutor.dlm
trained: v0001 (20 steps, seed=42, determinism=best-effort)
adapter: ~/.dlm/store/01KPM5…/adapter/versions/v0001
log: ~/.dlm/store/01KPM5…/logs/train-000001-…jsonl
$ uv run dlm prompt tutor.dlm "What is a Python decorator?"
A decorator is a function that takes another function…
$ uv run dlm show tutor.dlm
/tmp/dlm-readme-demo/tutor.dlm
dlm_id: 01KPM5CXB51GRX86Q25AKERN6E
base_model: smollm2-135m (revision 12fd25f)
store: ~/.dlm/store/01KPM5CXB51GRX86Q25AKERN6E (537 B)
adapter: v0001
training runs: 1
exports: 0
$ uv run dlm export tutor.dlm --name my-tutor --quant Q4_K_M
export: base.Q4_K_M.gguf (47 MiB)
export: adapter.gguf (3 MiB)
export: Modelfile written; ollama create my-tutor:latest
export: smoke: "hello" → "Hi! How can I help?"
$ ollama run my-tutor "When should I use functools.wraps?"
Always, inside decorators. …
The cookbook has the walkthrough for five starter scenarios (coding tutor, domain KB, writing partner, personal assistant, changelog).
Commands
Every command has --help for the full flag surface. Global flags
(--home, -v, -q, --version) apply to all subcommands.
| Command | Purpose | Key flags |
|---|---|---|
dlm init <path> |
Scaffold a new .dlm + create the store + record license acceptance. |
--base, --force, --i-accept-license |
dlm train <path> |
Train / retrain the adapter. Replay-weighted by default. | --resume, --fresh, --seed, --max-steps, --strict-lock, --update-lock, --ignore-lock |
dlm prompt <path> |
Inference via HF (bypasses Ollama). Great for --temp 0 determinism checks. |
--temp, --top-p, --max-tokens, --verbose |
dlm export <path> |
Convert to GGUF, emit Modelfile, register with Ollama, smoke-run. | --quant, --merged, --dequantize, --skip-ollama, --no-smoke, --no-imatrix, --draft |
dlm pack <path> |
Bundle a .dlm + store into a portable .dlm.pack. |
--out, --include-exports, --include-base, --include-logs, --i-am-the-licensee |
dlm unpack <pack> |
Restore a .dlm.pack into the local store. |
--force, --out |
dlm doctor |
Probe hardware, print the resolved training plan. | --json |
dlm show <path> |
Training history + exports + adapter state. | --json |
dlm migrate <path> |
Upgrade a .dlm frontmatter to the current schema version. |
--dry-run, --no-backup |
See the CLI reference for every flag + the exit-code policy.
Typical workflows
Iterate on one document. Edit, train, prompt, repeat:
$EDITOR tutor.dlm
uv run dlm train tutor.dlm # additive retrain
uv run dlm prompt tutor.dlm "…" # smoke
Ship to Ollama. Export, quant-level choice documented in the cookbook:
uv run dlm export tutor.dlm --quant Q4_K_M --name my-tutor
ollama run my-tutor
Archive or share. One-file bundle:
uv run dlm pack tutor.dlm --out tutor.dlm.pack # ~100 MB (minimal)
uv run dlm pack tutor.dlm --include-exports --out tutor-full.dlm.pack
# …elsewhere:
uv run dlm unpack tutor-full.dlm.pack
Start fresh. Discard optimizer state + replay corpus:
uv run dlm train tutor.dlm --fresh
Audit reproducibility. Fail on any lock drift:
uv run dlm train tutor.dlm --strict-lock
Documentation
- Getting started — install → first train → first prompt → first export
- The
.dlmformat — frontmatter reference + section grammar - CLI reference — every command, every flag
- Cookbook — 6 end-to-end recipes
- Architecture — module map + storage layout
- contract boundaries
- Determinism — the reproducibility contract, severity table, regen-golden flow
- Troubleshooting — symptom → cause → fix, seeded from the pitfall inventory
Principles
- The document is the interface. Not a config file. Not a framework. Plain text with a special extension.
- Training is real. LoRA / QLoRA on a pretrained base, not a toy from-scratch transformer.
- Retrain is additive. Replay prior versions; never silently forget.
- Local-first, always. Training, inference, and store all live on your disk. No network calls outside of model download.
- Deterministic by default. Reproducibility is a contract, not a
wish.
dlm.lockrecords the version tuple; drift fails loud.
Tech stack
Python 3.11+ · PyTorch · HuggingFace transformers / peft / trl /
accelerate / datasets · safetensors · bitsandbytes (CUDA
extra) · vendored llama.cpp for GGUF export · Ollama (user-installed) ·
Typer · Pydantic · packaging · uv.
Contributing
See CONTRIBUTING.md. Testing conventions live at docs-internal/README-testing.md. Install the pre-commit hooks to match CI:
uv run pre-commit install
License
MIT. Base-model licenses are separate and enforced at dlm init /
dlm pack time; Llama-family bases require explicit acceptance (see
--i-accept-license).