Code
Use Git or checkout with SVN using the web URL.
No matching headings.
DocumentLanguageModel
A text file becomes your personal, locally-trained LLM.
Edit a .dlm file, train a LoRA on it, export to Ollama — all on your machine.
No telemetry, no uploads, no cloud. Built on PyTorch + HuggingFace with a
hardware-aware planner that picks precision, attention, and batching for your
box.
Status: pre-release. The full v1.0 command surface is wired —
init, train, prompt, export, pack, unpack, doctor,
show, migrate. Reproducibility-lock and docs polish are the
remaining Phase 3 work before a tagged release.
What it does
- Edit a document, get a model. A
.dlmis plain UTF-8 text with a YAML frontmatter and section fences (::instruction::,::preference::, default-prose). Prose trains via continued pretraining; instruction blocks train via SFT; preference blocks via DPO/ORPO (coming). - LoRA / QLoRA on a real base. Curated registry of small pretrained bases
(Qwen 2.5 0.5B–3B, Llama-3.2 1B/3B, SmolLM2 135M–1.7B, Phi-3.5-mini). Any
HuggingFace model via an
hf:org/nameescape hatch. - Retrain, don't forget. Prior document versions are stored in a zstd-compressed replay corpus and sampled back into each training run; retrains are additive, not destructive.
- Deterministic by default. Same document + same hardware tier + pinned versions → bit-identical adapter.
- Export to Ollama.
dlm exportproduces a base GGUF + adapter GGUF + Modelfile with an explicit Gotext/template(no fuzzy matching), then registers it locally withollama create. - Hardware-aware.
dlm doctorpicks precision (bf16 on Ampere+, fp16 on MPS), attention (FlashAttention when available, SDPA otherwise), batching, and gradient checkpointing.
Supported platforms
| Tier | Training | Inference |
|---|---|---|
| NVIDIA CUDA (SM ≥ 8.0) | bf16 + QLoRA 4-bit + FlashAttention | Ollama (GGUF CUDA) |
| NVIDIA CUDA (SM < 8.0) | fp16 LoRA | Ollama (GGUF CUDA) |
| Apple Silicon (MPS) | fp16 LoRA | Ollama (GGUF Metal) |
| CPU | inference-only by default (training refused above 200M params) | Ollama (GGUF CPU) |
| AMD ROCm | experimental (later) | llama.cpp ROCm |
Installation
# Requires Python 3.11+ and uv (https://github.com/astral-sh/uv)
git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git
cd DocumentLanguageModel
uv sync
uv run dlm --help
For export: install Ollama separately (minimum version
is pinned in the CLI; dlm doctor reports it).
Quickstart
uv run dlm init mydoc.dlm # scaffold a new .dlm
# edit mydoc.dlm — write prose, add ### Q / ### A pairs, etc.
uv run dlm train mydoc.dlm # train a LoRA
uv run dlm prompt mydoc.dlm "question?" # query the trained adapter
uv run dlm export mydoc.dlm --name mydoc # register with Ollama
ollama run mydoc # use it
dlm pack mydoc.dlm produces a portable .dlm.pack bundle you can
hand off to another machine; dlm unpack installs it on the other end.
dlm show mydoc.dlm prints training history, exports, and adapter
state; dlm doctor reports the resolved hardware plan.
Principles
- The document is the interface. Not a config file. Not a framework. Plain text with a special extension.
- Training is real. LoRA/QLoRA on a pretrained base, not a toy from-scratch transformer.
- Retrain is additive. Replay prior versions; never forget silently.
- Local-first, always. Training, inference, and store all live on your disk. No network calls outside of model download.
- Deterministic by default. Reproducibility is a contract, not a wish.
Tech stack
Python 3.11+ · PyTorch · HuggingFace transformers/peft/trl/accelerate ·
bitsandbytes (CUDA-gated) · llama.cpp (vendored, for GGUF export) · Typer ·
Pydantic · uv.
Contributing
See CONTRIBUTING.md. Testing conventions live at docs-internal/README-testing.md.
License
MIT. Base-model licenses are separate and enforced at dlm init / dlm pack
time; Llama family bases require explicit acceptance.