f5fb3f3
Branches trunk
1 Branches 0 Tags
Go to file T
Code
.github ci: coverage gate ≥95% on src/dlm/train/preference
docs docs(format): note .dlm file vs .dlm/ directory namespace (audit-09 m4)
docs-internal docs: add contributor testing guide
scripts build(llama.cpp): scripts/build-llama-cpp-rocm.sh — HIPBLAS cmake wrapper
src docs(cli): strip sprint references from user-facing help text (audit-09 m3)
tests fix(cli): dlm serve refuses untrained .dlm with actionable error (audit-09 M3)
vendor feat(vendor): add llama.cpp submodule pinned at b8816 + refresh pre-tokenizer labels
.editorconfig chore: add editorconfig
.gitignore chore(gitignore): untrack .claude/ + AGENTS.md (editor artifacts)
.gitmodules sway: hide development noise inside sway submodule from outer git status
.pre-commit-config.yaml chore: pre-commit config — ruff + mypy + non-slow pytest (sprint 15)
.python-version chore: scaffold pyproject, pin python 3.11, lock deps
CHANGELOG.md feat(release): Homebrew-tap delivery; drop PyPI; v0.9.0 target
CONTRIBUTING.md ci(release): drop deploy-docs job; docs hosting deferred post-v0.9.0
LICENSE chore: add MIT license
README.md feat(release): Homebrew-tap delivery; drop PyPI; v0.9.0 target
mkdocs.yml docs(cache): directive-cache cookbook + CLI reference section
pyproject.toml deps: observability extra — tensorboard + wandb
sway @ 98ad941 sway: convert in-tree subproject to git submodule pointing at tenseleyFlow/sway
uv.lock deps: observability extra — tensorboard + wandb

DocumentLanguageModel

A text file becomes your personal, locally-trained LLM.

Edit a .dlm file, train a LoRA adapter on it, export to Ollama — all on your machine. No telemetry, no uploads, no cloud. Built on PyTorch

  • HuggingFace with a hardware-aware planner that picks precision, attention, and batching for your box.

Status: pre-1.0 — the Phase 3 CLI surface (init, train, prompt, export, pack, unpack, doctor, show, migrate) is wired end-to-end but hasn't been battle-tested by a human running a full train-export-ollama-run cycle. Ship target is v0.9.0 via the Homebrew tap below; v1.0 waits on a real end-to-end train.

Why

Most "personal AI" tooling either wants your data in their cloud or asks you to run a 70B model you can't afford. DLM sits in the gap: plain-text input, real pretrained bases (SmolLM2 for iteration, Qwen or Llama for production), deterministic retraining, Ollama export.

  • Edit a document, get a model. A .dlm is plain UTF-8 with a YAML frontmatter and section fences (::instruction::, ::preference::, default-prose). Prose trains via continued pretraining; instruction blocks train via SFT; preference blocks via DPO/ORPO (Phase 4).
  • LoRA / QLoRA on a real base. Curated registry of SmolLM2 135M–1.7B, Qwen 2.5 0.5B–3B, Llama-3.2 1B/3B, Phi-3.5-mini. Any HuggingFace model via an hf:org/name escape hatch.
  • Retrain, don't forget. Prior document versions stay in a zstd-compressed replay corpus and get sampled into each training run. Edits are additive by default.
  • Deterministic by contract. Same doc + same hardware tier + pinned versions → bit-identical adapter. dlm.lock records the tuple; --strict-lock upgrades every warn to an error. See the determinism guide.
  • Explicit Ollama export. dlm export emits a base GGUF + adapter GGUF + Modelfile with a pinned Go text/template (no fuzzy matching), then registers it via ollama create.
  • Hardware-aware. dlm doctor probes the GPU, picks precision (bf16 on Ampere+, fp16 on MPS), attention (FlashAttention when available, SDPA otherwise), batching, and gradient checkpointing.

Supported platforms

Tier Training Inference
NVIDIA CUDA (SM ≥ 8.0) bf16 + QLoRA 4-bit + FlashAttention Ollama (GGUF CUDA)
NVIDIA CUDA (SM < 8.0) fp16 LoRA Ollama (GGUF CUDA)
Apple Silicon (MPS) fp16 LoRA Ollama (GGUF Metal)
CPU inference-only by default (training refused above 200M params) Ollama (GGUF CPU)
AMD ROCm experimental (Phase 5) llama.cpp ROCm

Install

brew tap tenseleyFlow/tap
brew install dlm

# Ollama is required for `dlm export` smoke runs:
brew install ollama

brew install dlm pulls in a vendored llama.cpp source tree for GGUF conversion and declares depends_on "llama.cpp" for the compiled llama-quantize / llama-imatrix binaries. On NVIDIA hardware, unlock QLoRA 4-bit after install:

$(brew --prefix dlm)/libexec/venv/bin/pip install 'dlm[cuda]'

From source (contributors)

# Python 3.11+ and uv (https://github.com/astral-sh/uv).
git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git
cd DocumentLanguageModel
uv sync
# One-time: build the vendored llama.cpp binaries for `dlm export`.
scripts/bump-llama-cpp.sh build
uv run dlm --help

We deliberately don't publish to PyPI — too easy to ship unfinished work to a permanent-file-archive with 5 GB of transitive deps. See CONTRIBUTING.md for the release flow.

First run

$ uv run dlm init tutor.dlm --base smollm2-135m
init: wrote tutor.dlm

The scaffold:

---
dlm_id: 01KPM5CXB51GRX86Q25AKERN6E
dlm_version: 1
base_model: smollm2-135m
---

# Your document title

Write prose here. It will train via continued pretraining (CPT) loss.

::instruction::

### Q
Your example question.

### A
Your example answer.

Open tutor.dlm in your editor, replace the placeholder content with real prose + Q/A pairs, then:

$ uv run dlm train tutor.dlm
trained: v0001 (20 steps, seed=42, determinism=best-effort)
adapter: ~/.dlm/store/01KPM5…/adapter/versions/v0001
log:     ~/.dlm/store/01KPM5…/logs/train-000001-…jsonl

$ uv run dlm prompt tutor.dlm "What is a Python decorator?"
A decorator is a function that takes another function…

$ uv run dlm show tutor.dlm
/tmp/dlm-readme-demo/tutor.dlm
  dlm_id:         01KPM5CXB51GRX86Q25AKERN6E
  base_model:     smollm2-135m (revision 12fd25f)
  store:          ~/.dlm/store/01KPM5CXB51GRX86Q25AKERN6E  (537 B)
  adapter:        v0001
  training runs:  1
  exports:        0

$ uv run dlm export tutor.dlm --name my-tutor --quant Q4_K_M
export: base.Q4_K_M.gguf (47 MiB)
export: adapter.gguf (3 MiB)
export: Modelfile written; ollama create my-tutor:latest
export: smoke: "hello" → "Hi! How can I help?"

$ ollama run my-tutor "When should I use functools.wraps?"
Always, inside decorators. …

The cookbook has the walkthrough for five starter scenarios (coding tutor, domain KB, writing partner, personal assistant, changelog).

Commands

Every command has --help for the full flag surface. Global flags (--home, -v, -q, --version) apply to all subcommands.

Command Purpose Key flags
dlm init <path> Scaffold a new .dlm + create the store + record license acceptance. --base, --force, --i-accept-license
dlm train <path> Train / retrain the adapter. Replay-weighted by default. --resume, --fresh, --seed, --max-steps, --strict-lock, --update-lock, --ignore-lock
dlm prompt <path> Inference via HF (bypasses Ollama). Great for --temp 0 determinism checks. --temp, --top-p, --max-tokens, --verbose
dlm export <path> Convert to GGUF, emit Modelfile, register with Ollama, smoke-run. --quant, --merged, --dequantize, --skip-ollama, --no-smoke, --no-imatrix, --draft
dlm pack <path> Bundle a .dlm + store into a portable .dlm.pack. --out, --include-exports, --include-base, --include-logs, --i-am-the-licensee
dlm unpack <pack> Restore a .dlm.pack into the local store. --force, --out
dlm doctor Probe hardware, print the resolved training plan. --json
dlm show <path> Training history + exports + adapter state. --json
dlm migrate <path> Upgrade a .dlm frontmatter to the current schema version. --dry-run, --no-backup

See the CLI reference for every flag + the exit-code policy.

Typical workflows

Iterate on one document. Edit, train, prompt, repeat:

$EDITOR tutor.dlm
uv run dlm train tutor.dlm          # additive retrain
uv run dlm prompt tutor.dlm "…"     # smoke

Ship to Ollama. Export, quant-level choice documented in the cookbook:

uv run dlm export tutor.dlm --quant Q4_K_M --name my-tutor
ollama run my-tutor

Archive or share. One-file bundle:

uv run dlm pack tutor.dlm --out tutor.dlm.pack           # ~100 MB (minimal)
uv run dlm pack tutor.dlm --include-exports --out tutor-full.dlm.pack
# …elsewhere:
uv run dlm unpack tutor-full.dlm.pack

Start fresh. Discard optimizer state + replay corpus:

uv run dlm train tutor.dlm --fresh

Audit reproducibility. Fail on any lock drift:

uv run dlm train tutor.dlm --strict-lock

Documentation

Principles

  1. The document is the interface. Not a config file. Not a framework. Plain text with a special extension.
  2. Training is real. LoRA / QLoRA on a pretrained base, not a toy from-scratch transformer.
  3. Retrain is additive. Replay prior versions; never silently forget.
  4. Local-first, always. Training, inference, and store all live on your disk. No network calls outside of model download.
  5. Deterministic by default. Reproducibility is a contract, not a wish. dlm.lock records the version tuple; drift fails loud.

Tech stack

Python 3.11+ · PyTorch · HuggingFace transformers / peft / trl / accelerate / datasets · safetensors · bitsandbytes (CUDA extra) · vendored llama.cpp for GGUF export · Ollama (user-installed) · Typer · Pydantic · packaging · uv.

Contributing

See CONTRIBUTING.md. Testing conventions live at docs-internal/README-testing.md. Install the pre-commit hooks to match CI:

uv run pre-commit install

License

MIT. Base-model licenses are separate and enforced at dlm init / dlm pack time; Llama-family bases require explicit acceptance (see --i-accept-license).