documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 14465 bytes Raw Blame History

DocumentLanguageModel

.dlm is a trainable local AI document format: typed sections, directives, replay-backed retraining, and export.

DocumentLanguageModel (DLM) is a local-first training, inference, and export toolchain built around authored documents instead of hosted dashboards.

A .dlm can be:

a hand-written training document with prose, instruction, and preference data
a directive-driven entrypoint into a codebase or notes tree
a multi-adapter project with learned routing
a selected multimodal or audio-language document

DLM trains LoRA / QLoRA / DoRA adapters on real pretrained bases, keeps a replay history so retrains do not silently forget, and exports local runtimes such as Ollama, llama-server, vllm, and mlx-serve.

Status: pre-v1.0, but far beyond the original MVP framing. The core author/train/prompt/export/pack/share loop is real, and newer runtime-target work is landing incrementally. Current export targets are ollama, llama-server, vllm, and mlx-serve.

What A `.dlm` Actually Is

A .dlm is not just “a text file with a special extension.”

It is a trainable project surface with:

frontmatter for base-model choice, training config, export defaults, sources, cache policy, and multi-adapter gate settings
typed body sections such as prose, ::instruction::, ::preference::, ::image::, and ::audio::
adapter routing via fences like ::instruction#knowledge::
directive-driven ingestion from files and directories through training.sources
repo-local subtree control through .dlm/training.yaml and .dlm/ignore
a stable dlm_id that binds the document to a local store under ~/.dlm/store/<dlm_id>/

That combination is what makes DLM more like a local AI authoring format than a single prompt file.

Why DLM

Most “personal AI” tooling still pushes you toward one of two bad choices:

upload your data to someone else’s cloud
run an oversized model with weak authoring and retraining ergonomics

DLM sits in the gap:

The document is the interface. You author the thing you care about instead of wiring together a hidden dataset pipeline.
Training is real. LoRA / QLoRA / DoRA on pretrained bases, not a toy from-scratch transformer.
Retraining is additive. Previous document versions flow into a replay corpus so the model does not forget last week’s state by default.
Everything stays local. Training, inference, store state, exports, and packs all live on your machine unless you explicitly push them somewhere.
Determinism is a contract. Locks, pinned versions, and golden checks are first-class design constraints, not “best effort.”

Core Capabilities

Author structured training data in one place. Mix prose, SFT examples, preferences, image sections, and audio sections in one document.
Ingest whole trees, not just one file. training.sources can walk a repo, and subtree-local .dlm/training.yaml / .dlm/ignore let the corpus carry its own curation rules.
Train on modern base families. Text, reasoning-tuned, sparse-MoE, vision-language, and audio-language registry rows ship today, plus hf:org/name escape hatches.
Compose multiple adapters in one document. Named adapters, weighted export mixes, and learned adapter gates let one .dlm separate knowledge, tone, or persona lanes.
Mine preference pairs from a live adapter. dlm preference mine can use sway, HF reward models, or external CLI judges to write auto-mined ::preference:: sections back into the document.
Stay in a local iteration loop. dlm prompt, dlm repl, dlm train --watch, dlm metrics, and dlm doctor are all part of the normal workflow now.
Export beyond the original Ollama-only story. DLM still does explicit Ollama exports with pinned templates, and now also emits llama-server, vllm, and mlx-serve launch artifacts for local runtime targets.
Close the eval loop. dlm harvest can pull failing sway-style probe reports back into the document as new training examples.
Pack and share reproducibly. .dlm.pack, verification, push/pull, and local serve flows are all built around the same store contracts.

Supported Platforms

Tier	Training	Inference / export
NVIDIA CUDA (SM ≥ 8.0)	bf16 + QLoRA 4-bit + FlashAttention	Ollama, GGUF export, `llama-server`, `vllm`
NVIDIA CUDA (SM < 8.0)	fp16 LoRA	Ollama, GGUF export, `llama-server`, `vllm`
Apple Silicon (MPS)	fp16 or fp32 LoRA depending on doctor plan	Ollama, selected MLX inference paths, GGUF export, `vllm` (conservative Metal defaults), `mlx-serve`
CPU	inference-first; training refused above small bases unless forced	GGUF export, Ollama, `llama-server`
AMD ROCm	experimental	ROCm-oriented llama.cpp flows

See docs/hardware and docs/hardware/vl-memory.md for the real support matrix and current caveats.

Install

From the Homebrew tap

brew tap tenseleyFlow/tap
brew install dlm

# Optional, only if you want `--target ollama` registration/smoke:
brew install ollama

brew install dlm pulls in the Python environment and the vendored llama.cpp source tree DLM uses for GGUF conversion. CUDA users unlock QLoRA after install:

$(brew --prefix dlm)/libexec/venv/bin/pip install 'dlm[cuda]'

From source

git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git
cd DocumentLanguageModel
uv sync

# Build GGUF tooling:
scripts/bump-llama-cpp.sh build

# If you want the llama.cpp HTTP target too:
scripts/bump-llama-cpp.sh build --with-server

# If you want the Apple Silicon MLX HTTP target:
uv sync --extra mlx

# If you want the vLLM HTTP target:
# install a compatible vllm runtime separately; DLM writes launch artifacts
# but does not bundle the server runtime itself.

uv run dlm --help

We deliberately do not publish to PyPI yet. See CONTRIBUTING.md for the release flow.

30-Second Start

uv run dlm init tutor.dlm --base smollm2-135m
$EDITOR tutor.dlm
uv run dlm train tutor.dlm
uv run dlm prompt tutor.dlm "What is a Python decorator?"
uv run dlm export tutor.dlm --target ollama --name my-tutor

A minimal .dlm still works:

---
dlm_id: 01KPM5CXB51GRX86Q25AKERN6E
dlm_version: 1
base_model: smollm2-135m
---

# Your document title

Write prose here.

::instruction::
### Q
What is a decorator?

### A
A function that takes a function and returns a wrapped function.

That path is still important. It is just no longer the whole story.

Authoring Beyond The Toy Example

A more representative .dlm can mix directives, named adapters, and export defaults in one place:

---
dlm_id: 01KTESTEXAMPLE000000000000
dlm_version: 1
base_model: qwen3-1.7b
system_prompt: |
  You are a concise engineering assistant.
training:
  adapter: lora
  sequence_len: 4096
  sources_policy: strict
  sources:
    - path: ./src
      include: ["**/*.py", "**/*.md"]
      exclude: ["tests/**", "**/__pycache__/**"]
  adapters:
    knowledge:
      adapter: lora
      lora_r: 8
    tone:
      adapter: lora
      lora_r: 4
  gate:
    enabled: true
export:
  default_quant: Q4_K_M
---

# Project notes

Shared prose trains all declared adapters by default.

::instruction#knowledge::
### Q
What does the cache layer do?

### A
It avoids re-tokenizing unchanged directive-sourced files.

::preference#tone::
### Prompt
Explain a failure mode.

### Chosen
Explain it directly, then give the fix.

### Rejected
Over-explain the background before naming the problem.

Two important upgrades over the older README story:

training.sources can turn a repo or notes tree into synthetic training sections.
training.adapters + training.gate let one document route prompts across multiple adapters instead of pretending one flat adapter is the only mode.

If you need deeper subtree-specific curation, drop .dlm/training.yaml and .dlm/ignore into nested directories and let the corpus carry its own rules.

Common Workflows

1. Hand-authored document

uv run dlm init tutor.dlm --base smollm2-135m
uv run dlm train tutor.dlm
uv run dlm prompt tutor.dlm "Explain decorators"

2. Train across a codebase

uv run dlm train ./my-repo --base qwen3-1.7b --include '**/*.py' --name corpus

That auto-scaffolds a .dlm under ./my-repo/.dlm/ and lets the repo become its own training surface.

3. Multi-adapter composition

uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter knowledge
uv run dlm export mydoc.dlm --adapter-mix knowledge:1.0,tone:0.5

4. Local iteration loop

uv run dlm train mydoc.dlm --watch
uv run dlm repl mydoc.dlm
uv run dlm metrics mydoc.dlm

5. Export and ship

uv run dlm export mydoc.dlm --target ollama --name mydoc
uv run dlm export mydoc.dlm --target llama-server
uv run dlm export mydoc.dlm --target vllm
uv run dlm export mydoc.dlm --target mlx-serve
uv run dlm pack mydoc.dlm --include-exports
uv run dlm verify mydoc.dlm.pack

# Also emit a ready-to-run sway.yaml next to the GGUF for downstream
# evaluation via `sway run` (requires the [sway] extra).
uv run dlm export mydoc.dlm --target ollama --emit-sway-json
uv run sway run <export-dir>/sway.yaml

On Apple Silicon, --target vllm now emits conservative vllm-metal defaults in the launch script: it pins the server to the MLX KV path (VLLM_METAL_USE_PAGED_ATTENTION=0, VLLM_METAL_MEMORY_FRACTION=auto) and caps --max-model-len to the document's training.sequence_len instead of blindly asking vllm for the base model's full context.

6. Mine preference pairs and retrain

uv run dlm preference mine mydoc.dlm --samples 4 --max-pairs 8
uv run dlm preference list mydoc.dlm
uv run dlm preference apply mydoc.dlm
uv run dlm train mydoc.dlm --phase preference

# A/B check against hand-authored pairs only:
uv run dlm train mydoc.dlm --phase preference --no-mined

# Use a different judge when bootstrap self-judging is not enough:
uv run dlm preference mine mydoc.dlm --judge hf:YourOrg/reward-model --apply

7. Scaffold multimodal or audio docs

uv run dlm init diagrams.dlm --multimodal --base qwen2-vl-2b-instruct
uv run dlm train diagrams.dlm
uv run dlm prompt diagrams.dlm --image figures/system.png "What is happening here?"

uv run dlm init calls.dlm --audio
uv run dlm train calls.dlm
uv run dlm prompt calls.dlm --audio clips/example.wav "Summarize the clip"

8. Pull eval failures back into training

uv run dlm harvest mydoc.dlm --sway-json sway-report.json --apply

That is the probe-driven loop: evaluation finds a miss, DLM turns it into document-level training data, and the next train closes the gap.

9. Inspect store state and reproducibility

uv run dlm doctor
uv run dlm show mydoc.dlm --json
uv run dlm metrics mydoc.dlm --run-id 7 --json
uv run dlm pack mydoc.dlm --include-exports
uv run dlm verify mydoc.dlm.pack

Command Surface

The CLI is broader than the original MVP now. A useful mental map:

Area	Commands	What they cover
Author	`init`, `templates`, `show`, `migrate`, `cache`	Create docs, inspect them, migrate schema, manage cache state
Train	`train`, `doctor`, `metrics`, `harvest`	Run training, inspect plans, observe runs, pull eval misses back in
Align	`preference`	Mine, stage, apply, revert, and inspect auto-mined preference sections
Infer	`prompt`, `repl`	Local interactive and one-shot inference
Ship	`export`, `pack`, `unpack`, `verify`, `push`, `pull`, `serve`	Export to runtimes, bundle, verify, and move artifacts

See the CLI reference for the full flag surface.

Documentation

Principles

The document is the interface. But the document is structured: frontmatter, typed sections, directives, and store contracts all matter.
Training is real. LoRA / QLoRA / DoRA on pretrained bases, not a toy transformer.
Retraining should not silently forget. Replay-backed accumulation is part of the product.
Local-first is load-bearing. Your training data, adapters, exports, and packs stay on your machine unless you explicitly move them.
Determinism is a contract. If a change breaks the reproducibility story, that is a product regression.

Tech Stack

Python 3.11+ · PyTorch · HuggingFace transformers / peft / trl / accelerate / datasets · watchfiles · prompt-toolkit · safetensors · vendored llama.cpp for GGUF export · Ollama (optional runtime target) · Typer · Pydantic · uv

Contributing

See CONTRIBUTING.md. Testing conventions live in docs-internal/README-testing.md.

uv run pre-commit install

License

MIT. Base-model licenses are separate and enforced where DLM needs them: dlm init, dlm train, dlm export, and dlm pack all keep the gated-base acceptance path explicit.

View source

  
        1
        # DocumentLanguageModel
      
        2
        
        3
        > `.dlm` is a trainable local AI document format: typed sections, directives,
      
        4
        > replay-backed retraining, and export.
      
        5
        
        6
        DocumentLanguageModel (DLM) is a local-first training, inference, and export
      
        7
        toolchain built around authored documents instead of hosted dashboards.
      
        8
        
        9
        A `.dlm` can be:
      
        10
        
        11
        - a hand-written training document with prose, instruction, and preference data
      
        12
        - a directive-driven entrypoint into a codebase or notes tree
      
        13
        - a multi-adapter project with learned routing
      
        14
        - a selected multimodal or audio-language document
      
        15
        
        16
        DLM trains LoRA / QLoRA / DoRA adapters on real pretrained bases, keeps a replay
      
        17
        history so retrains do not silently forget, and exports local runtimes such as
      
        18
        Ollama, `llama-server`, `vllm`, and `mlx-serve`.
      
        19
        
        20
        **Status:** pre-v1.0, but far beyond the original MVP framing. The core
      
        21
        author/train/prompt/export/pack/share loop is real, and newer runtime-target
      
        22
        work is landing incrementally. Current export targets are `ollama`,
      
        23
        `llama-server`, `vllm`, and `mlx-serve`.
      
        24
        
        25
        ## What A `.dlm` Actually Is
      
        26
        
        27
        A `.dlm` is not just “a text file with a special extension.”
      
        28
        
        29
        It is a trainable project surface with:
      
        30
        
        31
        - **frontmatter** for base-model choice, training config, export defaults,
      
        32
          sources, cache policy, and multi-adapter gate settings
      
        33
        - **typed body sections** such as prose, `::instruction::`,
      
        34
          `::preference::`, `::image::`, and `::audio::`
      
        35
        - **adapter routing** via fences like `::instruction#knowledge::`
      
        36
        - **directive-driven ingestion** from files and directories through
      
        37
          `training.sources`
      
        38
        - **repo-local subtree control** through `.dlm/training.yaml` and `.dlm/ignore`
      
        39
        - a stable **`dlm_id`** that binds the document to a local store under
      
        40
          `~/.dlm/store/<dlm_id>/`
      
        41
        
        42
        That combination is what makes DLM more like a local AI authoring format than a
      
        43
        single prompt file.
      
        44
        
        45
        ## Why DLM
      
        46
        
        47
        Most “personal AI” tooling still pushes you toward one of two bad choices:
      
        48
        
        49
        - upload your data to someone else’s cloud
      
        50
        - run an oversized model with weak authoring and retraining ergonomics
      
        51
        
        52
        DLM sits in the gap:
      
        53
        
        54
        - **The document is the interface.** You author the thing you care about instead
      
        55
          of wiring together a hidden dataset pipeline.
      
        56
        - **Training is real.** LoRA / QLoRA / DoRA on pretrained bases, not a toy
      
        57
          from-scratch transformer.
      
        58
        - **Retraining is additive.** Previous document versions flow into a replay
      
        59
          corpus so the model does not forget last week’s state by default.
      
        60
        - **Everything stays local.** Training, inference, store state, exports, and
      
        61
          packs all live on your machine unless you explicitly push them somewhere.
      
        62
        - **Determinism is a contract.** Locks, pinned versions, and golden checks are
      
        63
          first-class design constraints, not “best effort.”
      
        64
        
        65
        ## Core Capabilities
      
        66
        
        67
        - **Author structured training data in one place.** Mix prose, SFT examples,
      
        68
          preferences, image sections, and audio sections in one document.
      
        69
        - **Ingest whole trees, not just one file.** `training.sources` can walk a
      
        70
          repo, and subtree-local `.dlm/training.yaml` / `.dlm/ignore` let the corpus
      
        71
          carry its own curation rules.
      
        72
        - **Train on modern base families.** Text, reasoning-tuned, sparse-MoE,
      
        73
          vision-language, and audio-language registry rows ship today, plus `hf:org/name`
      
        74
          escape hatches.
      
        75
        - **Compose multiple adapters in one document.** Named adapters, weighted export
      
        76
          mixes, and learned adapter gates let one `.dlm` separate knowledge, tone, or
      
        77
          persona lanes.
      
        78
        - **Mine preference pairs from a live adapter.** `dlm preference mine` can use
      
        79
          `sway`, HF reward models, or external CLI judges to write auto-mined
      
        80
          `::preference::` sections back into the document.
      
        81
        - **Stay in a local iteration loop.** `dlm prompt`, `dlm repl`,
      
        82
          `dlm train --watch`, `dlm metrics`, and `dlm doctor` are all part of the
      
        83
          normal workflow now.
      
        84
        - **Export beyond the original Ollama-only story.** DLM still does explicit
      
        85
          Ollama exports with pinned templates, and now also emits `llama-server`,
      
        86
          `vllm`, and `mlx-serve` launch artifacts for local runtime targets.
      
        87
        - **Close the eval loop.** `dlm harvest` can pull failing `sway`-style probe
      
        88
          reports back into the document as new training examples.
      
        89
        - **Pack and share reproducibly.** `.dlm.pack`, verification, push/pull, and
      
        90
          local serve flows are all built around the same store contracts.
      
        91
        
        92
        ## Supported Platforms
      
        93
        
        94
        | Tier | Training | Inference / export |
      
        95
        |---|---|---|
      
        96
        | NVIDIA CUDA (SM ≥ 8.0) | bf16 + QLoRA 4-bit + FlashAttention | Ollama, GGUF export, `llama-server`, `vllm` |
      
        97
        | NVIDIA CUDA (SM < 8.0) | fp16 LoRA | Ollama, GGUF export, `llama-server`, `vllm` |
      
        98
        | Apple Silicon (MPS) | fp16 or fp32 LoRA depending on doctor plan | Ollama, selected MLX inference paths, GGUF export, `vllm` (conservative Metal defaults), `mlx-serve` |
      
        99
        | CPU | inference-first; training refused above small bases unless forced | GGUF export, Ollama, `llama-server` |
      
        100
        | AMD ROCm | experimental | ROCm-oriented llama.cpp flows |
      
        101
        
        102
        See [docs/hardware](./docs/hardware/memory-estimates.md) and
      
        103
        [docs/hardware/vl-memory.md](./docs/hardware/vl-memory.md) for the real support
      
        104
        matrix and current caveats.
      
        105
        
        106
        ## Install
      
        107
        
        108
        ### From the Homebrew tap
      
        109
        
        110
        ```sh
      
        111
        brew tap tenseleyFlow/tap
      
        112
        brew install dlm
      
        113
        
        114
        # Optional, only if you want `--target ollama` registration/smoke:
      
        115
        brew install ollama
      
        116
        ```
      
        117
        
        118
        `brew install dlm` pulls in the Python environment and the vendored
      
        119
        `llama.cpp` source tree DLM uses for GGUF conversion. CUDA users unlock QLoRA
      
        120
        after install:
      
        121
        
        122
        ```sh
      
        123
        $(brew --prefix dlm)/libexec/venv/bin/pip install 'dlm[cuda]'
      
        124
        ```
      
        125
        
        126
        ### From source
      
        127
        
        128
        ```sh
      
        129
        git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git
      
        130
        cd DocumentLanguageModel
      
        131
        uv sync
      
        132
        
        133
        # Build GGUF tooling:
      
        134
        scripts/bump-llama-cpp.sh build
      
        135
        
        136
        # If you want the llama.cpp HTTP target too:
      
        137
        scripts/bump-llama-cpp.sh build --with-server
      
        138
        
        139
        # If you want the Apple Silicon MLX HTTP target:
      
        140
        uv sync --extra mlx
      
        141
        
        142
        # If you want the vLLM HTTP target:
      
        143
        # install a compatible vllm runtime separately; DLM writes launch artifacts
      
        144
        # but does not bundle the server runtime itself.
      
        145
        
        146
        uv run dlm --help
      
        147
        ```
      
        148
        
        149
        We deliberately do not publish to PyPI yet. See
      
        150
        [CONTRIBUTING.md](./CONTRIBUTING.md) for the release flow.
      
        151
        
        152
        ## 30-Second Start
      
        153
        
        154
        ```sh
      
        155
        uv run dlm init tutor.dlm --base smollm2-135m
      
        156
        $EDITOR tutor.dlm
      
        157
        uv run dlm train tutor.dlm
      
        158
        uv run dlm prompt tutor.dlm "What is a Python decorator?"
      
        159
        uv run dlm export tutor.dlm --target ollama --name my-tutor
      
        160
        ```
      
        161
        
        162
        A minimal `.dlm` still works:
      
        163
        
        164
        ```dlm
      
        165
        ---
      
        166
        dlm_id: 01KPM5CXB51GRX86Q25AKERN6E
      
        167
        dlm_version: 1
      
        168
        base_model: smollm2-135m
      
        169
        ---
      
        170
        
        171
        # Your document title
      
        172
        
        173
        Write prose here.
      
        174
        
        175
        ::instruction::
      
        176
        ### Q
      
        177
        What is a decorator?
      
        178
        
        179
        ### A
      
        180
        A function that takes a function and returns a wrapped function.
      
        181
        ```
      
        182
        
        183
        That path is still important. It is just no longer the whole story.
      
        184
        
        185
        ## Authoring Beyond The Toy Example
      
        186
        
        187
        A more representative `.dlm` can mix directives, named adapters, and export
      
        188
        defaults in one place:
      
        189
        
        190
        ```dlm
      
        191
        ---
      
        192
        dlm_id: 01KTESTEXAMPLE000000000000
      
        193
        dlm_version: 1
      
        194
        base_model: qwen3-1.7b
      
        195
        system_prompt: |
      
        196
          You are a concise engineering assistant.
      
        197
        training:
      
        198
          adapter: lora
      
        199
          sequence_len: 4096
      
        200
          sources_policy: strict
      
        201
          sources:
      
        202
            - path: ./src
      
        203
              include: ["**/*.py", "**/*.md"]
      
        204
              exclude: ["tests/**", "**/__pycache__/**"]
      
        205
          adapters:
      
        206
            knowledge:
      
        207
              adapter: lora
      
        208
              lora_r: 8
      
        209
            tone:
      
        210
              adapter: lora
      
        211
              lora_r: 4
      
        212
          gate:
      
        213
            enabled: true
      
        214
        export:
      
        215
          default_quant: Q4_K_M
      
        216
        ---
      
        217
        
        218
        # Project notes
      
        219
        
        220
        Shared prose trains all declared adapters by default.
      
        221
        
        222
        ::instruction#knowledge::
      
        223
        ### Q
      
        224
        What does the cache layer do?
      
        225
        
        226
        ### A
      
        227
        It avoids re-tokenizing unchanged directive-sourced files.
      
        228
        
        229
        ::preference#tone::
      
        230
        ### Prompt
      
        231
        Explain a failure mode.
      
        232
        
        233
        ### Chosen
      
        234
        Explain it directly, then give the fix.
      
        235
        
        236
        ### Rejected
      
        237
        Over-explain the background before naming the problem.
      
        238
        ```
      
        239
        
        240
        Two important upgrades over the older README story:
      
        241
        
        242
        - `training.sources` can turn a repo or notes tree into synthetic training
      
        243
          sections.
      
        244
        - `training.adapters` + `training.gate` let one document route prompts across
      
        245
          multiple adapters instead of pretending one flat adapter is the only mode.
      
        246
        
        247
        If you need deeper subtree-specific curation, drop `.dlm/training.yaml` and
      
        248
        `.dlm/ignore` into nested directories and let the corpus carry its own rules.
      
        249
        
        250
        ## Common Workflows
      
        251
        
        252
        ### 1. Hand-authored document
      
        253
        
        254
        ```sh
      
        255
        uv run dlm init tutor.dlm --base smollm2-135m
      
        256
        uv run dlm train tutor.dlm
      
        257
        uv run dlm prompt tutor.dlm "Explain decorators"
      
        258
        ```
      
        259
        
        260
        ### 2. Train across a codebase
      
        261
        
        262
        ```sh
      
        263
        uv run dlm train ./my-repo --base qwen3-1.7b --include '**/*.py' --name corpus
      
        264
        ```
      
        265
        
        266
        That auto-scaffolds a `.dlm` under `./my-repo/.dlm/` and lets the repo become
      
        267
        its own training surface.
      
        268
        
        269
        ### 3. Multi-adapter composition
      
        270
        
        271
        ```sh
      
        272
        uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter knowledge
      
        273
        uv run dlm export mydoc.dlm --adapter-mix knowledge:1.0,tone:0.5
      
        274
        ```
      
        275
        
        276
        ### 4. Local iteration loop
      
        277
        
        278
        ```sh
      
        279
        uv run dlm train mydoc.dlm --watch
      
        280
        uv run dlm repl mydoc.dlm
      
        281
        uv run dlm metrics mydoc.dlm
      
        282
        ```
      
        283
        
        284
        ### 5. Export and ship
      
        285
        
        286
        ```sh
      
        287
        uv run dlm export mydoc.dlm --target ollama --name mydoc
      
        288
        uv run dlm export mydoc.dlm --target llama-server
      
        289
        uv run dlm export mydoc.dlm --target vllm
      
        290
        uv run dlm export mydoc.dlm --target mlx-serve
      
        291
        uv run dlm pack mydoc.dlm --include-exports
      
        292
        uv run dlm verify mydoc.dlm.pack
      
        293
        
        294
        # Also emit a ready-to-run sway.yaml next to the GGUF for downstream
      
        295
        # evaluation via `sway run` (requires the [sway] extra).
      
        296
        uv run dlm export mydoc.dlm --target ollama --emit-sway-json
      
        297
        uv run sway run <export-dir>/sway.yaml
      
        298
        ```
      
        299
        
        300
        On Apple Silicon, `--target vllm` now emits conservative `vllm-metal`
      
        301
        defaults in the launch script: it pins the server to the MLX KV path
      
        302
        (`VLLM_METAL_USE_PAGED_ATTENTION=0`, `VLLM_METAL_MEMORY_FRACTION=auto`)
      
        303
        and caps `--max-model-len` to the document's `training.sequence_len`
      
        304
        instead of blindly asking `vllm` for the base model's full context.
      
        305
        
        306
        ### 6. Mine preference pairs and retrain
      
        307
        
        308
        ```sh
      
        309
        uv run dlm preference mine mydoc.dlm --samples 4 --max-pairs 8
      
        310
        uv run dlm preference list mydoc.dlm
      
        311
        uv run dlm preference apply mydoc.dlm
      
        312
        uv run dlm train mydoc.dlm --phase preference
      
        313
        
        314
        # A/B check against hand-authored pairs only:
      
        315
        uv run dlm train mydoc.dlm --phase preference --no-mined
      
        316
        
        317
        # Use a different judge when bootstrap self-judging is not enough:
      
        318
        uv run dlm preference mine mydoc.dlm --judge hf:YourOrg/reward-model --apply
      
        319
        ```
      
        320
        
        321
        ### 7. Scaffold multimodal or audio docs
      
        322
        
        323
        ```sh
      
        324
        uv run dlm init diagrams.dlm --multimodal --base qwen2-vl-2b-instruct
      
        325
        uv run dlm train diagrams.dlm
      
        326
        uv run dlm prompt diagrams.dlm --image figures/system.png "What is happening here?"
      
        327
        
        328
        uv run dlm init calls.dlm --audio
      
        329
        uv run dlm train calls.dlm
      
        330
        uv run dlm prompt calls.dlm --audio clips/example.wav "Summarize the clip"
      
        331
        ```
      
        332
        
        333
        ### 8. Pull eval failures back into training
      
        334
        
        335
        ```sh
      
        336
        uv run dlm harvest mydoc.dlm --sway-json sway-report.json --apply
      
        337
        ```
      
        338
        
        339
        That is the probe-driven loop: evaluation finds a miss, DLM turns it into
      
        340
        document-level training data, and the next train closes the gap.
      
        341
        
        342
        ### 9. Inspect store state and reproducibility
      
        343
        
        344
        ```sh
      
        345
        uv run dlm doctor
      
        346
        uv run dlm show mydoc.dlm --json
      
        347
        uv run dlm metrics mydoc.dlm --run-id 7 --json
      
        348
        uv run dlm pack mydoc.dlm --include-exports
      
        349
        uv run dlm verify mydoc.dlm.pack
      
        350
        ```
      
        351
        
        352
        ## Command Surface
      
        353
        
        354
        The CLI is broader than the original MVP now. A useful mental map:
      
        355
        
        356
        | Area | Commands | What they cover |
      
        357
        |---|---|---|
      
        358
        | Author | `init`, `templates`, `show`, `migrate`, `cache` | Create docs, inspect them, migrate schema, manage cache state |
      
        359
        | Train | `train`, `doctor`, `metrics`, `harvest` | Run training, inspect plans, observe runs, pull eval misses back in |
      
        360
        | Align | `preference` | Mine, stage, apply, revert, and inspect auto-mined preference sections |
      
        361
        | Infer | `prompt`, `repl` | Local interactive and one-shot inference |
      
        362
        | Ship | `export`, `pack`, `unpack`, `verify`, `push`, `pull`, `serve` | Export to runtimes, bundle, verify, and move artifacts |
      
        363
        
        364
        See the [CLI reference](./docs/cli/reference.md) for the full flag surface.
      
        365
        
        366
        ## Documentation
      
        367
        
        368
        - [Getting started](./docs/getting-started/install.md)
      
        369
        - [Frontmatter reference](./docs/format/frontmatter.md)
      
        370
        - [Section grammar](./docs/format/sections.md)
      
        371
        - [Preference section reference](./docs/format/preference-section.md)
      
        372
        - [Training across codebases](./docs/cookbook/training-across-codebases.md)
      
        373
        - [Train from a folder](./docs/cookbook/train-from-folder.md)
      
        374
        - [Multi-source training](./docs/cookbook/multi-source-training.md)
      
        375
        - [Tokenized-section cache](./docs/cookbook/directive-cache.md)
      
        376
        - [Multi-adapter composition](./docs/cookbook/multi-adapter.md)
      
        377
        - [Learned adapter gate](./docs/cookbook/learned-adapter-gate.md)
      
        378
        - [Self-improving loop / preference mining](./docs/cookbook/self-improving-loop.md)
      
        379
        - [Reward-model integration](./docs/cookbook/reward-model-integration.md)
      
        380
        - [Multimodal training](./docs/cookbook/multimodal-training.md)
      
        381
        - [Audio training](./docs/cookbook/audio-training.md)
      
        382
        - [Probe-driven training / sway harvest](./docs/cookbook/probe-driven-training.md)
      
        383
        - [Multi-target export](./docs/cookbook/multi-target-export.md)
      
        384
        - [Sharing adapters and packs](./docs/cookbook/sharing.md)
      
        385
        - [CLI reference](./docs/cli/reference.md)
      
        386
        - [Architecture](./docs/architecture.md)
      
        387
        - [Determinism](./docs/determinism.md)
      
        388
        
        389
        ## Principles
      
        390
        
        391
        1. **The document is the interface.**
      
        392
           But the document is structured: frontmatter, typed sections, directives, and
      
        393
           store contracts all matter.
      
        394
        2. **Training is real.**
      
        395
           LoRA / QLoRA / DoRA on pretrained bases, not a toy transformer.
      
        396
        3. **Retraining should not silently forget.**
      
        397
           Replay-backed accumulation is part of the product.
      
        398
        4. **Local-first is load-bearing.**
      
        399
           Your training data, adapters, exports, and packs stay on your machine unless
      
        400
           you explicitly move them.
      
        401
        5. **Determinism is a contract.**
      
        402
           If a change breaks the reproducibility story, that is a product regression.
      
        403
        
        404
        ## Tech Stack
      
        405
        
        406
        Python 3.11+ · PyTorch · HuggingFace `transformers` / `peft` / `trl` /
      
        407
        `accelerate` / `datasets` · `watchfiles` · `prompt-toolkit` · `safetensors` ·
      
        408
        vendored `llama.cpp` for GGUF export · Ollama (optional runtime target) ·
      
        409
        Typer · Pydantic · `uv`
      
        410
        
        411
        ## Contributing
      
        412
        
        413
        See [CONTRIBUTING.md](./CONTRIBUTING.md). Testing conventions live in
      
        414
        [docs-internal/README-testing.md](./docs-internal/README-testing.md).
      
        415
        
        416
        ```sh
      
        417
        uv run pre-commit install
      
        418
        ```
      
        419
        
        420
        ## License
      
        421
        
        422
        MIT. Base-model licenses are separate and enforced where DLM needs them:
      
        423
        `dlm init`, `dlm train`, `dlm export`, and `dlm pack` all keep the gated-base
      
        424
        acceptance path explicit.

1	# DocumentLanguageModel
2
3	> `.dlm` is a trainable local AI document format: typed sections, directives,
4	> replay-backed retraining, and export.
5
6	DocumentLanguageModel (DLM) is a local-first training, inference, and export
7	toolchain built around authored documents instead of hosted dashboards.
8
9	A `.dlm` can be:
10
11	- a hand-written training document with prose, instruction, and preference data
12	- a directive-driven entrypoint into a codebase or notes tree
13	- a multi-adapter project with learned routing
14	- a selected multimodal or audio-language document
15
16	DLM trains LoRA / QLoRA / DoRA adapters on real pretrained bases, keeps a replay
17	history so retrains do not silently forget, and exports local runtimes such as
18	Ollama, `llama-server`, `vllm`, and `mlx-serve`.
19
20	Status: pre-v1.0, but far beyond the original MVP framing. The core
21	author/train/prompt/export/pack/share loop is real, and newer runtime-target
22	work is landing incrementally. Current export targets are `ollama`,
23	`llama-server`, `vllm`, and `mlx-serve`.
24
25	## What A `.dlm` Actually Is
26
27	A `.dlm` is not just “a text file with a special extension.”
28
29	It is a trainable project surface with:
30
31	- frontmatter for base-model choice, training config, export defaults,
32	sources, cache policy, and multi-adapter gate settings
33	- typed body sections such as prose, `::instruction::`,
34	`::preference::`, `::image::`, and `::audio::`
35	- adapter routing via fences like `::instruction#knowledge::`
36	- directive-driven ingestion from files and directories through
37	`training.sources`
38	- repo-local subtree control through `.dlm/training.yaml` and `.dlm/ignore`
39	- a stable `dlm_id` that binds the document to a local store under
40	`~/.dlm/store/<dlm_id>/`
41
42	That combination is what makes DLM more like a local AI authoring format than a
43	single prompt file.
44
45	## Why DLM
46
47	Most “personal AI” tooling still pushes you toward one of two bad choices:
48
49	- upload your data to someone else’s cloud
50	- run an oversized model with weak authoring and retraining ergonomics
51
52	DLM sits in the gap:
53
54	- The document is the interface. You author the thing you care about instead
55	of wiring together a hidden dataset pipeline.
56	- Training is real. LoRA / QLoRA / DoRA on pretrained bases, not a toy
57	from-scratch transformer.
58	- Retraining is additive. Previous document versions flow into a replay
59	corpus so the model does not forget last week’s state by default.
60	- Everything stays local. Training, inference, store state, exports, and
61	packs all live on your machine unless you explicitly push them somewhere.
62	- Determinism is a contract. Locks, pinned versions, and golden checks are
63	first-class design constraints, not “best effort.”
64
65	## Core Capabilities
66
67	- Author structured training data in one place. Mix prose, SFT examples,
68	preferences, image sections, and audio sections in one document.
69	- Ingest whole trees, not just one file. `training.sources` can walk a
70	repo, and subtree-local `.dlm/training.yaml` / `.dlm/ignore` let the corpus
71	carry its own curation rules.
72	- Train on modern base families. Text, reasoning-tuned, sparse-MoE,
73	vision-language, and audio-language registry rows ship today, plus `hf:org/name`
74	escape hatches.
75	- Compose multiple adapters in one document. Named adapters, weighted export
76	mixes, and learned adapter gates let one `.dlm` separate knowledge, tone, or
77	persona lanes.
78	- Mine preference pairs from a live adapter. `dlm preference mine` can use
79	`sway`, HF reward models, or external CLI judges to write auto-mined
80	`::preference::` sections back into the document.
81	- Stay in a local iteration loop. `dlm prompt`, `dlm repl`,
82	`dlm train --watch`, `dlm metrics`, and `dlm doctor` are all part of the
83	normal workflow now.
84	- Export beyond the original Ollama-only story. DLM still does explicit
85	Ollama exports with pinned templates, and now also emits `llama-server`,
86	`vllm`, and `mlx-serve` launch artifacts for local runtime targets.
87	- Close the eval loop. `dlm harvest` can pull failing `sway`-style probe
88	reports back into the document as new training examples.
89	- Pack and share reproducibly. `.dlm.pack`, verification, push/pull, and
90	local serve flows are all built around the same store contracts.
91
92	## Supported Platforms
93
94	\| Tier \| Training \| Inference / export \|
95	\|---\|---\|---\|
96	\| NVIDIA CUDA (SM ≥ 8.0) \| bf16 + QLoRA 4-bit + FlashAttention \| Ollama, GGUF export, `llama-server`, `vllm` \|
97	\| NVIDIA CUDA (SM < 8.0) \| fp16 LoRA \| Ollama, GGUF export, `llama-server`, `vllm` \|
98	\| Apple Silicon (MPS) \| fp16 or fp32 LoRA depending on doctor plan \| Ollama, selected MLX inference paths, GGUF export, `vllm` (conservative Metal defaults), `mlx-serve` \|
99	\| CPU \| inference-first; training refused above small bases unless forced \| GGUF export, Ollama, `llama-server` \|
100	\| AMD ROCm \| experimental \| ROCm-oriented llama.cpp flows \|
101
102	See [docs/hardware](./docs/hardware/memory-estimates.md) and
103	[docs/hardware/vl-memory.md](./docs/hardware/vl-memory.md) for the real support
104	matrix and current caveats.
105
106	## Install
107
108	### From the Homebrew tap
109
110	```sh
111	brew tap tenseleyFlow/tap
112	brew install dlm
113
114	# Optional, only if you want `--target ollama` registration/smoke:
115	brew install ollama
116	```
117
118	`brew install dlm` pulls in the Python environment and the vendored
119	`llama.cpp` source tree DLM uses for GGUF conversion. CUDA users unlock QLoRA
120	after install:
121
122	```sh
123	$(brew --prefix dlm)/libexec/venv/bin/pip install 'dlm[cuda]'
124	```
125
126	### From source
127
128	```sh
129	git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git
130	cd DocumentLanguageModel
131	uv sync
132
133	# Build GGUF tooling:
134	scripts/bump-llama-cpp.sh build
135
136	# If you want the llama.cpp HTTP target too:
137	scripts/bump-llama-cpp.sh build --with-server
138
139	# If you want the Apple Silicon MLX HTTP target:
140	uv sync --extra mlx
141
142	# If you want the vLLM HTTP target:
143	# install a compatible vllm runtime separately; DLM writes launch artifacts
144	# but does not bundle the server runtime itself.
145
146	uv run dlm --help
147	```
148
149	We deliberately do not publish to PyPI yet. See
150	[CONTRIBUTING.md](./CONTRIBUTING.md) for the release flow.
151
152	## 30-Second Start
153
154	```sh
155	uv run dlm init tutor.dlm --base smollm2-135m
156	$EDITOR tutor.dlm
157	uv run dlm train tutor.dlm
158	uv run dlm prompt tutor.dlm "What is a Python decorator?"
159	uv run dlm export tutor.dlm --target ollama --name my-tutor
160	```
161
162	A minimal `.dlm` still works:
163
164	```dlm
165	---
166	dlm_id: 01KPM5CXB51GRX86Q25AKERN6E
167	dlm_version: 1
168	base_model: smollm2-135m
169	---
170
171	# Your document title
172
173	Write prose here.
174
175	::instruction::
176	### Q
177	What is a decorator?
178
179	### A
180	A function that takes a function and returns a wrapped function.
181	```
182
183	That path is still important. It is just no longer the whole story.
184
185	## Authoring Beyond The Toy Example
186
187	A more representative `.dlm` can mix directives, named adapters, and export
188	defaults in one place:
189
190	```dlm
191	---
192	dlm_id: 01KTESTEXAMPLE000000000000
193	dlm_version: 1
194	base_model: qwen3-1.7b
195	system_prompt: \|
196	You are a concise engineering assistant.
197	training:
198	adapter: lora
199	sequence_len: 4096
200	sources_policy: strict
201	sources:
202	- path: ./src
203	include: ["*/.py", "*/.md"]
204	exclude: ["tests/", "/__pycache__/**"]
205	adapters:
206	knowledge:
207	adapter: lora
208	lora_r: 8
209	tone:
210	adapter: lora
211	lora_r: 4
212	gate:
213	enabled: true
214	export:
215	default_quant: Q4_K_M
216	---
217
218	# Project notes
219
220	Shared prose trains all declared adapters by default.
221
222	::instruction#knowledge::
223	### Q
224	What does the cache layer do?
225
226	### A
227	It avoids re-tokenizing unchanged directive-sourced files.
228
229	::preference#tone::
230	### Prompt
231	Explain a failure mode.
232
233	### Chosen
234	Explain it directly, then give the fix.
235
236	### Rejected
237	Over-explain the background before naming the problem.
238	```
239
240	Two important upgrades over the older README story:
241
242	- `training.sources` can turn a repo or notes tree into synthetic training
243	sections.
244	- `training.adapters` + `training.gate` let one document route prompts across
245	multiple adapters instead of pretending one flat adapter is the only mode.
246
247	If you need deeper subtree-specific curation, drop `.dlm/training.yaml` and
248	`.dlm/ignore` into nested directories and let the corpus carry its own rules.
249
250	## Common Workflows
251
252	### 1. Hand-authored document
253
254	```sh
255	uv run dlm init tutor.dlm --base smollm2-135m
256	uv run dlm train tutor.dlm
257	uv run dlm prompt tutor.dlm "Explain decorators"
258	```
259
260	### 2. Train across a codebase
261
262	```sh
263	uv run dlm train ./my-repo --base qwen3-1.7b --include '*/.py' --name corpus
264	```
265
266	That auto-scaffolds a `.dlm` under `./my-repo/.dlm/` and lets the repo become
267	its own training surface.
268
269	### 3. Multi-adapter composition
270
271	```sh
272	uv run dlm prompt mydoc.dlm "Explain the runbook" --adapter knowledge
273	uv run dlm export mydoc.dlm --adapter-mix knowledge:1.0,tone:0.5
274	```
275
276	### 4. Local iteration loop
277
278	```sh
279	uv run dlm train mydoc.dlm --watch
280	uv run dlm repl mydoc.dlm
281	uv run dlm metrics mydoc.dlm
282	```
283
284	### 5. Export and ship
285
286	```sh
287	uv run dlm export mydoc.dlm --target ollama --name mydoc
288	uv run dlm export mydoc.dlm --target llama-server
289	uv run dlm export mydoc.dlm --target vllm
290	uv run dlm export mydoc.dlm --target mlx-serve
291	uv run dlm pack mydoc.dlm --include-exports
292	uv run dlm verify mydoc.dlm.pack
293
294	# Also emit a ready-to-run sway.yaml next to the GGUF for downstream
295	# evaluation via `sway run` (requires the [sway] extra).
296	uv run dlm export mydoc.dlm --target ollama --emit-sway-json
297	uv run sway run <export-dir>/sway.yaml
298	```
299
300	On Apple Silicon, `--target vllm` now emits conservative `vllm-metal`
301	defaults in the launch script: it pins the server to the MLX KV path
302	(`VLLM_METAL_USE_PAGED_ATTENTION=0`, `VLLM_METAL_MEMORY_FRACTION=auto`)
303	and caps `--max-model-len` to the document's `training.sequence_len`
304	instead of blindly asking `vllm` for the base model's full context.
305
306	### 6. Mine preference pairs and retrain
307
308	```sh
309	uv run dlm preference mine mydoc.dlm --samples 4 --max-pairs 8
310	uv run dlm preference list mydoc.dlm
311	uv run dlm preference apply mydoc.dlm
312	uv run dlm train mydoc.dlm --phase preference
313
314	# A/B check against hand-authored pairs only:
315	uv run dlm train mydoc.dlm --phase preference --no-mined
316
317	# Use a different judge when bootstrap self-judging is not enough:
318	uv run dlm preference mine mydoc.dlm --judge hf:YourOrg/reward-model --apply
319	```
320
321	### 7. Scaffold multimodal or audio docs
322
323	```sh
324	uv run dlm init diagrams.dlm --multimodal --base qwen2-vl-2b-instruct
325	uv run dlm train diagrams.dlm
326	uv run dlm prompt diagrams.dlm --image figures/system.png "What is happening here?"
327
328	uv run dlm init calls.dlm --audio
329	uv run dlm train calls.dlm
330	uv run dlm prompt calls.dlm --audio clips/example.wav "Summarize the clip"
331	```
332
333	### 8. Pull eval failures back into training
334
335	```sh
336	uv run dlm harvest mydoc.dlm --sway-json sway-report.json --apply
337	```
338
339	That is the probe-driven loop: evaluation finds a miss, DLM turns it into
340	document-level training data, and the next train closes the gap.
341
342	### 9. Inspect store state and reproducibility
343
344	```sh
345	uv run dlm doctor
346	uv run dlm show mydoc.dlm --json
347	uv run dlm metrics mydoc.dlm --run-id 7 --json
348	uv run dlm pack mydoc.dlm --include-exports
349	uv run dlm verify mydoc.dlm.pack
350	```
351
352	## Command Surface
353
354	The CLI is broader than the original MVP now. A useful mental map:
355
356	\| Area \| Commands \| What they cover \|
357	\|---\|---\|---\|
358	\| Author \| `init`, `templates`, `show`, `migrate`, `cache` \| Create docs, inspect them, migrate schema, manage cache state \|
359	\| Train \| `train`, `doctor`, `metrics`, `harvest` \| Run training, inspect plans, observe runs, pull eval misses back in \|
360	\| Align \| `preference` \| Mine, stage, apply, revert, and inspect auto-mined preference sections \|
361	\| Infer \| `prompt`, `repl` \| Local interactive and one-shot inference \|
362	\| Ship \| `export`, `pack`, `unpack`, `verify`, `push`, `pull`, `serve` \| Export to runtimes, bundle, verify, and move artifacts \|
363
364	See the [CLI reference](./docs/cli/reference.md) for the full flag surface.
365
366	## Documentation
367
368	- [Getting started](./docs/getting-started/install.md)
369	- [Frontmatter reference](./docs/format/frontmatter.md)
370	- [Section grammar](./docs/format/sections.md)
371	- [Preference section reference](./docs/format/preference-section.md)
372	- [Training across codebases](./docs/cookbook/training-across-codebases.md)
373	- [Train from a folder](./docs/cookbook/train-from-folder.md)
374	- [Multi-source training](./docs/cookbook/multi-source-training.md)
375	- [Tokenized-section cache](./docs/cookbook/directive-cache.md)
376	- [Multi-adapter composition](./docs/cookbook/multi-adapter.md)
377	- [Learned adapter gate](./docs/cookbook/learned-adapter-gate.md)
378	- [Self-improving loop / preference mining](./docs/cookbook/self-improving-loop.md)
379	- [Reward-model integration](./docs/cookbook/reward-model-integration.md)
380	- [Multimodal training](./docs/cookbook/multimodal-training.md)
381	- [Audio training](./docs/cookbook/audio-training.md)
382	- [Probe-driven training / sway harvest](./docs/cookbook/probe-driven-training.md)
383	- [Multi-target export](./docs/cookbook/multi-target-export.md)
384	- [Sharing adapters and packs](./docs/cookbook/sharing.md)
385	- [CLI reference](./docs/cli/reference.md)
386	- [Architecture](./docs/architecture.md)
387	- [Determinism](./docs/determinism.md)
388
389	## Principles
390
391	1. The document is the interface.
392	But the document is structured: frontmatter, typed sections, directives, and
393	store contracts all matter.
394	2. Training is real.
395	LoRA / QLoRA / DoRA on pretrained bases, not a toy transformer.
396	3. Retraining should not silently forget.
397	Replay-backed accumulation is part of the product.
398	4. Local-first is load-bearing.
399	Your training data, adapters, exports, and packs stay on your machine unless
400	you explicitly move them.
401	5. Determinism is a contract.
402	If a change breaks the reproducibility story, that is a product regression.
403
404	## Tech Stack
405
406	Python 3.11+ · PyTorch · HuggingFace `transformers` / `peft` / `trl` /
407	`accelerate` / `datasets` · `watchfiles` · `prompt-toolkit` · `safetensors` ·
408	vendored `llama.cpp` for GGUF export · Ollama (optional runtime target) ·
409	Typer · Pydantic · `uv`
410
411	## Contributing
412
413	See [CONTRIBUTING.md](./CONTRIBUTING.md). Testing conventions live in
414	[docs-internal/README-testing.md](./docs-internal/README-testing.md).
415
416	```sh
417	uv run pre-commit install
418	```
419
420	## License
421
422	MIT. Base-model licenses are separate and enforced where DLM needs them:
423	`dlm init`, `dlm train`, `dlm export`, and `dlm pack` all keep the gated-base
424	acceptance path explicit.