documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 3351 bytes Raw Blame History

DocumentLanguageModel

.dlm is a trainable local AI document format: typed sections, directives, replay-backed retraining, and export.

DocumentLanguageModel (DLM) is a local-first training and inference toolkit built around authored documents instead of hosted dashboards.

A .dlm can be a hand-authored training doc, a directive-driven entrypoint into a codebase, a multi-adapter project with learned routing, or a selected multimodal / audio-language document. DLM trains LoRA / QLoRA / DoRA adapters on real pretrained bases, keeps replay history, and exports local runtimes such as Ollama, llama-server, vllm, and mlx-serve.

What DLM Ships Today

Structured .dlm authoring with frontmatter plus typed body sections like prose, ::instruction::, ::preference::, ::image::, and ::audio::
Directive-driven corpus building via training.sources, plus nested .dlm/training.yaml / .dlm/ignore for repo-local curation
Modern base-model registry across text, reasoning, sparse-MoE, vision-language, and audio-language rows
Replay-backed retraining so edits accumulate instead of silently wiping prior state
Synthetic data loops through dlm synth instructions and dlm synth preferences
Multi-adapter docs + learned gating for separating knowledge, tone, or persona lanes inside one project
Local iteration UX with prompt, repl, train --watch, metrics, and doctor
Runtime export to ollama, llama-server, vllm, and mlx-serve
Probe-driven improvement through sway-style harvest flows

30-Second Demo

$ uv run dlm init tutor.dlm --base smollm2-135m
$ $EDITOR tutor.dlm
$ uv run dlm train tutor.dlm
$ uv run dlm prompt tutor.dlm "Explain Python decorators"
$ uv run dlm export tutor.dlm --target ollama --name my-tutor

Where To Start

If you want to…	Start here
Install DLM and run the first cycle	Getting started → Install
Understand the `.dlm` file format	Frontmatter and Section grammar
Train across a real repo	Training across codebases
Use named adapters and routing	Multi-adapter and Learned adapter gate
Work with images or audio	Multimodal training and Audio training
Turn prose into instruction data	Synthesize training data and Bootstrap self-improving
Mine preference pairs from a live adapter	Self-improving loop and Reward-model integration
Export or ship a model	Multi-target export, CLI reference, and Determinism
Pull eval failures back into training	Probe-driven training

Status

DLM is pre-v1.0 but substantially broader than the original MVP framing. Core author/train/prompt/export/pack/share flows are in place, and current runtime-target work is extending export beyond the original Ollama-only path.

View source

  
        1
        # DocumentLanguageModel
      
        2
        
        3
        > `.dlm` is a trainable local AI document format: typed sections, directives,
      
        4
        > replay-backed retraining, and export.
      
        5
        
        6
        DocumentLanguageModel (DLM) is a local-first training and inference toolkit
      
        7
        built around authored documents instead of hosted dashboards.
      
        8
        
        9
        A `.dlm` can be a hand-authored training doc, a directive-driven entrypoint
      
        10
        into a codebase, a multi-adapter project with learned routing, or a selected
      
        11
        multimodal / audio-language document. DLM trains LoRA / QLoRA / DoRA adapters
      
        12
        on real pretrained bases, keeps replay history, and exports local runtimes such
      
        13
        as Ollama, `llama-server`, `vllm`, and `mlx-serve`.
      
        14
        
        15
        ## What DLM Ships Today
      
        16
        
        17
        - **Structured `.dlm` authoring** with frontmatter plus typed body sections
      
        18
          like prose, `::instruction::`, `::preference::`, `::image::`, and
      
        19
          `::audio::`
      
        20
        - **Directive-driven corpus building** via `training.sources`, plus nested
      
        21
          `.dlm/training.yaml` / `.dlm/ignore` for repo-local curation
      
        22
        - **Modern base-model registry** across text, reasoning, sparse-MoE,
      
        23
          vision-language, and audio-language rows
      
        24
        - **Replay-backed retraining** so edits accumulate instead of silently wiping
      
        25
          prior state
      
        26
        - **Synthetic data loops** through `dlm synth instructions` and
      
        27
          `dlm synth preferences`
      
        28
        - **Multi-adapter docs + learned gating** for separating knowledge, tone, or
      
        29
          persona lanes inside one project
      
        30
        - **Local iteration UX** with `prompt`, `repl`, `train --watch`, `metrics`,
      
        31
          and `doctor`
      
        32
        - **Runtime export** to `ollama`, `llama-server`, `vllm`, and `mlx-serve`
      
        33
        - **Probe-driven improvement** through `sway`-style harvest flows
      
        34
        
        35
        ## 30-Second Demo
      
        36
        
        37
        ```sh
      
        38
        $ uv run dlm init tutor.dlm --base smollm2-135m
      
        39
        $ $EDITOR tutor.dlm
      
        40
        $ uv run dlm train tutor.dlm
      
        41
        $ uv run dlm prompt tutor.dlm "Explain Python decorators"
      
        42
        $ uv run dlm export tutor.dlm --target ollama --name my-tutor
      
        43
        ```
      
        44
        
        45
        ## Where To Start
      
        46
        
        47
        | If you want to… | Start here |
      
        48
        |---|---|
      
        49
        | Install DLM and run the first cycle | [Getting started → Install](getting-started/install.md) |
      
        50
        | Understand the `.dlm` file format | [Frontmatter](format/frontmatter.md) and [Section grammar](format/sections.md) |
      
        51
        | Train across a real repo | [Training across codebases](cookbook/training-across-codebases.md) |
      
        52
        | Use named adapters and routing | [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) |
      
        53
        | Work with images or audio | [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) |
      
        54
        | Turn prose into instruction data | [Synthesize training data](cookbook/synthesize-training-data.md) and [Bootstrap self-improving](cookbook/bootstrap-self-improving.md) |
      
        55
        | Mine preference pairs from a live adapter | [Self-improving loop](cookbook/self-improving-loop.md) and [Reward-model integration](cookbook/reward-model-integration.md) |
      
        56
        | Export or ship a model | [Multi-target export](cookbook/multi-target-export.md), [CLI reference](cli/reference.md), and [Determinism](determinism.md) |
      
        57
        | Pull eval failures back into training | [Probe-driven training](cookbook/probe-driven-training.md) |
      
        58
        
        59
        ## Status
      
        60
        
        61
        DLM is pre-v1.0 but substantially broader than the original MVP framing.
      
        62
        Core author/train/prompt/export/pack/share flows are in place, and current
      
        63
        runtime-target work is extending export beyond the original Ollama-only path.

1	# DocumentLanguageModel
2
3	> `.dlm` is a trainable local AI document format: typed sections, directives,
4	> replay-backed retraining, and export.
5
6	DocumentLanguageModel (DLM) is a local-first training and inference toolkit
7	built around authored documents instead of hosted dashboards.
8
9	A `.dlm` can be a hand-authored training doc, a directive-driven entrypoint
10	into a codebase, a multi-adapter project with learned routing, or a selected
11	multimodal / audio-language document. DLM trains LoRA / QLoRA / DoRA adapters
12	on real pretrained bases, keeps replay history, and exports local runtimes such
13	as Ollama, `llama-server`, `vllm`, and `mlx-serve`.
14
15	## What DLM Ships Today
16
17	- Structured `.dlm` authoring with frontmatter plus typed body sections
18	like prose, `::instruction::`, `::preference::`, `::image::`, and
19	`::audio::`
20	- Directive-driven corpus building via `training.sources`, plus nested
21	`.dlm/training.yaml` / `.dlm/ignore` for repo-local curation
22	- Modern base-model registry across text, reasoning, sparse-MoE,
23	vision-language, and audio-language rows
24	- Replay-backed retraining so edits accumulate instead of silently wiping
25	prior state
26	- Synthetic data loops through `dlm synth instructions` and
27	`dlm synth preferences`
28	- Multi-adapter docs + learned gating for separating knowledge, tone, or
29	persona lanes inside one project
30	- Local iteration UX with `prompt`, `repl`, `train --watch`, `metrics`,
31	and `doctor`
32	- Runtime export to `ollama`, `llama-server`, `vllm`, and `mlx-serve`
33	- Probe-driven improvement through `sway`-style harvest flows
34
35	## 30-Second Demo
36
37	```sh
38	$ uv run dlm init tutor.dlm --base smollm2-135m
39	$ $EDITOR tutor.dlm
40	$ uv run dlm train tutor.dlm
41	$ uv run dlm prompt tutor.dlm "Explain Python decorators"
42	$ uv run dlm export tutor.dlm --target ollama --name my-tutor
43	```
44
45	## Where To Start
46
47	\| If you want to… \| Start here \|
48	\|---\|---\|
49	\| Install DLM and run the first cycle \| [Getting started → Install](getting-started/install.md) \|
50	\| Understand the `.dlm` file format \| [Frontmatter](format/frontmatter.md) and [Section grammar](format/sections.md) \|
51	\| Train across a real repo \| [Training across codebases](cookbook/training-across-codebases.md) \|
52	\| Use named adapters and routing \| [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) \|
53	\| Work with images or audio \| [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) \|
54	\| Turn prose into instruction data \| [Synthesize training data](cookbook/synthesize-training-data.md) and [Bootstrap self-improving](cookbook/bootstrap-self-improving.md) \|
55	\| Mine preference pairs from a live adapter \| [Self-improving loop](cookbook/self-improving-loop.md) and [Reward-model integration](cookbook/reward-model-integration.md) \|
56	\| Export or ship a model \| [Multi-target export](cookbook/multi-target-export.md), [CLI reference](cli/reference.md), and [Determinism](determinism.md) \|
57	\| Pull eval failures back into training \| [Probe-driven training](cookbook/probe-driven-training.md) \|
58
59	## Status
60
61	DLM is pre-v1.0 but substantially broader than the original MVP framing.
62	Core author/train/prompt/export/pack/share flows are in place, and current
63	runtime-target work is extending export beyond the original Ollama-only path.