Add mlx-serve export target

Status	File	+	-
M	`README.md`	18	8
M	`docs/cli/reference.md`	1	1
A	`docs/cookbook/multi-target-export.md`	175	0
A	`docs/format/export-manifest.md`	95	0
M	`docs/getting-started/first-export.md`	17	0
M	`docs/index.md`	3	3
M	`mkdocs.yml`	2	0
M	`src/dlm/cli/commands.py`	79	1
M	`src/dlm/export/targets/__init__.py`	9	0
A	`src/dlm/export/targets/mlx_serve.py`	272	0
M	`tests/unit/cli/test_export_target_flag.py`	20	0
A	`tests/unit/export/targets/test_mlx_serve_argv.py`	173	0
M	`tests/unit/export/targets/test_registry.py`	3	2

README.mdmodified

  DLM trains LoRA / QLoRA / DoRA adapters on real pretrained bases, keeps a replay
  history so retrains do not silently forget, and exports local runtimes such as
--Ollama and `llama-server`.
++Ollama, `llama-server`, `vllm`, and `mlx-serve`.
  **Status:** pre-v1.0, but far beyond the original MVP framing. The core
  author/train/prompt/export/pack/share loop is real, and newer runtime-target
  work is landing incrementally. Current export targets are `ollama`,
--`llama-server`, and `vllm`.
++`llama-server`, `vllm`, and `mlx-serve`.
  ## What A `.dlm` Actually Is
    `dlm train --watch`, `dlm metrics`, and `dlm doctor` are all part of the
    normal workflow now.
  - **Export beyond the original Ollama-only story.** DLM still does explicit
--  Ollama exports with pinned templates, and now also emits `llama-server`
++  Ollama exports with pinned templates, and now also emits `llama-server`,
--  launch artifacts against the same GGUF path.
++  `vllm`, and `mlx-serve` launch artifacts for local runtime targets.
  - **Close the eval loop.** `dlm harvest` can pull failing `sway`-style probe
    reports back into the document as new training examples.
  - **Pack and share reproducibly.** `.dlm.pack`, verification, push/pull, and
  | Tier | Training | Inference / export |
  |---|---|---|
--| NVIDIA CUDA (SM ≥ 8.0) | bf16 + QLoRA 4-bit + FlashAttention | Ollama, GGUF export, `llama-server` launch artifacts |
++| NVIDIA CUDA (SM ≥ 8.0) | bf16 + QLoRA 4-bit + FlashAttention | Ollama, GGUF export, `llama-server`, `vllm` |
--| NVIDIA CUDA (SM < 8.0) | fp16 LoRA | Ollama, GGUF export, `llama-server` launch artifacts |
++| NVIDIA CUDA (SM < 8.0) | fp16 LoRA | Ollama, GGUF export, `llama-server`, `vllm` |
--| Apple Silicon (MPS) | fp16 or fp32 LoRA depending on doctor plan | Ollama, selected MLX inference paths, GGUF export |
++| Apple Silicon (MPS) | fp16 or fp32 LoRA depending on doctor plan | Ollama, selected MLX inference paths, GGUF export, `vllm` (conservative Metal defaults), `mlx-serve` |
--| CPU | inference-first; training refused above small bases unless forced | GGUF export, Ollama, `llama-server` launch artifacts |
++| CPU | inference-first; training refused above small bases unless forced | GGUF export, Ollama, `llama-server` |
  | AMD ROCm | experimental | ROCm-oriented llama.cpp flows |
  See [docs/hardware](./docs/hardware/memory-estimates.md) and
  # If you want the llama.cpp HTTP target too:
  scripts/bump-llama-cpp.sh build --with-server
++# If you want the Apple Silicon MLX HTTP target:
++uv sync --extra mlx
++
++# If you want the vLLM HTTP target:
++# install a compatible vllm runtime separately; DLM writes launch artifacts
++# but does not bundle the server runtime itself.
++
  uv run dlm --help
  ```
  ```sh
  uv run dlm export mydoc.dlm --target ollama --name mydoc
  uv run dlm export mydoc.dlm --target llama-server --no-smoke
++uv run dlm export mydoc.dlm --target vllm --no-smoke
++uv run dlm export mydoc.dlm --target mlx-serve --no-smoke
  uv run dlm pack mydoc.dlm --include-exports
  uv run dlm verify mydoc.dlm.pack
  ```
  - [Multimodal training](./docs/cookbook/multimodal-training.md)
  - [Audio training](./docs/cookbook/audio-training.md)
  - [Probe-driven training / sway harvest](./docs/cookbook/probe-driven-training.md)
++- [Multi-target export](./docs/cookbook/multi-target-export.md)
  - [CLI reference](./docs/cli/reference.md)
  - [Architecture](./docs/architecture.md)
  - [Determinism](./docs/determinism.md)

docs/cli/reference.mdmodified


 
 | Option | Default | Notes |
 |---|---|---|
+| `--target NAME` | `ollama` | Export destination. Sprint 41 currently supports `ollama`, `llama-server`, `vllm`, and `mlx-serve`. The `llama-server` path writes launch artifacts against the existing GGUF export and uses the shared OpenAI-compatible HTTP smoke harness. The `vllm` path writes `vllm_launch.sh` + `vllm_config.json` against the local adapter layout and ignores GGUF-only flags. On Apple Silicon, the generated `vllm` launch path forces the documented low-risk `vllm-metal` settings (`VLLM_METAL_USE_PAGED_ATTENTION=0`, `VLLM_METAL_MEMORY_FRACTION=auto`) and caps `--max-model-len` to the document's `training.sequence_len`. The `mlx-serve` path is Apple Silicon only, writes `mlx_serve_launch.sh` plus a staged MLX adapter directory, and currently supports text bases only. |
 | `--quant Q` | frontmatter.export.default_quant | `Q4_K_M` / `Q5_K_M` / `Q6_K` / `Q8_0` / `F16`. |
 | `--merged` | false | Merge LoRA into base before quantizing. |
 | `--dequantize` | false | Required with `--merged` on a QLoRA adapter (pitfall #3). |

docs/cookbook/multi-target-export.mdadded

++# Multi-target export
++
++`dlm export` is no longer just an Ollama registration path. The same
++trained store can now emit local runtime artifacts for four targets:
++
++- `ollama` for managed local registration plus the existing Modelfile flow
++- `llama-server` for GGUF-backed OpenAI-compatible HTTP serving via vendored
++  `llama.cpp`
++- `vllm` for HF-snapshot plus LoRA-module serving on machines that can run
++  `vllm`
++- `mlx-serve` for Apple Silicon text serving through `mlx_lm.server`
++
++Use this when you want one training loop but different local runtimes for
++prompting, evaluation harnesses, agents, or deployment experiments.
++
++## Quick map
++
++| Target | Best for | Artifact shape | Smoke path |
++|---|---|---|---|
++| `ollama` | Easiest local chat loop | GGUF + `Modelfile` + local registration | existing Ollama smoke |
++| `llama-server` | GGUF-backed OpenAI-compatible server | `base.<quant>.gguf` + `adapter.gguf` + `chat-template.jinja` + `llama-server_launch.sh` | shared HTTP smoke |
++| `vllm` | HF-snapshot + LoRA serving on supported hosts | `vllm_launch.sh` + `vllm_config.json` + staged adapters | shared HTTP smoke |
++| `mlx-serve` | Apple Silicon text serving without GGUF conversion | `mlx_serve_launch.sh` + staged MLX adapter dir | shared HTTP smoke |
++
++## Prerequisites
++
++### Ollama
++
++```sh
++brew install ollama
++```
++
++### llama-server
++
++```sh
++scripts/bump-llama-cpp.sh build --with-server
++```
++
++That compiles the vendored `llama-server` binary alongside the GGUF tooling.
++
++### vLLM
++
++Install a compatible `vllm` runtime in the environment you plan to launch
++from. DLM writes the launch/config artifacts, but it does not bundle the
++server runtime.
++
++On Apple Silicon, the generated `vllm` launch path is deliberately cautious:
++
++- `VLLM_METAL_USE_PAGED_ATTENTION=0`
++- `VLLM_METAL_MEMORY_FRACTION=auto`
++- `--max-model-len` capped to the document's `training.sequence_len`
++
++Those defaults exist to avoid the Metal OOM / hang pattern that shows up when
++`vllm-metal` blindly asks for the base model's full context window.
++
++### MLX-serve
++
++```sh
++uv sync --extra mlx
++```
++
++`mlx-serve` is Apple Silicon only. DLM refuses it on CUDA, ROCm, and CPU-only
++hosts, and this Sprint 41 slice only supports text bases on that target.
++
++## Common exports
++
++### Ollama
++
++```sh
++uv run dlm export tutor.dlm --target ollama --name my-tutor
++```
++
++This is the classic DLM path: GGUF conversion, explicit Go-template
++`Modelfile`, optional registration, and an Ollama smoke prompt.
++
++### llama-server
++
++```sh
++uv run dlm export tutor.dlm --target llama-server
++bash ~/.dlm/store/<dlm_id>/exports/Q4_K_M/llama-server_launch.sh
++```
++
++This reuses the GGUF export artifacts and adds:
++
++- `chat-template.jinja`
++- `llama-server_launch.sh`
++- `target: "llama-server"` in `export_manifest.json`
++
++The launch script binds `127.0.0.1` and speaks `/v1/chat/completions`.
++
++### vLLM
++
++```sh
++uv run dlm export tutor.dlm --target vllm
++bash ~/.dlm/store/<dlm_id>/exports/vllm/vllm_launch.sh
++```
++
++This path stages local LoRA modules and writes:
++
++- `vllm_launch.sh`
++- `vllm_config.json`
++- `exports/vllm/adapters/...`
++
++Flags that only matter to GGUF or Ollama are ignored with a banner:
++`--quant`, `--merged`, `--dequantize`, `--no-template`, `--skip-ollama`,
++`--no-imatrix`, `--draft`, `--no-draft`.
++
++### MLX-serve
++
++```sh
++uv run dlm export tutor.dlm --target mlx-serve
++bash ~/.dlm/store/<dlm_id>/exports/mlx-serve/mlx_serve_launch.sh
++```
++
++This path stages an MLX-loadable adapter directory and writes:
++
++- `mlx_serve_launch.sh`
++- `exports/mlx-serve/adapter/` or one named adapter directory
++- `target: "mlx-serve"` in `export_manifest.json`
++
++`mlx-serve` also ignores the GGUF/Ollama-only flags above, plus `--name`.
++
++## Multi-adapter behavior
++
++The runtime targets split into two families:
++
++- `ollama` and `llama-server` can reuse the GGUF weighted-merge path for
++  `--adapter-mix`
++- `vllm` and `mlx-serve` work from local adapter directories
++
++For `vllm`:
++
++- single-adapter docs export one staged module
++- multi-adapter docs without `--adapter` export every named adapter as a
++  `--lora-modules` list
++- `--adapter-mix` exports the staged composite adapter instead
++
++For `mlx-serve`:
++
++- single-adapter docs export the current flat adapter
++- multi-adapter docs must choose one adapter with `--adapter`, or pass
++  `--adapter-mix` to export the staged composite adapter
++
++That "one adapter at a time" rule is intentional: this target is a simple
++local-serving path, not a dynamic multi-LoRA router.
++
++## Smoke behavior
++
++All three HTTP targets use the shared OpenAI-compatible smoke harness:
++
++1. reserve a loopback port
++2. launch the target-specific server command
++3. poll `/v1/models`
++4. POST `/v1/chat/completions`
++5. record the first non-empty line in the store manifest
++
++Skip it with `--no-smoke` when the runtime is not installed or you want the
++artifacts only.
++
++## Inspecting what got written
++
++Every export writes `export_manifest.json` under its target directory. The
++important fields are:
++
++- `target`
++- `quant`
++- `artifacts`
++- `adapter_version`
++- `base_model_hf_id`
++- `base_model_revision`
++
++The per-store `manifest.json` also gets an appended `exports[-1]` row with the
++same `target` plus the smoke first line when a smoke test ran.
++
++See [Export manifest](../format/export-manifest.md) for the exact schema.

docs/format/export-manifest.mdadded

++# Export manifest
++
++Every `dlm export` writes an `export_manifest.json` inside the export directory.
++It is the target-local record of what DLM emitted, separate from the broader
++per-store `manifest.json`.
++
++Examples:
++
++- `~/.dlm/store/<dlm_id>/exports/Q4_K_M/export_manifest.json`
++- `~/.dlm/store/<dlm_id>/exports/vllm/export_manifest.json`
++- `~/.dlm/store/<dlm_id>/exports/mlx-serve/export_manifest.json`
++
++## What it records
++
++The manifest captures:
++
++- `target`: which runtime this export was prepared for
++- `quant`: the export family (`Q4_K_M`, `Q8_0`, `hf`, ...)
++- `merged` / `dequantized`: whether LoRA weights were merged into the base
++- `created_at` and `created_by`
++- `llama_cpp_tag` when the target depends on vendored `llama.cpp`
++- `base_model_hf_id` and `base_model_revision`
++- `adapter_version`
++- `artifacts`: every emitted file with relative path, sha256, and size
++
++The schema is strict and round-trips through the Pydantic model in
++`src/dlm/export/manifest.py`.
++
++## Example
++
++```json
++{
++  "target": "llama-server",
++  "quant": "Q4_K_M",
++  "merged": false,
++  "dequantized": false,
++  "ollama_name": null,
++  "created_at": "2026-04-23T18:42:00",
++  "created_by": "dlm-0.1.0",
++  "llama_cpp_tag": "b4281",
++  "base_model_hf_id": "HuggingFaceTB/SmolLM2-135M-Instruct",
++  "base_model_revision": "4c0d2...",
++  "adapter_version": 3,
++  "artifacts": [
++    {
++      "path": "base.Q4_K_M.gguf",
++      "sha256": "…",
++      "size_bytes": 47211904
++    },
++    {
++      "path": "adapter.gguf",
++      "sha256": "…",
++      "size_bytes": 3145728
++    },
++    {
++      "path": "llama-server_launch.sh",
++      "sha256": "…",
++      "size_bytes": 312
++    }
++  ]
++}
++```
++
++## `target`
++
++`target` is now the load-bearing field for Sprint 41’s runtime split.
++
++Current values:
++
++- `ollama`
++- `llama-server`
++- `vllm`
++- `mlx-serve`
++
++That lets downstream tooling distinguish:
++
++- a GGUF + Modelfile export meant for Ollama
++- a GGUF-backed OpenAI-compatible launch artifact set
++- an HF-snapshot + LoRA-module export for `vllm`
++- an MLX adapter export for Apple Silicon serving
++
++## Relationship to the store manifest
++
++`export_manifest.json` is per-export and artifact-focused.
++
++The store-level `manifest.json` keeps the running narrative in `exports[]`:
++
++- when the export happened
++- which `target` it used
++- GGUF checksums when present
++- `ollama_name` when relevant
++- the first smoke output line when a smoke test ran
++
++Use `export_manifest.json` when you need exact artifact provenance for one
++export directory. Use `manifest.json` when you want the store’s full history.

docs/getting-started/first-export.mdmodified

  Modelfile with an explicit Go `text/template` (no fuzzy matching),
  registers the model with `ollama create`, and runs a smoke prompt.
++That is still the default path, but it is no longer the only one. Sprint 41
++also adds local runtime targets such as `llama-server`, `vllm`, and
++`mlx-serve`; see the [multi-target export cookbook](../cookbook/multi-target-export.md)
++once you want an OpenAI-compatible local server instead of an Ollama model.
++
  ## Prerequisites
  - `vendor/llama.cpp` submodule is built:
  Useful on CI runners without the Ollama daemon installed. The GGUFs
  land in `exports/Q4_K_M/`; wire them into your own runtime.
++## Other runtime targets
++
++Once the basic GGUF/Ollama flow is familiar, the same store can export to:
++
++- `--target llama-server` for a vendored `llama.cpp` HTTP server
++- `--target vllm` for HF-snapshot + LoRA-module serving
++- `--target mlx-serve` for Apple Silicon text serving through `mlx_lm.server`
++
++Those targets have different prerequisites and artifact layouts, so they live
++in the [multi-target export cookbook](../cookbook/multi-target-export.md)
++instead of this first-run page.
++
  ## Next
  Want to send the whole training history to a friend? The

docs/index.mdmodified


 into a codebase, a multi-adapter project with learned routing, or a selected
 multimodal / audio-language document. DLM trains LoRA / QLoRA / DoRA adapters
 on real pretrained bases, keeps replay history, and exports local runtimes such
+as Ollama, `llama-server`, `vllm`, and `mlx-serve`.
 
 ## What DLM Ships Today
 

   persona lanes inside one project
 - **Local iteration UX** with `prompt`, `repl`, `train --watch`, `metrics`,
   and `doctor`
+- **Runtime export** to `ollama`, `llama-server`, `vllm`, and `mlx-serve`
 - **Probe-driven improvement** through `sway`-style harvest flows
 
 ## 30-Second Demo

 | Train across a real repo | [Training across codebases](cookbook/training-across-codebases.md) |
 | Use named adapters and routing | [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) |
 | Work with images or audio | [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) |
+| Export or ship a model | [Multi-target export](cookbook/multi-target-export.md), [CLI reference](cli/reference.md), and [Determinism](determinism.md) |
 | Pull eval failures back into training | [Probe-driven training](cookbook/probe-driven-training.md) |
 
 ## Status

mkdocs.ymlmodified

    - The .dlm format:
        - Frontmatter: format/frontmatter.md
        - Sections: format/sections.md
++      - Export manifest: format/export-manifest.md
        - .dlm/training.yaml: format/dlm-training-yaml.md
        - .dlm/ignore: format/dlm-ignore.md
    - CLI reference: cli/reference.md
        - Template gallery: cookbook/template-gallery.md
        - Sharing adapters: cookbook/sharing.md
        - Multi-source training: cookbook/multi-source-training.md
++      - Multi-target export: cookbook/multi-target-export.md
        - Train from a folder: cookbook/train-from-folder.md
        - Training across codebases: cookbook/training-across-codebases.md
        - Tokenized-section cache: cookbook/directive-cache.md

src/dlm/cli/commands.pymodified

          str,
          typer.Option(
              "--target",
--            help="Export destination. Currently supported: ollama, llama-server, vllm.",
++            help="Export destination. Currently supported: ollama, llama-server, vllm, mlx-serve.",
          ),
      ] = "ollama",
      quant: Annotated[
+     )
      from dlm.export.quantize import run_checked
      from dlm.export.targets import (
++        finalize_mlx_serve_export,
          finalize_vllm_export,
          prepare_llama_server_export,
++        prepare_mlx_serve_export,
          prepare_vllm_export,
          resolve_target,
+     )
              "documents yet; this Sprint 41 slice only supports text bases."
+         )
          raise typer.Exit(code=2)
++    if resolved_target.name == "mlx-serve" and export_dispatch.accepts_audio:
++        console.print(
++            "[red]export:[/red] --target mlx-serve is not wired for audio-language "
++            "documents yet; this Sprint 41 slice only supports text bases."
++        )
++        raise typer.Exit(code=2)
      if export_dispatch.accepts_audio:
          try:
              dispatch_result = export_dispatch.dispatch_export(
              "documents yet; this Sprint 41 slice only supports text bases."
+         )
          raise typer.Exit(code=2)
++    if resolved_target.name == "mlx-serve" and export_dispatch.accepts_images:
++        console.print(
++            "[red]export:[/red] --target mlx-serve is not wired for vision-language "
++            "documents yet; this Sprint 41 slice only supports text bases."
++        )
++        raise typer.Exit(code=2)
      if export_dispatch.accepts_images:
          gguf_emission_context = None
          try:
              console.print(f"smoke:   {vllm_smoke.detail}")
          return
++    if resolved_target.name == "mlx-serve":
++        mlx_ignored_flags: list[str] = []
++        if quant is not None:
++            mlx_ignored_flags.append("--quant")
++        if merged:
++            mlx_ignored_flags.append("--merged")
++        if dequantize:
++            mlx_ignored_flags.append("--dequantize")
++        if name is not None:
++            mlx_ignored_flags.append("--name")
++        if no_template:
++            mlx_ignored_flags.append("--no-template")
++        if skip_ollama:
++            mlx_ignored_flags.append("--skip-ollama")
++        if no_imatrix:
++            mlx_ignored_flags.append("--no-imatrix")
++        if draft is not None:
++            mlx_ignored_flags.append("--draft")
++        if no_draft:
++            mlx_ignored_flags.append("--no-draft")
++        if mlx_ignored_flags:
++            console.print(
++                "[yellow]export:[/yellow] ignoring flags not applicable to "
++                f"`--target mlx-serve`: {', '.join(mlx_ignored_flags)}"
++            )
++
++        declared_adapter_names = tuple(adapters_declared.keys()) if adapters_declared else None
++        try:
++            mlx_serve_result = prepare_mlx_serve_export(
++                store=store,
++                spec=spec,
++                adapter_name=adapter,
++                adapter_path_override=adapter_path_override,
++                declared_adapter_names=declared_adapter_names,
++            )
++        except ExportError as exc:
++            console.print(f"[red]export:[/red] {exc}")
++            raise typer.Exit(code=1) from exc
++
++        mlx_serve_smoke = None if no_smoke else resolved_target.smoke_test(mlx_serve_result)
++        if mlx_serve_smoke is not None and not mlx_serve_smoke.ok:
++            console.print(
++                f"[red]smoke:[/red] {mlx_serve_smoke.detail}\n"
++                "  re-run with `--no-smoke` to skip the smoke test."
++            )
++            raise typer.Exit(code=1)
++
++        manifest_path = finalize_mlx_serve_export(
++            store=store,
++            spec=spec,
++            prepared=mlx_serve_result,
++            smoke_output_first_line=None if mlx_serve_smoke is None else mlx_serve_smoke.detail,
++            adapter_name=adapter,
++            adapter_mix=mix_entries,
++        )
++        console.print(f"[green]exported:[/green] {mlx_serve_result.export_dir}")
++        console.print("target:  mlx-serve")
++        assert mlx_serve_result.launch_script_path is not None
++        console.print(f"launch:  {mlx_serve_result.launch_script_path.name}")
++        console.print(f"manifest: {manifest_path.name}")
++        if mlx_serve_smoke is not None and mlx_serve_smoke.detail:
++            console.print(f"smoke:   {mlx_serve_smoke.detail}")
++        return
++
      try:
          result = run_export(
              store,

src/dlm/export/targets/__init__.pymodified

  from dlm.export.errors import UnknownExportTargetError
  from dlm.export.targets.base import ExportTarget, SmokeResult, TargetResult
  from dlm.export.targets.llama_server import LLAMA_SERVER_TARGET, prepare_llama_server_export
++from dlm.export.targets.mlx_serve import (
++    MLX_SERVE_TARGET,
++    finalize_mlx_serve_export,
++    prepare_mlx_serve_export,
++)
  from dlm.export.targets.ollama import OLLAMA_TARGET
  from dlm.export.targets.vllm import VLLM_TARGET, finalize_vllm_export, prepare_vllm_export
      OLLAMA_TARGET.name: OLLAMA_TARGET,
      LLAMA_SERVER_TARGET.name: LLAMA_SERVER_TARGET,
      VLLM_TARGET.name: VLLM_TARGET,
++    MLX_SERVE_TARGET.name: MLX_SERVE_TARGET,
  }
  __all__ = [
      "ExportTarget",
      "LLAMA_SERVER_TARGET",
++    "MLX_SERVE_TARGET",
      "SmokeResult",
      "TARGETS",
      "TargetResult",
      "VLLM_TARGET",
      "available_targets",
++    "finalize_mlx_serve_export",
      "finalize_vllm_export",
++    "prepare_mlx_serve_export",
      "prepare_llama_server_export",
      "prepare_vllm_export",
      "resolve_target",

src/dlm/export/targets/mlx_serve.pyadded

++"""MLX HTTP server target helpers."""
++
++from __future__ import annotations
++
++import shlex
++import shutil
++from pathlib import Path
++
++from dlm.base_models import BaseModelSpec
++from dlm.export.errors import ExportError, TargetSmokeError
++from dlm.export.manifest import ExportManifest, build_artifact, save_export_manifest, utc_now
++from dlm.export.record import append_export_summary
++from dlm.export.smoke import smoke_openai_compat_server
++from dlm.export.targets.base import ExportTarget, SmokeResult, TargetResult
++from dlm.inference.backends.mlx_backend import stage_mlx_adapter_dir
++from dlm.inference.backends.select import is_apple_silicon, mlx_available
++from dlm.io.atomic import write_text
++from dlm.store.paths import StorePath
++
++MLX_SERVE_EXPORT_SUBDIR = "mlx-serve"
++LAUNCH_SCRIPT_FILENAME = "mlx_serve_launch.sh"
++_HF_QUANT = "hf"
++_DEFAULT_ADAPTER_DIRNAME = "adapter"
++_MIXED_ADAPTER_DIRNAME = "mixed"
++
++
++class MlxServeTarget:
++    """Registered export target for MLX HTTP server launch artifacts."""
++
++    name = "mlx-serve"
++
++    def prepare(self, ctx: object) -> TargetResult:
++        raise NotImplementedError("mlx-serve exports are prepared via prepare_mlx_serve_export()")
++
++    def launch_command(self, prepared: TargetResult) -> list[str]:
++        return _build_command(prepared, use_script_dir=True)
++
++    def smoke_test(self, prepared: TargetResult) -> SmokeResult:
++        try:
++            first_line = smoke_openai_compat_server(_build_command(prepared, use_script_dir=False))
++        except (OSError, TargetSmokeError, ExportError) as exc:
++            return SmokeResult(attempted=True, ok=False, detail=str(exc))
++        return SmokeResult(attempted=True, ok=True, detail=first_line)
++
++
++def prepare_mlx_serve_export(
++    *,
++    store: StorePath,
++    spec: BaseModelSpec,
++    adapter_name: str | None,
++    adapter_path_override: Path | None,
++    declared_adapter_names: tuple[str, ...] | None,
++) -> TargetResult:
++    """Stage an MLX-loadable adapter dir plus launch script."""
++
++    _require_mlx_runtime()
++    source_adapter_dir, staged_dirname, adapter_version = _resolve_source_adapter(
++        store=store,
++        adapter_name=adapter_name,
++        adapter_path_override=adapter_path_override,
++        declared_adapter_names=declared_adapter_names,
++    )
++
++    export_dir = store.exports / MLX_SERVE_EXPORT_SUBDIR
++    export_dir.mkdir(parents=True, exist_ok=True)
++
++    staged_adapter_dir = export_dir / staged_dirname
++    if staged_adapter_dir.exists():
++        shutil.rmtree(staged_adapter_dir)
++    stage_mlx_adapter_dir(source_adapter_dir, staged_adapter_dir, base_hf_id=spec.hf_id)
++
++    launch_script_path = export_dir / LAUNCH_SCRIPT_FILENAME
++    draft = TargetResult(
++        name=MLX_SERVE_TARGET.name,
++        export_dir=export_dir,
++        manifest_path=export_dir / "export_manifest.json",
++        artifacts=(),
++        launch_script_path=launch_script_path,
++        extras={
++            "model": spec.hf_id,
++            "adapter_dir": staged_adapter_dir,
++            "adapter_version": adapter_version,
++        },
++    )
++    write_text(launch_script_path, _render_launch_script(MLX_SERVE_TARGET.launch_command(draft)))
++    launch_script_path.chmod(0o755)
++    return TargetResult(
++        name=draft.name,
++        export_dir=draft.export_dir,
++        manifest_path=draft.manifest_path,
++        artifacts=tuple(_artifact_paths(export_dir)),
++        launch_script_path=draft.launch_script_path,
++        config_path=None,
++        extras=draft.extras,
++    )
++
++
++def finalize_mlx_serve_export(
++    *,
++    store: StorePath,
++    spec: BaseModelSpec,
++    prepared: TargetResult,
++    smoke_output_first_line: str | None,
++    adapter_name: str | None,
++    adapter_mix: list[tuple[str, float]] | None,
++) -> Path:
++    """Write export_manifest.json and append the store export summary."""
++
++    from dlm import __version__ as dlm_version
++
++    artifacts = [
++        build_artifact(prepared.export_dir, path) for path in _artifact_paths(prepared.export_dir)
++    ]
++    adapter_version = _require_prepared_int(prepared, "adapter_version")
++    manifest = ExportManifest(
++        target=MLX_SERVE_TARGET.name,
++        quant=_HF_QUANT,
++        merged=False,
++        dequantized=False,
++        ollama_name=None,
++        created_at=utc_now(),
++        created_by=f"dlm-{dlm_version}",
++        llama_cpp_tag=None,
++        base_model_hf_id=spec.hf_id,
++        base_model_revision=spec.revision,
++        adapter_version=adapter_version,
++        artifacts=artifacts,
++    )
++    manifest_path = save_export_manifest(prepared.export_dir, manifest)
++    append_export_summary(
++        store=store,
++        quant=_HF_QUANT,
++        merged=False,
++        target=MLX_SERVE_TARGET.name,
++        llama_cpp_tag=None,
++        artifacts=artifacts,
++        ollama_name=None,
++        ollama_version_str=None,
++        smoke_first_line=smoke_output_first_line,
++        adapter_name=adapter_name,
++        adapter_mix=adapter_mix,
++    )
++    return manifest_path
++
++
++def _resolve_source_adapter(
++    *,
++    store: StorePath,
++    adapter_name: str | None,
++    adapter_path_override: Path | None,
++    declared_adapter_names: tuple[str, ...] | None,
++) -> tuple[Path, str, int]:
++    if adapter_path_override is not None:
++        if not adapter_path_override.exists():
++            raise ExportError(f"adapter_path_override {adapter_path_override} does not exist")
++        return (
++            adapter_path_override,
++            _MIXED_ADAPTER_DIRNAME,
++            _version_from_dir_name(adapter_path_override),
++        )
++
++    if declared_adapter_names and adapter_name is None:
++        raise ExportError(
++            "mlx-serve exports one adapter at a time; pass `--adapter <name>` "
++            "or `--adapter-mix` for multi-adapter documents."
++        )
++
++    if adapter_name is not None:
++        path = store.resolve_current_adapter_for(adapter_name)
++        pointer = store.adapter_current_pointer_for(adapter_name)
++        if path is None or not path.exists():
++            raise ExportError(
++                f"no current adapter under {pointer}; run `dlm train` before exporting."
++            )
++        return path, adapter_name, _version_from_dir_name(path)
++
++    path = store.resolve_current_adapter()
++    pointer = store.adapter_current_pointer
++    if path is None or not path.exists():
++        raise ExportError(f"no current adapter under {pointer}; run `dlm train` before exporting.")
++    return path, _DEFAULT_ADAPTER_DIRNAME, _version_from_dir_name(path)
++
++
++def _require_mlx_runtime() -> None:
++    if not is_apple_silicon():
++        raise ExportError(
++            "mlx-serve export requires Apple Silicon (darwin-arm64); "
++            "this target is not available on CUDA, ROCm, or CPU-only hosts."
++        )
++    if not mlx_available():
++        raise ExportError(
++            "mlx-serve export requires the mlx extra to be installed; "
++            "run `uv sync --extra mlx` and re-try."
++        )
++
++
++def _artifact_paths(export_dir: Path) -> list[Path]:
++    artifacts: list[Path] = []
++    for path in sorted(export_dir.rglob("*")):
++        if path.is_file() and path.name != "export_manifest.json":
++            artifacts.append(path)
++    return artifacts
++
++
++def _build_command(prepared: TargetResult, *, use_script_dir: bool) -> list[str]:
++    model = _require_prepared_str(prepared, "model")
++    adapter_dir = _require_prepared_path(prepared, "adapter_dir")
++    return [
++        "python",
++        "-m",
++        "mlx_lm.server",
++        "--model",
++        model,
++        "--adapter-path",
++        _script_dir_arg(adapter_dir) if use_script_dir else str(adapter_dir),
++        "--host",
++        "127.0.0.1",
++        "--port",
++        "8000",
++    ]
++
++
++def _script_dir_arg(path: Path) -> str:
++    return f"$SCRIPT_DIR/{path.name}"
++
++
++def _render_launch_script(command: list[str]) -> str:
++    rendered = " ".join(_quote_script_arg(arg) for arg in command)
++    return (
++        "#!/usr/bin/env bash\n"
++        "set -euo pipefail\n"
++        'SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"\n'
++        f'exec {rendered} "$@"\n'
++    )
++
++
++def _quote_script_arg(arg: str) -> str:
++    if arg.startswith("$SCRIPT_DIR/"):
++        return f'"{arg}"'
++    return shlex.quote(arg)
++
++
++def _version_from_dir_name(path: Path) -> int:
++    stem = path.name
++    if not stem.startswith("v") or not stem[1:].isdigit():
++        return 1
++    return int(stem[1:])
++
++
++def _require_prepared_str(prepared: TargetResult, key: str) -> str:
++    value = prepared.extras.get(key)
++    if not isinstance(value, str) or not value:
++        raise ExportError(f"mlx-serve prepared target missing string extra {key!r}")
++    return value
++
++
++def _require_prepared_path(prepared: TargetResult, key: str) -> Path:
++    value = prepared.extras.get(key)
++    if not isinstance(value, Path):
++        raise ExportError(f"mlx-serve prepared target missing Path extra {key!r}")
++    return value
++
++
++def _require_prepared_int(prepared: TargetResult, key: str) -> int:
++    value = prepared.extras.get(key)
++    if not isinstance(value, int):
++        raise ExportError(f"mlx-serve prepared target missing int extra {key!r}")
++    return value
++
++
++MLX_SERVE_TARGET = MlxServeTarget()
++assert isinstance(MLX_SERVE_TARGET, ExportTarget)

tests/unit/cli/test_export_target_flag.pymodified

          assert "ollama" in text
          assert "llama-server" in text
          assert "vllm" in text
++        assert "mlx-serve" in text
      def test_ollama_target_reaches_existing_mutex_validation(self, tmp_path: Path) -> None:
          doc = _scaffold_doc(tmp_path)
+         )
          assert result.exit_code == 2
          assert "mutually exclusive" in _joined(result)
++
++    def test_mlx_serve_target_reaches_existing_mutex_validation(self, tmp_path: Path) -> None:
++        runner = CliRunner()
++        result = runner.invoke(
++            app,
++            [
++                "--home",
++                str(tmp_path / "home"),
++                "export",
++                str(tmp_path / "ghost.dlm"),
++                "--target",
++                "mlx-serve",
++                "--draft",
++                "qwen2.5:0.5b",
++                "--no-draft",
++            ],
++        )
++        assert result.exit_code == 2
++        assert "mutually exclusive" in _joined(result)

tests/unit/export/targets/test_mlx_serve_argv.pyadded

++"""MLX serve launch artifact generation."""
++
++from __future__ import annotations
++
++from pathlib import Path
++
++import pytest
++
++from dlm.base_models import BASE_MODELS
++from dlm.export.errors import ExportError
++from dlm.export.manifest import load_export_manifest
++from dlm.export.targets.mlx_serve import (
++    LAUNCH_SCRIPT_FILENAME,
++    MLX_SERVE_TARGET,
++    finalize_mlx_serve_export,
++    prepare_mlx_serve_export,
++)
++from dlm.store.manifest import Manifest, load_manifest, save_manifest
++from dlm.store.paths import for_dlm
++
++_SPEC = BASE_MODELS["smollm2-135m"]
++
++
++def _write_adapter(path: Path) -> None:
++    path.mkdir(parents=True)
++    (path / "adapter_config.json").write_text("{}", encoding="utf-8")
++    (path / "adapter_model.safetensors").write_bytes(b"adapter")
++
++
++def _fake_stage_mlx(src: Path, dst: Path, *, base_hf_id: str) -> Path:
++    assert src.exists()
++    assert base_hf_id == _SPEC.hf_id
++    dst.mkdir(parents=True, exist_ok=True)
++    (dst / "adapter_config.json").write_text("{}", encoding="utf-8")
++    (dst / "adapters.safetensors").write_bytes(b"mlx-adapter")
++    return dst
++
++
++def _setup_flat_store(tmp_path: Path) -> object:
++    store = for_dlm("01MLXTEST", home=tmp_path)
++    store.ensure_layout()
++    save_manifest(store.manifest, Manifest(dlm_id="01MLXTEST", base_model=_SPEC.key))
++    adapter = store.adapter_version(3)
++    _write_adapter(adapter)
++    store.set_current_adapter(adapter)
++    return store
++
++
++def _setup_named_store(tmp_path: Path) -> object:
++    store = for_dlm("01MLXMULTI", home=tmp_path)
++    store.ensure_layout()
++    save_manifest(store.manifest, Manifest(dlm_id="01MLXMULTI", base_model=_SPEC.key))
++    knowledge = store.adapter_version_for("knowledge", 2)
++    tone = store.adapter_version_for("tone", 4)
++    _write_adapter(knowledge)
++    _write_adapter(tone)
++    store.set_current_adapter_for("knowledge", knowledge)
++    store.set_current_adapter_for("tone", tone)
++    return store
++
++
++class TestPrepareMlxServeExport:
++    def test_prepare_writes_launch_script_and_manifest(
++        self, tmp_path: Path, monkeypatch: object
++    ) -> None:
++        store = _setup_flat_store(tmp_path)
++        monkeypatch.setattr("dlm.export.targets.mlx_serve.is_apple_silicon", lambda: True)
++        monkeypatch.setattr("dlm.export.targets.mlx_serve.mlx_available", lambda: True)
++        monkeypatch.setattr("dlm.export.targets.mlx_serve.stage_mlx_adapter_dir", _fake_stage_mlx)
++
++        prepared = prepare_mlx_serve_export(
++            store=store,
++            spec=_SPEC,
++            adapter_name=None,
++            adapter_path_override=None,
++            declared_adapter_names=None,
++        )
++        manifest_path = finalize_mlx_serve_export(
++            store=store,
++            spec=_SPEC,
++            prepared=prepared,
++            smoke_output_first_line="hello from mlx",
++            adapter_name=None,
++            adapter_mix=None,
++        )
++
++        assert prepared.launch_script_path is not None
++        assert prepared.launch_script_path.name == LAUNCH_SCRIPT_FILENAME
++        script = prepared.launch_script_path.read_text(encoding="utf-8")
++        assert script.startswith("#!/usr/bin/env bash\nset -euo pipefail\n")
++        assert "python -m mlx_lm.server" in script
++        assert f"--model {_SPEC.hf_id}" in script
++        assert '--adapter-path "$SCRIPT_DIR/adapter"' in script
++
++        export_manifest = load_export_manifest(prepared.export_dir)
++        assert manifest_path == prepared.manifest_path
++        assert export_manifest.target == "mlx-serve"
++        assert export_manifest.quant == "hf"
++        assert export_manifest.adapter_version == 3
++        assert any(artifact.path == "mlx_serve_launch.sh" for artifact in export_manifest.artifacts)
++        assert any(
++            artifact.path == "adapter/adapters.safetensors"
++            for artifact in export_manifest.artifacts
++        )
++
++        store_manifest = load_manifest(store.manifest)
++        assert store_manifest.exports[-1].target == "mlx-serve"
++        assert store_manifest.exports[-1].quant == "hf"
++        assert store_manifest.exports[-1].smoke_output_first_line == "hello from mlx"
++
++    def test_multi_adapter_export_requires_explicit_selection(
++        self, tmp_path: Path, monkeypatch: object
++    ) -> None:
++        store = _setup_named_store(tmp_path)
++        monkeypatch.setattr("dlm.export.targets.mlx_serve.is_apple_silicon", lambda: True)
++        monkeypatch.setattr("dlm.export.targets.mlx_serve.mlx_available", lambda: True)
++
++        with pytest.raises(ExportError, match="one adapter at a time"):
++            prepare_mlx_serve_export(
++                store=store,
++                spec=_SPEC,
++                adapter_name=None,
++                adapter_path_override=None,
++                declared_adapter_names=("knowledge", "tone"),
++            )
++
++    def test_refuses_without_apple_silicon_runtime(
++        self, tmp_path: Path, monkeypatch: object
++    ) -> None:
++        store = _setup_flat_store(tmp_path)
++        monkeypatch.setattr("dlm.export.targets.mlx_serve.is_apple_silicon", lambda: False)
++
++        with pytest.raises(ExportError, match="Apple Silicon"):
++            prepare_mlx_serve_export(
++                store=store,
++                spec=_SPEC,
++                adapter_name=None,
++                adapter_path_override=None,
++                declared_adapter_names=None,
++            )
++
++
++class TestMlxServeSmoke:
++    def test_smoke_uses_absolute_runtime_paths(self, tmp_path: Path, monkeypatch: object) -> None:
++        store = _setup_flat_store(tmp_path)
++        monkeypatch.setattr("dlm.export.targets.mlx_serve.is_apple_silicon", lambda: True)
++        monkeypatch.setattr("dlm.export.targets.mlx_serve.mlx_available", lambda: True)
++        monkeypatch.setattr("dlm.export.targets.mlx_serve.stage_mlx_adapter_dir", _fake_stage_mlx)
++        prepared = prepare_mlx_serve_export(
++            store=store,
++            spec=_SPEC,
++            adapter_name=None,
++            adapter_path_override=None,
++            declared_adapter_names=None,
++        )
++        seen: list[list[str]] = []
++
++        def _fake_smoke(argv: list[str], **_: object) -> str:
++            seen.append(list(argv))
++            return "mlx replied"
++
++        monkeypatch.setattr("dlm.export.targets.mlx_serve.smoke_openai_compat_server", _fake_smoke)
++
++        result = MLX_SERVE_TARGET.smoke_test(prepared)
++
++        assert result.attempted is True
++        assert result.ok is True
++        assert result.detail == "mlx replied"
++        argv = seen[0]
++        assert argv[:3] == ["python", "-m", "mlx_lm.server"]
++        assert "$SCRIPT_DIR" not in " ".join(argv)
++        assert _SPEC.hf_id in argv
++        assert str(prepared.export_dir / "adapter") in argv

tests/unit/export/targets/test_registry.pymodified

          assert TARGETS["ollama"] is target
          assert "llama-server" in TARGETS
          assert "vllm" in TARGETS
--        assert available_targets() == ("ollama", "llama-server", "vllm")
++        assert "mlx-serve" in TARGETS
++        assert available_targets() == ("ollama", "llama-server", "vllm", "mlx-serve")
      def test_unknown_target_lists_available_targets(self) -> None:
          with pytest.raises(
              UnknownExportTargetError,
--            match="available targets: ollama, llama-server, vllm",
++            match="available targets: ollama, llama-server, vllm, mlx-serve",
          ):
              resolve_target("sglang")

`@@ -203,7 +203,7 @@` dlm export <path> [--target NAME] [--quant Q] [--merged [--dequantize]]
203		203
204	\| Option \| Default \| Notes \|	204	\| Option \| Default \| Notes \|
205	\|---\|---\|---\|	205	\|---\|---\|---\|
206	-\| `--target NAME` \| `ollama` \| Export destination. Sprint 41 currently supports `ollama`, `llama-server`, and `vllm`. The `llama-server` path writes launch artifacts against the existing GGUF export and uses the shared OpenAI-compatible HTTP smoke harness; the `vllm` path writes `vllm_launch.sh` + `vllm_config.json` against the local adapter layout and ignores GGUF-only flags. On Apple Silicon, the generated `vllm` launch path forces the documented low-risk `vllm-metal` settings (`VLLM_METAL_USE_PAGED_ATTENTION=0`, `VLLM_METAL_MEMORY_FRACTION=auto`) and caps `--max-model-len` to the document's `training.sequence_len`. \|	206	+\| `--target NAME` \| `ollama` \| Export destination. Sprint 41 currently supports `ollama`, `llama-server`, `vllm`, and `mlx-serve`. The `llama-server` path writes launch artifacts against the existing GGUF export and uses the shared OpenAI-compatible HTTP smoke harness. The `vllm` path writes `vllm_launch.sh` + `vllm_config.json` against the local adapter layout and ignores GGUF-only flags. On Apple Silicon, the generated `vllm` launch path forces the documented low-risk `vllm-metal` settings (`VLLM_METAL_USE_PAGED_ATTENTION=0`, `VLLM_METAL_MEMORY_FRACTION=auto`) and caps `--max-model-len` to the document's `training.sequence_len`. The `mlx-serve` path is Apple Silicon only, writes `mlx_serve_launch.sh` plus a staged MLX adapter directory, and currently supports text bases only. \|
207	\| `--quant Q` \| frontmatter.export.default_quant \| `Q4_K_M` / `Q5_K_M` / `Q6_K` / `Q8_0` / `F16`. \|	207	\| `--quant Q` \| frontmatter.export.default_quant \| `Q4_K_M` / `Q5_K_M` / `Q6_K` / `Q8_0` / `F16`. \|
208	\| `--merged` \| false \| Merge LoRA into base before quantizing. \|	208	\| `--merged` \| false \| Merge LoRA into base before quantizing. \|
209	\| `--dequantize` \| false \| Required with `--merged` on a QLoRA adapter (pitfall #3). \|	209	\| `--dequantize` \| false \| Required with `--merged` on a QLoRA adapter (pitfall #3). \|

tenseleyflow/documentlanguagemodel / `fc1bd69`

13 changed files

`@@ -10,7 +10,7 @@` A `.dlm` can be a hand-authored training doc, a directive-driven entrypoint
10	into a codebase, a multi-adapter project with learned routing, or a selected	10	into a codebase, a multi-adapter project with learned routing, or a selected
11	multimodal / audio-language document. DLM trains LoRA / QLoRA / DoRA adapters	11	multimodal / audio-language document. DLM trains LoRA / QLoRA / DoRA adapters
12	on real pretrained bases, keeps replay history, and exports local runtimes such	12	on real pretrained bases, keeps replay history, and exports local runtimes such
13	-as Ollama and `llama-server`.	13	+as Ollama, `llama-server`, `vllm`, and `mlx-serve`.
14		14
15	## What DLM Ships Today	15	## What DLM Ships Today
16		16
`@@ -27,7 +27,7 @@` as Ollama and `llama-server`.
27	persona lanes inside one project	27	persona lanes inside one project
28	- Local iteration UX with `prompt`, `repl`, `train --watch`, `metrics`,	28	- Local iteration UX with `prompt`, `repl`, `train --watch`, `metrics`,
29	and `doctor`	29	and `doctor`
30	-- Runtime export to `ollama` and `llama-server`	30	+- Runtime export to `ollama`, `llama-server`, `vllm`, and `mlx-serve`
31	- Probe-driven improvement through `sway`-style harvest flows	31	- Probe-driven improvement through `sway`-style harvest flows
32		32
33	## 30-Second Demo	33	## 30-Second Demo
`@@ -49,7 +49,7 @@` $ uv run dlm export tutor.dlm --target ollama --name my-tutor
49	\| Train across a real repo \| [Training across codebases](cookbook/training-across-codebases.md) \|	49	\| Train across a real repo \| [Training across codebases](cookbook/training-across-codebases.md) \|
50	\| Use named adapters and routing \| [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) \|	50	\| Use named adapters and routing \| [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) \|
51	\| Work with images or audio \| [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) \|	51	\| Work with images or audio \| [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) \|
52	-\| Export or ship a model \| [CLI reference](cli/reference.md) and [Determinism](determinism.md) \|	52	+\| Export or ship a model \| [Multi-target export](cookbook/multi-target-export.md), [CLI reference](cli/reference.md), and [Determinism](determinism.md) \|
53	\| Pull eval failures back into training \| [Probe-driven training](cookbook/probe-driven-training.md) \|	53	\| Pull eval failures back into training \| [Probe-driven training](cookbook/probe-driven-training.md) \|
54		54
55	## Status	55	## Status