`fc1bd69`

Add mlx-serve export target

Authored by

espadonne 2 weeks ago

SHA: fc1bd697666b2f6d760ea21062e9b46ad2bac3ae
Parents: 787628d
Tree: cb96c6e

13 changed files

Status	File	+	-
M	`README.md`	18	8
M	`docs/cli/reference.md`	1	1
A	`docs/cookbook/multi-target-export.md`	175	0
A	`docs/format/export-manifest.md`	95	0
M	`docs/getting-started/first-export.md`	17	0
M	`docs/index.md`	3	3
M	`mkdocs.yml`	2	0
M	`src/dlm/cli/commands.py`	79	1
M	`src/dlm/export/targets/__init__.py`	9	0
A	`src/dlm/export/targets/mlx_serve.py`	272	0
M	`tests/unit/cli/test_export_target_flag.py`	20	0
A	`tests/unit/export/targets/test_mlx_serve_argv.py`	173	0
M	`tests/unit/export/targets/test_registry.py`	3	2

README.mdmodified

  DLM trains LoRA / QLoRA / DoRA adapters on real pretrained bases, keeps a replay
  history so retrains do not silently forget, and exports local runtimes such as
 -Ollama and `llama-server`.
 +Ollama, `llama-server`, `vllm`, and `mlx-serve`.
  **Status:** pre-v1.0, but far beyond the original MVP framing. The core
  author/train/prompt/export/pack/share loop is real, and newer runtime-target
  work is landing incrementally. Current export targets are `ollama`,
 -`llama-server`, and `vllm`.
 +`llama-server`, `vllm`, and `mlx-serve`.
  ## What A `.dlm` Actually Is
    `dlm train --watch`, `dlm metrics`, and `dlm doctor` are all part of the
    normal workflow now.
  - **Export beyond the original Ollama-only story.** DLM still does explicit
 -  Ollama exports with pinned templates, and now also emits `llama-server`
 -  launch artifacts against the same GGUF path.
 +  Ollama exports with pinned templates, and now also emits `llama-server`,
 +  `vllm`, and `mlx-serve` launch artifacts for local runtime targets.
  - **Close the eval loop.** `dlm harvest` can pull failing `sway`-style probe
    reports back into the document as new training examples.
  - **Pack and share reproducibly.** `.dlm.pack`, verification, push/pull, and
  | Tier | Training | Inference / export |
  |---|---|---|
 -| NVIDIA CUDA (SM ≥ 8.0) | bf16 + QLoRA 4-bit + FlashAttention | Ollama, GGUF export, `llama-server` launch artifacts |
 -| NVIDIA CUDA (SM < 8.0) | fp16 LoRA | Ollama, GGUF export, `llama-server` launch artifacts |
 -| Apple Silicon (MPS) | fp16 or fp32 LoRA depending on doctor plan | Ollama, selected MLX inference paths, GGUF export |
 -| CPU | inference-first; training refused above small bases unless forced | GGUF export, Ollama, `llama-server` launch artifacts |
 +| NVIDIA CUDA (SM ≥ 8.0) | bf16 + QLoRA 4-bit + FlashAttention | Ollama, GGUF export, `llama-server`, `vllm` |
 +| NVIDIA CUDA (SM < 8.0) | fp16 LoRA | Ollama, GGUF export, `llama-server`, `vllm` |
 +| Apple Silicon (MPS) | fp16 or fp32 LoRA depending on doctor plan | Ollama, selected MLX inference paths, GGUF export, `vllm` (conservative Metal defaults), `mlx-serve` |
 +| CPU | inference-first; training refused above small bases unless forced | GGUF export, Ollama, `llama-server` |
  | AMD ROCm | experimental | ROCm-oriented llama.cpp flows |
  See [docs/hardware](./docs/hardware/memory-estimates.md) and
  # If you want the llama.cpp HTTP target too:
  scripts/bump-llama-cpp.sh build --with-server
 +# If you want the Apple Silicon MLX HTTP target:
 +uv sync --extra mlx
++
 +# If you want the vLLM HTTP target:
 +# install a compatible vllm runtime separately; DLM writes launch artifacts
 +# but does not bundle the server runtime itself.
++
  uv run dlm --help
  ```
  ```sh
  uv run dlm export mydoc.dlm --target ollama --name mydoc
  uv run dlm export mydoc.dlm --target llama-server --no-smoke
 +uv run dlm export mydoc.dlm --target vllm --no-smoke
 +uv run dlm export mydoc.dlm --target mlx-serve --no-smoke
  uv run dlm pack mydoc.dlm --include-exports
  uv run dlm verify mydoc.dlm.pack
  ```
  - [Multimodal training](./docs/cookbook/multimodal-training.md)
  - [Audio training](./docs/cookbook/audio-training.md)
  - [Probe-driven training / sway harvest](./docs/cookbook/probe-driven-training.md)
 +- [Multi-target export](./docs/cookbook/multi-target-export.md)
  - [CLI reference](./docs/cli/reference.md)
  - [Architecture](./docs/architecture.md)
  - [Determinism](./docs/determinism.md)

docs/cli/reference.mdmodified

  | Option | Default | Notes |
  |---|---|---|
 -| `--target NAME` | `ollama` | Export destination. Sprint 41 currently supports `ollama`, `llama-server`, and `vllm`. The `llama-server` path writes launch artifacts against the existing GGUF export and uses the shared OpenAI-compatible HTTP smoke harness; the `vllm` path writes `vllm_launch.sh` + `vllm_config.json` against the local adapter layout and ignores GGUF-only flags. On Apple Silicon, the generated `vllm` launch path forces the documented low-risk `vllm-metal` settings (`VLLM_METAL_USE_PAGED_ATTENTION=0`, `VLLM_METAL_MEMORY_FRACTION=auto`) and caps `--max-model-len` to the document's `training.sequence_len`. |
 +| `--target NAME` | `ollama` | Export destination. Sprint 41 currently supports `ollama`, `llama-server`, `vllm`, and `mlx-serve`. The `llama-server` path writes launch artifacts against the existing GGUF export and uses the shared OpenAI-compatible HTTP smoke harness. The `vllm` path writes `vllm_launch.sh` + `vllm_config.json` against the local adapter layout and ignores GGUF-only flags. On Apple Silicon, the generated `vllm` launch path forces the documented low-risk `vllm-metal` settings (`VLLM_METAL_USE_PAGED_ATTENTION=0`, `VLLM_METAL_MEMORY_FRACTION=auto`) and caps `--max-model-len` to the document's `training.sequence_len`. The `mlx-serve` path is Apple Silicon only, writes `mlx_serve_launch.sh` plus a staged MLX adapter directory, and currently supports text bases only. |
  | `--quant Q` | frontmatter.export.default_quant | `Q4_K_M` / `Q5_K_M` / `Q6_K` / `Q8_0` / `F16`. |
  | `--merged` | false | Merge LoRA into base before quantizing. |
  | `--dequantize` | false | Required with `--merged` on a QLoRA adapter (pitfall #3). |

docs/cookbook/multi-target-export.mdadded

 +# Multi-target export
++
 +`dlm export` is no longer just an Ollama registration path. The same
 +trained store can now emit local runtime artifacts for four targets:
++
 +- `ollama` for managed local registration plus the existing Modelfile flow
 +- `llama-server` for GGUF-backed OpenAI-compatible HTTP serving via vendored
 +  `llama.cpp`
 +- `vllm` for HF-snapshot plus LoRA-module serving on machines that can run
 +  `vllm`
 +- `mlx-serve` for Apple Silicon text serving through `mlx_lm.server`
++
 +Use this when you want one training loop but different local runtimes for
 +prompting, evaluation harnesses, agents, or deployment experiments.
++
 +## Quick map
++
 +| Target | Best for | Artifact shape | Smoke path |
 +|---|---|---|---|
 +| `ollama` | Easiest local chat loop | GGUF + `Modelfile` + local registration | existing Ollama smoke |
 +| `llama-server` | GGUF-backed OpenAI-compatible server | `base.<quant>.gguf` + `adapter.gguf` + `chat-template.jinja` + `llama-server_launch.sh` | shared HTTP smoke |
 +| `vllm` | HF-snapshot + LoRA serving on supported hosts | `vllm_launch.sh` + `vllm_config.json` + staged adapters | shared HTTP smoke |
 +| `mlx-serve` | Apple Silicon text serving without GGUF conversion | `mlx_serve_launch.sh` + staged MLX adapter dir | shared HTTP smoke |
++
 +## Prerequisites
++
 +### Ollama
++
 +```sh
 +brew install ollama
 +```
++
 +### llama-server
++
 +```sh
 +scripts/bump-llama-cpp.sh build --with-server
 +```
++
 +That compiles the vendored `llama-server` binary alongside the GGUF tooling.
++
 +### vLLM
++
 +Install a compatible `vllm` runtime in the environment you plan to launch
 +from. DLM writes the launch/config artifacts, but it does not bundle the
 +server runtime.
++
 +On Apple Silicon, the generated `vllm` launch path is deliberately cautious:
++
 +- `VLLM_METAL_USE_PAGED_ATTENTION=0`
 +- `VLLM_METAL_MEMORY_FRACTION=auto`
 +- `--max-model-len` capped to the document's `training.sequence_len`
++
 +Those defaults exist to avoid the Metal OOM / hang pattern that shows up when
 +`vllm-metal` blindly asks for the base model's full context window.
++
 +### MLX-serve
++
 +```sh
 +uv sync --extra mlx
 +```
++
 +`mlx-serve` is Apple Silicon only. DLM refuses it on CUDA, ROCm, and CPU-only
 +hosts, and this Sprint 41 slice only supports text bases on that target.
++
 +## Common exports
++
 +### Ollama
++
 +```sh
 +uv run dlm export tutor.dlm --target ollama --name my-tutor
 +```
++
 +This is the classic DLM path: GGUF conversion, explicit Go-template
 +`Modelfile`, optional registration, and an Ollama smoke prompt.
++
 +### llama-server
++
 +```sh
 +uv run dlm export tutor.dlm --target llama-server
 +bash ~/.dlm/store/<dlm_id>/exports/Q4_K_M/llama-server_launch.sh
 +```
++
 +This reuses the GGUF export artifacts and adds:
++
 +- `chat-template.jinja`
 +- `llama-server_launch.sh`
 +- `target: "llama-server"` in `export_manifest.json`
++
 +The launch script binds `127.0.0.1` and speaks `/v1/chat/completions`.
++
 +### vLLM
++
 +```sh
 +uv run dlm export tutor.dlm --target vllm
 +bash ~/.dlm/store/<dlm_id>/exports/vllm/vllm_launch.sh
 +```
++
 +This path stages local LoRA modules and writes:
++
 +- `vllm_launch.sh`
 +- `vllm_config.json`
 +- `exports/vllm/adapters/...`
++
 +Flags that only matter to GGUF or Ollama are ignored with a banner:
 +`--quant`, `--merged`, `--dequantize`, `--no-template`, `--skip-ollama`,
 +`--no-imatrix`, `--draft`, `--no-draft`.
++
 +### MLX-serve
++
 +```sh
 +uv run dlm export tutor.dlm --target mlx-serve
 +bash ~/.dlm/store/<dlm_id>/exports/mlx-serve/mlx_serve_launch.sh
 +```
++
 +This path stages an MLX-loadable adapter directory and writes:
++
 +- `mlx_serve_launch.sh`
 +- `exports/mlx-serve/adapter/` or one named adapter directory
 +- `target: "mlx-serve"` in `export_manifest.json`
++
 +`mlx-serve` also ignores the GGUF/Ollama-only flags above, plus `--name`.
++
 +## Multi-adapter behavior
++
 +The runtime targets split into two families:
++
 +- `ollama` and `llama-server` can reuse the GGUF weighted-merge path for
 +  `--adapter-mix`
 +- `vllm` and `mlx-serve` work from local adapter directories
++
 +For `vllm`:
++
 +- single-adapter docs export one staged module
 +- multi-adapter docs without `--adapter` export every named adapter as a
 +  `--lora-modules` list
 +- `--adapter-mix` exports the staged composite adapter instead
++
 +For `mlx-serve`:
++
 +- single-adapter docs export the current flat adapter
 +- multi-adapter docs must choose one adapter with `--adapter`, or pass
 +  `--adapter-mix` to export the staged composite adapter
++
 +That "one adapter at a time" rule is intentional: this target is a simple
 +local-serving path, not a dynamic multi-LoRA router.
++
 +## Smoke behavior
++
 +All three HTTP targets use the shared OpenAI-compatible smoke harness:
++
 +1. reserve a loopback port
 +2. launch the target-specific server command
 +3. poll `/v1/models`
 +4. POST `/v1/chat/completions`
 +5. record the first non-empty line in the store manifest
++
 +Skip it with `--no-smoke` when the runtime is not installed or you want the
 +artifacts only.
++
 +## Inspecting what got written
++
 +Every export writes `export_manifest.json` under its target directory. The
 +important fields are:
++
 +- `target`
 +- `quant`
 +- `artifacts`
 +- `adapter_version`
 +- `base_model_hf_id`
 +- `base_model_revision`
++
 +The per-store `manifest.json` also gets an appended `exports[-1]` row with the
 +same `target` plus the smoke first line when a smoke test ran.
++
 +See [Export manifest](../format/export-manifest.md) for the exact schema.

docs/format/export-manifest.mdadded

 +# Export manifest
++
 +Every `dlm export` writes an `export_manifest.json` inside the export directory.
 +It is the target-local record of what DLM emitted, separate from the broader
 +per-store `manifest.json`.
++
 +Examples:
++
 +- `~/.dlm/store/<dlm_id>/exports/Q4_K_M/export_manifest.json`
 +- `~/.dlm/store/<dlm_id>/exports/vllm/export_manifest.json`
 +- `~/.dlm/store/<dlm_id>/exports/mlx-serve/export_manifest.json`
++
 +## What it records
++
 +The manifest captures:
++
 +- `target`: which runtime this export was prepared for
 +- `quant`: the export family (`Q4_K_M`, `Q8_0`, `hf`, ...)
 +- `merged` / `dequantized`: whether LoRA weights were merged into the base
 +- `created_at` and `created_by`
 +- `llama_cpp_tag` when the target depends on vendored `llama.cpp`
 +- `base_model_hf_id` and `base_model_revision`
 +- `adapter_version`
 +- `artifacts`: every emitted file with relative path, sha256, and size
++
 +The schema is strict and round-trips through the Pydantic model in
 +`src/dlm/export/manifest.py`.
++
 +## Example
++
 +```json
 +{
 +  "target": "llama-server",
 +  "quant": "Q4_K_M",
 +  "merged": false,
 +  "dequantized": false,
 +  "ollama_name": null,
 +  "created_at": "2026-04-23T18:42:00",
 +  "created_by": "dlm-0.1.0",
 +  "llama_cpp_tag": "b4281",
 +  "base_model_hf_id": "HuggingFaceTB/SmolLM2-135M-Instruct",
 +  "base_model_revision": "4c0d2...",
 +  "adapter_version": 3,
 +  "artifacts": [
 +    {
 +      "path": "base.Q4_K_M.gguf",
 +      "sha256": "…",
 +      "size_bytes": 47211904
 +    },
 +    {
 +      "path": "adapter.gguf",
 +      "sha256": "…",
 +      "size_bytes": 3145728
 +    },
 +    {
 +      "path": "llama-server_launch.sh",
 +      "sha256": "…",
 +      "size_bytes": 312
 +    }
 +  ]
 +}
 +```
++
 +## `target`
++
 +`target` is now the load-bearing field for Sprint 41’s runtime split.
++
 +Current values:
++
 +- `ollama`
 +- `llama-server`
 +- `vllm`
 +- `mlx-serve`
++
 +That lets downstream tooling distinguish:
++
 +- a GGUF + Modelfile export meant for Ollama
 +- a GGUF-backed OpenAI-compatible launch artifact set
 +- an HF-snapshot + LoRA-module export for `vllm`
 +- an MLX adapter export for Apple Silicon serving
++
 +## Relationship to the store manifest
++
 +`export_manifest.json` is per-export and artifact-focused.
++
 +The store-level `manifest.json` keeps the running narrative in `exports[]`:
++
 +- when the export happened
 +- which `target` it used
 +- GGUF checksums when present
 +- `ollama_name` when relevant
 +- the first smoke output line when a smoke test ran
++
 +Use `export_manifest.json` when you need exact artifact provenance for one
 +export directory. Use `manifest.json` when you want the store’s full history.

docs/getting-started/first-export.mdmodified

  Modelfile with an explicit Go `text/template` (no fuzzy matching),
  registers the model with `ollama create`, and runs a smoke prompt.
 +That is still the default path, but it is no longer the only one. Sprint 41
 +also adds local runtime targets such as `llama-server`, `vllm`, and
 +`mlx-serve`; see the [multi-target export cookbook](../cookbook/multi-target-export.md)
 +once you want an OpenAI-compatible local server instead of an Ollama model.
++
  ## Prerequisites
  - `vendor/llama.cpp` submodule is built:
  Useful on CI runners without the Ollama daemon installed. The GGUFs
  land in `exports/Q4_K_M/`; wire them into your own runtime.
 +## Other runtime targets
++
 +Once the basic GGUF/Ollama flow is familiar, the same store can export to:
++
 +- `--target llama-server` for a vendored `llama.cpp` HTTP server
 +- `--target vllm` for HF-snapshot + LoRA-module serving
 +- `--target mlx-serve` for Apple Silicon text serving through `mlx_lm.server`
++
 +Those targets have different prerequisites and artifact layouts, so they live
 +in the [multi-target export cookbook](../cookbook/multi-target-export.md)
 +instead of this first-run page.
++
  ## Next
  Want to send the whole training history to a friend? The

docs/index.mdmodified

  into a codebase, a multi-adapter project with learned routing, or a selected
  multimodal / audio-language document. DLM trains LoRA / QLoRA / DoRA adapters
  on real pretrained bases, keeps replay history, and exports local runtimes such
 -as Ollama and `llama-server`.
 +as Ollama, `llama-server`, `vllm`, and `mlx-serve`.
  ## What DLM Ships Today
    persona lanes inside one project
  - **Local iteration UX** with `prompt`, `repl`, `train --watch`, `metrics`,
    and `doctor`
 -- **Runtime export** to `ollama` and `llama-server`
 +- **Runtime export** to `ollama`, `llama-server`, `vllm`, and `mlx-serve`
  - **Probe-driven improvement** through `sway`-style harvest flows
  ## 30-Second Demo
  | Train across a real repo | [Training across codebases](cookbook/training-across-codebases.md) |
  | Use named adapters and routing | [Multi-adapter](cookbook/multi-adapter.md) and [Learned adapter gate](cookbook/learned-adapter-gate.md) |
  | Work with images or audio | [Multimodal training](cookbook/multimodal-training.md) and [Audio training](cookbook/audio-training.md) |
 -| Export or ship a model | [CLI reference](cli/reference.md) and [Determinism](determinism.md) |
 +| Export or ship a model | [Multi-target export](cookbook/multi-target-export.md), [CLI reference](cli/reference.md), and [Determinism](determinism.md) |
  | Pull eval failures back into training | [Probe-driven training](cookbook/probe-driven-training.md) |
  ## Status

mkdocs.ymlmodified

    - The .dlm format:
        - Frontmatter: format/frontmatter.md
        - Sections: format/sections.md
 +      - Export manifest: format/export-manifest.md
        - .dlm/training.yaml: format/dlm-training-yaml.md
        - .dlm/ignore: format/dlm-ignore.md
    - CLI reference: cli/reference.md
        - Template gallery: cookbook/template-gallery.md
        - Sharing adapters: cookbook/sharing.md
        - Multi-source training: cookbook/multi-source-training.md
 +      - Multi-target export: cookbook/multi-target-export.md
        - Train from a folder: cookbook/train-from-folder.md
        - Training across codebases: cookbook/training-across-codebases.md
        - Tokenized-section cache: cookbook/directive-cache.md

src/dlm/cli/commands.pymodified

          str,
          typer.Option(
              "--target",
 -            help="Export destination. Currently supported: ollama, llama-server, vllm.",
 +            help="Export destination. Currently supported: ollama, llama-server, vllm, mlx-serve.",
          ),
      ] = "ollama",
      quant: Annotated[
+     )
      from dlm.export.quantize import run_checked
      from dlm.export.targets import (
 +        finalize_mlx_serve_export,
          finalize_vllm_export,
          prepare_llama_server_export,
 +        prepare_mlx_serve_export,
          prepare_vllm_export,
          resolve_target,
+     )
              "documents yet; this Sprint 41 slice only supports text bases."
+         )
          raise typer.Exit(code=2)
 +    if resolved_target.name == "mlx-serve" and export_dispatch.accepts_audio:
 +        console.print(
 +            "[red]export:[/red] --target mlx-serve is not wired for audio-language "
 +            "documents yet; this Sprint 41 slice only supports text bases."
 +        )
 +        raise typer.Exit(code=2)
      if export_dispatch.accepts_audio:
          try:
              dispatch_result = export_dispatch.dispatch_export(
              "documents yet; this Sprint 41 slice only supports text bases."
+         )
          raise typer.Exit(code=2)
 +    if resolved_target.name == "mlx-serve" and export_dispatch.accepts_images:
 +        console.print(
 +            "[red]export:[/red] --target mlx-serve is not wired for vision-language "
 +            "documents yet; this Sprint 41 slice only supports text bases."
 +        )
 +        raise typer.Exit(code=2)
      if export_dispatch.accepts_images:
          gguf_emission_context = None
          try:
              console.print(f"smoke:   {vllm_smoke.detail}")
          return
 +    if resolved_target.name == "mlx-serve":
 +        mlx_ignored_flags: list[str] = []
 +        if quant is not None:
 +            mlx_ignored_flags.append("--quant")
 +        if merged:
 +            mlx_ignored_flags.append("--merged")
 +        if dequantize:
 +            mlx_ignored_flags.append("--dequantize")
 +        if name is not None:
 +            mlx_ignored_flags.append("--name")
 +        if no_template:
 +            mlx_ignored_flags.append("--no-template")
 +        if skip_ollama:
 +            mlx_ignored_flags.append("--skip-ollama")
 +        if no_imatrix:
 +            mlx_ignored_flags.append("--no-imatrix")
 +        if draft is not None:
 +            mlx_ignored_flags.append("--draft")
 +        if no_draft:
 +            mlx_ignored_flags.append("--no-draft")
 +        if mlx_ignored_flags:
 +            console.print(
 +                "[yellow]export:[/yellow] ignoring flags not applicable to "
 +                f"`--target mlx-serve`: {', '.join(mlx_ignored_flags)}"
 +            )
++
 +        declared_adapter_names = tuple(adapters_declared.keys()) if adapters_declared else None
 +        try:
 +            mlx_serve_result = prepare_mlx_serve_export(
 +                store=store,
 +                spec=spec,
 +                adapter_name=adapter,
 +                adapter_path_override=adapter_path_override,
 +                declared_adapter_names=declared_adapter_names,
 +            )
 +        except ExportError as exc:
 +            console.print(f"[red]export:[/red] {exc}")
 +            raise typer.Exit(code=1) from exc
++
 +        mlx_serve_smoke = None if no_smoke else resolved_target.smoke_test(mlx_serve_result)
 +        if mlx_serve_smoke is not None and not mlx_serve_smoke.ok:
 +            console.print(
 +                f"[red]smoke:[/red] {mlx_serve_smoke.detail}\n"
 +                "  re-run with `--no-smoke` to skip the smoke test."
 +            )
 +            raise typer.Exit(code=1)
++
 +        manifest_path = finalize_mlx_serve_export(
 +            store=store,
 +            spec=spec,
 +            prepared=mlx_serve_result,
 +            smoke_output_first_line=None if mlx_serve_smoke is None else mlx_serve_smoke.detail,
 +            adapter_name=adapter,
 +            adapter_mix=mix_entries,
 +        )
 +        console.print(f"[green]exported:[/green] {mlx_serve_result.export_dir}")
 +        console.print("target:  mlx-serve")
 +        assert mlx_serve_result.launch_script_path is not None
 +        console.print(f"launch:  {mlx_serve_result.launch_script_path.name}")
 +        console.print(f"manifest: {manifest_path.name}")
 +        if mlx_serve_smoke is not None and mlx_serve_smoke.detail:
 +            console.print(f"smoke:   {mlx_serve_smoke.detail}")
 +        return
++
      try:
          result = run_export(
              store,

src/dlm/export/targets/__init__.pymodified

  from dlm.export.errors import UnknownExportTargetError
  from dlm.export.targets.base import ExportTarget, SmokeResult, TargetResult
  from dlm.export.targets.llama_server import LLAMA_SERVER_TARGET, prepare_llama_server_export
 +from dlm.export.targets.mlx_serve import (
 +    MLX_SERVE_TARGET,
 +    finalize_mlx_serve_export,
 +    prepare_mlx_serve_export,
 +)
  from dlm.export.targets.ollama import OLLAMA_TARGET
  from dlm.export.targets.vllm import VLLM_TARGET, finalize_vllm_export, prepare_vllm_export
      OLLAMA_TARGET.name: OLLAMA_TARGET,
      LLAMA_SERVER_TARGET.name: LLAMA_SERVER_TARGET,
      VLLM_TARGET.name: VLLM_TARGET,
 +    MLX_SERVE_TARGET.name: MLX_SERVE_TARGET,
+ }
  __all__ = [
      "ExportTarget",
      "LLAMA_SERVER_TARGET",
 +    "MLX_SERVE_TARGET",
      "SmokeResult",
      "TARGETS",
      "TargetResult",
      "VLLM_TARGET",
      "available_targets",
 +    "finalize_mlx_serve_export",
      "finalize_vllm_export",
 +    "prepare_mlx_serve_export",
      "prepare_llama_server_export",
      "prepare_vllm_export",
      "resolve_target",

src/dlm/export/targets/mlx_serve.pyadded

 +"""MLX HTTP server target helpers."""
++
 +from __future__ import annotations
++
 +import shlex
 +import shutil
 +from pathlib import Path
++
 +from dlm.base_models import BaseModelSpec
 +from dlm.export.errors import ExportError, TargetSmokeError
 +from dlm.export.manifest import ExportManifest, build_artifact, save_export_manifest, utc_now
 +from dlm.export.record import append_export_summary
 +from dlm.export.smoke import smoke_openai_compat_server
 +from dlm.export.targets.base import ExportTarget, SmokeResult, TargetResult
 +from dlm.inference.backends.mlx_backend import stage_mlx_adapter_dir
 +from dlm.inference.backends.select import is_apple_silicon, mlx_available
 +from dlm.io.atomic import write_text
 +from dlm.store.paths import StorePath
++
 +MLX_SERVE_EXPORT_SUBDIR = "mlx-serve"
 +LAUNCH_SCRIPT_FILENAME = "mlx_serve_launch.sh"
 +_HF_QUANT = "hf"
 +_DEFAULT_ADAPTER_DIRNAME = "adapter"
 +_MIXED_ADAPTER_DIRNAME = "mixed"
++
++
 +class MlxServeTarget:
 +    """Registered export target for MLX HTTP server launch artifacts."""
++
 +    name = "mlx-serve"
++
 +    def prepare(self, ctx: object) -> TargetResult:
 +        raise NotImplementedError("mlx-serve exports are prepared via prepare_mlx_serve_export()")
++
 +    def launch_command(self, prepared: TargetResult) -> list[str]:
 +        return _build_command(prepared, use_script_dir=True)
++
 +    def smoke_test(self, prepared: TargetResult) -> SmokeResult:
 +        try:
 +            first_line = smoke_openai_compat_server(_build_command(prepared, use_script_dir=False))
 +        except (OSError, TargetSmokeError, ExportError) as exc:
 +            return SmokeResult(attempted=True, ok=False, detail=str(exc))
 +        return SmokeResult(attempted=True, ok=True, detail=first_line)
++
++
 +def prepare_mlx_serve_export(
 +    *,
 +    store: StorePath,
 +    spec: BaseModelSpec,
 +    adapter_name: str | None,
 +    adapter_path_override: Path | None,
 +    declared_adapter_names: tuple[str, ...] | None,
 +) -> TargetResult:
 +    """Stage an MLX-loadable adapter dir plus launch script."""
++
 +    _require_mlx_runtime()
 +    source_adapter_dir, staged_dirname, adapter_version = _resolve_source_adapter(
 +        store=store,
 +        adapter_name=adapter_name,
 +        adapter_path_override=adapter_path_override,
 +        declared_adapter_names=declared_adapter_names,
 +    )
++
 +    export_dir = store.exports / MLX_SERVE_EXPORT_SUBDIR
 +    export_dir.mkdir(parents=True, exist_ok=True)
++
 +    staged_adapter_dir = export_dir / staged_dirname
 +    if staged_adapter_dir.exists():
 +        shutil.rmtree(staged_adapter_dir)
 +    stage_mlx_adapter_dir(source_adapter_dir, staged_adapter_dir, base_hf_id=spec.hf_id)
++
 +    launch_script_path = export_dir / LAUNCH_SCRIPT_FILENAME
 +    draft = TargetResult(
 +        name=MLX_SERVE_TARGET.name,
 +        export_dir=export_dir,
 +        manifest_path=export_dir / "export_manifest.json",
 +        artifacts=(),
 +        launch_script_path=launch_script_path,
 +        extras={
 +            "model": spec.hf_id,
 +            "adapter_dir": staged_adapter_dir,
 +            "adapter_version": adapter_version,
 +        },
 +    )
 +    write_text(launch_script_path, _render_launch_script(MLX_SERVE_TARGET.launch_command(draft)))
 +    launch_script_path.chmod(0o755)
 +    return TargetResult(
 +        name=draft.name,
 +        export_dir=draft.export_dir,
 +        manifest_path=draft.manifest_path,
 +        artifacts=tuple(_artifact_paths(export_dir)),
 +        launch_script_path=draft.launch_script_path,
 +        config_path=None,
 +        extras=draft.extras,
 +    )
++
++
 +def finalize_mlx_serve_export(
 +    *,
 +    store: StorePath,
 +    spec: BaseModelSpec,
 +    prepared: TargetResult,
 +    smoke_output_first_line: str | None,
 +    adapter_name: str | None,
 +    adapter_mix: list[tuple[str, float]] | None,
 +) -> Path:
 +    """Write export_manifest.json and append the store export summary."""
++
 +    from dlm import __version__ as dlm_version
++
 +    artifacts = [
 +        build_artifact(prepared.export_dir, path) for path in _artifact_paths(prepared.export_dir)
 +    ]
 +    adapter_version = _require_prepared_int(prepared, "adapter_version")
 +    manifest = ExportManifest(
 +        target=MLX_SERVE_TARGET.name,
 +        quant=_HF_QUANT,
 +        merged=False,
 +        dequantized=False,
 +        ollama_name=None,
 +        created_at=utc_now(),
 +        created_by=f"dlm-{dlm_version}",
 +        llama_cpp_tag=None,
 +        base_model_hf_id=spec.hf_id,
 +        base_model_revision=spec.revision,
 +        adapter_version=adapter_version,
 +        artifacts=artifacts,
 +    )
 +    manifest_path = save_export_manifest(prepared.export_dir, manifest)
 +    append_export_summary(
 +        store=store,
 +        quant=_HF_QUANT,
 +        merged=False,
 +        target=MLX_SERVE_TARGET.name,
 +        llama_cpp_tag=None,
 +        artifacts=artifacts,
 +        ollama_name=None,
 +        ollama_version_str=None,
 +        smoke_first_line=smoke_output_first_line,
 +        adapter_name=adapter_name,
 +        adapter_mix=adapter_mix,
 +    )
 +    return manifest_path
++
++
 +def _resolve_source_adapter(
 +    *,
 +    store: StorePath,
 +    adapter_name: str | None,
 +    adapter_path_override: Path | None,
 +    declared_adapter_names: tuple[str, ...] | None,
 +) -> tuple[Path, str, int]:
 +    if adapter_path_override is not None:
 +        if not adapter_path_override.exists():
 +            raise ExportError(f"adapter_path_override {adapter_path_override} does not exist")
 +        return (
 +            adapter_path_override,
 +            _MIXED_ADAPTER_DIRNAME,
 +            _version_from_dir_name(adapter_path_override),
 +        )
++
 +    if declared_adapter_names and adapter_name is None:
 +        raise ExportError(
 +            "mlx-serve exports one adapter at a time; pass `--adapter <name>` "
 +            "or `--adapter-mix` for multi-adapter documents."
 +        )
++
 +    if adapter_name is not None:
 +        path = store.resolve_current_adapter_for(adapter_name)
 +        pointer = store.adapter_current_pointer_for(adapter_name)
 +        if path is None or not path.exists():
 +            raise ExportError(
 +                f"no current adapter under {pointer}; run `dlm train` before exporting."
 +            )
 +        return path, adapter_name, _version_from_dir_name(path)
++
 +    path = store.resolve_current_adapter()
 +    pointer = store.adapter_current_pointer
 +    if path is None or not path.exists():
 +        raise ExportError(f"no current adapter under {pointer}; run `dlm train` before exporting.")
 +    return path, _DEFAULT_ADAPTER_DIRNAME, _version_from_dir_name(path)
++
++
 +def _require_mlx_runtime() -> None:
 +    if not is_apple_silicon():
 +        raise ExportError(
 +            "mlx-serve export requires Apple Silicon (darwin-arm64); "
 +            "this target is not available on CUDA, ROCm, or CPU-only hosts."
 +        )
 +    if not mlx_available():
 +        raise ExportError(
 +            "mlx-serve export requires the mlx extra to be installed; "
 +            "run `uv sync --extra mlx` and re-try."
 +        )
++
++
 +def _artifact_paths(export_dir: Path) -> list[Path]:
 +    artifacts: list[Path] = []
 +    for path in sorted(export_dir.rglob("*")):
 +        if path.is_file() and path.name != "export_manifest.json":
 +            artifacts.append(path)
 +    return artifacts
++
++
 +def _build_command(prepared: TargetResult, *, use_script_dir: bool) -> list[str]:
 +    model = _require_prepared_str(prepared, "model")
 +    adapter_dir = _require_prepared_path(prepared, "adapter_dir")
 +    return [
 +        "python",
 +        "-m",
 +        "mlx_lm.server",
 +        "--model",
 +        model,
 +        "--adapter-path",
 +        _script_dir_arg(adapter_dir) if use_script_dir else str(adapter_dir),
 +        "--host",
 +        "127.0.0.1",
 +        "--port",
 +        "8000",
 +    ]
++
++
 +def _script_dir_arg(path: Path) -> str:
 +    return f"$SCRIPT_DIR/{path.name}"
++
++
 +def _render_launch_script(command: list[str]) -> str:
 +    rendered = " ".join(_quote_script_arg(arg) for arg in command)
 +    return (
 +        "#!/usr/bin/env bash\n"
 +        "set -euo pipefail\n"
 +        'SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"\n'
 +        f'exec {rendered} "$@"\n'
 +    )
++
++
 +def _quote_script_arg(arg: str) -> str:
 +    if arg.startswith("$SCRIPT_DIR/"):
 +        return f'"{arg}"'
 +    return shlex.quote(arg)
++
++
 +def _version_from_dir_name(path: Path) -> int:
 +    stem = path.name
 +    if not stem.startswith("v") or not stem[1:].isdigit():
 +        return 1
 +    return int(stem[1:])
++
++
 +def _require_prepared_str(prepared: TargetResult, key: str) -> str:
 +    value = prepared.extras.get(key)
 +    if not isinstance(value, str) or not value:
 +        raise ExportError(f"mlx-serve prepared target missing string extra {key!r}")
 +    return value
++
++
 +def _require_prepared_path(prepared: TargetResult, key: str) -> Path:
 +    value = prepared.extras.get(key)
 +    if not isinstance(value, Path):
 +        raise ExportError(f"mlx-serve prepared target missing Path extra {key!r}")
 +    return value
++
++
 +def _require_prepared_int(prepared: TargetResult, key: str) -> int:
 +    value = prepared.extras.get(key)
 +    if not isinstance(value, int):
 +        raise ExportError(f"mlx-serve prepared target missing int extra {key!r}")
 +    return value
++
++
 +MLX_SERVE_TARGET = MlxServeTarget()
 +assert isinstance(MLX_SERVE_TARGET, ExportTarget)

tests/unit/cli/test_export_target_flag.pymodified

          assert "ollama" in text
          assert "llama-server" in text
          assert "vllm" in text
 +        assert "mlx-serve" in text
      def test_ollama_target_reaches_existing_mutex_validation(self, tmp_path: Path) -> None:
          doc = _scaffold_doc(tmp_path)
+         )
          assert result.exit_code == 2
          assert "mutually exclusive" in _joined(result)
++
 +    def test_mlx_serve_target_reaches_existing_mutex_validation(self, tmp_path: Path) -> None:
 +        runner = CliRunner()
 +        result = runner.invoke(
 +            app,
 +            [
 +                "--home",
 +                str(tmp_path / "home"),
 +                "export",
 +                str(tmp_path / "ghost.dlm"),
 +                "--target",
 +                "mlx-serve",
 +                "--draft",
 +                "qwen2.5:0.5b",
 +                "--no-draft",
 +            ],
 +        )
 +        assert result.exit_code == 2
 +        assert "mutually exclusive" in _joined(result)

tests/unit/export/targets/test_mlx_serve_argv.pyadded

 +"""MLX serve launch artifact generation."""
++
 +from __future__ import annotations
++
 +from pathlib import Path
++
 +import pytest
++
 +from dlm.base_models import BASE_MODELS
 +from dlm.export.errors import ExportError
 +from dlm.export.manifest import load_export_manifest
 +from dlm.export.targets.mlx_serve import (
 +    LAUNCH_SCRIPT_FILENAME,
 +    MLX_SERVE_TARGET,
 +    finalize_mlx_serve_export,
 +    prepare_mlx_serve_export,
 +)
 +from dlm.store.manifest import Manifest, load_manifest, save_manifest
 +from dlm.store.paths import for_dlm
++
 +_SPEC = BASE_MODELS["smollm2-135m"]
++
++
 +def _write_adapter(path: Path) -> None:
 +    path.mkdir(parents=True)
 +    (path / "adapter_config.json").write_text("{}", encoding="utf-8")
 +    (path / "adapter_model.safetensors").write_bytes(b"adapter")
++
++
 +def _fake_stage_mlx(src: Path, dst: Path, *, base_hf_id: str) -> Path:
 +    assert src.exists()
 +    assert base_hf_id == _SPEC.hf_id
 +    dst.mkdir(parents=True, exist_ok=True)
 +    (dst / "adapter_config.json").write_text("{}", encoding="utf-8")
 +    (dst / "adapters.safetensors").write_bytes(b"mlx-adapter")
 +    return dst
++
++
 +def _setup_flat_store(tmp_path: Path) -> object:
 +    store = for_dlm("01MLXTEST", home=tmp_path)
 +    store.ensure_layout()
 +    save_manifest(store.manifest, Manifest(dlm_id="01MLXTEST", base_model=_SPEC.key))
 +    adapter = store.adapter_version(3)
 +    _write_adapter(adapter)
 +    store.set_current_adapter(adapter)
 +    return store
++
++
 +def _setup_named_store(tmp_path: Path) -> object:
 +    store = for_dlm("01MLXMULTI", home=tmp_path)
 +    store.ensure_layout()
 +    save_manifest(store.manifest, Manifest(dlm_id="01MLXMULTI", base_model=_SPEC.key))
 +    knowledge = store.adapter_version_for("knowledge", 2)
 +    tone = store.adapter_version_for("tone", 4)
 +    _write_adapter(knowledge)
 +    _write_adapter(tone)
 +    store.set_current_adapter_for("knowledge", knowledge)
 +    store.set_current_adapter_for("tone", tone)
 +    return store
++
++
 +class TestPrepareMlxServeExport:
 +    def test_prepare_writes_launch_script_and_manifest(
 +        self, tmp_path: Path, monkeypatch: object
 +    ) -> None:
 +        store = _setup_flat_store(tmp_path)
 +        monkeypatch.setattr("dlm.export.targets.mlx_serve.is_apple_silicon", lambda: True)
 +        monkeypatch.setattr("dlm.export.targets.mlx_serve.mlx_available", lambda: True)
 +        monkeypatch.setattr("dlm.export.targets.mlx_serve.stage_mlx_adapter_dir", _fake_stage_mlx)
++
 +        prepared = prepare_mlx_serve_export(
 +            store=store,
 +            spec=_SPEC,
 +            adapter_name=None,
 +            adapter_path_override=None,
 +            declared_adapter_names=None,
 +        )
 +        manifest_path = finalize_mlx_serve_export(
 +            store=store,
 +            spec=_SPEC,
 +            prepared=prepared,
 +            smoke_output_first_line="hello from mlx",
 +            adapter_name=None,
 +            adapter_mix=None,
 +        )
++
 +        assert prepared.launch_script_path is not None
 +        assert prepared.launch_script_path.name == LAUNCH_SCRIPT_FILENAME
 +        script = prepared.launch_script_path.read_text(encoding="utf-8")
 +        assert script.startswith("#!/usr/bin/env bash\nset -euo pipefail\n")
 +        assert "python -m mlx_lm.server" in script
 +        assert f"--model {_SPEC.hf_id}" in script
 +        assert '--adapter-path "$SCRIPT_DIR/adapter"' in script
++
 +        export_manifest = load_export_manifest(prepared.export_dir)
 +        assert manifest_path == prepared.manifest_path
 +        assert export_manifest.target == "mlx-serve"
 +        assert export_manifest.quant == "hf"
 +        assert export_manifest.adapter_version == 3
 +        assert any(artifact.path == "mlx_serve_launch.sh" for artifact in export_manifest.artifacts)
 +        assert any(
 +            artifact.path == "adapter/adapters.safetensors"
 +            for artifact in export_manifest.artifacts
 +        )
++
 +        store_manifest = load_manifest(store.manifest)
 +        assert store_manifest.exports[-1].target == "mlx-serve"
 +        assert store_manifest.exports[-1].quant == "hf"
 +        assert store_manifest.exports[-1].smoke_output_first_line == "hello from mlx"
++
 +    def test_multi_adapter_export_requires_explicit_selection(
 +        self, tmp_path: Path, monkeypatch: object
 +    ) -> None:
 +        store = _setup_named_store(tmp_path)
 +        monkeypatch.setattr("dlm.export.targets.mlx_serve.is_apple_silicon", lambda: True)
 +        monkeypatch.setattr("dlm.export.targets.mlx_serve.mlx_available", lambda: True)
++
 +        with pytest.raises(ExportError, match="one adapter at a time"):
 +            prepare_mlx_serve_export(
 +                store=store,
 +                spec=_SPEC,
 +                adapter_name=None,
 +                adapter_path_override=None,
 +                declared_adapter_names=("knowledge", "tone"),
 +            )
++
 +    def test_refuses_without_apple_silicon_runtime(
 +        self, tmp_path: Path, monkeypatch: object
 +    ) -> None:
 +        store = _setup_flat_store(tmp_path)
 +        monkeypatch.setattr("dlm.export.targets.mlx_serve.is_apple_silicon", lambda: False)
++
 +        with pytest.raises(ExportError, match="Apple Silicon"):
 +            prepare_mlx_serve_export(
 +                store=store,
 +                spec=_SPEC,
 +                adapter_name=None,
 +                adapter_path_override=None,
 +                declared_adapter_names=None,
 +            )
++
++
 +class TestMlxServeSmoke:
 +    def test_smoke_uses_absolute_runtime_paths(self, tmp_path: Path, monkeypatch: object) -> None:
 +        store = _setup_flat_store(tmp_path)
 +        monkeypatch.setattr("dlm.export.targets.mlx_serve.is_apple_silicon", lambda: True)
 +        monkeypatch.setattr("dlm.export.targets.mlx_serve.mlx_available", lambda: True)
 +        monkeypatch.setattr("dlm.export.targets.mlx_serve.stage_mlx_adapter_dir", _fake_stage_mlx)
 +        prepared = prepare_mlx_serve_export(
 +            store=store,
 +            spec=_SPEC,
 +            adapter_name=None,
 +            adapter_path_override=None,
 +            declared_adapter_names=None,
 +        )
 +        seen: list[list[str]] = []
++
 +        def _fake_smoke(argv: list[str], **_: object) -> str:
 +            seen.append(list(argv))
 +            return "mlx replied"
++
 +        monkeypatch.setattr("dlm.export.targets.mlx_serve.smoke_openai_compat_server", _fake_smoke)
++
 +        result = MLX_SERVE_TARGET.smoke_test(prepared)
++
 +        assert result.attempted is True
 +        assert result.ok is True
 +        assert result.detail == "mlx replied"
 +        argv = seen[0]
 +        assert argv[:3] == ["python", "-m", "mlx_lm.server"]
 +        assert "$SCRIPT_DIR" not in " ".join(argv)
 +        assert _SPEC.hf_id in argv
 +        assert str(prepared.export_dir / "adapter") in argv

tests/unit/export/targets/test_registry.pymodified

          assert TARGETS["ollama"] is target
          assert "llama-server" in TARGETS
          assert "vllm" in TARGETS
 -        assert available_targets() == ("ollama", "llama-server", "vllm")
 +        assert "mlx-serve" in TARGETS
 +        assert available_targets() == ("ollama", "llama-server", "vllm", "mlx-serve")
      def test_unknown_target_lists_available_targets(self) -> None:
          with pytest.raises(
              UnknownExportTargetError,
 -            match="available targets: ollama, llama-server, vllm",
 +            match="available targets: ollama, llama-server, vllm, mlx-serve",
          ):
              resolve_target("sglang")