@@ -2,6 +2,86 @@ |
| 2 | 2 | |
| 3 | 3 | ## Unreleased |
| 4 | 4 | |
| 5 | +### Sprint 24 — F01 PEFT→MLX adapter converter |
| 6 | + |
| 7 | +Closes the audit's #1 major finding: the README pitched MLX as a |
| 8 | +co-equal backend, but the path required a pre-converted `.npz` |
| 9 | +adapter that nothing in the toolchain produced. With this sprint, |
| 10 | +`dlm train` → `sway run` on the MLX backend works end-to-end on any |
| 11 | +PEFT-trained LoRA adapter. |
| 12 | + |
| 13 | +**Converter (pure I/O, no torch dep).** |
| 14 | + |
| 15 | +- **`backends/_mlx_convert.convert_peft_to_mlx`** — reads PEFT's |
| 16 | + `adapter_model.safetensors` + `adapter_config.json`, transposes |
| 17 | + LoRA matrices to MLX's layout, writes `adapters.safetensors` + |
| 18 | + mlx-lm-shaped `adapter_config.json`. Verified against PEFT >= 0.13 |
| 19 | + + mlx-lm 0.31. |
| 20 | +- **Key remap.** PEFT's |
| 21 | + `base_model.model.<dotted>.lora_<A|B>.weight` becomes MLX's |
| 22 | + `<dotted>.lora_<a|b>`. Modern PEFT keys (no `.default` adapter-name |
| 23 | + segment) and legacy `.default.weight` keys both supported. |
| 24 | +- **Shape transpose.** PEFT `lora_A=(r, in)` → MLX `lora_a=(in, r)`; |
| 25 | + PEFT `lora_B=(out, r)` → MLX `lora_b=(r, out)`. |
| 26 | +- **Config remap.** Writes `fine_tune_type=lora`, `num_layers` |
| 27 | + inferred from max layer index in the keys, `lora_parameters` with |
| 28 | + `rank/scale=alpha÷r/dropout/keys` (per-layer-relative attribute |
| 29 | + paths like `self_attn.q_proj`). |
| 30 | +- **Errors.** Missing files / non-LORA peft_type / invalid rank / |
| 31 | + unexpected key prefixes / dst-not-empty all surface as typed |
| 32 | + `MlxConvertError` with actionable messages. `modules_to_save` |
| 33 | + tensors (e.g. `embed_tokens`, `lm_head` overrides) are skipped |
| 34 | + with a per-key warning rather than crashing. |
| 35 | + |
| 36 | +**CLI surface.** |
| 37 | + |
| 38 | +- **`sway convert-adapter [--target mlx] SRC DST [--overwrite]`** — |
| 39 | + thin wrapper over the converter. Prints a before/after size + |
| 40 | + rank/scale report; surfaces `MlxConvertError` with a non-zero |
| 41 | + exit code; warns on skipped `modules_to_save` keys via stderr. |
| 42 | + |
| 43 | +**MLX backend integration.** |
| 44 | + |
| 45 | +- **`backends/mlx._ensure_mlx_adapter`** — auto-detect: if the |
| 46 | + adapter dir contains `adapter_model.safetensors`, run the |
| 47 | + converter into a content-hashed cache at |
| 48 | + `${XDG_CACHE_HOME:-$HOME/.cache}/dlm-sway/mlx-converted/<blake2b>/`, |
| 49 | + point mlx-lm at the cache. If the dir already contains |
| 50 | + `adapters.safetensors`, pass through unchanged. Uses 16-byte |
| 51 | + blake2b on the source safetensors bytes — repeat loads on the |
| 52 | + same adapter version short-circuit (~10 ms hash + dir lookup). |
| 53 | +- **`backends/mlx._MLXView._forward_logits`** — adjacent fix: |
| 54 | + `out[0].astype(mx.float32)` before `np.asarray` so unquantized |
| 55 | + bf16/fp16 model outputs round-trip correctly. Pre-existing bug |
| 56 | + surfaced by the new e2e test against `mlx-community/SmolLM2-135M-Instruct`. |
| 57 | + |
| 58 | +**Tests.** |
| 59 | + |
| 60 | +- **`tests/unit/test_mlx_convert`** — 20 tests across: |
| 61 | + - Helper functions (`_strip_layer_prefix`, `_extract_layer_index`). |
| 62 | + - Happy path: synthetic PEFT adapter → MLX adapter, expected file |
| 63 | + layout, config shape, rank/scale math, key transpose, value |
| 64 | + preservation. |
| 65 | + - Error paths: missing safetensors / config, non-LORA peft_type, |
| 66 | + invalid rank, dst-not-empty without `--overwrite`, unexpected |
| 67 | + key prefix, `modules_to_save` skip-and-report. |
| 68 | + - Auto-convert detection: pass-through on already-MLX dir, fresh |
| 69 | + convert on PEFT dir, cache short-circuit, unrecognized dir |
| 70 | + pass-through. |
| 71 | +- **`tests/integration/test_mlx_converter_e2e`** — 4 darwin-arm64 |
| 72 | + slow+online tests on real `mlx-community/SmolLM2-135M-Instruct`: |
| 73 | + XDG cache populated by backend init, `next_token_dist` returns |
| 74 | + finite top-k via converted adapter, `logprob_of` works, repeat |
| 75 | + load skips reconvert (mtime check). Skipped on non-darwin / |
| 76 | + missing `[mlx]` extra. |
| 77 | + |
| 78 | +**README.** |
| 79 | + |
| 80 | +- New "MLX backend (Apple Silicon)" section with the two install |
| 81 | + paths (auto-convert via `sway run` vs. explicit `sway convert-adapter`). |
| 82 | + Documents the cache location and out-of-scope items (QLoRA, |
| 83 | + `modules_to_save`). |
| 84 | + |
| 5 | 85 | ### Sprint 23 — H1 batched backend execution |
| 6 | 86 | |
| 7 | 87 | Opens the door to 3-5× wall-time reduction on HF-backend suites by |