tenseleyflow/sway / 2c9df2f

Browse files

CHANGELOG: Sprint 24 — F01 PEFT→MLX adapter converter

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
2c9df2f48940b7c64f989ff29321ee1637462d19
Parents
727e9f3
Tree
9f15125

1 changed file

StatusFile+-
M CHANGELOG.md 80 0
CHANGELOG.mdmodified
@@ -2,6 +2,86 @@
22
 
33
 ## Unreleased
44
 
5
+### Sprint 24 — F01 PEFT→MLX adapter converter
6
+
7
+Closes the audit's #1 major finding: the README pitched MLX as a
8
+co-equal backend, but the path required a pre-converted `.npz`
9
+adapter that nothing in the toolchain produced. With this sprint,
10
+`dlm train` → `sway run` on the MLX backend works end-to-end on any
11
+PEFT-trained LoRA adapter.
12
+
13
+**Converter (pure I/O, no torch dep).**
14
+
15
+- **`backends/_mlx_convert.convert_peft_to_mlx`** — reads PEFT's
16
+  `adapter_model.safetensors` + `adapter_config.json`, transposes
17
+  LoRA matrices to MLX's layout, writes `adapters.safetensors` +
18
+  mlx-lm-shaped `adapter_config.json`. Verified against PEFT >= 0.13
19
+  + mlx-lm 0.31.
20
+- **Key remap.** PEFT's
21
+  `base_model.model.<dotted>.lora_<A|B>.weight` becomes MLX's
22
+  `<dotted>.lora_<a|b>`. Modern PEFT keys (no `.default` adapter-name
23
+  segment) and legacy `.default.weight` keys both supported.
24
+- **Shape transpose.** PEFT `lora_A=(r, in)` → MLX `lora_a=(in, r)`;
25
+  PEFT `lora_B=(out, r)` → MLX `lora_b=(r, out)`.
26
+- **Config remap.** Writes `fine_tune_type=lora`, `num_layers`
27
+  inferred from max layer index in the keys, `lora_parameters` with
28
+  `rank/scale=alpha÷r/dropout/keys` (per-layer-relative attribute
29
+  paths like `self_attn.q_proj`).
30
+- **Errors.** Missing files / non-LORA peft_type / invalid rank /
31
+  unexpected key prefixes / dst-not-empty all surface as typed
32
+  `MlxConvertError` with actionable messages. `modules_to_save`
33
+  tensors (e.g. `embed_tokens`, `lm_head` overrides) are skipped
34
+  with a per-key warning rather than crashing.
35
+
36
+**CLI surface.**
37
+
38
+- **`sway convert-adapter [--target mlx] SRC DST [--overwrite]`** —
39
+  thin wrapper over the converter. Prints a before/after size +
40
+  rank/scale report; surfaces `MlxConvertError` with a non-zero
41
+  exit code; warns on skipped `modules_to_save` keys via stderr.
42
+
43
+**MLX backend integration.**
44
+
45
+- **`backends/mlx._ensure_mlx_adapter`** — auto-detect: if the
46
+  adapter dir contains `adapter_model.safetensors`, run the
47
+  converter into a content-hashed cache at
48
+  `${XDG_CACHE_HOME:-$HOME/.cache}/dlm-sway/mlx-converted/<blake2b>/`,
49
+  point mlx-lm at the cache. If the dir already contains
50
+  `adapters.safetensors`, pass through unchanged. Uses 16-byte
51
+  blake2b on the source safetensors bytes — repeat loads on the
52
+  same adapter version short-circuit (~10 ms hash + dir lookup).
53
+- **`backends/mlx._MLXView._forward_logits`** — adjacent fix:
54
+  `out[0].astype(mx.float32)` before `np.asarray` so unquantized
55
+  bf16/fp16 model outputs round-trip correctly. Pre-existing bug
56
+  surfaced by the new e2e test against `mlx-community/SmolLM2-135M-Instruct`.
57
+
58
+**Tests.**
59
+
60
+- **`tests/unit/test_mlx_convert`** — 20 tests across:
61
+  - Helper functions (`_strip_layer_prefix`, `_extract_layer_index`).
62
+  - Happy path: synthetic PEFT adapter → MLX adapter, expected file
63
+    layout, config shape, rank/scale math, key transpose, value
64
+    preservation.
65
+  - Error paths: missing safetensors / config, non-LORA peft_type,
66
+    invalid rank, dst-not-empty without `--overwrite`, unexpected
67
+    key prefix, `modules_to_save` skip-and-report.
68
+  - Auto-convert detection: pass-through on already-MLX dir, fresh
69
+    convert on PEFT dir, cache short-circuit, unrecognized dir
70
+    pass-through.
71
+- **`tests/integration/test_mlx_converter_e2e`** — 4 darwin-arm64
72
+  slow+online tests on real `mlx-community/SmolLM2-135M-Instruct`:
73
+  XDG cache populated by backend init, `next_token_dist` returns
74
+  finite top-k via converted adapter, `logprob_of` works, repeat
75
+  load skips reconvert (mtime check). Skipped on non-darwin /
76
+  missing `[mlx]` extra.
77
+
78
+**README.**
79
+
80
+- New "MLX backend (Apple Silicon)" section with the two install
81
+  paths (auto-convert via `sway run` vs. explicit `sway convert-adapter`).
82
+  Documents the cache location and out-of-scope items (QLoRA,
83
+  `modules_to_save`).
84
+
585
 ### Sprint 23 — H1 batched backend execution
686
 
787
 Opens the door to 3-5× wall-time reduction on HF-backend suites by