@@ -2,6 +2,48 @@ |
| 2 | 2 | |
| 3 | 3 | ## Unreleased |
| 4 | 4 | |
| 5 | +### Sprint 36 — `sway serve` warm-backend daemon |
| 6 | + |
| 7 | +Audit 03 H4. `sway run` cold-loads the HF backend each invocation |
| 8 | +(~15s on a 1.5B model: weights, KV cache, deterministic-mode setup). |
| 9 | +For interactive flows — notebooks, the `sway watch` retrain loop, |
| 10 | +the live HTML report — that startup dwarfs the actual scoring |
| 11 | +cost. `sway serve` is a long-running FastAPI daemon that loads the |
| 12 | +backend once and keeps it warm across requests. |
| 13 | + |
| 14 | +**New CLI (`sway serve`).** Defaults: `--host 127.0.0.1 --port 8787 |
| 15 | +--max-loaded-models 2`. The daemon refuses to bind a non-loopback |
| 16 | +interface unless `--api-key <token>` is passed; every non-`/health` |
| 17 | +request must then carry `Authorization: Bearer <token>`. |
| 18 | + |
| 19 | +**Endpoints.** |
| 20 | + |
| 21 | +- `GET /health` — uptime + the list of currently-warm models. |
| 22 | +- `GET /stats` — request count + mean latency + cache size. |
| 23 | +- `POST /run` — body `{spec: SwaySpec}`; returns the same JSON |
| 24 | + shape `sway run --json-out` would write, plus `request_seconds` |
| 25 | + for the daemon's measured execution time. |
| 26 | +- `POST /score` — same as `/run` with an optional `probe_names` |
| 27 | + filter; returns just the per-probe entries with no folded |
| 28 | + `SwayScore`. |
| 29 | + |
| 30 | +**Backend cache.** LRU keyed on `(kind, base, adapter, dtype, |
| 31 | +device)`. Capped at `--max-loaded-models`; loading a third distinct |
| 32 | +model LRU-evicts the oldest, calling `backend.close()` to release |
| 33 | +GPU memory. Single-flight: concurrent requests for the same key |
| 34 | +serialize at the loader instead of building twice. |
| 35 | + |
| 36 | +**Python SDK.** `dlm_sway.serve.client.ServeClient(url)` exposes |
| 37 | +`health()`, `stats()`, `run(spec)`, `score(spec, probe_names=...)`. |
| 38 | +Stateless (no persistent connection pool); raises |
| 39 | +`ServeClientError` (subclass of `SwayError`) on transport failure |
| 40 | +or non-2xx responses. |
| 41 | + |
| 42 | +**Auth posture (v1).** Defaults to no auth on loopback — the |
| 43 | +threat model on a single-user dev box is "did I bind 0.0.0.0 by |
| 44 | +accident". The CLI hard-refuses `--host 0.0.0.0` without an API |
| 45 | +key. Full OAuth is deferred. |
| 46 | + |
| 5 | 47 | ### Sprint 33 — `training_drift` probe (cross-repo, reads dlm loss curves) |
| 6 | 48 | |
| 7 | 49 | Closes the X2 "training_drift probe" backlog item. Sister to S25 |