@@ -2,6 +2,48 @@ |
| 2 | | 2 | |
| 3 | ## Unreleased | 3 | ## Unreleased |
| 4 | | 4 | |
| | 5 | +### Sprint 36 — `sway serve` warm-backend daemon |
| | 6 | + |
| | 7 | +Audit 03 H4. `sway run` cold-loads the HF backend each invocation |
| | 8 | +(~15s on a 1.5B model: weights, KV cache, deterministic-mode setup). |
| | 9 | +For interactive flows — notebooks, the `sway watch` retrain loop, |
| | 10 | +the live HTML report — that startup dwarfs the actual scoring |
| | 11 | +cost. `sway serve` is a long-running FastAPI daemon that loads the |
| | 12 | +backend once and keeps it warm across requests. |
| | 13 | + |
| | 14 | +**New CLI (`sway serve`).** Defaults: `--host 127.0.0.1 --port 8787 |
| | 15 | +--max-loaded-models 2`. The daemon refuses to bind a non-loopback |
| | 16 | +interface unless `--api-key <token>` is passed; every non-`/health` |
| | 17 | +request must then carry `Authorization: Bearer <token>`. |
| | 18 | + |
| | 19 | +**Endpoints.** |
| | 20 | + |
| | 21 | +- `GET /health` — uptime + the list of currently-warm models. |
| | 22 | +- `GET /stats` — request count + mean latency + cache size. |
| | 23 | +- `POST /run` — body `{spec: SwaySpec}`; returns the same JSON |
| | 24 | + shape `sway run --json-out` would write, plus `request_seconds` |
| | 25 | + for the daemon's measured execution time. |
| | 26 | +- `POST /score` — same as `/run` with an optional `probe_names` |
| | 27 | + filter; returns just the per-probe entries with no folded |
| | 28 | + `SwayScore`. |
| | 29 | + |
| | 30 | +**Backend cache.** LRU keyed on `(kind, base, adapter, dtype, |
| | 31 | +device)`. Capped at `--max-loaded-models`; loading a third distinct |
| | 32 | +model LRU-evicts the oldest, calling `backend.close()` to release |
| | 33 | +GPU memory. Single-flight: concurrent requests for the same key |
| | 34 | +serialize at the loader instead of building twice. |
| | 35 | + |
| | 36 | +**Python SDK.** `dlm_sway.serve.client.ServeClient(url)` exposes |
| | 37 | +`health()`, `stats()`, `run(spec)`, `score(spec, probe_names=...)`. |
| | 38 | +Stateless (no persistent connection pool); raises |
| | 39 | +`ServeClientError` (subclass of `SwayError`) on transport failure |
| | 40 | +or non-2xx responses. |
| | 41 | + |
| | 42 | +**Auth posture (v1).** Defaults to no auth on loopback — the |
| | 43 | +threat model on a single-user dev box is "did I bind 0.0.0.0 by |
| | 44 | +accident". The CLI hard-refuses `--host 0.0.0.0` without an API |
| | 45 | +key. Full OAuth is deferred. |
| | 46 | + |
| 5 | ### Sprint 33 — `training_drift` probe (cross-repo, reads dlm loss curves) | 47 | ### Sprint 33 — `training_drift` probe (cross-repo, reads dlm loss curves) |
| 6 | | 48 | |
| 7 | Closes the X2 "training_drift probe" backlog item. Sister to S25 | 49 | Closes the X2 "training_drift probe" backlog item. Sister to S25 |