tenseleyflow/sway / 1a9290f

Browse files

CHANGELOG: Sprint 36 sway serve daemon

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
1a9290f59f7ed8b48175a15ce0adbb3846faefe4
Parents
54b27e2
Tree
6d43237

1 changed file

StatusFile+-
M CHANGELOG.md 42 0
CHANGELOG.mdmodified
@@ -2,6 +2,48 @@
22
 
33
 ## Unreleased
44
 
5
+### Sprint 36 — `sway serve` warm-backend daemon
6
+
7
+Audit 03 H4. `sway run` cold-loads the HF backend each invocation
8
+(~15s on a 1.5B model: weights, KV cache, deterministic-mode setup).
9
+For interactive flows — notebooks, the `sway watch` retrain loop,
10
+the live HTML report — that startup dwarfs the actual scoring
11
+cost. `sway serve` is a long-running FastAPI daemon that loads the
12
+backend once and keeps it warm across requests.
13
+
14
+**New CLI (`sway serve`).** Defaults: `--host 127.0.0.1 --port 8787
15
+--max-loaded-models 2`. The daemon refuses to bind a non-loopback
16
+interface unless `--api-key <token>` is passed; every non-`/health`
17
+request must then carry `Authorization: Bearer <token>`.
18
+
19
+**Endpoints.**
20
+
21
+- `GET /health` — uptime + the list of currently-warm models.
22
+- `GET /stats` — request count + mean latency + cache size.
23
+- `POST /run` — body `{spec: SwaySpec}`; returns the same JSON
24
+  shape `sway run --json-out` would write, plus `request_seconds`
25
+  for the daemon's measured execution time.
26
+- `POST /score` — same as `/run` with an optional `probe_names`
27
+  filter; returns just the per-probe entries with no folded
28
+  `SwayScore`.
29
+
30
+**Backend cache.** LRU keyed on `(kind, base, adapter, dtype,
31
+device)`. Capped at `--max-loaded-models`; loading a third distinct
32
+model LRU-evicts the oldest, calling `backend.close()` to release
33
+GPU memory. Single-flight: concurrent requests for the same key
34
+serialize at the loader instead of building twice.
35
+
36
+**Python SDK.** `dlm_sway.serve.client.ServeClient(url)` exposes
37
+`health()`, `stats()`, `run(spec)`, `score(spec, probe_names=...)`.
38
+Stateless (no persistent connection pool); raises
39
+`ServeClientError` (subclass of `SwayError`) on transport failure
40
+or non-2xx responses.
41
+
42
+**Auth posture (v1).** Defaults to no auth on loopback — the
43
+threat model on a single-user dev box is "did I bind 0.0.0.0 by
44
+accident". The CLI hard-refuses `--host 0.0.0.0` without an API
45
+key. Full OAuth is deferred.
46
+
547
 ### Sprint 33 — `training_drift` probe (cross-repo, reads dlm loss curves)
648
 
749
 Closes the X2 "training_drift probe" backlog item. Sister to S25