sway Public

Watch 0 Fork 0 Star 0

Python · 1229 bytes Raw Blame History

  
        1
        """``sway serve`` daemon: warm-backend HTTP API for iterative workflows.
      
        2
        
        3
        Loading the HF backend takes 15s cold (model + adapter weights, KV cache
      
        4
        allocation, deterministic-mode setup). For interactive flows — notebook
      
        5
        exploration, the S34 ``sway watch`` loop, the S29 live HTML report —
      
        6
        that 15s startup is the dominant cost on every run.
      
        7
        
        8
        This package exposes ``sway serve`` as a long-running daemon that loads
      
        9
        the backend once and serves a small HTTP API. First call: ~15s cold.
      
        10
        Every subsequent call against the same model: ~2s warm. Five-to-ten-X
      
        11
        DX win for users who iterate.
      
        12
        
        13
        The package is gated behind the ``[serve]`` extra (FastAPI + uvicorn)
      
        14
        so users who only run one-shot ``sway run`` invocations don't pull
      
        15
        the daemon dependencies.
      
        16
        
        17
        Public surface:
      
        18
        
        19
        - :class:`dlm_sway.serve.client.ServeClient` — Python SDK for
      
        20
          notebooks; one-liner ``ServeClient(url).run(spec)``.
      
        21
        - :func:`dlm_sway.serve.app.create_app` — FastAPI app factory used by
      
        22
          the CLI's uvicorn launcher and unit tests' ``TestClient``.
      
        23
        - :class:`dlm_sway.serve.cache.BackendCache` — LRU backend cache the
      
        24
          app uses to keep multiple loaded models warm; capped via the
      
        25
          ``--max-loaded-models`` CLI flag.
      
        26
        """
      
        27
        
        28
        from __future__ import annotations

1	"""``sway serve`` daemon: warm-backend HTTP API for iterative workflows.
2
3	Loading the HF backend takes 15s cold (model + adapter weights, KV cache
4	allocation, deterministic-mode setup). For interactive flows — notebook
5	exploration, the S34 ``sway watch`` loop, the S29 live HTML report —
6	that 15s startup is the dominant cost on every run.
7
8	This package exposes ``sway serve`` as a long-running daemon that loads
9	the backend once and serves a small HTTP API. First call: ~15s cold.
10	Every subsequent call against the same model: ~2s warm. Five-to-ten-X
11	DX win for users who iterate.
12
13	The package is gated behind the ``[serve]`` extra (FastAPI + uvicorn)
14	so users who only run one-shot ``sway run`` invocations don't pull
15	the daemon dependencies.
16
17	Public surface:
18
19	- :class:`dlm_sway.serve.client.ServeClient` — Python SDK for
20	notebooks; one-liner ``ServeClient(url).run(spec)``.
21	- :func:`dlm_sway.serve.app.create_app` — FastAPI app factory used by
22	the CLI's uvicorn launcher and unit tests' ``TestClient``.
23	- :class:`dlm_sway.serve.cache.BackendCache` — LRU backend cache the
24	app uses to keep multiple loaded models warm; capped via the
25	``--max-loaded-models`` CLI flag.
26	"""
27
28	from __future__ import annotations