tenseleyflow/loader / d9a710a

Browse files

Audit Sprint 23 runtime-first rollout

Authored by espadonne
SHA
d9a710ab2e3781440f73c21849ba9b3196ea851f
Parents
c906221
Tree
0ee2563

2 changed files

StatusFile+-
M .docs/PARITY.md 9 3
M .docs/sprints/sprint23.md 25 0
.docs/PARITY.mdmodified
@@ -2,7 +2,7 @@
22
 
33
 Date: 2026-04-09
44
 
5
-Deterministic baseline: `uv run pytest -q` → `388 passed`
5
+Deterministic baseline: `uv run pytest -q` → `397 passed`
66
 
77
 This file tracks the current deterministic runtime baseline for Loader. It stays intentionally narrow and operational: what the runtime can do today, what remains weak, and what scenarios we measure with repeatable tests.
88
 
@@ -87,10 +87,13 @@ This file tracks the current deterministic runtime baseline for Loader. It stays
8787
 - completion stop/continue policy now cites observed verification facts when available, and exhausted continuation failures preserve those observed verification results in the canonical accountability story instead of only reporting generic missing evidence
8888
 - `loader status`, `loader session show`, and `loader workflow show` now surface observed verification directly, and `Recent Verification` is unified from canonical policy observations first with DoD evidence only as a fallback
8989
 - Loader now has a runtime-owned internal execution handle in `src/loader/runtime/runtime_handle.py`, and runtime-oriented launcher/bootstrap/public-shell tests no longer need to treat `Agent` as the only valid runtime owner
90
+- non-TUI CLI paths, `loader explore`, and the scripted runtime harness now default to the runtime-first owner seam below `Agent`, so Loader uses `RuntimeHandle` in real internal integrations instead of reserving it for tests
91
+- persisted session state now records the active runtime-owner path, and `loader status`, `loader session list/show`, and `loader workflow show` surface that runtime-owner provenance directly
92
+- verification now emits per-command `verify_observation` events into the canonical workflow timeline while the verification loop is running, and workflow/policy read models project those entries as first-class accountability state
9093
 
9194
 ## Known weak spots
9295
 
93
-- the public runtime boundary is now explicit and runtime-shaped, and Loader now also has a runtime-first internal owner, but `Agent` still constructs and supplies the main public boundary instead of collapsing to a narrower runtime-first external API; Sprint 22 also did not yet promote additional real integration paths to that runtime-first seam
96
+- the public runtime boundary is now explicit and runtime-shaped, and Loader now also has real runtime-first internal integrations through `RuntimeHandle`, but `Agent` still constructs and supplies the main public boundary and the TUI still routes through that public shell instead of a narrower runtime-first external API
9497
 - [`src/loader/agent/loop.py`](../src/loader/agent/loop.py) is down to 267 lines and much closer to a public facade than the pre-Sprint-15 shell, but it still owns the compatibility shell and remaining launcher/UI glue instead of disappearing entirely
9598
 - [`src/loader/agent/reasoning.py`](../src/loader/agent/reasoning.py) and [`src/loader/agent/safeguards.py`](../src/loader/agent/safeguards.py) are now compatibility shims rather than primary implementations, but they still remain as export layers until Loader narrows its external compatibility surface further
9699
 - [`src/loader/runtime/tool_batches.py`](../src/loader/runtime/tool_batches.py) and parts of [`src/loader/runtime/workflow_lanes.py`](../src/loader/runtime/workflow_lanes.py) are narrower and more directly tested than before, but they still carry more heuristic policy than the tightest reference seams in `refs/claw-code`
@@ -160,7 +163,7 @@ The auditable manifest lives at [`tests/fixtures/runtime_parity_manifest.json`](
160163
 
161164
 As of 2026-04-09:
162165
 
163
-- `uv run pytest -q`: 380 passed
166
+- `uv run pytest -q`: 397 passed
164167
 - `tests/test_runtime_harness.py` is fully green, including permission-mode parity, DoD verify/fix coverage, workflow routing parity, and the original contract regression
165168
 - `tests/test_prompt_builder.py` covers section rendering, native-vs-ReAct formatting, and prompt metadata persistence
166169
 - `tests/test_turn_state_machine.py` covers allowed/disallowed turn transitions and terminal transition metadata
@@ -197,11 +200,13 @@ As of 2026-04-09:
197200
 - `tests/test_verification_observations.py` covers typed verification-observation serialization and normalization
198201
 - `tests/test_workflow_timeline_read_model.py` covers grouped supporting/missing policy-evidence rollups, latest-policy derivation, and observed-verification read models from the canonical workflow timeline
199202
 - `tests/test_runtime_handle.py` covers the runtime-owned internal handle below `Agent`, including direct launcher/context/runtime construction without depending on the public compatibility facade
203
+- `tests/test_cli_runtime_owner.py` covers runtime-first CLI owner selection for non-TUI and explore paths
200204
 - `tests/test_explore_runtime.py` covers the direct explore lane contract, forced read-only behavior, persisted follow-up continuity, persisted `fresh` vs `continue` visibility, and `fresh` explore resets outside the parity harness
201205
 - `tests/test_expanded_tools.py` covers structured patch application, read-only git helpers, `notepad_append`, and richer structured user questions
202206
 - `tests/test_permissions.py` covers prompt/allow mode parsing, rule precedence, policy-backed prompting behavior, and hook lifecycle ordering
203207
 - `tests/test_tool_safety.py` covers workspace boundaries, binary/oversize guards, patch metadata, and shell truncation/classification
204208
 - `tests/test_status_surfaces.py` covers the CLI/TUI DoD, workflow-mode, permission-mode, capability-profile, and session-id formatting helpers
209
+- `tests/test_runtime_public_shell.py`, `tests/test_session_state.py`, and `tests/test_inspection.py` now also cover persisted runtime-owner metadata plus its status/session/workflow rendering
205210
 - native and extracted tool calls now record the same executor trace events, with source-specific metadata
206211
 - turn startup can refine backend capability profiles before the first request, `run_streaming()` delegates into the main runtime path, mutating tasks route through persisted evidence-backed completion, workflow artifacts and workflow-ledger state survive across turns, sessions compact safely, explore queries bypass DoD/router overhead safely, policy rules are enforced deterministically, operators can inspect/dry-run policy decisions without live turns, prompt construction is sectioned and persisted, prompt snapshots and artifact diffs are inspectable after the fact, explicit turn phases are visible while a turn runs, session inspection preserves effective policy state, typed workflow signals now feed routing directly, semantic invalidation can force targeted refresh vs full re-plan, brownfield clarify can ask evidence-backed questions from repo facts, and the turn runtime now avoids the older synthetic repair/no-tool puppeting while routing assistant outcomes through dedicated controllers instead of a single conversation-loop monolith
207212
 
@@ -231,3 +236,4 @@ As of 2026-04-09:
231236
 - Sprint 20 is complete: Loader now treats the workflow timeline as the canonical policy/accountability artifact even for live completion-trace projection, grounds more follow-through decisions in DoD verification state and tracked runtime evidence, exposes latest-policy rollups in the existing status/session surfaces, and explicitly settles the remaining `Agent` shell as a documented public facade guarded by boundary tests, but it still stops short of claw-code's fuller policy engine, a narrower runtime-first external API, and OMX's deeper verifier/interview rigor.
232237
 - Sprint 21 is complete: Loader now carries typed evidence provenance through canonical policy events, derives grouped policy-evidence rollups from one shared workflow-timeline read model, exposes “needed” vs “satisfied” evidence in `loader status` / `loader session show` / `loader workflow show`, and provides a runtime-owned internal handle so runtime-oriented code and tests no longer need to treat `Agent` as the only valid execution owner, but it still stops short of claw-code's fuller policy engine, a narrower runtime-first external API, and OMX's deeper verifier/interview rigor.
233238
 - Sprint 22 is complete on the verification-observation lane: Loader now captures typed verification observations closer to execution, carries those observations through canonical policy events and completion-stop decisions, and surfaces observed verification plus a unified `Recent Verification` view in `loader status` / `loader session show` / `loader workflow show`, but the planned runtime-first entry promotion beyond tests did not land and rolls forward as Sprint 23 debt alongside Loader's remaining gap to claw-code's fuller policy engine and OMX's deeper verifier/interview rigor.
239
+- Sprint 23 is complete: Loader now uses the runtime-first seam in real internal integrations through `RuntimeHandle`, emits per-command `verify_observation` events while the verification loop runs, and surfaces persisted runtime-owner provenance in the existing operator views, but it still stops short of a narrower runtime-first public API, TUI migration away from the public shell, claw-code's fuller policy engine, and OMX's deeper verifier/interview rigor.
.docs/sprints/sprint23.mdmodified
@@ -156,3 +156,28 @@ The goal is to make Loader easier to audit after the fact, not simply more verbo
156156
 - AST-aware semantic diffs
157157
 - a broad visual workflow UI
158158
 - multi-agent or team orchestration
159
+
160
+## Audit
161
+
162
+### Status
163
+
164
+- Sprint 23 is complete, and the audit is green. Loader now uses the runtime-first seam in real internal integrations, captures verification observations closer to the moment verification runs, and exposes runtime-owner provenance in the same operator surfaces that already carry policy and workflow accountability.
165
+
166
+### Landed
167
+
168
+- runtime-first ownership is now materially real outside tests: `src/loader/runtime/runtime_handle.py` now owns direct `run` / `run_streaming` / `run_explore` entrypoints, `src/loader/cli/main.py` routes non-TUI CLI and `loader explore` through that runtime-first owner by default, and `tests/helpers/runtime_harness.py` now uses `RuntimeHandle` for scripted runtime scenarios instead of instantiating `Agent` by habit
169
+- verification observations now enter the canonical accountability story closer to execution: `src/loader/runtime/finalization.py`, `src/loader/runtime/workflow_policy.py`, `src/loader/runtime/policy_timeline.py`, and `src/loader/runtime/workflow_timeline_read_model.py` now persist and project per-command `verify_observation` entries, so Loader can explain what verification actually ran and what it observed instead of only summarizing that state later
170
+- runtime-owner provenance is now part of persisted session state and inspection: `src/loader/runtime/owner_metadata.py`, `src/loader/runtime/bootstrap.py`, `src/loader/runtime/public_shell.py`, and `src/loader/runtime/session.py` now persist owner-path metadata, while `src/loader/runtime/inspection.py` and `src/loader/cli/main.py` surface that metadata in `loader status`, `loader session list/show`, and `loader workflow show`
171
+
172
+### Verification
173
+
174
+- `uv run pytest -q` is green: `397 passed`
175
+- `tests/test_runtime_handle.py`, `tests/test_cli_runtime_owner.py`, and `tests/helpers/runtime_harness.py` now pin real runtime-first integration paths below `Agent`
176
+- `tests/test_finalization.py` and `tests/test_workflow_timeline_read_model.py` now pin per-command verification-observation entries and their projection into workflow/policy views
177
+- `tests/test_session_state.py`, `tests/test_runtime_public_shell.py`, `tests/test_runtime_bootstrap.py`, `tests/test_runtime_launcher.py`, and `tests/test_inspection.py` now cover persisted runtime-owner metadata plus its status/session/workflow rendering
178
+
179
+### Residual debt
180
+
181
+- Loader now has real runtime-first internal integrations, but the TUI still routes through the public `Agent` facade and the public shell still remains the outermost construction contract for external integrations
182
+- verification observations are now closer to execution, but they are still strongest around the verification loop/finalization path; Loader still does not yet emit a richer lifecycle story for planned, pending, or stale verification outside that bounded lane
183
+- the new owner-path visibility makes runtime-first adoption auditable, but Loader still stops short of a narrower public runtime API, claw-code's fuller policy engine, and OMX's deeper verifier/interview rigor