@@ -0,0 +1,158 @@ |
| 1 | +# Sprint 23: Runtime-First Integrations, Verification Producers, and Facade Narrowing |
| 2 | + |
| 3 | +## Prerequisites |
| 4 | + |
| 5 | +Sprint 22 |
| 6 | + |
| 7 | +## Goals |
| 8 | + |
| 9 | +Take the next honest step after Sprint 22: stop treating the runtime-first owner as mostly a testing seam, move more real integrations onto that seam where `Agent` is only habit, and widen the new verification-observation contract from a finalization/result story into a more direct execution-time producer story. |
| 10 | + |
| 11 | +Sprint 22 improved the remaining debt in a useful way: |
| 12 | + |
| 13 | +- Loader now carries typed verification observations through canonical policy events and completion-stop decisions |
| 14 | +- the operator surfaces can now explain recent verification from canonical policy observations first instead of stitching together post-hoc summaries |
| 15 | +- the canonical workflow timeline is stronger as the accountability story |
| 16 | +- but the planned runtime-first entry promotion beyond tests did not land |
| 17 | +- `Agent` still remains the default construction seam for many real internal paths even though Loader now has a runtime-owned internal handle and explicit public facade boundaries |
| 18 | +- verification observations are still strongest at the DoD/finalization edge; Loader still captures less from the earlier verification lifecycle than it now knows how to represent |
| 19 | + |
| 20 | +Sprint 23 should keep using the references as architectural guardrails, not as a feature-copy list. |
| 21 | + |
| 22 | +The standard remains: |
| 23 | + |
| 24 | +- use claw-code to sharpen runtime-first bootstrap/session ownership, event capture, and narrower public boundaries |
| 25 | +- use OMX to sharpen direct verifier-observation capture and evidence-backed accountability |
| 26 | +- do not add work just because the refs have it |
| 27 | +- do add work when the refs show that Loader is still too shell-bound, too late in its evidence capture, or too fuzzy about which boundary owns what |
| 28 | + |
| 29 | +`audit.txt` remains a guardrail against wrapper-heavy drift and soft compatibility habits. It is not the factual roadmap. |
| 30 | + |
| 31 | +The references for this sprint are: |
| 32 | + |
| 33 | +- `refs/claw-code/rust/crates/runtime/src/bootstrap.rs` |
| 34 | +- `refs/claw-code/rust/crates/runtime/src/session_control.rs` |
| 35 | +- `refs/claw-code/rust/crates/runtime/src/conversation.rs` |
| 36 | +- `refs/claw-code/rust/crates/runtime/src/policy_engine.rs` |
| 37 | +- `refs/claw-code/rust/crates/runtime/src/lane_events.rs` |
| 38 | +- `refs/claw-code/rust/crates/runtime/src/green_contract.rs` |
| 39 | +- `refs/claw-code/PARITY.md` |
| 40 | +- `refs/oh-my-codex/src/verification/verifier.ts` |
| 41 | +- `refs/oh-my-codex/src/autoresearch/contracts.ts` |
| 42 | +- `refs/oh-my-codex/src/autoresearch/runtime.ts` |
| 43 | +- `refs/oh-my-codex/src/hooks/session.ts` |
| 44 | +- `.docs/PARITY.md` |
| 45 | +- `.docs/audit.txt` |
| 46 | +- `.docs/audit_sprints/trunk_sitrep.md` |
| 47 | +- `.docs/sprints/sprint22.md` |
| 48 | + |
| 49 | +## Deliverables |
| 50 | + |
| 51 | +### 1. Promote the runtime-first entry contract into real internal integrations |
| 52 | + |
| 53 | +Sprint 22 left this as explicit debt. Sprint 23 should land it for real. |
| 54 | + |
| 55 | +Implementation targets: |
| 56 | + |
| 57 | +- inventory internal call sites that still instantiate or route through `Agent` by default even though they are consuming runtime-owned behavior, especially around: |
| 58 | + - launcher/bootstrap helpers |
| 59 | + - CLI/TUI integration seams that are not actually testing the public compatibility contract |
| 60 | + - harnesses, utilities, and inspection/session helpers that only need runtime ownership |
| 61 | +- define or refine a small runtime-first entry contract for internal consumers that need: |
| 62 | + - runtime bootstrap/session ownership |
| 63 | + - launcher/public-shell execution |
| 64 | + - inspection, continuity, or workflow/accountability hooks |
| 65 | +- migrate a bounded but real set of internal integrations onto that seam |
| 66 | +- explicitly document what remains intentionally public-shell-only versus what is now runtime-first by default |
| 67 | + |
| 68 | +The goal is not to delete `Agent`. The goal is to stop using it internally by reflex where a runtime-owned seam is cleaner and already exists. |
| 69 | + |
| 70 | +### 2. Expand verification observations to earlier execution-time producers |
| 71 | + |
| 72 | +Sprint 22 made verification observations real, but they still enter the story relatively late. |
| 73 | + |
| 74 | +Implementation targets: |
| 75 | + |
| 76 | +- inventory where verification-related facts are currently available earlier than finalization across: |
| 77 | + - `src/loader/runtime/dod.py` |
| 78 | + - `src/loader/runtime/finalization.py` |
| 79 | + - `src/loader/runtime/tool_batches.py` |
| 80 | + - `src/loader/runtime/executor.py` |
| 81 | + - `src/loader/runtime/workflow_lanes.py` |
| 82 | + - `src/loader/runtime/workflow_policy.py` |
| 83 | +- identify which observation kinds should be emitted closer to execution, such as: |
| 84 | + - verification command planned/requested |
| 85 | + - verification command actually executed |
| 86 | + - verification output observed and classified as passed/failed/contradictory |
| 87 | + - verification was intentionally skipped, stale, or still pending |
| 88 | + - observed artifact/touchpoint evidence materially backed or blocked completion before finalization |
| 89 | +- thread those earlier observations into the canonical policy story without creating a peer truth beside the workflow timeline |
| 90 | + |
| 91 | +The goal is to move Loader’s accountability closer to “this is what we observed while verification happened” instead of only “this is what the runtime concluded later.” |
| 92 | + |
| 93 | +### 3. Narrow the remaining public facade boundary on purpose |
| 94 | + |
| 95 | +Sprint 20 settled `Agent` as the public shell. Sprint 23 should keep making that shell smaller and more explicit. |
| 96 | + |
| 97 | +Implementation targets: |
| 98 | + |
| 99 | +- inventory what still lives in `src/loader/agent/loop.py` and nearby public-shell glue |
| 100 | +- identify which pieces are: |
| 101 | + - true public compatibility API |
| 102 | + - UI integration seam |
| 103 | + - leftover runtime ownership that can move below the shell now |
| 104 | +- move the still-obviously-runtime pieces below the public shell where that reduces ambiguity |
| 105 | +- add or extend direct boundary tests so internal code does not drift back toward `Agent` ownership once a runtime seam exists |
| 106 | + |
| 107 | +The goal is not “make the file smaller” for its own sake. The goal is that future work has a clearer answer to what the public shell is for. |
| 108 | + |
| 109 | +### 4. Sharpen operator visibility for runtime-first ownership and observed verification |
| 110 | + |
| 111 | +Once more of the real integration paths go runtime-first and more observations are captured earlier, the existing product surfaces should make that easier to audit. |
| 112 | + |
| 113 | +Implementation targets: |
| 114 | + |
| 115 | +- improve the existing surfaces so users can answer: |
| 116 | + - which runtime-owned path produced the current session/accountability state? |
| 117 | + - what verification was actually observed earlier in the turn versus only concluded at finalization? |
| 118 | + - what evidence is pending, contradicted, or already satisfied? |
| 119 | +- prefer improving: |
| 120 | + - `loader status` |
| 121 | + - `loader session show` |
| 122 | + - `loader workflow show` |
| 123 | + over inventing a new command unless a new surface is clearly cleaner |
| 124 | +- keep concise rollups first, and expose deeper ownership/observation detail only where it materially improves post-mortem debugging |
| 125 | + |
| 126 | +The goal is to make Loader easier to audit after the fact, not simply more verbose. |
| 127 | + |
| 128 | +## Testing strategy |
| 129 | + |
| 130 | +- unit coverage for: |
| 131 | + - runtime-first entry helpers adopted in real internal integration paths |
| 132 | + - earlier verification-observation producer normalization and persistence |
| 133 | + - public-shell boundary helpers and import/boundary guards |
| 134 | +- runtime coverage for: |
| 135 | + - successful completion with observed verification facts emitted before finalization |
| 136 | + - failed or pending verification that now leaves a clearer producer-backed policy trail |
| 137 | + - internal integration paths that no longer need `Agent` by default |
| 138 | +- regression coverage for: |
| 139 | + - no drift back toward `Agent` as the default internal seam when a runtime-owned seam exists |
| 140 | + - no duplicate truth beside the canonical policy/accountability story |
| 141 | + - no regression in Sprint 22’s observed-verification inspection and stop/continue honesty |
| 142 | + |
| 143 | +## Definition of done |
| 144 | + |
| 145 | +- Loader has at least one more real internal integration path using a runtime-first entry seam below `Agent` |
| 146 | +- verification observations are emitted from at least one earlier execution-time producer instead of only the finalization edge |
| 147 | +- the remaining public-shell boundary is smaller or more explicitly defended on purpose |
| 148 | +- existing status/session/workflow surfaces expose the stronger runtime-first and verification-observation story without multiplying commands |
| 149 | +- Sprint 22’s observed-verification and accountability gains remain green |
| 150 | + |
| 151 | +## Explicitly out of scope |
| 152 | + |
| 153 | +- deleting `Agent` as the public compatibility surface |
| 154 | +- full claw-code policy-engine parity |
| 155 | +- model-authored verifier narratives as a required runtime dependency |
| 156 | +- AST-aware semantic diffs |
| 157 | +- a broad visual workflow UI |
| 158 | +- multi-agent or team orchestration |