@@ -0,0 +1,163 @@ |
| 1 | +# Sprint 19: Facade Finalization, Continuation Hardening, and Unified Policy Timeline |
| 2 | + |
| 3 | +## Prerequisites |
| 4 | + |
| 5 | +Sprint 18 |
| 6 | + |
| 7 | +## Goals |
| 8 | + |
| 9 | +Take the next honest contraction after Sprint 18: finish thinning the public shell where it still reads like runtime glue, harden the last continuation heuristics that can still feel like soft rescue behavior, and unify the scattered policy/debug surfaces into one clearer operator story. |
| 10 | + |
| 11 | +Sprint 18 changed the shape of the remaining debt in a useful way: |
| 12 | + |
| 13 | +- completion policy is now explicit, persisted, and inspectable |
| 14 | +- public-shell helpers now own steering, session install/load, event-emitter normalization, and capability-refresh decisions |
| 15 | +- `src/loader/agent/loop.py` is materially thinner |
| 16 | +- but `Agent` still owns the public entrypoints and launcher glue |
| 17 | +- completion traces and workflow traces now both exist, but they are still separate operator surfaces |
| 18 | +- Loader still keeps bounded continuation nudges for some non-mutating tasks, and those nudges are explicit now but not yet deeply justified |
| 19 | + |
| 20 | +Sprint 19 should stay reference-guided, not reference-submissive. |
| 21 | + |
| 22 | +The standard remains: |
| 23 | + |
| 24 | +- use claw-code to sharpen runtime seams, policy ownership, and explicit lifecycle contracts |
| 25 | +- use OMX to sharpen follow-through, verifier pressure, and operator-facing runtime accountability |
| 26 | +- do not add a feature just because the refs have it |
| 27 | +- do pursue changes when the refs reveal that Loader is still too soft, too implicit, or too hard to audit |
| 28 | + |
| 29 | +`audit.txt` is still not the roadmap. It is useful only as a guardrail against sliding back into wrapper-heavy cleanup and soft model rescue behavior. |
| 30 | + |
| 31 | +The references for this sprint are: |
| 32 | + |
| 33 | +- `refs/claw-code/rust/crates/runtime/src/conversation.rs` |
| 34 | +- `refs/claw-code/rust/crates/runtime/src/bootstrap.rs` |
| 35 | +- `refs/claw-code/rust/crates/runtime/src/session_control.rs` |
| 36 | +- `refs/claw-code/rust/crates/runtime/src/lane_events.rs` |
| 37 | +- `refs/claw-code/rust/crates/runtime/src/policy_engine.rs` |
| 38 | +- `refs/claw-code/rust/crates/runtime/src/green_contract.rs` |
| 39 | +- `refs/claw-code/rust/crates/runtime/src/prompt.rs` |
| 40 | +- `refs/claw-code/PARITY.md` |
| 41 | +- `.docs/PARITY.md` |
| 42 | +- `.docs/audit.txt` |
| 43 | +- `.docs/audit_sprints/trunk_sitrep.md` |
| 44 | +- `.docs/sprints/sprint18.md` |
| 45 | +- `refs/oh-my-codex/src/autoresearch/contracts.ts` |
| 46 | +- `refs/oh-my-codex/src/autoresearch/runtime.ts` |
| 47 | +- `refs/oh-my-codex/src/verification/verifier.ts` |
| 48 | +- `refs/oh-my-codex/src/hooks/session.ts` |
| 49 | +- `refs/oh-my-codex/src/hooks/prompt-guidance-contract.ts` |
| 50 | + |
| 51 | +## Deliverables |
| 52 | + |
| 53 | +### 1. Finish the next public-shell contraction below `Agent` |
| 54 | + |
| 55 | +Sprint 18 moved more shell behavior into `src/loader/runtime/public_shell.py`, but `Agent` still owns the public entrypoint wrappers and some launch-time glue. |
| 56 | + |
| 57 | +Implementation targets: |
| 58 | + |
| 59 | +- inventory what remains in `src/loader/agent/loop.py` that still feels like runtime/public-shell plumbing instead of true public API ownership, especially: |
| 60 | + - run / run_streaming / run_explore event-wrapper glue |
| 61 | + - resume / clear lifecycle orchestration |
| 62 | + - launcher construction and runtime-source preparation |
| 63 | + - capability-refresh and prompt invalidation wiring |
| 64 | +- move what is reusable into runtime-owned helpers or a tighter launcher/public-shell seam |
| 65 | +- keep `Agent` focused on: |
| 66 | + - public API shape |
| 67 | + - compatibility-facing attributes |
| 68 | + - minimal UI-facing integration points |
| 69 | + |
| 70 | +The goal is not “delete `Agent`.” The goal is for `Agent` to read like an intentionally tiny facade instead of a convenient place for leftover runtime glue. |
| 71 | + |
| 72 | +### 2. Harden the remaining continuation contract |
| 73 | + |
| 74 | +Sprint 18 made continuation behavior visible. Sprint 19 should decide which of that behavior is still too soft. |
| 75 | + |
| 76 | +Implementation targets: |
| 77 | + |
| 78 | +- inventory the remaining continuation behavior across: |
| 79 | + - `src/loader/runtime/completion_policy.py` |
| 80 | + - `src/loader/runtime/turn_completion.py` |
| 81 | + - `src/loader/runtime/assistant_turns.py` |
| 82 | + - any nearby repair/finalization controller that can still nudge rather than stop |
| 83 | +- identify which continuation cases are still justified by explicit runtime evidence versus merely tolerated by textual heuristics |
| 84 | +- prefer: |
| 85 | + - deletion |
| 86 | + - a stricter typed stop/fail state |
| 87 | + - explicit follow-through requirements derived from runtime artifacts or session state |
| 88 | + over keeping broad “continue once more” behavior |
| 89 | +- where a continuation path remains, make the required evidence explicit and persisted |
| 90 | + |
| 91 | +The goal is to keep following the Sprint 13 / Sprint 17 / Sprint 18 line: the runtime should proceed for a clear typed reason or stop honestly, not continue because the model “probably meant well.” |
| 92 | + |
| 93 | +### 3. Unify completion, workflow, and repair accountability into one operator-facing timeline |
| 94 | + |
| 95 | +Loader now has workflow timeline entries and a separate completion trace. That is better than hidden state, but still fragmented. |
| 96 | + |
| 97 | +Implementation targets: |
| 98 | + |
| 99 | +- define a compact unified policy timeline or policy event model that can carry: |
| 100 | + - workflow routing/handoff decisions |
| 101 | + - completion-policy outcomes |
| 102 | + - repair / retry / recovery decisions |
| 103 | + - terminal stop reasons |
| 104 | +- decide whether the existing workflow timeline should absorb completion/repair events or whether a sibling policy timeline is the cleaner contract |
| 105 | +- persist enough of that state to survive resume and make post-mortem inspection more useful |
| 106 | +- surface it through existing product seams, likely one of: |
| 107 | + - `loader workflow show` |
| 108 | + - `loader session show` |
| 109 | + - a narrowly-scoped new policy-focused surface if and only if it is cleaner than overloading the workflow view |
| 110 | + |
| 111 | +The goal is that operators can answer “why did Loader keep going, stop, retry, or accept this result?” from one coherent surface instead of stitching together multiple tables by hand. |
| 112 | + |
| 113 | +### 4. Keep the ref relationship explicit and healthy |
| 114 | + |
| 115 | +Implementation targets: |
| 116 | + |
| 117 | +- use claw-code for: |
| 118 | + - lane-event shape |
| 119 | + - session/runtime control seams |
| 120 | + - explicit policy and green-contract ownership |
| 121 | +- use OMX for: |
| 122 | + - verifier/follow-through accountability |
| 123 | + - session/runtime operator clarity |
| 124 | + - stronger prompt/runtime contract thinking |
| 125 | +- do not add work just because the refs have it |
| 126 | +- do add work when the refs reveal a real Loader weakness in: |
| 127 | + - honesty |
| 128 | + - inspectability |
| 129 | + - shell minimalism |
| 130 | + - follow-through |
| 131 | + |
| 132 | +The goal is to keep Loader reference-guided and self-aware, not to drift into either blind feature copying or isolated local optimization. |
| 133 | + |
| 134 | +## Testing strategy |
| 135 | + |
| 136 | +- unit coverage for: |
| 137 | + - any new public-shell or launcher helper that further reduces `Agent` ownership |
| 138 | + - tightened continuation/terminal-stop decisions |
| 139 | + - unified policy timeline serialization and restoration |
| 140 | +- runtime coverage for: |
| 141 | + - no regression in normal follow-through on non-mutating and mutating tasks |
| 142 | + - honest terminal behavior where continuation heuristics were deleted or narrowed |
| 143 | + - session/workflow inspection of the unified policy/debug story |
| 144 | +- regression coverage for: |
| 145 | + - no drift back toward `agent/loop.py` owning extracted shell glue |
| 146 | + - no silent reintroduction of soft continuation phrasing after harder stop conditions |
| 147 | + - no loss of the current workflow/completion/explore inspection surfaces while timelines are unified |
| 148 | + |
| 149 | +## Definition of done |
| 150 | + |
| 151 | +- `agent/loop.py` shrinks again or becomes materially more facade-like even if line count only drops modestly |
| 152 | +- Loader deletes or hardens more of the remaining continuation heuristics instead of merely explaining them better |
| 153 | +- operators can inspect one more coherent runtime policy story after the fact |
| 154 | +- Sprint 18’s completion-trace and public-shell gains remain green |
| 155 | +- the parity baseline remains green after the Sprint 19 shell and policy tightening |
| 156 | + |
| 157 | +## Explicitly out of scope |
| 158 | + |
| 159 | +- full claw-code policy-engine parity |
| 160 | +- multi-agent or team orchestration |
| 161 | +- AST-aware semantic diffs |
| 162 | +- a broad visual workflow UI |
| 163 | +- rich permission-rule editing UX |