@@ -0,0 +1,170 @@ |
| 1 | +# Sprint 13: Turn Policy Narrowing, Assumption Ledger, and Artifact Diffs |
| 2 | + |
| 3 | +## Prerequisites |
| 4 | + |
| 5 | +Sprint 12 |
| 6 | + |
| 7 | +## Goals |
| 8 | + |
| 9 | +Turn Loader's newly controllerized runtime into a more semantically explicit workflow system by shrinking the still-heavy `turn_iteration` seam, promoting assumptions and contradictions into first-class workflow state, and giving operators diff-oriented artifact visibility instead of only latest-state inspection. |
| 10 | + |
| 11 | +Sprint 12 was a real structural win. Loader now has pressure-pass clarify, codebase-backed grounding, structured recovery evidence, and a controller-shaped runtime shell. That meaningfully closes the gap with claw-code and OMX. The audit is also honest about what still hurts: |
| 12 | + |
| 13 | +- `turn_iteration.py` is still carrying a lot of repair, tool-routing, and completion policy in one seam |
| 14 | +- contradiction and invalidation evidence are richer than before, but they are still mostly runtime-authored summaries rather than a reusable semantic ledger |
| 15 | +- operator surfaces can explain "why did this happen?" better than before, but they still cannot show "what changed?" across briefs, plans, verification, or prompt contracts |
| 16 | +- Loader now has better workflow discipline, but it still lacks some of the day-two operator ergonomics that make claw-code and OMX easier to trust during long tasks |
| 17 | + |
| 18 | +The next leverage point is to stop treating semantic drift and operator visibility as one-off summaries and start treating them as durable contracts: |
| 19 | + |
| 20 | +- the turn runtime should classify and route assistant output through narrower policy seams |
| 21 | +- assumptions, contradictions, and acceptance anchors should survive across workflow phases as explicit state |
| 22 | +- inspection should be able to show diffs between the artifacts and prompt contracts that drove behavior |
| 23 | + |
| 24 | +This sprint is about making Loader more inspectable and less accidental: |
| 25 | + |
| 26 | +- `turn_iteration` shrinks into narrower policy-oriented seams |
| 27 | +- workflow invalidation gains an explicit assumption/contradiction ledger |
| 28 | +- operator tooling gains artifact and prompt diff visibility |
| 29 | +- Loader gets closer to claw-code not just in structure, but in debuggability |
| 30 | + |
| 31 | +The references for this sprint are: |
| 32 | + |
| 33 | +- `refs/claw-code/rust/crates/runtime/src/conversation.rs` |
| 34 | +- `refs/claw-code/rust/crates/runtime/src/policy_engine.rs` |
| 35 | +- `refs/claw-code/rust/crates/runtime/src/prompt.rs` |
| 36 | +- `refs/claw-code/PARITY.md` |
| 37 | +- `refs/oh-my-codex/src/ralplan/runtime.ts` |
| 38 | +- `refs/oh-my-codex/src/modes/base.ts` |
| 39 | +- `refs/oh-my-codex/src/verification/verifier.ts` |
| 40 | +- `refs/oh-my-codex/skills/deep-interview/SKILL.md` |
| 41 | +- `refs/oh-my-codex/skills/ralplan/SKILL.md` |
| 42 | + |
| 43 | +## Deliverables |
| 44 | + |
| 45 | +### 1. Split `turn_iteration` into narrower response-policy seams |
| 46 | + |
| 47 | +Sprint 12 made `conversation.py` coordinator-shaped. Sprint 13 should keep the same discipline for the still-heavy iteration seam. |
| 48 | + |
| 49 | +Implementation targets: |
| 50 | + |
| 51 | +- extract narrower helpers under `src/loader/runtime/`, likely around: |
| 52 | + - assistant-response classification |
| 53 | + - repair routing |
| 54 | + - final-answer routing |
| 55 | + - tool-batch routing |
| 56 | + - no-tool completion handoff |
| 57 | +- make `turn_iteration.py` read more like: |
| 58 | + - request assistant turn |
| 59 | + - classify response |
| 60 | + - delegate the winning route |
| 61 | + - return loop-state deltas |
| 62 | +- keep the main behavior unchanged while reducing policy density per module |
| 63 | +- add direct controller tests so future iteration changes do not depend only on broad runtime integration coverage |
| 64 | + |
| 65 | +The goal is not more files for their own sake. The goal is to make assistant-turn behavior easier to tune deliberately. |
| 66 | + |
| 67 | +### 2. Assumption and contradiction ledger instead of one-off evidence summaries |
| 68 | + |
| 69 | +Sprint 12 introduced richer drift evidence. Sprint 13 should make that evidence durable and reusable. |
| 70 | + |
| 71 | +Implementation targets: |
| 72 | + |
| 73 | +- define a typed workflow ledger contract under `src/loader/runtime/` for: |
| 74 | + - explicit assumptions |
| 75 | + - confirmed assumptions |
| 76 | + - contradicted assumptions |
| 77 | + - acceptance anchors |
| 78 | + - open decision boundaries |
| 79 | + - closed decision boundaries |
| 80 | +- thread that ledger through clarify, planning, verification, and recovery instead of only summarizing evidence at refresh time |
| 81 | +- persist enough structure to answer: |
| 82 | + - which assumption was invalidated? |
| 83 | + - which workflow phase introduced it? |
| 84 | + - what evidence contradicted it? |
| 85 | + - whether the contradiction forced refresh, reentry, or only inspection visibility |
| 86 | +- keep the first version pragmatic and text-first; do not try to build a symbolic reasoning engine |
| 87 | + |
| 88 | +This is how Loader gets from "richer summaries" to a more explicit semantic workflow contract. |
| 89 | + |
| 90 | +### 3. Artifact and prompt diff surfaces for operators |
| 91 | + |
| 92 | +Loader can now show the latest prompt and workflow timeline. Sprint 13 should help operators see what changed. |
| 93 | + |
| 94 | +Implementation targets: |
| 95 | + |
| 96 | +- add diff-oriented inspection surfaces, likely around: |
| 97 | + - clarify brief vs refreshed brief |
| 98 | + - old plan vs refreshed plan |
| 99 | + - workflow ledger changes across reentry |
| 100 | + - prompt metadata or prompt-body diffs across relevant turns |
| 101 | +- keep the product surface text-first and operator-friendly, for example via: |
| 102 | + - `loader workflow show --diff` |
| 103 | + - `loader prompt diff` |
| 104 | + - or an equivalent `loader artifact show` family if that is cleaner |
| 105 | +- include concise change summaries by default and fuller diffs when explicitly requested |
| 106 | +- avoid a visual UI in this sprint; prioritize fast CLI/TUI debugging value |
| 107 | + |
| 108 | +The goal is to make workflow changes legible, not just persisted. |
| 109 | + |
| 110 | +### 4. Workflow/operator surfaces that explain semantic change, not only event history |
| 111 | + |
| 112 | +Sprint 12 improved evidence visibility. Sprint 13 should improve semantic visibility. |
| 113 | + |
| 114 | +Implementation targets: |
| 115 | + |
| 116 | +- extend inspection surfaces so they can show: |
| 117 | + - which assumptions remain open |
| 118 | + - which assumptions were contradicted |
| 119 | + - which acceptance anchors changed across clarify/plan/verify |
| 120 | + - whether a refresh was forced by contradiction, touchpoint drift, or acceptance drift |
| 121 | +- preserve concise defaults so everyday status remains readable |
| 122 | +- make session/workflow output useful for long-running or resumed tasks, not only single-turn debugging |
| 123 | + |
| 124 | +This brings Loader closer to claw-code's stronger operator trust model. |
| 125 | + |
| 126 | +### 5. Keep the parity baseline honest while the runtime narrows again |
| 127 | + |
| 128 | +Sprint 12 closed a big structural loop. Sprint 13 should protect that gain. |
| 129 | + |
| 130 | +Implementation targets: |
| 131 | + |
| 132 | +- add direct tests for the newly split iteration policy seams |
| 133 | +- extend workflow/inspection coverage for diff and ledger behavior |
| 134 | +- keep existing parity scenarios green after the iteration split |
| 135 | +- update `PARITY.md` and the sprint audit only after the new surfaces and contracts are actually covered |
| 136 | + |
| 137 | +## Testing strategy |
| 138 | + |
| 139 | +- unit coverage for: |
| 140 | + - response classification and per-route delegation |
| 141 | + - assumption-ledger updates and contradiction recording |
| 142 | + - artifact/prompt diff formatting and summaries |
| 143 | + - workflow refresh decisions reading from the new ledger state |
| 144 | +- CLI coverage for: |
| 145 | + - prompt/artifact/workflow diff surfaces |
| 146 | + - workflow/session output for contradiction-led refreshes |
| 147 | +- deterministic/runtime coverage for: |
| 148 | + - a clarify answer that seeds assumptions later contradicted during verification |
| 149 | + - a plan refresh where the operator surface can show exactly what changed |
| 150 | + - a resumed session where workflow inspection still reflects semantic ledger state |
| 151 | + - Sprint 00-12 parity scenarios staying green after the deeper iteration split |
| 152 | +- regression coverage: |
| 153 | + - iteration refactors should not regress verify/fix, permission, or explore contracts |
| 154 | + - diff surfaces should read persisted artifacts/session state rather than reconstructing history heuristically |
| 155 | + |
| 156 | +## Definition of done |
| 157 | + |
| 158 | +- `turn_iteration.py` is slimmer and delegates through narrower response-policy seams |
| 159 | +- assumptions and contradictions are persisted as explicit workflow state |
| 160 | +- operators can inspect artifact or prompt diffs from the product surface |
| 161 | +- workflow inspection explains semantic change, not only route history |
| 162 | +- the full parity baseline remains green after the deeper iteration split |
| 163 | + |
| 164 | +## Explicitly out of scope |
| 165 | + |
| 166 | +- full OMX-style consensus planning |
| 167 | +- a visual workflow diff UI |
| 168 | +- AST-aware, LSP-aware, or symbol-aware editing |
| 169 | +- a first-class permission rule editor |
| 170 | +- multi-agent or team orchestration |