tenseleyflow/loader / 4effa19

Browse files

Plan Sprint 13 semantic diff work

Authored by espadonne
SHA
4effa1996f2d7b6e91cba64909b24651ddcf6356
Parents
8e5d8e7
Tree
0b19b70

2 changed files

StatusFile+-
M .docs/sprints/index.md 4 0
A .docs/sprints/sprint13.md 170 0
.docs/sprints/index.mdmodified
@@ -47,6 +47,10 @@ The plan was reshaped after a deeper validation pass against `refs/claw-code` an
4747
 
4848
 - [Sprint 12](sprint12.md) — Interview Pressure, Semantic Evidence, and Turn Orchestration
4949
 
50
+## Phase 11: Semantic Change and Operator Diffs
51
+
52
+- [Sprint 13](sprint13.md) — Turn Policy Narrowing, Assumption Ledger, and Artifact Diffs
53
+
5054
 ## Working principles
5155
 
5256
 - Each sprint must end with stronger runtime reliability, not just more features.
.docs/sprints/sprint13.mdadded
@@ -0,0 +1,170 @@
1
+# Sprint 13: Turn Policy Narrowing, Assumption Ledger, and Artifact Diffs
2
+
3
+## Prerequisites
4
+
5
+Sprint 12
6
+
7
+## Goals
8
+
9
+Turn Loader's newly controllerized runtime into a more semantically explicit workflow system by shrinking the still-heavy `turn_iteration` seam, promoting assumptions and contradictions into first-class workflow state, and giving operators diff-oriented artifact visibility instead of only latest-state inspection.
10
+
11
+Sprint 12 was a real structural win. Loader now has pressure-pass clarify, codebase-backed grounding, structured recovery evidence, and a controller-shaped runtime shell. That meaningfully closes the gap with claw-code and OMX. The audit is also honest about what still hurts:
12
+
13
+- `turn_iteration.py` is still carrying a lot of repair, tool-routing, and completion policy in one seam
14
+- contradiction and invalidation evidence are richer than before, but they are still mostly runtime-authored summaries rather than a reusable semantic ledger
15
+- operator surfaces can explain "why did this happen?" better than before, but they still cannot show "what changed?" across briefs, plans, verification, or prompt contracts
16
+- Loader now has better workflow discipline, but it still lacks some of the day-two operator ergonomics that make claw-code and OMX easier to trust during long tasks
17
+
18
+The next leverage point is to stop treating semantic drift and operator visibility as one-off summaries and start treating them as durable contracts:
19
+
20
+- the turn runtime should classify and route assistant output through narrower policy seams
21
+- assumptions, contradictions, and acceptance anchors should survive across workflow phases as explicit state
22
+- inspection should be able to show diffs between the artifacts and prompt contracts that drove behavior
23
+
24
+This sprint is about making Loader more inspectable and less accidental:
25
+
26
+- `turn_iteration` shrinks into narrower policy-oriented seams
27
+- workflow invalidation gains an explicit assumption/contradiction ledger
28
+- operator tooling gains artifact and prompt diff visibility
29
+- Loader gets closer to claw-code not just in structure, but in debuggability
30
+
31
+The references for this sprint are:
32
+
33
+- `refs/claw-code/rust/crates/runtime/src/conversation.rs`
34
+- `refs/claw-code/rust/crates/runtime/src/policy_engine.rs`
35
+- `refs/claw-code/rust/crates/runtime/src/prompt.rs`
36
+- `refs/claw-code/PARITY.md`
37
+- `refs/oh-my-codex/src/ralplan/runtime.ts`
38
+- `refs/oh-my-codex/src/modes/base.ts`
39
+- `refs/oh-my-codex/src/verification/verifier.ts`
40
+- `refs/oh-my-codex/skills/deep-interview/SKILL.md`
41
+- `refs/oh-my-codex/skills/ralplan/SKILL.md`
42
+
43
+## Deliverables
44
+
45
+### 1. Split `turn_iteration` into narrower response-policy seams
46
+
47
+Sprint 12 made `conversation.py` coordinator-shaped. Sprint 13 should keep the same discipline for the still-heavy iteration seam.
48
+
49
+Implementation targets:
50
+
51
+- extract narrower helpers under `src/loader/runtime/`, likely around:
52
+  - assistant-response classification
53
+  - repair routing
54
+  - final-answer routing
55
+  - tool-batch routing
56
+  - no-tool completion handoff
57
+- make `turn_iteration.py` read more like:
58
+  - request assistant turn
59
+  - classify response
60
+  - delegate the winning route
61
+  - return loop-state deltas
62
+- keep the main behavior unchanged while reducing policy density per module
63
+- add direct controller tests so future iteration changes do not depend only on broad runtime integration coverage
64
+
65
+The goal is not more files for their own sake. The goal is to make assistant-turn behavior easier to tune deliberately.
66
+
67
+### 2. Assumption and contradiction ledger instead of one-off evidence summaries
68
+
69
+Sprint 12 introduced richer drift evidence. Sprint 13 should make that evidence durable and reusable.
70
+
71
+Implementation targets:
72
+
73
+- define a typed workflow ledger contract under `src/loader/runtime/` for:
74
+  - explicit assumptions
75
+  - confirmed assumptions
76
+  - contradicted assumptions
77
+  - acceptance anchors
78
+  - open decision boundaries
79
+  - closed decision boundaries
80
+- thread that ledger through clarify, planning, verification, and recovery instead of only summarizing evidence at refresh time
81
+- persist enough structure to answer:
82
+  - which assumption was invalidated?
83
+  - which workflow phase introduced it?
84
+  - what evidence contradicted it?
85
+  - whether the contradiction forced refresh, reentry, or only inspection visibility
86
+- keep the first version pragmatic and text-first; do not try to build a symbolic reasoning engine
87
+
88
+This is how Loader gets from "richer summaries" to a more explicit semantic workflow contract.
89
+
90
+### 3. Artifact and prompt diff surfaces for operators
91
+
92
+Loader can now show the latest prompt and workflow timeline. Sprint 13 should help operators see what changed.
93
+
94
+Implementation targets:
95
+
96
+- add diff-oriented inspection surfaces, likely around:
97
+  - clarify brief vs refreshed brief
98
+  - old plan vs refreshed plan
99
+  - workflow ledger changes across reentry
100
+  - prompt metadata or prompt-body diffs across relevant turns
101
+- keep the product surface text-first and operator-friendly, for example via:
102
+  - `loader workflow show --diff`
103
+  - `loader prompt diff`
104
+  - or an equivalent `loader artifact show` family if that is cleaner
105
+- include concise change summaries by default and fuller diffs when explicitly requested
106
+- avoid a visual UI in this sprint; prioritize fast CLI/TUI debugging value
107
+
108
+The goal is to make workflow changes legible, not just persisted.
109
+
110
+### 4. Workflow/operator surfaces that explain semantic change, not only event history
111
+
112
+Sprint 12 improved evidence visibility. Sprint 13 should improve semantic visibility.
113
+
114
+Implementation targets:
115
+
116
+- extend inspection surfaces so they can show:
117
+  - which assumptions remain open
118
+  - which assumptions were contradicted
119
+  - which acceptance anchors changed across clarify/plan/verify
120
+  - whether a refresh was forced by contradiction, touchpoint drift, or acceptance drift
121
+- preserve concise defaults so everyday status remains readable
122
+- make session/workflow output useful for long-running or resumed tasks, not only single-turn debugging
123
+
124
+This brings Loader closer to claw-code's stronger operator trust model.
125
+
126
+### 5. Keep the parity baseline honest while the runtime narrows again
127
+
128
+Sprint 12 closed a big structural loop. Sprint 13 should protect that gain.
129
+
130
+Implementation targets:
131
+
132
+- add direct tests for the newly split iteration policy seams
133
+- extend workflow/inspection coverage for diff and ledger behavior
134
+- keep existing parity scenarios green after the iteration split
135
+- update `PARITY.md` and the sprint audit only after the new surfaces and contracts are actually covered
136
+
137
+## Testing strategy
138
+
139
+- unit coverage for:
140
+  - response classification and per-route delegation
141
+  - assumption-ledger updates and contradiction recording
142
+  - artifact/prompt diff formatting and summaries
143
+  - workflow refresh decisions reading from the new ledger state
144
+- CLI coverage for:
145
+  - prompt/artifact/workflow diff surfaces
146
+  - workflow/session output for contradiction-led refreshes
147
+- deterministic/runtime coverage for:
148
+  - a clarify answer that seeds assumptions later contradicted during verification
149
+  - a plan refresh where the operator surface can show exactly what changed
150
+  - a resumed session where workflow inspection still reflects semantic ledger state
151
+  - Sprint 00-12 parity scenarios staying green after the deeper iteration split
152
+- regression coverage:
153
+  - iteration refactors should not regress verify/fix, permission, or explore contracts
154
+  - diff surfaces should read persisted artifacts/session state rather than reconstructing history heuristically
155
+
156
+## Definition of done
157
+
158
+- `turn_iteration.py` is slimmer and delegates through narrower response-policy seams
159
+- assumptions and contradictions are persisted as explicit workflow state
160
+- operators can inspect artifact or prompt diffs from the product surface
161
+- workflow inspection explains semantic change, not only route history
162
+- the full parity baseline remains green after the deeper iteration split
163
+
164
+## Explicitly out of scope
165
+
166
+- full OMX-style consensus planning
167
+- a visual workflow diff UI
168
+- AST-aware, LSP-aware, or symbol-aware editing
169
+- a first-class permission rule editor
170
+- multi-agent or team orchestration