tenseleyflow/loader / 67878a4

Browse files

Plan Sprint 12 interview rigor work

Authored by espadonne
SHA
67878a4e869c7ea6389f7dc55d2a86622fcecf81
Parents
0adda1a
Tree
8123ae2

2 changed files

StatusFile+-
M .docs/sprints/index.md 4 0
A .docs/sprints/sprint12.md 177 0
.docs/sprints/index.mdmodified
@@ -43,6 +43,10 @@ The plan was reshaped after a deeper validation pass against `refs/claw-code` an
4343
 
4444
 - [Sprint 11](sprint11.md) — Semantic Signals, Clarify Strategy, and Orchestrator Split
4545
 
46
+## Phase 10: Interview Rigor and Recovery Evidence
47
+
48
+- [Sprint 12](sprint12.md) — Interview Pressure, Semantic Evidence, and Turn Orchestration
49
+
4650
 ## Working principles
4751
 
4852
 - Each sprint must end with stronger runtime reliability, not just more features.
.docs/sprints/sprint12.mdadded
@@ -0,0 +1,177 @@
1
+# Sprint 12: Interview Pressure, Semantic Evidence, and Turn Orchestration
2
+
3
+## Prerequisites
4
+
5
+Sprint 11
6
+
7
+## Goals
8
+
9
+Turn Loader's newer workflow structure into a more disciplined execution contract by deepening clarify beyond slot selection, making semantic invalidation rely on richer evidence than text overlap alone, and shrinking the main turn loop into a clearer orchestration shell.
10
+
11
+Sprint 11 closed several real gaps. Loader now has typed workflow signals, slot-aware clarify, semantic invalidation, better workflow inspection, and a slimmer coordinator. That is meaningful progress toward claw-code and OMX, but the audit is honest about what still hurts:
12
+
13
+- typed workflow signals are still hand-tuned runtime heuristics rather than a deeper ambiguity/evidence model
14
+- clarify is more intentional now, but it still lacks OMX's pressure-pass discipline, evidence-chasing, and codebase-backed interview style
15
+- artifact invalidation is broader than file drift, but it still reasons from lightweight text overlap instead of richer structured evidence
16
+- `conversation.py` is smaller, but it still owns the main assistant/recovery/completion orchestration loop that the refs spread across narrower runtime seams
17
+
18
+The next leverage point is to stop treating clarify as "ask a better next question" and start treating it as "run a bounded interview with explicit pressure passes, factual grounding, and a stronger handoff contract for later execution."
19
+
20
+This sprint is about execution rigor:
21
+
22
+- clarify gains pressure-pass behavior instead of only slot-follow-up behavior
23
+- semantic invalidation uses richer structured evidence and contradiction tracking
24
+- the main turn loop shrinks again by delegating orchestration checkpoints into dedicated runtime modules
25
+- Loader gets closer to closed-source agentic tools not by more prompt prose, but by stronger workflow contracts
26
+
27
+The references for this sprint are:
28
+
29
+- `refs/claw-code/rust/crates/runtime/src/conversation.rs`
30
+- `refs/claw-code/rust/crates/runtime/src/policy_engine.rs`
31
+- `refs/claw-code/rust/crates/runtime/src/prompt.rs`
32
+- `refs/oh-my-codex/src/ralplan/runtime.ts`
33
+- `refs/oh-my-codex/src/modes/base.ts`
34
+- `refs/oh-my-codex/skills/deep-interview/SKILL.md`
35
+- `refs/oh-my-codex/skills/ralplan/SKILL.md`
36
+
37
+## Deliverables
38
+
39
+### 1. Pressure-pass clarify controller instead of slot selection alone
40
+
41
+Sprint 11 made clarify targeted. Sprint 12 should make it disciplined.
42
+
43
+Implementation targets:
44
+
45
+- introduce a dedicated clarify controller under `src/loader/runtime/` that tracks:
46
+  - current interview stage
47
+  - weakest clarity dimension
48
+  - whether a pressure pass has occurred
49
+  - whether non-goals and decision boundaries are explicit
50
+  - how much interview budget remains
51
+- extend clarify reasoning beyond "what slot is unresolved?" to also ask:
52
+  - was the last answer too broad?
53
+  - has this assumption been challenged yet?
54
+  - do we still need an example, counterexample, tradeoff, or explicit stop boundary?
55
+- persist clarify progress in structured form so later workflow decisions can explain:
56
+  - which dimension clarify was targeting
57
+  - whether Loader was still gathering boundaries
58
+  - whether it stopped because the budget was exhausted or because readiness gates were met
59
+- keep it bounded and pragmatic:
60
+  - no unbounded interviews
61
+  - no long questionnaires
62
+  - one question at a time with explicit stop conditions
63
+
64
+The goal is not to copy OMX wholesale. The goal is to adopt the parts that materially reduce misaligned execution and premature planning.
65
+
66
+### 2. Codebase-backed clarify grounding and stronger requirement artifacts
67
+
68
+Sprint 11 still relies mostly on the user answer plus task text. Sprint 12 should let clarify lean on facts Loader can gather directly.
69
+
70
+Implementation targets:
71
+
72
+- add a lightweight preflight/context seam for brownfield tasks that can feed clarify with discovered facts before asking the user for repository details
73
+- prefer evidence-backed clarify questions when Loader already knows something, for example:
74
+  - "I found X in Y. Should this change follow that pattern?"
75
+  - "The current touchpoints appear to be A and B. Should I keep C out of scope?"
76
+- persist richer clarify artifact metadata where it helps downstream runtime behavior, for example:
77
+  - explicit non-goal status
78
+  - explicit decision-boundary status
79
+  - whether a pressure pass occurred
80
+  - likely touchpoint evidence
81
+  - inferred vs confirmed boundaries
82
+- keep this grounded in Loader's existing tool surface rather than inventing a large research subsystem
83
+
84
+This moves Loader closer to OMX's "reduce user effort and don't ask for facts we can discover" principle.
85
+
86
+### 3. Structured semantic evidence for invalidation and replan decisions
87
+
88
+Sprint 11 improved invalidation, but it still reasons mostly from text coverage. Sprint 12 should give recovery choices a stronger evidence model.
89
+
90
+Implementation targets:
91
+
92
+- define a structured invalidation/evidence contract under `src/loader/runtime/`, for example around:
93
+  - confirmed touchpoints
94
+  - inferred touchpoints
95
+  - acceptance anchors
96
+  - contradicted assumptions
97
+  - verification contradiction signals
98
+  - changed user boundaries after clarify
99
+- teach invalidation to distinguish:
100
+  - plan mismatch
101
+  - brief contradiction
102
+  - verification contradiction
103
+  - stale assumptions
104
+- improve recovery selection so Loader can explain not only what it chose, but what evidence forced that choice
105
+- preserve "smallest valid recovery move first" as the governing behavior
106
+
107
+This is how Loader gets from "semantic-ish refresh" to a more trustworthy workflow contract.
108
+
109
+### 4. Turn orchestration split beyond lane execution
110
+
111
+Sprint 11 moved clarify/plan lanes out. Sprint 12 should keep shrinking the top-level turn loop.
112
+
113
+Implementation targets:
114
+
115
+- extract additional runtime seams under `src/loader/runtime/`, likely around:
116
+  - turn preparation/bootstrap
117
+  - workflow recovery/reentry control
118
+  - completion/continuation orchestration
119
+  - assistant-response repair routing
120
+- make `ConversationRuntime.run_turn(...)` read more like:
121
+  - initialize turn state
122
+  - prepare workflow contract
123
+  - delegate iteration/orchestration helpers
124
+  - finalize summary
125
+- avoid creating a new monolith module; prefer narrow orchestration seams with direct tests
126
+
127
+A good outcome is that the turn loop becomes easier to reason about and less likely to collect ad hoc behavior again.
128
+
129
+### 5. Workflow/operator surfaces that explain evidence, not just decisions
130
+
131
+Sprint 11 made `loader workflow show` more useful. Sprint 12 should make it explain the evidence behind recovery and clarify pressure more directly.
132
+
133
+Implementation targets:
134
+
135
+- extend workflow inspection surfaces to show:
136
+  - whether a pressure pass occurred
137
+  - which clarify dimension was active
138
+  - which evidence triggered refresh or reentry
139
+  - which assumptions were still unresolved
140
+- keep the default UX concise, but expose richer detail when explicitly requested
141
+- avoid a visual UI in this sprint; prioritize text surfaces that make the runtime easier to debug immediately
142
+
143
+## Testing strategy
144
+
145
+- unit coverage for:
146
+  - clarify pressure-pass progression and readiness gates
147
+  - codebase-backed clarify question selection from discovered facts
148
+  - structured invalidation evidence and contradiction handling
149
+  - new orchestration seams preserving current turn behavior
150
+- CLI coverage for:
151
+  - workflow inspection showing clarify pressure/evidence
152
+  - session/workflow output for contradiction-driven reentry
153
+- deterministic/runtime coverage for:
154
+  - ambiguous brownfield tasks where Loader asks evidence-backed clarify questions
155
+  - tasks that need an assumption/tradeoff pressure pass before planning
156
+  - verification contradictions that trigger targeted refresh vs full re-plan
157
+  - Sprint 00-11 parity scenarios staying green after the deeper orchestration split
158
+- regression coverage:
159
+  - clarify should not ask the user for repository facts Loader can gather directly
160
+  - orchestration extraction should not regress the verify/fix or permission/runtime contracts
161
+
162
+## Definition of done
163
+
164
+- clarify uses a bounded pressure-pass controller rather than slot selection alone
165
+- brownfield clarify can ask evidence-backed questions from discovered facts
166
+- invalidation relies on richer structured evidence and contradiction tracking
167
+- workflow/operator surfaces explain clarify and recovery evidence more directly
168
+- `conversation.py` is slimmer again and more orchestration-shell-like
169
+- the full parity baseline remains green after the deeper clarify/orchestration split
170
+
171
+## Explicitly out of scope
172
+
173
+- full OMX-style consensus planning
174
+- a visual workflow timeline UI
175
+- a first-class permission rule editor
176
+- AST-aware, LSP-aware, or symbol-aware editing
177
+- multi-agent or team orchestration