tenseleyflow/loader / 1aaccad

Browse files

Plan Sprint 11 semantic workflow work

Authored by espadonne
SHA
1aaccadccb1bc955f673b9da61bef36cf13c099e
Parents
49d8032
Tree
222c6a5

2 changed files

StatusFile+-
M .docs/sprints/index.md 4 0
A .docs/sprints/sprint11.md 186 0
.docs/sprints/index.mdmodified
@@ -39,6 +39,10 @@ The plan was reshaped after a deeper validation pass against `refs/claw-code` an
3939
 
4040
 - [Sprint 10](sprint10.md) — Route Pressure, Clarify Depth, and Workflow Timeline
4141
 
42
+## Phase 9: Semantic Workflow and Orchestration
43
+
44
+- [Sprint 11](sprint11.md) — Semantic Signals, Clarify Strategy, and Orchestrator Split
45
+
4246
 ## Working principles
4347
 
4448
 - Each sprint must end with stronger runtime reliability, not just more features.
.docs/sprints/sprint11.mdadded
@@ -0,0 +1,186 @@
1
+# Sprint 11: Semantic Signals, Clarify Strategy, and Orchestrator Split
2
+
3
+## Prerequisites
4
+
5
+Sprint 10
6
+
7
+## Goals
8
+
9
+Turn Loader's new workflow policy from a better scorecard into a more structured workflow contract, and keep shrinking the coordinator so policy and orchestration live in dedicated runtime seams instead of collecting back inside `conversation.py`.
10
+
11
+Sprint 10 was a meaningful step forward. Loader now has scored routing, bounded clarify follow-through, plan refresh, and a persisted workflow timeline. That closes a real gap with claw-code and OMX, but the audit is honest about what still hurts:
12
+
13
+- workflow scoring is still hand-tuned and text-heuristic rather than driven by a typed signal model
14
+- clarify has follow-through now, but the questioning strategy is still generic and shallow compared with OMX's deep-interview discipline
15
+- plan freshness is still mostly file-drift based instead of understanding broader semantic invalidation
16
+- workflow history is inspectable, but not yet filtered or summarized around the most useful operator questions
17
+- `conversation.py` is smaller than it was, but it still coordinates more workflow behavior than the refs
18
+
19
+The next leverage point is to stop asking only "what pressure score won?" and start asking "what concrete workflow signals are in play, which task boundaries remain unresolved, and which orchestration module should own the next move?"
20
+
21
+This sprint is about workflow structure:
22
+
23
+- route policy consumes typed workflow signals rather than leaning so heavily on inline heuristics
24
+- clarify becomes intent-aware instead of merely multi-round
25
+- replan discipline becomes more semantic than touched-file drift alone
26
+- workflow inspection becomes more useful for debugging why Loader stayed in or re-entered a lane
27
+- `conversation.py` shrinks again because orchestration moves into dedicated runtime modules
28
+
29
+The references for this sprint are:
30
+
31
+- `refs/claw-code/rust/crates/runtime/src/conversation.rs`
32
+- `refs/claw-code/rust/crates/runtime/src/policy_engine.rs`
33
+- `refs/claw-code/rust/crates/runtime/src/prompt.rs`
34
+- `refs/oh-my-codex/src/ralplan/runtime.ts`
35
+- `refs/oh-my-codex/src/modes/base.ts`
36
+- `refs/oh-my-codex/skills/deep-interview/SKILL.md`
37
+- `refs/oh-my-codex/skills/ralplan/SKILL.md`
38
+
39
+## Deliverables
40
+
41
+### 1. Typed workflow-signal extraction instead of score inputs assembled inline
42
+
43
+Sprint 10 made routing scored. Sprint 11 should make the inputs first-class.
44
+
45
+Implementation targets:
46
+
47
+- introduce a dedicated workflow-signal module under `src/loader/runtime/`, for example around:
48
+  - ambiguity signals
49
+  - complexity signals
50
+  - mutation / verification pressure
51
+  - unresolved clarification slots
52
+  - artifact availability and freshness
53
+  - explicit user workflow requests
54
+  - recent workflow timeline pressure
55
+- separate signal extraction from route scoring so policy code can reason over a typed signal packet rather than rebuilding context ad hoc
56
+- persist enough of the winning signal context to explain:
57
+  - why clarify won over execute
58
+  - why plan refresh was triggered
59
+  - why direct execution was still allowed despite ambiguity
60
+- keep route scoring tunable, but move the fragile task-text heuristics out of the coordinator path
61
+
62
+The goal is not to build a giant intent engine. The goal is to make workflow policy more explainable, testable, and less accidental.
63
+
64
+### 2. Intent-aware clarify strategy instead of generic follow-up rounds
65
+
66
+Loader can now clarify more than once, but it still asks questions in a relatively flat way.
67
+
68
+Implementation targets:
69
+
70
+- define typed clarify objectives or slots such as:
71
+  - desired outcome
72
+  - acceptance criteria
73
+  - constraints
74
+  - non-goals
75
+  - risk boundaries
76
+- choose the next clarify question from unresolved slots instead of using a mostly generic follow-up loop
77
+- adapt clarify behavior based on signal severity and task class while preserving a hard upper bound
78
+- persist why clarify stopped:
79
+  - enough boundaries gathered
80
+  - budget exhausted
81
+  - route pressure shifted toward plan or execute
82
+  - explicit user answer narrowed the scope sufficiently
83
+- carry unresolved slots forward into workflow state and artifacts so later plan/execute decisions can explain what was still uncertain
84
+
85
+This is how Loader gets closer to OMX's deeper interview rigor without turning every task into a long questionnaire.
86
+
87
+### 3. Semantic artifact invalidation and stronger re-plan discipline
88
+
89
+Sprint 10 made plan refresh possible. Sprint 11 should make refresh triggers smarter.
90
+
91
+Implementation targets:
92
+
93
+- enrich planning artifacts with more structured metadata where it materially helps, for example:
94
+  - expected touchpoints
95
+  - acceptance-criteria anchors
96
+  - planned files or subsystems
97
+  - known risks or assumptions
98
+- define broader invalidation triggers beyond file drift, for example:
99
+  - verification evidence contradicts the plan assumptions
100
+  - the implementation touched files or subsystems outside the expected scope
101
+  - acceptance criteria changed materially after clarify or verification
102
+  - the current task wording narrowed or expanded after the plan was written
103
+- distinguish between:
104
+  - targeted plan refresh
105
+  - clarify reentry
106
+  - full re-plan
107
+- keep the runtime disciplined: prefer the smallest valid recovery move instead of restarting workflow lanes casually
108
+
109
+This should move Loader closer to claw-code's stronger artifact discipline, where plans remain live contracts instead of just persisted markdown.
110
+
111
+### 4. Workflow inspection that answers operator questions more directly
112
+
113
+Sprint 10 made workflow history visible. Sprint 11 should make it more usable.
114
+
115
+Implementation targets:
116
+
117
+- extend `loader workflow show` with higher-signal inspection affordances such as:
118
+  - filtering by mode or event kind
119
+  - limiting to the most recent meaningful items
120
+  - clearer summaries for refresh, reentry, and clarify-budget outcomes
121
+- expose the signal/reason context that most directly answers questions like:
122
+  - why did Loader ask again?
123
+  - why did Loader refresh the plan?
124
+  - why did Loader skip verify?
125
+- keep session surfaces concise by surfacing only the most recent or most important workflow events by default
126
+- avoid building a visual UI in this sprint; prioritize text inspection that reduces debugging time immediately
127
+
128
+The goal is not prettier output. The goal is faster workflow debugging and better operator trust.
129
+
130
+### 5. Continue shrinking `conversation.py` into a coordinator over runtime modules
131
+
132
+Sprint 10 improved the split, but the coordinator still owns too much sequencing logic.
133
+
134
+Implementation targets:
135
+
136
+- extract additional orchestration seams under `src/loader/runtime/`, likely around:
137
+  - signal extraction
138
+  - clarify-lane control
139
+  - plan refresh / invalidation decisions
140
+  - workflow timeline append policy
141
+- make `ConversationRuntime.run_turn(...)` read more like:
142
+  - collect turn state
143
+  - compute workflow signals
144
+  - ask policy/orchestrator for the next lane decision
145
+  - delegate lane execution
146
+  - persist summary and timeline outcomes
147
+- keep completion and downstream workflow handoff logic out of the signal-extraction path
148
+- avoid replacing one monolith with another; new orchestration modules should have narrow responsibilities and direct tests
149
+
150
+A good outcome is that `conversation.py` keeps shrinking because ownership is clearer, not because behavior gets hidden.
151
+
152
+## Testing strategy
153
+
154
+- unit coverage for:
155
+  - typed workflow-signal extraction and normalization
156
+  - route-policy scoring over structured signals
157
+  - clarify-slot progression and stop reasons
158
+  - semantic invalidation triggers and targeted recovery selection
159
+- CLI coverage for:
160
+  - `loader workflow show` filtering and summarization
161
+  - session/workflow output for clarify exhaustion, plan refresh, and reentry reasons
162
+- deterministic/runtime coverage for:
163
+  - ambiguous tasks where clarify chooses different follow-up questions based on unresolved slots
164
+  - verification failure that triggers plan refresh vs clarify reentry based on typed invalidation reasons
165
+  - tasks that remain executable even with mild ambiguity because stronger signals favor direct execution
166
+  - Sprint 00-10 parity scenarios staying green after the workflow-policy split deepens again
167
+- regression coverage:
168
+  - route policy should consume typed signals rather than rebuilding them ad hoc inside the coordinator
169
+  - workflow inspection should continue to work after session resume and compaction
170
+
171
+## Definition of done
172
+
173
+- Loader extracts typed workflow signals before route scoring
174
+- clarify behavior is intent-aware and persists why it continued or stopped
175
+- plan refresh uses richer invalidation reasons than file drift alone
176
+- workflow inspection better explains reentry, refresh, and clarify behavior
177
+- `conversation.py` is slimmer again and more coordinator-like
178
+- the full parity baseline remains green after the deeper workflow-policy split
179
+
180
+## Explicitly out of scope
181
+
182
+- full OMX-style consensus planning
183
+- a visual workflow timeline UI
184
+- a first-class permission rule editor
185
+- AST-aware, LSP-aware, or symbol-aware editing
186
+- multi-agent or team orchestration