tenseleyflow/loader / f8f0c8a

Browse files

Plan Sprint 19 policy accountability work

Authored by espadonne
SHA
f8f0c8aa69955fe4a6496fb0f166c2de4f0ce95d
Parents
18026a3
Tree
7d710aa

2 changed files

StatusFile+-
M .docs/sprints/index.md 4 0
A .docs/sprints/sprint19.md 163 0
.docs/sprints/index.mdmodified
@@ -71,6 +71,10 @@ The plan was reshaped after a deeper validation pass against `refs/claw-code` an
7171
 
7272
 - [Sprint 18](sprint18.md) — Shell Minimalism, Completion Contract, and Runtime Policy Trace
7373
 
74
+## Phase 17: Facade Finalization and Policy Accountability
75
+
76
+- [Sprint 19](sprint19.md) — Facade Finalization, Continuation Hardening, and Unified Policy Timeline
77
+
7478
 ## Working principles
7579
 
7680
 - Each sprint must end with stronger runtime reliability, not just more features.
.docs/sprints/sprint19.mdadded
@@ -0,0 +1,163 @@
1
+# Sprint 19: Facade Finalization, Continuation Hardening, and Unified Policy Timeline
2
+
3
+## Prerequisites
4
+
5
+Sprint 18
6
+
7
+## Goals
8
+
9
+Take the next honest contraction after Sprint 18: finish thinning the public shell where it still reads like runtime glue, harden the last continuation heuristics that can still feel like soft rescue behavior, and unify the scattered policy/debug surfaces into one clearer operator story.
10
+
11
+Sprint 18 changed the shape of the remaining debt in a useful way:
12
+
13
+- completion policy is now explicit, persisted, and inspectable
14
+- public-shell helpers now own steering, session install/load, event-emitter normalization, and capability-refresh decisions
15
+- `src/loader/agent/loop.py` is materially thinner
16
+- but `Agent` still owns the public entrypoints and launcher glue
17
+- completion traces and workflow traces now both exist, but they are still separate operator surfaces
18
+- Loader still keeps bounded continuation nudges for some non-mutating tasks, and those nudges are explicit now but not yet deeply justified
19
+
20
+Sprint 19 should stay reference-guided, not reference-submissive.
21
+
22
+The standard remains:
23
+
24
+- use claw-code to sharpen runtime seams, policy ownership, and explicit lifecycle contracts
25
+- use OMX to sharpen follow-through, verifier pressure, and operator-facing runtime accountability
26
+- do not add a feature just because the refs have it
27
+- do pursue changes when the refs reveal that Loader is still too soft, too implicit, or too hard to audit
28
+
29
+`audit.txt` is still not the roadmap. It is useful only as a guardrail against sliding back into wrapper-heavy cleanup and soft model rescue behavior.
30
+
31
+The references for this sprint are:
32
+
33
+- `refs/claw-code/rust/crates/runtime/src/conversation.rs`
34
+- `refs/claw-code/rust/crates/runtime/src/bootstrap.rs`
35
+- `refs/claw-code/rust/crates/runtime/src/session_control.rs`
36
+- `refs/claw-code/rust/crates/runtime/src/lane_events.rs`
37
+- `refs/claw-code/rust/crates/runtime/src/policy_engine.rs`
38
+- `refs/claw-code/rust/crates/runtime/src/green_contract.rs`
39
+- `refs/claw-code/rust/crates/runtime/src/prompt.rs`
40
+- `refs/claw-code/PARITY.md`
41
+- `.docs/PARITY.md`
42
+- `.docs/audit.txt`
43
+- `.docs/audit_sprints/trunk_sitrep.md`
44
+- `.docs/sprints/sprint18.md`
45
+- `refs/oh-my-codex/src/autoresearch/contracts.ts`
46
+- `refs/oh-my-codex/src/autoresearch/runtime.ts`
47
+- `refs/oh-my-codex/src/verification/verifier.ts`
48
+- `refs/oh-my-codex/src/hooks/session.ts`
49
+- `refs/oh-my-codex/src/hooks/prompt-guidance-contract.ts`
50
+
51
+## Deliverables
52
+
53
+### 1. Finish the next public-shell contraction below `Agent`
54
+
55
+Sprint 18 moved more shell behavior into `src/loader/runtime/public_shell.py`, but `Agent` still owns the public entrypoint wrappers and some launch-time glue.
56
+
57
+Implementation targets:
58
+
59
+- inventory what remains in `src/loader/agent/loop.py` that still feels like runtime/public-shell plumbing instead of true public API ownership, especially:
60
+  - run / run_streaming / run_explore event-wrapper glue
61
+  - resume / clear lifecycle orchestration
62
+  - launcher construction and runtime-source preparation
63
+  - capability-refresh and prompt invalidation wiring
64
+- move what is reusable into runtime-owned helpers or a tighter launcher/public-shell seam
65
+- keep `Agent` focused on:
66
+  - public API shape
67
+  - compatibility-facing attributes
68
+  - minimal UI-facing integration points
69
+
70
+The goal is not “delete `Agent`.” The goal is for `Agent` to read like an intentionally tiny facade instead of a convenient place for leftover runtime glue.
71
+
72
+### 2. Harden the remaining continuation contract
73
+
74
+Sprint 18 made continuation behavior visible. Sprint 19 should decide which of that behavior is still too soft.
75
+
76
+Implementation targets:
77
+
78
+- inventory the remaining continuation behavior across:
79
+  - `src/loader/runtime/completion_policy.py`
80
+  - `src/loader/runtime/turn_completion.py`
81
+  - `src/loader/runtime/assistant_turns.py`
82
+  - any nearby repair/finalization controller that can still nudge rather than stop
83
+- identify which continuation cases are still justified by explicit runtime evidence versus merely tolerated by textual heuristics
84
+- prefer:
85
+  - deletion
86
+  - a stricter typed stop/fail state
87
+  - explicit follow-through requirements derived from runtime artifacts or session state
88
+  over keeping broad “continue once more” behavior
89
+- where a continuation path remains, make the required evidence explicit and persisted
90
+
91
+The goal is to keep following the Sprint 13 / Sprint 17 / Sprint 18 line: the runtime should proceed for a clear typed reason or stop honestly, not continue because the model “probably meant well.”
92
+
93
+### 3. Unify completion, workflow, and repair accountability into one operator-facing timeline
94
+
95
+Loader now has workflow timeline entries and a separate completion trace. That is better than hidden state, but still fragmented.
96
+
97
+Implementation targets:
98
+
99
+- define a compact unified policy timeline or policy event model that can carry:
100
+  - workflow routing/handoff decisions
101
+  - completion-policy outcomes
102
+  - repair / retry / recovery decisions
103
+  - terminal stop reasons
104
+- decide whether the existing workflow timeline should absorb completion/repair events or whether a sibling policy timeline is the cleaner contract
105
+- persist enough of that state to survive resume and make post-mortem inspection more useful
106
+- surface it through existing product seams, likely one of:
107
+  - `loader workflow show`
108
+  - `loader session show`
109
+  - a narrowly-scoped new policy-focused surface if and only if it is cleaner than overloading the workflow view
110
+
111
+The goal is that operators can answer “why did Loader keep going, stop, retry, or accept this result?” from one coherent surface instead of stitching together multiple tables by hand.
112
+
113
+### 4. Keep the ref relationship explicit and healthy
114
+
115
+Implementation targets:
116
+
117
+- use claw-code for:
118
+  - lane-event shape
119
+  - session/runtime control seams
120
+  - explicit policy and green-contract ownership
121
+- use OMX for:
122
+  - verifier/follow-through accountability
123
+  - session/runtime operator clarity
124
+  - stronger prompt/runtime contract thinking
125
+- do not add work just because the refs have it
126
+- do add work when the refs reveal a real Loader weakness in:
127
+  - honesty
128
+  - inspectability
129
+  - shell minimalism
130
+  - follow-through
131
+
132
+The goal is to keep Loader reference-guided and self-aware, not to drift into either blind feature copying or isolated local optimization.
133
+
134
+## Testing strategy
135
+
136
+- unit coverage for:
137
+  - any new public-shell or launcher helper that further reduces `Agent` ownership
138
+  - tightened continuation/terminal-stop decisions
139
+  - unified policy timeline serialization and restoration
140
+- runtime coverage for:
141
+  - no regression in normal follow-through on non-mutating and mutating tasks
142
+  - honest terminal behavior where continuation heuristics were deleted or narrowed
143
+  - session/workflow inspection of the unified policy/debug story
144
+- regression coverage for:
145
+  - no drift back toward `agent/loop.py` owning extracted shell glue
146
+  - no silent reintroduction of soft continuation phrasing after harder stop conditions
147
+  - no loss of the current workflow/completion/explore inspection surfaces while timelines are unified
148
+
149
+## Definition of done
150
+
151
+- `agent/loop.py` shrinks again or becomes materially more facade-like even if line count only drops modestly
152
+- Loader deletes or hardens more of the remaining continuation heuristics instead of merely explaining them better
153
+- operators can inspect one more coherent runtime policy story after the fact
154
+- Sprint 18’s completion-trace and public-shell gains remain green
155
+- the parity baseline remains green after the Sprint 19 shell and policy tightening
156
+
157
+## Explicitly out of scope
158
+
159
+- full claw-code policy-engine parity
160
+- multi-agent or team orchestration
161
+- AST-aware semantic diffs
162
+- a broad visual workflow UI
163
+- rich permission-rule editing UX