tenseleyflow/loader / cf8a60e

Browse files

Plan Sprint 26 verification attempt timeline work

Authored by espadonne
SHA
cf8a60e64d087d6192cd1516511e3a218c19179a
Parents
5ab1424
Tree
dc9732d

2 changed files

StatusFile+-
M .docs/sprints/index.md 4 0
A .docs/sprints/sprint26.md 165 0
.docs/sprints/index.mdmodified
@@ -99,6 +99,10 @@ The plan was reshaped after a deeper validation pass against `refs/claw-code` an
9999
 
100100
 - [Sprint 25](sprint25.md) — Public Runtime API, Verification Attempts, and Boundary Narrowing
101101
 
102
+## Phase 24: Attempt Histories and Runtime API Delamination
103
+
104
+- [Sprint 26](sprint26.md) — Verification Attempt Timelines and Public Facade Delamination
105
+
102106
 ## Working principles
103107
 
104108
 - Each sprint must end with stronger runtime reliability, not just more features.
.docs/sprints/sprint26.mdadded
@@ -0,0 +1,165 @@
1
+# Sprint 26: Verification Attempt Timelines and Public Facade Delamination
2
+
3
+## Prerequisites
4
+
5
+Sprint 25
6
+
7
+## Goals
8
+
9
+Take the next honest step after Sprint 25: preserve more than attempt labels in the verifier lifecycle, make the runtime-owned API more authoritative as Loader's practical outer boundary, and keep shrinking the gap between "runtime-first internally" and "runtime-first by default" without pretending the public compatibility surface is ready to disappear.
10
+
11
+Sprint 25 changed the remaining debt in a useful way:
12
+
13
+- Loader now has a runtime-owned shell API boundary in `src/loader/runtime/runtime_api.py`
14
+- verification lifecycle now preserves explicit attempt identity across planned, pending, stale, skipped, and observed states
15
+- completion policy can explain active versus superseded proof in attempt-aware terms
16
+- operator surfaces can now show runtime-boundary summaries and attempt-aware verification state in CLI and TUI
17
+- but Loader still says more about attempt labels than about attempt timeline facts such as when an attempt was first planned, when it actually started, when it completed, and what exactly superseded it
18
+- and `Agent` plus `runtime.public_shell` still remain the documented public compatibility shell even though the runtime-owned boundary is much stronger now
19
+
20
+Sprint 26 should keep using the references as architectural guardrails, not as a feature-copy list.
21
+
22
+The standard remains:
23
+
24
+- use claw-code to sharpen runtime/bootstrap/session ownership, policy/accountability event structure, and explicit outer runtime boundaries
25
+- use OMX to sharpen verifier attempt visibility, freshness reasoning, and the audit trail around incomplete, stale, superseded, or resumed proof
26
+- do not add work just because the refs have it
27
+- do add work when the refs show that Loader is still too compatibility-shell-bound or too coarse in its verification-attempt history
28
+
29
+`audit.txt` remains a guardrail against wrapper-heavy drift and compatibility-by-habit. It is not the factual roadmap.
30
+
31
+The references for this sprint are:
32
+
33
+- `refs/claw-code/rust/crates/runtime/src/bootstrap.rs`
34
+- `refs/claw-code/rust/crates/runtime/src/session_control.rs`
35
+- `refs/claw-code/rust/crates/runtime/src/conversation.rs`
36
+- `refs/claw-code/rust/crates/runtime/src/lane_events.rs`
37
+- `refs/claw-code/rust/crates/runtime/src/green_contract.rs`
38
+- `refs/claw-code/PARITY.md`
39
+- `refs/oh-my-codex/src/verification/verifier.ts`
40
+- `refs/oh-my-codex/src/autoresearch/runtime.ts`
41
+- `refs/oh-my-codex/src/autoresearch/contracts.ts`
42
+- `refs/oh-my-codex/src/hooks/session.ts`
43
+- `.docs/PARITY.md`
44
+- `.docs/audit.txt`
45
+- `.docs/audit_sprints/trunk_sitrep.md`
46
+- `.docs/sprints/sprint25.md`
47
+
48
+## Deliverables
49
+
50
+### 1. Promote verification attempts into richer timeline records
51
+
52
+Sprint 25 gave Loader attempt identity. Sprint 26 should make those attempts feel like real runtime history, not only labeled states.
53
+
54
+Implementation targets:
55
+
56
+- inventory where Loader already knows attempt-order or lifecycle facts across:
57
+  - `src/loader/runtime/dod.py`
58
+  - `src/loader/runtime/finalization.py`
59
+  - `src/loader/runtime/tool_batches.py`
60
+  - `src/loader/runtime/verification_observations.py`
61
+  - `src/loader/runtime/workflow_policy.py`
62
+  - `src/loader/runtime/policy_timeline.py`
63
+  - `src/loader/runtime/workflow_timeline_read_model.py`
64
+- define a typed verification-attempt timeline model that can preserve things like:
65
+  - when an attempt was first planned
66
+  - when it actually became active
67
+  - when it completed or was skipped
68
+  - when and why it became stale or was superseded
69
+  - which commands/evidence bundle belonged to that attempt
70
+- keep that richer attempt model inside the canonical policy/accountability story instead of creating a side log
71
+
72
+The goal is to make Loader answer not only "which attempt is active?" but "what happened to attempt 2, when did attempt 3 actually start, and what proof belongs to each?"
73
+
74
+### 2. Tighten completion/freshness policy around attempt history
75
+
76
+Once attempt history is richer, completion policy should stop flattening that history into one summary string too early.
77
+
78
+Implementation targets:
79
+
80
+- connect completion and reentry decisions more directly to:
81
+  - attempt planning versus active-start moments
82
+  - completed versus superseded attempt ordering
83
+  - freshness relative to later mutating work
84
+  - explicit gaps between planned proof, running proof, and finished proof
85
+- preserve a clear distinction between:
86
+  - proof that is scheduled but has not started
87
+  - proof that started but has not completed
88
+  - proof that completed and passed
89
+  - proof that completed and failed
90
+  - proof that was once green but is now stale because a later attempt superseded it
91
+- ensure the canonical policy story explains why Loader trusted, rejected, or waited on one attempt instead of another
92
+
93
+The goal is to make Loader's stop/continue/retry logic more auditable when verification spans multiple retries or resumes.
94
+
95
+### 3. Narrow the public facade below `Agent` again
96
+
97
+Sprint 25 added a runtime-owned shell API. Sprint 26 should make that boundary more authoritative in practice.
98
+
99
+Implementation targets:
100
+
101
+- inventory what still materially depends on:
102
+  - `src/loader/agent/loop.py`
103
+  - `src/loader/runtime/public_shell.py`
104
+  - `src/loader/runtime/runtime_api.py`
105
+  - `src/loader/runtime/runtime_handle.py`
106
+  - `src/loader/cli/main.py`
107
+  - `src/loader/ui/app.py`
108
+- migrate at least one more real caller or ownership seam onto the runtime-owned API if it is still compatibility-shaped by habit
109
+- make remaining `Agent`-owned behavior explicitly compatibility-facing rather than ambiguous runtime glue
110
+- add or extend boundary tests so future work does not drift back toward `Agent` as the assumed outer runtime owner
111
+
112
+The goal is not to delete `Agent`. The goal is to make the runtime-owned API the default answer more often, and the compatibility facade the deliberate exception.
113
+
114
+### 4. Improve operator visibility for attempt history and public boundary
115
+
116
+Once attempt records get richer and the outer boundary gets cleaner, the existing surfaces should tell that story with less reconstruction.
117
+
118
+Implementation targets:
119
+
120
+- improve the current surfaces so users can answer:
121
+  - which runtime/public boundary handled this session?
122
+  - which verification attempt is active right now?
123
+  - when did it become planned, active, stale, or superseded?
124
+  - what evidence belongs to the active attempt versus an older one?
125
+- prefer improving:
126
+  - `loader status`
127
+  - `loader session show`
128
+  - `loader workflow show`
129
+  - the TUI status surface
130
+  over inventing a new command unless one is clearly cleaner
131
+- keep concise rollups first, and expose deeper attempt history only where it materially improves debugging
132
+
133
+The goal is to make Loader's runtime boundary and verifier history easier to audit after the fact, not simply more verbose.
134
+
135
+## Testing strategy
136
+
137
+- unit coverage for:
138
+  - richer verification-attempt timeline normalization and persistence
139
+  - completion/freshness decisions that now depend on attempt history
140
+  - the narrower runtime-owned API boundary and any new caller migration
141
+- runtime coverage for:
142
+  - a planned attempt that later becomes active and then completes
143
+  - a completed attempt that becomes stale after new mutating work
144
+  - a resumed session whose active attempt history and runtime-owner boundary remain coherent
145
+- regression coverage for:
146
+  - no duplicate verification-attempt truth beside the canonical policy timeline
147
+  - no drift back toward `Agent` as the assumed outer runtime owner when a runtime-owned API exists
148
+  - no regression in Sprint 25's attempt-aware completion/freshness and operator surfaces
149
+
150
+## Definition of done
151
+
152
+- Loader preserves richer verification-attempt history than plain attempt labels inside the canonical policy/accountability story
153
+- completion and freshness policy can explain which attempt was planned, active, completed, stale, or superseded and why
154
+- the runtime-owned API below `Agent` is more authoritative in at least one more real integration seam, or remaining `Agent` ownership is explicitly justified as compatibility-only
155
+- existing status/session/workflow/TUI surfaces expose the stronger boundary and attempt-history story without multiplying product commands
156
+- Sprint 25's runtime-boundary and attempt-aware verification gains remain green
157
+
158
+## Explicitly out of scope
159
+
160
+- deleting `Agent` as the public compatibility surface
161
+- full claw-code policy-engine parity
162
+- model-authored verifier narratives as a required runtime dependency
163
+- AST-aware semantic diffs
164
+- a broad visual workflow UI redesign
165
+- multi-agent or team orchestration