tenseleyflow/loader / 28a4d01

Browse files

Plan Sprint 23 runtime-first integration work

Authored by espadonne
SHA
28a4d01b270958f7d41797f28489c7d3d40a14c3
Parents
0d10ade
Tree
fac4a1a

2 changed files

StatusFile+-
M .docs/sprints/index.md 4 0
A .docs/sprints/sprint23.md 158 0
.docs/sprints/index.mdmodified
@@ -87,6 +87,10 @@ The plan was reshaped after a deeper validation pass against `refs/claw-code` an
8787
 
8888
 - [Sprint 22](sprint22.md) — Runtime Entry API, Verification Observations, and Compatibility Narrowing
8989
 
90
+## Phase 21: Runtime-First Integrations and Verification Producers
91
+
92
+- [Sprint 23](sprint23.md) — Runtime-First Integrations, Verification Producers, and Facade Narrowing
93
+
9094
 ## Working principles
9195
 
9296
 - Each sprint must end with stronger runtime reliability, not just more features.
.docs/sprints/sprint23.mdadded
@@ -0,0 +1,158 @@
1
+# Sprint 23: Runtime-First Integrations, Verification Producers, and Facade Narrowing
2
+
3
+## Prerequisites
4
+
5
+Sprint 22
6
+
7
+## Goals
8
+
9
+Take the next honest step after Sprint 22: stop treating the runtime-first owner as mostly a testing seam, move more real integrations onto that seam where `Agent` is only habit, and widen the new verification-observation contract from a finalization/result story into a more direct execution-time producer story.
10
+
11
+Sprint 22 improved the remaining debt in a useful way:
12
+
13
+- Loader now carries typed verification observations through canonical policy events and completion-stop decisions
14
+- the operator surfaces can now explain recent verification from canonical policy observations first instead of stitching together post-hoc summaries
15
+- the canonical workflow timeline is stronger as the accountability story
16
+- but the planned runtime-first entry promotion beyond tests did not land
17
+- `Agent` still remains the default construction seam for many real internal paths even though Loader now has a runtime-owned internal handle and explicit public facade boundaries
18
+- verification observations are still strongest at the DoD/finalization edge; Loader still captures less from the earlier verification lifecycle than it now knows how to represent
19
+
20
+Sprint 23 should keep using the references as architectural guardrails, not as a feature-copy list.
21
+
22
+The standard remains:
23
+
24
+- use claw-code to sharpen runtime-first bootstrap/session ownership, event capture, and narrower public boundaries
25
+- use OMX to sharpen direct verifier-observation capture and evidence-backed accountability
26
+- do not add work just because the refs have it
27
+- do add work when the refs show that Loader is still too shell-bound, too late in its evidence capture, or too fuzzy about which boundary owns what
28
+
29
+`audit.txt` remains a guardrail against wrapper-heavy drift and soft compatibility habits. It is not the factual roadmap.
30
+
31
+The references for this sprint are:
32
+
33
+- `refs/claw-code/rust/crates/runtime/src/bootstrap.rs`
34
+- `refs/claw-code/rust/crates/runtime/src/session_control.rs`
35
+- `refs/claw-code/rust/crates/runtime/src/conversation.rs`
36
+- `refs/claw-code/rust/crates/runtime/src/policy_engine.rs`
37
+- `refs/claw-code/rust/crates/runtime/src/lane_events.rs`
38
+- `refs/claw-code/rust/crates/runtime/src/green_contract.rs`
39
+- `refs/claw-code/PARITY.md`
40
+- `refs/oh-my-codex/src/verification/verifier.ts`
41
+- `refs/oh-my-codex/src/autoresearch/contracts.ts`
42
+- `refs/oh-my-codex/src/autoresearch/runtime.ts`
43
+- `refs/oh-my-codex/src/hooks/session.ts`
44
+- `.docs/PARITY.md`
45
+- `.docs/audit.txt`
46
+- `.docs/audit_sprints/trunk_sitrep.md`
47
+- `.docs/sprints/sprint22.md`
48
+
49
+## Deliverables
50
+
51
+### 1. Promote the runtime-first entry contract into real internal integrations
52
+
53
+Sprint 22 left this as explicit debt. Sprint 23 should land it for real.
54
+
55
+Implementation targets:
56
+
57
+- inventory internal call sites that still instantiate or route through `Agent` by default even though they are consuming runtime-owned behavior, especially around:
58
+  - launcher/bootstrap helpers
59
+  - CLI/TUI integration seams that are not actually testing the public compatibility contract
60
+  - harnesses, utilities, and inspection/session helpers that only need runtime ownership
61
+- define or refine a small runtime-first entry contract for internal consumers that need:
62
+  - runtime bootstrap/session ownership
63
+  - launcher/public-shell execution
64
+  - inspection, continuity, or workflow/accountability hooks
65
+- migrate a bounded but real set of internal integrations onto that seam
66
+- explicitly document what remains intentionally public-shell-only versus what is now runtime-first by default
67
+
68
+The goal is not to delete `Agent`. The goal is to stop using it internally by reflex where a runtime-owned seam is cleaner and already exists.
69
+
70
+### 2. Expand verification observations to earlier execution-time producers
71
+
72
+Sprint 22 made verification observations real, but they still enter the story relatively late.
73
+
74
+Implementation targets:
75
+
76
+- inventory where verification-related facts are currently available earlier than finalization across:
77
+  - `src/loader/runtime/dod.py`
78
+  - `src/loader/runtime/finalization.py`
79
+  - `src/loader/runtime/tool_batches.py`
80
+  - `src/loader/runtime/executor.py`
81
+  - `src/loader/runtime/workflow_lanes.py`
82
+  - `src/loader/runtime/workflow_policy.py`
83
+- identify which observation kinds should be emitted closer to execution, such as:
84
+  - verification command planned/requested
85
+  - verification command actually executed
86
+  - verification output observed and classified as passed/failed/contradictory
87
+  - verification was intentionally skipped, stale, or still pending
88
+  - observed artifact/touchpoint evidence materially backed or blocked completion before finalization
89
+- thread those earlier observations into the canonical policy story without creating a peer truth beside the workflow timeline
90
+
91
+The goal is to move Loader’s accountability closer to “this is what we observed while verification happened” instead of only “this is what the runtime concluded later.”
92
+
93
+### 3. Narrow the remaining public facade boundary on purpose
94
+
95
+Sprint 20 settled `Agent` as the public shell. Sprint 23 should keep making that shell smaller and more explicit.
96
+
97
+Implementation targets:
98
+
99
+- inventory what still lives in `src/loader/agent/loop.py` and nearby public-shell glue
100
+- identify which pieces are:
101
+  - true public compatibility API
102
+  - UI integration seam
103
+  - leftover runtime ownership that can move below the shell now
104
+- move the still-obviously-runtime pieces below the public shell where that reduces ambiguity
105
+- add or extend direct boundary tests so internal code does not drift back toward `Agent` ownership once a runtime seam exists
106
+
107
+The goal is not “make the file smaller” for its own sake. The goal is that future work has a clearer answer to what the public shell is for.
108
+
109
+### 4. Sharpen operator visibility for runtime-first ownership and observed verification
110
+
111
+Once more of the real integration paths go runtime-first and more observations are captured earlier, the existing product surfaces should make that easier to audit.
112
+
113
+Implementation targets:
114
+
115
+- improve the existing surfaces so users can answer:
116
+  - which runtime-owned path produced the current session/accountability state?
117
+  - what verification was actually observed earlier in the turn versus only concluded at finalization?
118
+  - what evidence is pending, contradicted, or already satisfied?
119
+- prefer improving:
120
+  - `loader status`
121
+  - `loader session show`
122
+  - `loader workflow show`
123
+  over inventing a new command unless a new surface is clearly cleaner
124
+- keep concise rollups first, and expose deeper ownership/observation detail only where it materially improves post-mortem debugging
125
+
126
+The goal is to make Loader easier to audit after the fact, not simply more verbose.
127
+
128
+## Testing strategy
129
+
130
+- unit coverage for:
131
+  - runtime-first entry helpers adopted in real internal integration paths
132
+  - earlier verification-observation producer normalization and persistence
133
+  - public-shell boundary helpers and import/boundary guards
134
+- runtime coverage for:
135
+  - successful completion with observed verification facts emitted before finalization
136
+  - failed or pending verification that now leaves a clearer producer-backed policy trail
137
+  - internal integration paths that no longer need `Agent` by default
138
+- regression coverage for:
139
+  - no drift back toward `Agent` as the default internal seam when a runtime-owned seam exists
140
+  - no duplicate truth beside the canonical policy/accountability story
141
+  - no regression in Sprint 22’s observed-verification inspection and stop/continue honesty
142
+
143
+## Definition of done
144
+
145
+- Loader has at least one more real internal integration path using a runtime-first entry seam below `Agent`
146
+- verification observations are emitted from at least one earlier execution-time producer instead of only the finalization edge
147
+- the remaining public-shell boundary is smaller or more explicitly defended on purpose
148
+- existing status/session/workflow surfaces expose the stronger runtime-first and verification-observation story without multiplying commands
149
+- Sprint 22’s observed-verification and accountability gains remain green
150
+
151
+## Explicitly out of scope
152
+
153
+- deleting `Agent` as the public compatibility surface
154
+- full claw-code policy-engine parity
155
+- model-authored verifier narratives as a required runtime dependency
156
+- AST-aware semantic diffs
157
+- a broad visual workflow UI
158
+- multi-agent or team orchestration