tenseleyflow/loader / 2671488

Browse files

Plan Sprint 21 evidence provenance work

Authored by espadonne
SHA
2671488d11f03876982586d418af401919668afe
Parents
c3f6a1b
Tree
3653d1e

2 changed files

StatusFile+-
M .docs/sprints/index.md 4 0
A .docs/sprints/sprint21.md 159 0
.docs/sprints/index.mdmodified
@@ -79,6 +79,10 @@ The plan was reshaped after a deeper validation pass against `refs/claw-code` an
7979
 
8080
 - [Sprint 20](sprint20.md) — Canonical Policy Events, Verifier-Backed Follow-Through, and Facade Settlement
8181
 
82
+## Phase 19: Evidence Provenance and Runtime-First Narrowing
83
+
84
+- [Sprint 21](sprint21.md) — Evidence Provenance, Read-Model Cleanup, and Runtime-First API
85
+
8286
 ## Working principles
8387
 
8488
 - Each sprint must end with stronger runtime reliability, not just more features.
.docs/sprints/sprint21.mdadded
@@ -0,0 +1,159 @@
1
+# Sprint 21: Evidence Provenance, Read-Model Cleanup, and Runtime-First API
2
+
3
+## Prerequisites
4
+
5
+Sprint 20
6
+
7
+## Goals
8
+
9
+Take the next honest step after Sprint 20: move Loader's completion and verification story from "better heuristics with canonical policy events" toward stronger evidence provenance, reduce compatibility read-model duplication where the canonical workflow timeline already carries the truth, and begin narrowing internal callers toward a runtime-first API instead of treating `Agent` as the only natural seam.
10
+
11
+Sprint 20 changed the remaining debt in a useful way:
12
+
13
+- the workflow timeline is now the canonical policy/accountability artifact, including live completion-trace projection
14
+- follow-through checks now use stronger DoD/runtime evidence instead of only textual heuristics
15
+- the remaining `Agent` shell is explicitly documented and guarded as a public facade
16
+- but completion/verification evidence is still mostly flattened into human-readable strings rather than typed provenance
17
+- compatibility/read-model surfaces still rely on a few projections that are honest but not yet minimal
18
+- internal callers still treat `Agent` as the default runtime entry seam even though the shell is now explicitly a compatibility/public facade
19
+
20
+Sprint 21 should keep using the references as architectural guardrails, not as a feature-copy list.
21
+
22
+The standard remains:
23
+
24
+- use claw-code to sharpen canonical event ownership, green-contract discipline, and runtime-first seams
25
+- use OMX to sharpen verifier/accountability provenance and evidence-backed follow-through
26
+- do not add work just because the refs have it
27
+- do add work when the refs show that Loader is still too stringly-typed, too duplicative, or too dependent on a compatibility shell
28
+
29
+`audit.txt` remains a guardrail against wrapper-heavy drift and soft rescue behavior. It is not the factual roadmap.
30
+
31
+The references for this sprint are:
32
+
33
+- `refs/claw-code/rust/crates/runtime/src/policy_engine.rs`
34
+- `refs/claw-code/rust/crates/runtime/src/green_contract.rs`
35
+- `refs/claw-code/rust/crates/runtime/src/lane_events.rs`
36
+- `refs/claw-code/rust/crates/runtime/src/session_control.rs`
37
+- `refs/claw-code/rust/crates/runtime/src/bootstrap.rs`
38
+- `refs/claw-code/rust/crates/runtime/src/conversation.rs`
39
+- `refs/claw-code/PARITY.md`
40
+- `refs/oh-my-codex/src/verification/verifier.ts`
41
+- `refs/oh-my-codex/src/autoresearch/contracts.ts`
42
+- `refs/oh-my-codex/src/autoresearch/runtime.ts`
43
+- `refs/oh-my-codex/src/hooks/session.ts`
44
+- `refs/oh-my-codex/src/hooks/prompt-guidance-contract.ts`
45
+- `.docs/PARITY.md`
46
+- `.docs/audit.txt`
47
+- `.docs/audit_sprints/trunk_sitrep.md`
48
+- `.docs/sprints/sprint20.md`
49
+
50
+## Deliverables
51
+
52
+### 1. Introduce typed evidence provenance for completion and verification
53
+
54
+Sprint 20 strengthened follow-through, but most of the contract still collapses into free-text evidence summaries too early.
55
+
56
+Implementation targets:
57
+
58
+- inventory where completion/verification evidence is currently flattened into strings across:
59
+  - `src/loader/runtime/task_completion.py`
60
+  - `src/loader/runtime/completion_trace.py`
61
+  - `src/loader/runtime/policy_timeline.py`
62
+  - `src/loader/runtime/finalization.py`
63
+  - `src/loader/runtime/dod.py`
64
+  - `src/loader/runtime/workflow_policy.py`
65
+- define a small typed provenance model that can represent things like:
66
+  - verification command ran and passed/failed
67
+  - verification command was still missing
68
+  - tracked work item remained incomplete
69
+  - artifact/touchpoint evidence existed or was contradicted
70
+  - claimed runtime outcome was backed by observed output
71
+- prefer structured provenance that can still be rendered into human-readable summaries, instead of making strings the primary contract
72
+- thread that provenance through completion policy and canonical policy events where it materially improves honesty or inspectability
73
+
74
+The goal is not to build a fake theorem prover. The goal is to stop throwing away runtime evidence structure too early.
75
+
76
+### 2. Reduce read-model duplication around the canonical workflow timeline
77
+
78
+Sprint 20 made the workflow timeline canonical, but a few read models still feel more coupled than they need to be.
79
+
80
+Implementation targets:
81
+
82
+- inventory where compatibility/read-model projections still depend on direct mutation or duplicated logic across:
83
+  - `src/loader/runtime/completion_trace.py`
84
+  - `src/loader/runtime/session.py`
85
+  - `src/loader/runtime/inspection.py`
86
+  - `src/loader/runtime/events.py`
87
+  - any nearby status/session helper that reconstructs policy state manually
88
+- make sure projections like completion traces and latest-policy summaries are clearly derivations from canonical policy events instead of semi-independent contracts
89
+- remove any remaining direct writes or state bookkeeping that are only there to keep parallel policy read models in sync
90
+- keep compact operator-facing read models where they help, but make their derived nature explicit in code and tests
91
+
92
+The goal is one canonical truth plus honest projections, not a forest of near-duplicates.
93
+
94
+### 3. Start the runtime-first internal API transition below the public `Agent` facade
95
+
96
+Sprint 20 settled `Agent` as the public compatibility shell. Sprint 21 should stop using that shell as the default internal seam where it no longer needs to be.
97
+
98
+Implementation targets:
99
+
100
+- inventory current internal call sites that still instantiate or consume `Agent` when a runtime-first seam would be cleaner, especially in:
101
+  - launcher/bootstrap helpers
102
+  - CLI/TUI integration code
103
+  - tests that are really exercising runtime behavior rather than public compatibility
104
+- define a small runtime-first entry contract for internal consumers where it clearly reduces shell coupling
105
+- keep `Agent` as the public compatibility surface, but begin migrating internal runtime-oriented callers away from assuming that `Agent` is the only valid execution owner
106
+- document what remains intentionally public-shell-only versus what is now runtime-first
107
+
108
+The goal is not to delete `Agent`. The goal is to make `Agent` clearly public/compatibility-facing while runtime internals use runtime-first seams by default.
109
+
110
+### 4. Sharpen operator visibility for evidence-backed stop/continue decisions
111
+
112
+Sprint 20 improved policy summaries, but the evidence itself is still only partially visible.
113
+
114
+Implementation targets:
115
+
116
+- improve the existing operator views so users can answer:
117
+  - what exact evidence was missing when Loader stopped?
118
+  - what exact evidence satisfied the completion contract?
119
+  - which policy event carried that evidence?
120
+- prefer improving:
121
+  - `loader workflow show`
122
+  - `loader session show`
123
+  - `loader status`
124
+  over inventing a new command unless a new surface is clearly cleaner
125
+- add concise rollups first, and expose deeper provenance only where it materially helps post-mortem inspection
126
+
127
+The goal is to make Loader easier to audit after the fact, not simply more verbose.
128
+
129
+## Testing strategy
130
+
131
+- unit coverage for:
132
+  - typed evidence-provenance normalization/rendering
133
+  - derived read-model projections from the canonical workflow timeline
134
+  - any new runtime-first internal entry contract below `Agent`
135
+- runtime coverage for:
136
+  - honest finalization with explicit evidence provenance when completion still fails
137
+  - successful completion paths that now surface structured proof instead of only summary strings
138
+  - status/session/workflow inspection of the evidence-backed policy story
139
+- regression coverage for:
140
+  - no drift back toward peer policy artifacts beside the canonical workflow timeline
141
+  - no drift back toward `Agent` as the default internal seam when a runtime-first contract exists
142
+  - no loss of the current compact operator read models while provenance becomes richer
143
+
144
+## Definition of done
145
+
146
+- Loader preserves one canonical policy/accountability artifact while making evidence provenance more structured
147
+- completion/verification evidence is less stringly-typed and more inspectable without weakening honesty
148
+- internal runtime-oriented code has at least one cleaner runtime-first seam below the public `Agent` facade
149
+- existing status/session/workflow surfaces answer stop/continue questions with clearer evidence context
150
+- Sprint 20's canonical-policy and facade-settlement gains remain green
151
+
152
+## Explicitly out of scope
153
+
154
+- full claw-code policy-engine parity
155
+- model-authored verifier narratives as a mandatory dependency
156
+- multi-agent or team orchestration
157
+- AST-aware semantic diffs
158
+- a broad visual workflow UI
159
+- rich permission-rule editing UX