tenseleyflow/loader / e5bc716

Browse files

Record sprint 12 workflow contract status

Authored by espadonne
SHA
e5bc716340fcdf7b9cbd9913748a252e701ec9b3
Parents
012f8e4
Tree
28f5cf7

4 changed files

StatusFile+-
M .docs/PARITY.md 9 6
M .docs/audit_sprints/index.md 2 2
M .docs/audit_sprints/sprint12.md 10 0
M tests/fixtures/runtime_parity_manifest.json 2 2
.docs/PARITY.mdmodified
@@ -22,6 +22,9 @@ This file tracks the current deterministic runtime baseline for Loader. It stays
2222
 - durable project memory in `.loader/project-memory.json` and working notes in `.loader/notepad.md`
2323
 - native memory tools for `project_memory_*` and `notepad_*`
2424
 - heuristic workflow routing across `clarify` → `plan` → `execute` → `verify`
25
+- clarify mode as an explicit single-question brief flow that returns to execute mode
26
+- plan mode as explicit single-pass implementation and verification artifact generation
27
+- persisted workflow-artifact status and source metadata in session state when execute consumes or reuses workflow artifacts
2528
 - mode-specific system prompts for clarify, plan, execute, and verify
2629
 - explicit verify/fix loops for mutating tasks, with a bounded retry budget
2730
 - verify/fix retries return to execute mode without re-triggering clarify or plan
@@ -47,10 +50,10 @@ This file tracks the current deterministic runtime baseline for Loader. It stays
4750
 ## Known weak spots
4851
 
4952
 - the core turn loop moved into [`src/loader/runtime/conversation.py`](../src/loader/runtime/conversation.py), but it still owns workflow routing, remaining loop safeguards, and other coordination logic that remains more heuristic-heavy than the reference runtime in `refs/claw-code`
50
-- planning, decomposition, and several helper behaviors still live in [`src/loader/agent/loop.py`](../src/loader/agent/loop.py), so ownership is cleaner than Sprint 00 but not fully simplified yet
53
+- workflow routing is cleaner than Sprint 00, but the router and artifact bridge still live in [`src/loader/runtime/conversation.py`](../src/loader/runtime/conversation.py) and remain more heuristic than the reference runtimes
5154
 - the mode router is still heuristic-only; Loader does not yet implement OMX's deeper ambiguity scoring, pressure-pass discipline, or branch-specific routing policy
52
-- clarify mode currently stops after one structured question and one brief artifact; it does not yet run a deeper Socratic loop
53
-- plan mode is still a single-pass artifact generator, not a Planner/Architect/Critic consensus loop
55
+- clarify mode is now explicitly a single-question brief flow, not a deeper Socratic protocol
56
+- plan mode is now explicitly a single-pass artifact generator, not a Planner/Architect/Critic consensus loop
5457
 - DoD acceptance criteria and pending items are stronger than Sprint 02, but todo progress is still lightly structured compared with claw-code's richer workflow state
5558
 - evidence summaries are deterministic runtime summaries of captured output, not model-written verification narratives
5659
 - session compaction summaries are heuristic runtime summaries, not model-assisted continuity artifacts
@@ -109,13 +112,13 @@ The auditable manifest lives at [`tests/fixtures/runtime_parity_manifest.json`](
109112
 
110113
 As of 2026-04-07:
111114
 
112
-- `uv run pytest -q`: 210 passed
115
+- `uv run pytest -q`: 211 passed
113116
 - `tests/test_runtime_harness.py` is fully green, including permission-mode parity, DoD verify/fix coverage, workflow routing parity, and the original contract regression
114117
 - `tests/test_dod.py` covers persistence, sizing boundaries, and verification command derivation
115118
 - `tests/test_workflow.py` covers router heuristics, clarify/plan artifact round trips, DoD workflow links, and todo-to-DoD syncing
116119
 - `tests/test_workflow_runtime.py` covers clarify routing, plan routing, and verify-fix workflow handoff
117120
 - `tests/test_workflow_tools.py` and `tests/test_workflow_runtime_tools.py` cover `TodoWrite`, `AskUserQuestion`, and runtime callback plumbing
118
-- `tests/test_session_state.py` covers session persistence, resume, rotation, compaction persistence, cumulative usage rollups, and persisted permission-policy metadata
121
+- `tests/test_session_state.py` covers session persistence, resume, rotation, compaction persistence, cumulative usage rollups, persisted permission-policy metadata, and persisted workflow-artifact state
119122
 - `tests/test_compaction.py` covers claw-style line compression and compacted continuation-message behavior
120123
 - `tests/test_memory_tools.py` covers project-memory writes, notepad writes, lifecycle-hook mirroring, and DoD-summary capture into project memory
121124
 - `tests/test_cli_resume.py` covers `--resume` argument rewriting for latest and named-session restore
@@ -135,7 +138,7 @@ As of 2026-04-07:
135138
 - Sprint 01 turned the original `tool_call_id` regression green by fixing the message contract, not by weakening the test.
136139
 - Sprint 02 replaced "looks done" completion for mutating tasks with a real verify/fix gate, but it has not yet reached the richer workflow contracts described in the report and Sprint 04+.
137140
 - Sprint 03 established permission modes, hooks, and tool hardening, but it intentionally stops short of claw-code's fuller rule engine and prompt/allow permission variants.
138
-- Sprint 04 adds routing, artifacts, and structured user questions, but it is still a first-pass workflow layer rather than full OMX consensus planning or deep interview rigor.
141
+- Sprint 04's workflow layer is now explicitly scoped as lightweight: single-question clarify, single-pass planning, explicit artifact bridging, and no legacy decomposition path.
139142
 - Sprint 05 adds durable sessions, resume, compaction, and native memory/notepad tools, but it stops short of Sprint 06's inspectable session/status product surfaces and still uses heuristic continuity summaries rather than richer semantic memory extraction.
140143
 - Sprint 06 adds inspectable product surfaces, a constrained explore lane, and a broader tool registry, but it still stops short of interactive explore workflows, richer git ergonomics, AST/LSP-aware editing, or any multi-agent/team runtime.
141144
 - Sprint 07 is complete: Loader now has prompt/allow modes, rule-based permission policy, policy-backed prompting, persisted policy inspection state, and smaller assistant-turn/tool-batch/finalization runtime seams, but it still stops short of a richer rule UX, deeper policy sandboxing, and the more opinionated workflow/runtime contracts in the refs.
.docs/audit_sprints/index.mdmodified
@@ -4,14 +4,14 @@ These sprints translate the 2026-04-07 audit in `.docs/audit.txt` into a post-Sp
44
 
55
 The repo has moved since the audit snapshot. On this planning branch:
66
 
7
-- `uv run pytest -q` is green with `210 passed`
7
+- `uv run pytest -q` is green with `211 passed`
88
 - Sprint 08's prompt builder, turn-phase tracking, and permission inspection surfaces are already present on `HEAD`
99
 - Sprint 09 interactive validation has started; `loader doctor` now distinguishes metadata reachability from live chat readiness, and both native-capable and `json_tag` Ollama lanes currently fail the live chat probe on `/api/chat` with HTTP 500
1010
 - Sprint 10's runtime-ownership inversion is now materially in place: `src/loader/runtime/` no longer reaches into `Agent` directly, and the remaining legacy dependencies are explicit `RuntimeLegacyServices` seams
1111
 - Sprint 11 has already deleted several puppet behaviors and collapsed the raw-text fallback stack onto the shared parser used by the runtime and Ollama text fallback paths
1212
 - the central debt still remains:
1313
   - the runtime still carries some recovery and safety heuristics around the main turn contract, even though the inline completion/critique rescue layers have now been deleted
14
-  - clarify/plan workflows still persist artifacts without enforcing the deeper protocol the refs rely on
14
+  - workflow modes are now honestly scoped as lightweight single-question and single-pass flows, but the refs' deeper protocol and routing discipline are still absent
1515
   - `agent/loop.py`, `agent/reasoning.py`, `agent/safeguards.py`, and `agent/recovery.py` are still the load-bearing legacy tree
1616
 
1717
 ## Sprint 09 Ownership Baseline
.docs/audit_sprints/sprint12.mdmodified
@@ -1,5 +1,15 @@
11
 # Sprint 12: Workflow Protocol Hardening and Decomposition Decision
22
 
3
+## Status on `cleanup-audit-plan`
4
+
5
+- repo verification is currently `211 passed`
6
+- clarify mode is now explicitly a single-question brief flow in prompts, runtime behavior, and persisted clarify artifacts
7
+- plan mode is now explicitly single-pass implementation and verification artifact generation in prompts, runtime behavior, and persisted plans
8
+- execute now records workflow-artifact status and artifact sources in session state when it activates or reuses the workflow bridge
9
+- the legacy decomposition CLI flag and `agent/loop.py` decomposition orchestration have been deleted
10
+- the sprint's explicit workflow-contract goals are now met
11
+  - the remaining gap is not hidden workflow depth; it is the absence of the refs' deeper routing discipline and the broader legacy tree still living under `agent/`
12
+
313
 ## Prerequisites
414
 
515
 Sprint 11
tests/fixtures/runtime_parity_manifest.jsonmodified
@@ -147,12 +147,12 @@
147147
   {
148148
     "name": "ambiguous_prompt_routes_to_clarify",
149149
     "category": "workflow",
150
-    "description": "Ambiguous prompts enter clarify mode, ask one structured question, and persist a brief artifact."
150
+    "description": "Ambiguous prompts enter clarify mode, ask one structured question, persist a single-question brief artifact, and hand off to execute."
151151
   },
152152
   {
153153
     "name": "complex_prompt_routes_to_plan",
154154
     "category": "workflow",
155
-    "description": "Complex prompts enter plan mode, persist implementation and verification artifacts, and use planned verification commands."
155
+    "description": "Complex prompts enter plan mode, persist single-pass implementation and verification artifacts, and use planned verification commands without legacy decomposition."
156156
   },
157157
   {
158158
     "name": "verify_failure_fix_loop_does_not_reroute_workflow",