`e5bc716`

Record sprint 12 workflow contract status

Authored by

espadonne 1 month ago

SHA: e5bc716340fcdf7b9cbd9913748a252e701ec9b3
Parents: 012f8e4
Tree: 28f5cf7

4 changed files

Status	File	+	-
M	`.docs/PARITY.md`	9	6
M	`.docs/audit_sprints/index.md`	2	2
M	`.docs/audit_sprints/sprint12.md`	10	0
M	`tests/fixtures/runtime_parity_manifest.json`	2	2

.docs/PARITY.mdmodified

  - durable project memory in `.loader/project-memory.json` and working notes in `.loader/notepad.md`
  - native memory tools for `project_memory_*` and `notepad_*`
  - heuristic workflow routing across `clarify` → `plan` → `execute` → `verify`
 +- clarify mode as an explicit single-question brief flow that returns to execute mode
 +- plan mode as explicit single-pass implementation and verification artifact generation
 +- persisted workflow-artifact status and source metadata in session state when execute consumes or reuses workflow artifacts
  - mode-specific system prompts for clarify, plan, execute, and verify
  - explicit verify/fix loops for mutating tasks, with a bounded retry budget
  - verify/fix retries return to execute mode without re-triggering clarify or plan
  ## Known weak spots
  - the core turn loop moved into [`src/loader/runtime/conversation.py`](../src/loader/runtime/conversation.py), but it still owns workflow routing, remaining loop safeguards, and other coordination logic that remains more heuristic-heavy than the reference runtime in `refs/claw-code`
 -- planning, decomposition, and several helper behaviors still live in [`src/loader/agent/loop.py`](../src/loader/agent/loop.py), so ownership is cleaner than Sprint 00 but not fully simplified yet
 +- workflow routing is cleaner than Sprint 00, but the router and artifact bridge still live in [`src/loader/runtime/conversation.py`](../src/loader/runtime/conversation.py) and remain more heuristic than the reference runtimes
  - the mode router is still heuristic-only; Loader does not yet implement OMX's deeper ambiguity scoring, pressure-pass discipline, or branch-specific routing policy
 -- clarify mode currently stops after one structured question and one brief artifact; it does not yet run a deeper Socratic loop
 -- plan mode is still a single-pass artifact generator, not a Planner/Architect/Critic consensus loop
 +- clarify mode is now explicitly a single-question brief flow, not a deeper Socratic protocol
 +- plan mode is now explicitly a single-pass artifact generator, not a Planner/Architect/Critic consensus loop
  - DoD acceptance criteria and pending items are stronger than Sprint 02, but todo progress is still lightly structured compared with claw-code's richer workflow state
  - evidence summaries are deterministic runtime summaries of captured output, not model-written verification narratives
  - session compaction summaries are heuristic runtime summaries, not model-assisted continuity artifacts
  As of 2026-04-07:
 -- `uv run pytest -q`: 210 passed
 +- `uv run pytest -q`: 211 passed
  - `tests/test_runtime_harness.py` is fully green, including permission-mode parity, DoD verify/fix coverage, workflow routing parity, and the original contract regression
  - `tests/test_dod.py` covers persistence, sizing boundaries, and verification command derivation
  - `tests/test_workflow.py` covers router heuristics, clarify/plan artifact round trips, DoD workflow links, and todo-to-DoD syncing
  - `tests/test_workflow_runtime.py` covers clarify routing, plan routing, and verify-fix workflow handoff
  - `tests/test_workflow_tools.py` and `tests/test_workflow_runtime_tools.py` cover `TodoWrite`, `AskUserQuestion`, and runtime callback plumbing
 -- `tests/test_session_state.py` covers session persistence, resume, rotation, compaction persistence, cumulative usage rollups, and persisted permission-policy metadata
 +- `tests/test_session_state.py` covers session persistence, resume, rotation, compaction persistence, cumulative usage rollups, persisted permission-policy metadata, and persisted workflow-artifact state
  - `tests/test_compaction.py` covers claw-style line compression and compacted continuation-message behavior
  - `tests/test_memory_tools.py` covers project-memory writes, notepad writes, lifecycle-hook mirroring, and DoD-summary capture into project memory
  - `tests/test_cli_resume.py` covers `--resume` argument rewriting for latest and named-session restore
  - Sprint 01 turned the original `tool_call_id` regression green by fixing the message contract, not by weakening the test.
  - Sprint 02 replaced "looks done" completion for mutating tasks with a real verify/fix gate, but it has not yet reached the richer workflow contracts described in the report and Sprint 04+.
  - Sprint 03 established permission modes, hooks, and tool hardening, but it intentionally stops short of claw-code's fuller rule engine and prompt/allow permission variants.
 -- Sprint 04 adds routing, artifacts, and structured user questions, but it is still a first-pass workflow layer rather than full OMX consensus planning or deep interview rigor.
 +- Sprint 04's workflow layer is now explicitly scoped as lightweight: single-question clarify, single-pass planning, explicit artifact bridging, and no legacy decomposition path.
  - Sprint 05 adds durable sessions, resume, compaction, and native memory/notepad tools, but it stops short of Sprint 06's inspectable session/status product surfaces and still uses heuristic continuity summaries rather than richer semantic memory extraction.
  - Sprint 06 adds inspectable product surfaces, a constrained explore lane, and a broader tool registry, but it still stops short of interactive explore workflows, richer git ergonomics, AST/LSP-aware editing, or any multi-agent/team runtime.
  - Sprint 07 is complete: Loader now has prompt/allow modes, rule-based permission policy, policy-backed prompting, persisted policy inspection state, and smaller assistant-turn/tool-batch/finalization runtime seams, but it still stops short of a richer rule UX, deeper policy sandboxing, and the more opinionated workflow/runtime contracts in the refs.

.docs/audit_sprints/index.mdmodified

  The repo has moved since the audit snapshot. On this planning branch:
 -- `uv run pytest -q` is green with `210 passed`
 +- `uv run pytest -q` is green with `211 passed`
  - Sprint 08's prompt builder, turn-phase tracking, and permission inspection surfaces are already present on `HEAD`
  - Sprint 09 interactive validation has started; `loader doctor` now distinguishes metadata reachability from live chat readiness, and both native-capable and `json_tag` Ollama lanes currently fail the live chat probe on `/api/chat` with HTTP 500
  - Sprint 10's runtime-ownership inversion is now materially in place: `src/loader/runtime/` no longer reaches into `Agent` directly, and the remaining legacy dependencies are explicit `RuntimeLegacyServices` seams
  - Sprint 11 has already deleted several puppet behaviors and collapsed the raw-text fallback stack onto the shared parser used by the runtime and Ollama text fallback paths
  - the central debt still remains:
    - the runtime still carries some recovery and safety heuristics around the main turn contract, even though the inline completion/critique rescue layers have now been deleted
 -  - clarify/plan workflows still persist artifacts without enforcing the deeper protocol the refs rely on
 +  - workflow modes are now honestly scoped as lightweight single-question and single-pass flows, but the refs' deeper protocol and routing discipline are still absent
    - `agent/loop.py`, `agent/reasoning.py`, `agent/safeguards.py`, and `agent/recovery.py` are still the load-bearing legacy tree
  ## Sprint 09 Ownership Baseline

.docs/audit_sprints/sprint12.mdmodified

  # Sprint 12: Workflow Protocol Hardening and Decomposition Decision
 +## Status on `cleanup-audit-plan`
++
 +- repo verification is currently `211 passed`
 +- clarify mode is now explicitly a single-question brief flow in prompts, runtime behavior, and persisted clarify artifacts
 +- plan mode is now explicitly single-pass implementation and verification artifact generation in prompts, runtime behavior, and persisted plans
 +- execute now records workflow-artifact status and artifact sources in session state when it activates or reuses the workflow bridge
 +- the legacy decomposition CLI flag and `agent/loop.py` decomposition orchestration have been deleted
 +- the sprint's explicit workflow-contract goals are now met
 +  - the remaining gap is not hidden workflow depth; it is the absence of the refs' deeper routing discipline and the broader legacy tree still living under `agent/`
++
  ## Prerequisites
  Sprint 11

tests/fixtures/runtime_parity_manifest.jsonmodified


   {
     "name": "ambiguous_prompt_routes_to_clarify",
     "category": "workflow",
-    "description": "Ambiguous prompts enter clarify mode, ask one structured question, and persist a brief artifact."
+    "description": "Ambiguous prompts enter clarify mode, ask one structured question, persist a single-question brief artifact, and hand off to execute."
   },
   {
     "name": "complex_prompt_routes_to_plan",
     "category": "workflow",
-    "description": "Complex prompts enter plan mode, persist implementation and verification artifacts, and use planned verification commands."
+    "description": "Complex prompts enter plan mode, persist single-pass implementation and verification artifacts, and use planned verification commands without legacy decomposition."
   },
   {
     "name": "verify_failure_fix_loop_does_not_reroute_workflow",

`@@ -147,12 +147,12 @@`
147	147	{
148	148	"name": "ambiguous_prompt_routes_to_clarify",
149	149	"category": "workflow",
150		- "description": "Ambiguous prompts enter clarify mode, ask one structured question, and persist a brief artifact."
	150	+ "description": "Ambiguous prompts enter clarify mode, ask one structured question, persist a single-question brief artifact, and hand off to execute."
151	151	},
152	152	{
153	153	"name": "complex_prompt_routes_to_plan",
154	154	"category": "workflow",
155		- "description": "Complex prompts enter plan mode, persist implementation and verification artifacts, and use planned verification commands."
	155	+ "description": "Complex prompts enter plan mode, persist single-pass implementation and verification artifacts, and use planned verification commands without legacy decomposition."
156	156	},
157	157	{
158	158	"name": "verify_failure_fix_loop_does_not_reroute_workflow",