`aaefb0b`

Refresh parity checkpoint for Sprint 15 runtime ownership

Authored by

espadonne 1 month ago

SHA: aaefb0b8a860ca14d4425de34e38970e69f5e0cc
Parents: 170ee44
Tree: a0994f8

1 changed file

Status	File	+	-
M	`.docs/PARITY.md`	12	4

.docs/PARITY.mdmodified

  - `loader workflow show [session-id]` with `--mode`, `--kind`, and `--limit` filters plus operator-focused workflow highlights, recent timeline snippets in `loader session show`, and `--diff` / `--full-diff` artifact comparison for persisted workflow artifacts
  - `loader explore <prompt>` as a one-shot read-only lookup lane with its own prompt, constrained registry, and no DoD or workflow routing
  - `RuntimeContext` is now the primary runtime seam for workflow state, turn phases, response repair, no-tool completion, response routing, turn looping, finalization, workflow lanes, and workflow recovery; the older `RuntimeLegacyServices` shim has been removed
 +- shared runtime bootstrap through `runtime.bootstrap.build_runtime_context(...)` / `sync_runtime_context(...)`, with both conversation and explore runtimes constructing typed context through the same runtime-owned contract
 +- runtime-owned safeguard and reasoning helpers now have canonical homes under `src/loader/runtime/`; `src/loader/agent/safeguards.py` and `src/loader/agent/reasoning.py` are compatibility-export layers rather than the primary implementations
 +- the old `Agent._build_runtime_context()` helper, dead legacy planner module, and dead raw-extraction / self-critique helper paths have been removed from the agent shell, leaving a smaller and more honest `src/loader/agent/loop.py`
  - CLI and TUI status surfaces for model, capability profile, mode, workflow mode, workflow reason, last transition summary, permission mode, explicit turn phase, prompt format/sections, DoD phase, pending items, last verification result, and active session id
  - CLI and TUI workflow-mode visibility plus artifact notifications
  - CLI and TUI permission-mode visibility with color-coded status
  ## Known weak spots
 -- the hot runtime path is now substantially context-owned, but [`src/loader/runtime/conversation.py`](../src/loader/runtime/conversation.py) and [`src/loader/runtime/explore.py`](../src/loader/runtime/explore.py) still bootstrap from `Agent._build_runtime_context()` and keep a thin wrapper-level dependency for initial prompt/capability synchronization
 -- planning, decomposition, and several helper behaviors still live in [`src/loader/agent/loop.py`](../src/loader/agent/loop.py), so ownership is cleaner than Sprint 00 but not fully simplified yet
 +- the hot runtime path no longer depends on a hidden bootstrap helper, but [`src/loader/runtime/conversation.py`](../src/loader/runtime/conversation.py) and [`src/loader/runtime/explore.py`](../src/loader/runtime/explore.py) still start from an `Agent`-shaped bootstrap source at the public entrypoint layer
 +- [`src/loader/agent/loop.py`](../src/loader/agent/loop.py) is much smaller and less misleading than the pre-Sprint-15 shell, but it still owns conversational handling, decomposition orchestration, and session/UI-facing entrypoint glue instead of collapsing fully to a minimal runtime launcher
 +- [`src/loader/agent/reasoning.py`](../src/loader/agent/reasoning.py) and [`src/loader/agent/safeguards.py`](../src/loader/agent/safeguards.py) are now compatibility shims rather than primary implementations, but they still remain as export layers until Loader narrows its external compatibility surface further
  - [`src/loader/runtime/tool_batches.py`](../src/loader/runtime/tool_batches.py) and parts of [`src/loader/runtime/workflow_lanes.py`](../src/loader/runtime/workflow_lanes.py) are narrower and more directly tested than before, but they still carry more heuristic policy than the tightest reference seams in `refs/claw-code`
  - the workflow policy now consumes typed signals, but signal extraction is still heuristic and hand-tuned; Loader does not yet implement OMX's deeper ambiguity analysis, richer pressure-pass discipline, or branch-specific policy depth
  - clarify is now intent-aware, pressure-aware, and codebase-grounded, but it is still much shallower than OMX's deep-interview behavior and does not adapt its budget or questioning style by task class
  ## Deterministic parity scenarios
 -The auditable manifest lives at [`tests/fixtures/runtime_parity_manifest.json`](../tests/fixtures/runtime_parity_manifest.json) and is exercised by [`tests/test_runtime_harness.py`](../tests/test_runtime_harness.py). Sprint 04 adds focused workflow integration coverage in [`tests/test_workflow_runtime.py`](../tests/test_workflow_runtime.py) and artifact/router unit coverage in [`tests/test_workflow.py`](../tests/test_workflow.py). Sprint 06 adds inspection/explore coverage in [`tests/test_inspection.py`](../tests/test_inspection.py), [`tests/test_explore_runtime.py`](../tests/test_explore_runtime.py), and [`tests/test_expanded_tools.py`](../tests/test_expanded_tools.py). Sprint 10 extends that workflow coverage in [`tests/test_workflow_policy.py`](../tests/test_workflow_policy.py), [`tests/test_workflow_runtime.py`](../tests/test_workflow_runtime.py), and [`tests/test_inspection.py`](../tests/test_inspection.py) for scored routing, clarify-budget behavior, plan refresh, and workflow timeline inspection. Sprint 11 adds [`tests/test_workflow_signals.py`](../tests/test_workflow_signals.py), [`tests/test_clarify_strategy.py`](../tests/test_clarify_strategy.py), [`tests/test_artifact_invalidation.py`](../tests/test_artifact_invalidation.py), and expanded inspection/runtime coverage for signal summaries, intent-aware clarify, semantic replan recovery, and workflow timeline filtering/highlights. Sprint 12 adds [`tests/test_clarify_grounding.py`](../tests/test_clarify_grounding.py), [`tests/test_turn_preparation.py`](../tests/test_turn_preparation.py), [`tests/test_turn_completion.py`](../tests/test_turn_completion.py), [`tests/test_turn_iteration.py`](../tests/test_turn_iteration.py), [`tests/test_turn_preamble.py`](../tests/test_turn_preamble.py), [`tests/test_workflow_state.py`](../tests/test_workflow_state.py), and [`tests/test_turn_loop.py`](../tests/test_turn_loop.py) for grounded clarify, structured recovery evidence, and the controllerized turn runtime. Sprint 13 adds [`tests/test_runtime_repair_flows.py`](../tests/test_runtime_repair_flows.py), [`tests/test_response_routing.py`](../tests/test_response_routing.py), [`tests/test_workflow_ledger.py`](../tests/test_workflow_ledger.py), and expanded [`tests/test_session_state.py`](../tests/test_session_state.py) / [`tests/test_inspection.py`](../tests/test_inspection.py) coverage for honest repair behavior, dedicated response routing, persisted semantic ledger state, prompt snapshot history, and prompt/artifact diff inspection.
 +The auditable manifest lives at [`tests/fixtures/runtime_parity_manifest.json`](../tests/fixtures/runtime_parity_manifest.json) and is exercised by [`tests/test_runtime_harness.py`](../tests/test_runtime_harness.py). Sprint 04 adds focused workflow integration coverage in [`tests/test_workflow_runtime.py`](../tests/test_workflow_runtime.py) and artifact/router unit coverage in [`tests/test_workflow.py`](../tests/test_workflow.py). Sprint 06 adds inspection/explore coverage in [`tests/test_inspection.py`](../tests/test_inspection.py), [`tests/test_explore_runtime.py`](../tests/test_explore_runtime.py), and [`tests/test_expanded_tools.py`](../tests/test_expanded_tools.py). Sprint 10 extends that workflow coverage in [`tests/test_workflow_policy.py`](../tests/test_workflow_policy.py), [`tests/test_workflow_runtime.py`](../tests/test_workflow_runtime.py), and [`tests/test_inspection.py`](../tests/test_inspection.py) for scored routing, clarify-budget behavior, plan refresh, and workflow timeline inspection. Sprint 11 adds [`tests/test_workflow_signals.py`](../tests/test_workflow_signals.py), [`tests/test_clarify_strategy.py`](../tests/test_clarify_strategy.py), [`tests/test_artifact_invalidation.py`](../tests/test_artifact_invalidation.py), and expanded inspection/runtime coverage for signal summaries, intent-aware clarify, semantic replan recovery, and workflow timeline filtering/highlights. Sprint 12 adds [`tests/test_clarify_grounding.py`](../tests/test_clarify_grounding.py), [`tests/test_turn_preparation.py`](../tests/test_turn_preparation.py), [`tests/test_turn_completion.py`](../tests/test_turn_completion.py), [`tests/test_turn_iteration.py`](../tests/test_turn_iteration.py), [`tests/test_turn_preamble.py`](../tests/test_turn_preamble.py), [`tests/test_workflow_state.py`](../tests/test_workflow_state.py), and [`tests/test_turn_loop.py`](../tests/test_turn_loop.py) for grounded clarify, structured recovery evidence, and the controllerized turn runtime. Sprint 13 adds [`tests/test_runtime_repair_flows.py`](../tests/test_runtime_repair_flows.py), [`tests/test_response_routing.py`](../tests/test_response_routing.py), [`tests/test_workflow_ledger.py`](../tests/test_workflow_ledger.py), and expanded [`tests/test_session_state.py`](../tests/test_session_state.py) / [`tests/test_inspection.py`](../tests/test_inspection.py) coverage for honest repair behavior, dedicated response routing, persisted semantic ledger state, prompt snapshot history, and prompt/artifact diff inspection. Sprint 15 adds [`tests/test_runtime_bootstrap.py`](../tests/test_runtime_bootstrap.py), [`tests/test_safeguard_services.py`](../tests/test_safeguard_services.py), [`tests/test_reasoning_compat.py`](../tests/test_reasoning_compat.py), and updated [`tests/test_runtime_context.py`](../tests/test_runtime_context.py) coverage for the shared bootstrap seam plus the runtime-owned safeguards/reasoning compatibility contract.
  - `streaming_text`: green
  - `read_file_roundtrip`: green
  As of 2026-04-08:
 -- `uv run pytest -q`: 303 passed
 +- `uv run pytest -q`: 312 passed
  - `tests/test_runtime_harness.py` is fully green, including permission-mode parity, DoD verify/fix coverage, workflow routing parity, and the original contract regression
  - `tests/test_prompt_builder.py` covers section rendering, native-vs-ReAct formatting, and prompt metadata persistence
  - `tests/test_turn_state_machine.py` covers allowed/disallowed turn transitions and terminal transition metadata
  - `tests/test_runtime_phases.py` covers repair/completion phase transitions plus persisted transition metadata in runtime events and session state
  - `tests/test_runtime_repair_flows.py` covers honest empty-response retries, no synthetic prefill on first turns, and the removal of the older no-tool puppeting/scolding reroutes
  - `tests/test_runtime_context.py` and `tests/test_runtime_state_controllers.py` cover typed runtime-context construction plus direct workflow-state and phase-tracker behavior without relying on a full `Agent`
 +- `tests/test_runtime_bootstrap.py` covers the shared runtime bootstrap contract, prompt/capability synchronization, and direct conversation/explore construction through the runtime bootstrap seam
 +- `tests/test_safeguard_services.py` covers the canonical runtime safeguard implementation plus the compatibility-export path under `loader.agent.safeguards`
 +- `tests/test_reasoning_compat.py` covers runtime-owned deliberation/completion helpers plus the compatibility-export path under `loader.agent.reasoning`
  - `tests/test_repair.py` covers raw-text fallback through the runtime parser and active registry, including `TodoWrite` recovery
  - `tests/test_completion_policy.py` covers direct text-loop bailout and continuation-prompt behavior on the typed runtime context
  - `tests/test_response_routing.py` covers direct final-answer routing and halted tool-batch routing at the new response-policy seam
  - Sprint 12 is complete: Loader now has pressure-pass clarify behavior, codebase-backed clarify grounding, structured recovery evidence, controllerized turn-runtime seams, and a genuinely coordinator-shaped `runtime.conversation`, but it still stops short of OMX-style deep interview depth, richer semantic artifact reasoning, artifact/prompt diff ergonomics, and the broader operator/runtime sophistication in the refs.
  - Sprint 13 is complete: Loader now avoids synthetic prefill and no-tool puppeting, persists a semantic workflow ledger, exposes prompt/artifact diff surfaces, and routes assistant responses through a dedicated response-policy seam, but it still stops short of claw-code's narrower response/tool policy factoring, deeper planning rigor, and broader operator ergonomics.
  - Sprint 14 is complete: Loader now treats `RuntimeContext` as the primary seam across workflow state, turn phases, response policy, turn looping, workflow recovery, and finalization; the old runtime legacy shim is gone, raw-text tool recovery no longer depends on hidden agent extractors, and the hot path is substantially more runtime-owned, but bootstrap ownership still begins at the agent wrapper and Loader still stops short of claw-code's fuller policy engine, OMX's deeper workflow rigor, and richer operator/runtime surfaces.
 +- Sprint 15 is complete: Loader now has a shared runtime bootstrap seam, runtime-owned safeguards and deliberation helpers, compatibility-only `agent/reasoning.py` and `agent/safeguards.py`, no remaining `Agent._build_runtime_context()` helper, and a much smaller `agent/loop.py` after dead planner/raw-extraction cleanup, but it still stops short of a minimal entrypoint shell, deeper explore ergonomics, claw-code's tighter policy engine, and OMX's richer planning/interview rigor.