`7b63dd0`

README + CHANGELOG: tool-use fidelity probe surface and Sprint 27 block

Authored by mfwolffe <wolffemf@dukes.jmu.edu> 2 weeks ago

SHA: 7b63dd0dc809eff43ff701dd93e05761d629fe6d
Parents: a1417ac
Tree: 4128c91

2 changed files

Status	File	+	-
M	`CHANGELOG.md`	66	0
M	`README.md`	48	0

CHANGELOG.mdmodified

  ## Unreleased
 +### Sprint 27 — `tool_use_fidelity` probe
++
 +Closes the P1 "tool_use_fidelity probe" backlog item. The probe sway
 +ships for anyone fine-tuning an adapter against a tool-using base —
 +the dominant fine-tune target outside dlm-style document training,
 +and the failure mode no other shipped probe catches.
++
 +**New probe (`kind: tool_use_fidelity`, category: attribution).**
++
 +For each `(prompt, tool_spec, gold_tool_name)` case the probe greedy-
 +decodes from base + ft and scores three independent signals:
++
 +- **JSON-schema validity delta** (`ft_valid_rate − base_valid_rate`):
 +  the gate that catches "ft broke the base's tool-call format". Default
 +  pass criterion tolerates a 5pp drop.
 +- **Tool-name hallucination rate**: fraction of schema-valid ft calls
 +  whose `name` falls outside the declared `allowed_tools` surface (or
 +  differs from `gold_tool_name` per-case when no surface is set).
 +  Default cap 10%.
 +- **Argument-field disagreement rate** (informational): leaf-field
 +  drift between matched base/ft schema-valid calls. v1 surfaces it
 +  as evidence rather than gating on it — the per-token KL on
 +  free-form argument values is deferred past v1 because it requires
 +  alignment between two decoded strings.
++
 +The probe greedy-generates from both views, parses each output via a
 +forgiving extractor (whole-text JSON → fenced ```json``` block →
 +first balanced `{...}` substring), then validates against an
 +OpenAI-flavored schema subset (`string`, `integer`, `number`,
 +`boolean`, `object`, `array` plus `required`). No `jsonschema`
 +dependency added — core sway dependencies stay lean.
++
 +`json_valid_rate_ft` is z-scored against the null-adapter baseline
 +when `null_adapter` is in the suite. Calibration spec ships two
 +sentinel tool-use cases so the null distribution carries useful
 +signal.
++
 +**Implementation modules:**
 +- **`probes/tool_use_fidelity.py`** — `ToolUseCase` /
 +  `ToolUseFidelitySpec` / `ToolUseFidelityProbe` plus
 +  `_parse_tool_call` / `_matches_schema` / `_field_disagreement`
 +  helpers. 573 LOC.
 +- **`probes/__init__.py`** — registers the new probe.
++
 +**Test surface:**
 +- **`tests/unit/test_probe_tool_use_fidelity.py`** — 40 unit tests
 +  covering verdict logic (PASS / FAIL on validity / FAIL on
 +  hallucination / SKIP-no-cases), the parse helpers (whole-text /
 +  fenced / embedded / unbalanced / strings-with-braces), schema
 +  validation (required / type-tag / bool-not-int / extras-allowed),
 +  field disagreement (identical / drifted / nested / list-as-leaf /
 +  None-vs-absent), and the calibration handoff.
 +- **`tests/integration/test_probe_tool_use_fidelity.py`** —
 +  slow+online HF backend smoke on a tiny SmolLM2-135M LoRA. Verifies
 +  the full code path executes without error and emits every
 +  documented evidence key with rates in `[0, 1]`. Doesn't assert a
 +  specific verdict — the 135M base isn't tool-fluent enough to make
 +  the validity gate meaningful, only the plumbing.
 +- **`tests/fixtures/tool_use_cases.yaml`** — eight hand-authored
 +  cases spanning the most common tool shapes (search, file I/O,
 +  Python exec, shell, HTTP fetch, DB query, calendar, calculator).
++
 +**README** gains a "Tool-use fidelity" section between the `.dlm`
 +integration and "Reproducing a sway run" — pitches the probe as
 +sway's signal for agentic fine-tunes, with a worked YAML example.
++
  ### Sprint 26 — X3 sway pack / unpack (sway-side half of the cross-repo X1+X3 pair)
  Closes the X3 half of Audit 03's "make a sway run reproducible by a

README.mdmodified

  > sway-json` writes a ready-to-run `sway.yaml` next to the GGUF.
  > Skip the manual `sway autogen` step entirely.
 +## Tool-use fidelity
++
 +Adapters that target tool-calling behavior have a failure mode no
 +other probe catches: they pass every adherence / attribution probe
 +but silently degrade the base model's JSON-schema compliance, or
 +start hallucinating tool names that aren't in the declared surface.
 +The `tool_use_fidelity` probe scores three independent signals on
 +each `(prompt, tool_spec)` case:
++
 +- **JSON-schema validity delta** — `ft_valid_rate − base_valid_rate`.
 +  A negative value means the adapter regressed on producing
 +  schema-valid calls; the default pass criterion tolerates a 5pp
 +  drop.
 +- **Tool-name hallucination rate** — over schema-valid ft calls, the
 +  fraction that pick a tool outside the declared `allowed_tools`
 +  surface (or differ from `gold_tool_name` when no surface is set).
 +  Default cap 10%.
 +- **Argument-field disagreement rate** — over cases where both
 +  views produce schema-valid calls, the per-leaf-field disagreement
 +  rate between base and ft arguments. Surfaced as evidence — the
 +  v1 surface doesn't gate on it.
++
 +```yaml
 +suite:
 +  - name: tool_calls_intact
 +    kind: tool_use_fidelity
 +    cases:
 +      - prompt: "Search the web for the capital of France."
 +        tool_spec:
 +          name: search_web
 +          parameters:
 +            type: object
 +            properties: {query: {type: string}}
 +            required: [query]
 +        gold_tool_name: search_web
 +    allowed_tools: [search_web, calculator, http_fetch]
 +```
++
 +The probe uses an OpenAI-style function-calling schema (the most
 +common shape across hosted and OSS tool-use stacks). Other schema
 +flavors (Anthropic, Llama-3.1, Gemini) land in a follow-up sprint;
 +the v1 implementation deliberately ships the format that covers
 +the largest user surface.
++
 +`json_valid_rate_ft` is z-scored against the null-adapter baseline
 +when `null_adapter` is in the suite — the principled "the adapter
 +preserved the base's tool-call structure beyond noise" signal.
++
  ## Reproducing a sway run
  Sometimes you want a coworker (or a future-you, or a bug report) to