`bf50f91`

docs: add contributor testing guide

Authored by

espadonne 3 weeks ago

SHA: bf50f913ea5fb47d41266d8d034ddb504df55bb4
Parents: 2381a8e
Tree: abac5ea

1 changed file

Status	File	+	-
A	`docs-internal/README-testing.md`	158	0

docs-internal/README-testing.mdadded

 +# Testing guide (contributor-facing)
++
 +Everything you need to run the test suite locally and understand what each
 +layer does.
++
 +## Layers
++
 +```
 +tests/
 +  test_smoke.py           package + CLI boot
 +  unit/                   fast, in-process, no network
 +  integration/            crosses 2+ modules (e.g. parser + store)
 +  e2e/                    full CLI against tmp stores
 +  fixtures/               factories + mocks (see below)
 +  golden/                 checked-in JSON goldens per (name, torch_version)
 +```
++
 +## Markers
++
 +| marker | meaning | default |
 +|---|---|---|
 +| (none) | fast unit, <1s each | run |
 +| `slow` | expensive; may load the tiny model | **skipped** |
 +| `gpu` | requires CUDA | skipped on CPU/MPS |
 +| `online` | touches the network (HF Hub) | skipped offline |
++
 +`pyproject.toml` sets `addopts = ["-m", "not slow and not gpu and not online"]`
 +so the default `uv run pytest` is always the fast, local subset.
++
 +## Running
++
 +```
 +uv run pytest                         # fast subset, default
 +uv run pytest -m slow                 # tiny-model and long-running paths
 +uv run pytest -m "slow and online"    # tiny-model download + inference
 +uv run pytest --update-goldens        # regenerate goldens (see below)
 +uv run pytest -v path/to/test_file.py # single-file verbose
 +```
++
 +## Fixtures
++
 +### `tests/fixtures/dlm_factory.py`
++
 +Builds synthetic `.dlm` text. Stable shape matching Sprint 03's parser.
++
 +```python
 +from tests.fixtures.dlm_factory import make_dlm, prose, instruction, preference
++
 +text = make_dlm(
 +    sections=[
 +        prose("# intro\n\nbody\n"),
 +        instruction(("Q1?", "A1."), ("Q2?", "A2.")),
 +        preference(("prompt", "good", "bad")),
 +    ],
 +    base_model="smollm2-135m",
 +    dlm_id="01HZ...",                # omit for a fresh ULID
 +    training_overrides={"lora_r": 16},
 +)
 +```
++
 +### `tests/fixtures/hardware_mocks.py`
++
 +Context managers for backend simulation without real hardware.
++
 +```python
 +from tests.fixtures.hardware_mocks import force_cuda, force_mps, force_cpu
++
 +with force_cuda(sm=(8, 9), vram_gb=24.0):
 +    # torch.cuda.is_available() is True, capability (8, 9), mem 24GB
 +    ...
++
 +with force_mps():
 +    # MPS is available; CUDA is not
 +    ...
 +```
++
 +Nesting works — the inner context is restored on exit.
++
 +### `tests/fixtures/tiny_model.py`
++
 +SmolLM2-135M-Instruct as a session-scoped fixture. Download is gated behind
 +`@pytest.mark.online`; the session-scoped `tiny_model_dir` fixture returns the
 +cached path.
++
 +```python
 +import pytest
++
 +@pytest.mark.online
 +@pytest.mark.slow
 +def test_something(tiny_model_dir):
 +    # tiny_model_dir is a pathlib.Path to the cached model
 +    ...
 +```
++
 +The revision is pinned via `DLM_TINY_MODEL_REVISION` (defaulting to `main`
 +until Sprint 06's base-model registry owns the SHA).
++
 +### `tests/fixtures/golden.py`
++
 +```python
 +from tests.fixtures.golden import assert_golden
++
 +def test_loss_curve():
 +    values = compute_loss_curve()
 +    assert_golden({"loss": values}, name="loss-curve-v1")
 +```
++
 +Goldens live at `tests/golden/<name>.torch-<version>.json`. Bumping torch
 +creates a new key; the old one stays until deliberately removed.
++
 +## Regenerating goldens
++
 +```
 +uv run pytest --update-goldens
 +```
++
 +This flips `assert_golden` into write mode. Review the diff before
 +committing:
++
 +```
 +git diff tests/golden/
 +```
++
 +A two-person review is mandatory for golden changes — they're determinism
 +contracts. See Sprint 15's `scripts/regen-determinism-golden.py` for the
 +heavier regeneration workflow once that lands.
++
 +## CI layout
++
 +Three GitHub Actions jobs:
++
 +1. **lint / typecheck / test** — ubuntu-latest + macos-latest matrix.
 +   Runs ruff, ruff format --check, mypy, default pytest selection.
 +2. **no-network sandbox** — ubuntu-latest. Blocks egress via iptables,
 +   then runs the local-only CLI surfaces (`dlm --version`, `--help`,
 +   and later `init`/`doctor`/`show`). Asserts the "no telemetry, ever"
 +   promise.
 +3. **slow tests (hf-cache)** — ubuntu-latest. Restores HF cache keyed
 +   on `(pyproject.toml hash, TINY_MODEL_REVISION)`, pre-warms the tiny
 +   model, then runs `pytest -m slow`.
++
 +## Offline-first autouse
++
 +`tests/conftest.py` sets `HF_HUB_OFFLINE=1` + `TRANSFORMERS_OFFLINE=1` +
 +`HF_DATASETS_OFFLINE=1` via an autouse fixture. The `tiny_model_dir`
 +fixture temporarily clears these for its scope when an online test opts
 +in. This means a test that *accidentally* touches HF without the fixture
 +will fail fast instead of downloading silently.
++
 +## Common pitfalls
++
 +- **Importing torch in test collection is slow** (~5s). Fixtures that
 +  need it import lazily inside functions.
 +- **Hardware mocks don't simulate actual CUDA computation.** They only
 +  toggle `is_available`-shaped attributes. Tests that need a real GPU use
 +  the `gpu` marker.
 +- **Golden drift on torch bumps is expected.** Regeneration is the fix;
 +  review the old vs new checksum side-by-side before approval.