tenseleyflow/documentlanguagemodel / bf50f91

Browse files

docs: add contributor testing guide

Authored by espadonne
SHA
bf50f913ea5fb47d41266d8d034ddb504df55bb4
Parents
2381a8e
Tree
abac5ea

1 changed file

StatusFile+-
A docs-internal/README-testing.md 158 0
docs-internal/README-testing.mdadded
@@ -0,0 +1,158 @@
1
+# Testing guide (contributor-facing)
2
+
3
+Everything you need to run the test suite locally and understand what each
4
+layer does.
5
+
6
+## Layers
7
+
8
+```
9
+tests/
10
+  test_smoke.py           package + CLI boot
11
+  unit/                   fast, in-process, no network
12
+  integration/            crosses 2+ modules (e.g. parser + store)
13
+  e2e/                    full CLI against tmp stores
14
+  fixtures/               factories + mocks (see below)
15
+  golden/                 checked-in JSON goldens per (name, torch_version)
16
+```
17
+
18
+## Markers
19
+
20
+| marker | meaning | default |
21
+|---|---|---|
22
+| (none) | fast unit, <1s each | run |
23
+| `slow` | expensive; may load the tiny model | **skipped** |
24
+| `gpu` | requires CUDA | skipped on CPU/MPS |
25
+| `online` | touches the network (HF Hub) | skipped offline |
26
+
27
+`pyproject.toml` sets `addopts = ["-m", "not slow and not gpu and not online"]`
28
+so the default `uv run pytest` is always the fast, local subset.
29
+
30
+## Running
31
+
32
+```
33
+uv run pytest                         # fast subset, default
34
+uv run pytest -m slow                 # tiny-model and long-running paths
35
+uv run pytest -m "slow and online"    # tiny-model download + inference
36
+uv run pytest --update-goldens        # regenerate goldens (see below)
37
+uv run pytest -v path/to/test_file.py # single-file verbose
38
+```
39
+
40
+## Fixtures
41
+
42
+### `tests/fixtures/dlm_factory.py`
43
+
44
+Builds synthetic `.dlm` text. Stable shape matching Sprint 03's parser.
45
+
46
+```python
47
+from tests.fixtures.dlm_factory import make_dlm, prose, instruction, preference
48
+
49
+text = make_dlm(
50
+    sections=[
51
+        prose("# intro\n\nbody\n"),
52
+        instruction(("Q1?", "A1."), ("Q2?", "A2.")),
53
+        preference(("prompt", "good", "bad")),
54
+    ],
55
+    base_model="smollm2-135m",
56
+    dlm_id="01HZ...",                # omit for a fresh ULID
57
+    training_overrides={"lora_r": 16},
58
+)
59
+```
60
+
61
+### `tests/fixtures/hardware_mocks.py`
62
+
63
+Context managers for backend simulation without real hardware.
64
+
65
+```python
66
+from tests.fixtures.hardware_mocks import force_cuda, force_mps, force_cpu
67
+
68
+with force_cuda(sm=(8, 9), vram_gb=24.0):
69
+    # torch.cuda.is_available() is True, capability (8, 9), mem 24GB
70
+    ...
71
+
72
+with force_mps():
73
+    # MPS is available; CUDA is not
74
+    ...
75
+```
76
+
77
+Nesting works — the inner context is restored on exit.
78
+
79
+### `tests/fixtures/tiny_model.py`
80
+
81
+SmolLM2-135M-Instruct as a session-scoped fixture. Download is gated behind
82
+`@pytest.mark.online`; the session-scoped `tiny_model_dir` fixture returns the
83
+cached path.
84
+
85
+```python
86
+import pytest
87
+
88
+@pytest.mark.online
89
+@pytest.mark.slow
90
+def test_something(tiny_model_dir):
91
+    # tiny_model_dir is a pathlib.Path to the cached model
92
+    ...
93
+```
94
+
95
+The revision is pinned via `DLM_TINY_MODEL_REVISION` (defaulting to `main`
96
+until Sprint 06's base-model registry owns the SHA).
97
+
98
+### `tests/fixtures/golden.py`
99
+
100
+```python
101
+from tests.fixtures.golden import assert_golden
102
+
103
+def test_loss_curve():
104
+    values = compute_loss_curve()
105
+    assert_golden({"loss": values}, name="loss-curve-v1")
106
+```
107
+
108
+Goldens live at `tests/golden/<name>.torch-<version>.json`. Bumping torch
109
+creates a new key; the old one stays until deliberately removed.
110
+
111
+## Regenerating goldens
112
+
113
+```
114
+uv run pytest --update-goldens
115
+```
116
+
117
+This flips `assert_golden` into write mode. Review the diff before
118
+committing:
119
+
120
+```
121
+git diff tests/golden/
122
+```
123
+
124
+A two-person review is mandatory for golden changes — they're determinism
125
+contracts. See Sprint 15's `scripts/regen-determinism-golden.py` for the
126
+heavier regeneration workflow once that lands.
127
+
128
+## CI layout
129
+
130
+Three GitHub Actions jobs:
131
+
132
+1. **lint / typecheck / test** — ubuntu-latest + macos-latest matrix.
133
+   Runs ruff, ruff format --check, mypy, default pytest selection.
134
+2. **no-network sandbox** — ubuntu-latest. Blocks egress via iptables,
135
+   then runs the local-only CLI surfaces (`dlm --version`, `--help`,
136
+   and later `init`/`doctor`/`show`). Asserts the "no telemetry, ever"
137
+   promise.
138
+3. **slow tests (hf-cache)** — ubuntu-latest. Restores HF cache keyed
139
+   on `(pyproject.toml hash, TINY_MODEL_REVISION)`, pre-warms the tiny
140
+   model, then runs `pytest -m slow`.
141
+
142
+## Offline-first autouse
143
+
144
+`tests/conftest.py` sets `HF_HUB_OFFLINE=1` + `TRANSFORMERS_OFFLINE=1` +
145
+`HF_DATASETS_OFFLINE=1` via an autouse fixture. The `tiny_model_dir`
146
+fixture temporarily clears these for its scope when an online test opts
147
+in. This means a test that *accidentally* touches HF without the fixture
148
+will fail fast instead of downloading silently.
149
+
150
+## Common pitfalls
151
+
152
+- **Importing torch in test collection is slow** (~5s). Fixtures that
153
+  need it import lazily inside functions.
154
+- **Hardware mocks don't simulate actual CUDA computation.** They only
155
+  toggle `is_available`-shaped attributes. Tests that need a real GPU use
156
+  the `gpu` marker.
157
+- **Golden drift on torch bumps is expected.** Regeneration is the fix;
158
+  review the old vs new checksum side-by-side before approval.