documentlanguagemodel Public

Watch 0 Fork 0 Star 0

markdown · 4853 bytes Raw Blame History

Testing guide (contributor-facing)

Everything you need to run the test suite locally and understand what each layer does.

Layers

tests/
  test_smoke.py           package + CLI boot
  unit/                   fast, in-process, no network
  integration/            crosses 2+ modules (e.g. parser + store)
  e2e/                    full CLI against tmp stores
  fixtures/               factories + mocks (see below)
  golden/                 checked-in JSON goldens per (name, torch_version)

Markers

marker	meaning	default
(none)	fast unit, <1s each	run
`slow`	expensive; may load the tiny model	skipped
`gpu`	requires CUDA	skipped on CPU/MPS
`online`	touches the network (HF Hub)	skipped offline

pyproject.toml sets addopts = ["-m", "not slow and not gpu and not online"] so the default uv run pytest is always the fast, local subset.

Running

uv run pytest                         # fast subset, default
uv run pytest -m slow                 # tiny-model and long-running paths
uv run pytest -m "slow and online"    # tiny-model download + inference
uv run pytest --update-goldens        # regenerate goldens (see below)
uv run pytest -v path/to/test_file.py # single-file verbose

Fixtures

`tests/fixtures/dlm_factory.py`

Builds synthetic .dlm text. Stable shape matching Sprint 03's parser.

from tests.fixtures.dlm_factory import make_dlm, prose, instruction, preference

text = make_dlm(
    sections=[
        prose("# intro\n\nbody\n"),
        instruction(("Q1?", "A1."), ("Q2?", "A2.")),
        preference(("prompt", "good", "bad")),
    ],
    base_model="smollm2-135m",
    dlm_id="01HZ...",                # omit for a fresh ULID
    training_overrides={"lora_r": 16},
)

`tests/fixtures/hardware_mocks.py`

Context managers for backend simulation without real hardware.

from tests.fixtures.hardware_mocks import force_cuda, force_mps, force_cpu

with force_cuda(sm=(8, 9), vram_gb=24.0):
    # torch.cuda.is_available() is True, capability (8, 9), mem 24GB
    ...

with force_mps():
    # MPS is available; CUDA is not
    ...

Nesting works — the inner context is restored on exit.

`tests/fixtures/tiny_model.py`

SmolLM2-135M-Instruct as a session-scoped fixture. Download is gated behind @pytest.mark.online; the session-scoped tiny_model_dir fixture returns the cached path.

import pytest

@pytest.mark.online
@pytest.mark.slow
def test_something(tiny_model_dir):
    # tiny_model_dir is a pathlib.Path to the cached model
    ...

The revision is pinned via DLM_TINY_MODEL_REVISION (defaulting to main until Sprint 06's base-model registry owns the SHA).

`tests/fixtures/golden.py`

from tests.fixtures.golden import assert_golden

def test_loss_curve():
    values = compute_loss_curve()
    assert_golden({"loss": values}, name="loss-curve-v1")

Goldens live at tests/golden/<name>.torch-<version>.json. Bumping torch creates a new key; the old one stays until deliberately removed.

Regenerating goldens

uv run pytest --update-goldens

This flips assert_golden into write mode. Review the diff before committing:

git diff tests/golden/

A two-person review is mandatory for golden changes — they're determinism contracts. See Sprint 15's scripts/regen-determinism-golden.py for the heavier regeneration workflow once that lands.

CI layout

Three GitHub Actions jobs:

lint / typecheck / test — ubuntu-latest + macos-latest matrix. Runs ruff, ruff format --check, mypy, default pytest selection.
no-network sandbox — ubuntu-latest. Blocks egress via iptables, then runs the local-only CLI surfaces (dlm --version, --help, and later init/doctor/show). Asserts the "no telemetry, ever" promise.
slow tests (hf-cache) — ubuntu-latest. Restores HF cache keyed on (pyproject.toml hash, TINY_MODEL_REVISION), pre-warms the tiny model, then runs pytest -m slow.

Offline-first autouse

tests/conftest.py sets HF_HUB_OFFLINE=1 + TRANSFORMERS_OFFLINE=1 + HF_DATASETS_OFFLINE=1 via an autouse fixture. The tiny_model_dir fixture temporarily clears these for its scope when an online test opts in. This means a test that accidentally touches HF without the fixture will fail fast instead of downloading silently.

Common pitfalls

Importing torch in test collection is slow (~5s). Fixtures that need it import lazily inside functions.
Hardware mocks don't simulate actual CUDA computation. They only toggle is_available-shaped attributes. Tests that need a real GPU use the gpu marker.
Golden drift on torch bumps is expected. Regeneration is the fix; review the old vs new checksum side-by-side before approval.

View source

  
        1
        # Testing guide (contributor-facing)
      
        2
        
        3
        Everything you need to run the test suite locally and understand what each
      
        4
        layer does.
      
        5
        
        6
        ## Layers
      
        7
        
        8
        ```
      
        9
        tests/
      
        10
          test_smoke.py           package + CLI boot
      
        11
          unit/                   fast, in-process, no network
      
        12
          integration/            crosses 2+ modules (e.g. parser + store)
      
        13
          e2e/                    full CLI against tmp stores
      
        14
          fixtures/               factories + mocks (see below)
      
        15
          golden/                 checked-in JSON goldens per (name, torch_version)
      
        16
        ```
      
        17
        
        18
        ## Markers
      
        19
        
        20
        | marker | meaning | default |
      
        21
        |---|---|---|
      
        22
        | (none) | fast unit, <1s each | run |
      
        23
        | `slow` | expensive; may load the tiny model | **skipped** |
      
        24
        | `gpu` | requires CUDA | skipped on CPU/MPS |
      
        25
        | `online` | touches the network (HF Hub) | skipped offline |
      
        26
        
        27
        `pyproject.toml` sets `addopts = ["-m", "not slow and not gpu and not online"]`
      
        28
        so the default `uv run pytest` is always the fast, local subset.
      
        29
        
        30
        ## Running
      
        31
        
        32
        ```
      
        33
        uv run pytest                         # fast subset, default
      
        34
        uv run pytest -m slow                 # tiny-model and long-running paths
      
        35
        uv run pytest -m "slow and online"    # tiny-model download + inference
      
        36
        uv run pytest --update-goldens        # regenerate goldens (see below)
      
        37
        uv run pytest -v path/to/test_file.py # single-file verbose
      
        38
        ```
      
        39
        
        40
        ## Fixtures
      
        41
        
        42
        ### `tests/fixtures/dlm_factory.py`
      
        43
        
        44
        Builds synthetic `.dlm` text. Stable shape matching Sprint 03's parser.
      
        45
        
        46
        ```python
      
        47
        from tests.fixtures.dlm_factory import make_dlm, prose, instruction, preference
      
        48
        
        49
        text = make_dlm(
      
        50
            sections=[
      
        51
                prose("# intro\n\nbody\n"),
      
        52
                instruction(("Q1?", "A1."), ("Q2?", "A2.")),
      
        53
                preference(("prompt", "good", "bad")),
      
        54
            ],
      
        55
            base_model="smollm2-135m",
      
        56
            dlm_id="01HZ...",                # omit for a fresh ULID
      
        57
            training_overrides={"lora_r": 16},
      
        58
        )
      
        59
        ```
      
        60
        
        61
        ### `tests/fixtures/hardware_mocks.py`
      
        62
        
        63
        Context managers for backend simulation without real hardware.
      
        64
        
        65
        ```python
      
        66
        from tests.fixtures.hardware_mocks import force_cuda, force_mps, force_cpu
      
        67
        
        68
        with force_cuda(sm=(8, 9), vram_gb=24.0):
      
        69
            # torch.cuda.is_available() is True, capability (8, 9), mem 24GB
      
        70
            ...
      
        71
        
        72
        with force_mps():
      
        73
            # MPS is available; CUDA is not
      
        74
            ...
      
        75
        ```
      
        76
        
        77
        Nesting works — the inner context is restored on exit.
      
        78
        
        79
        ### `tests/fixtures/tiny_model.py`
      
        80
        
        81
        SmolLM2-135M-Instruct as a session-scoped fixture. Download is gated behind
      
        82
        `@pytest.mark.online`; the session-scoped `tiny_model_dir` fixture returns the
      
        83
        cached path.
      
        84
        
        85
        ```python
      
        86
        import pytest
      
        87
        
        88
        @pytest.mark.online
      
        89
        @pytest.mark.slow
      
        90
        def test_something(tiny_model_dir):
      
        91
            # tiny_model_dir is a pathlib.Path to the cached model
      
        92
            ...
      
        93
        ```
      
        94
        
        95
        The revision is pinned via `DLM_TINY_MODEL_REVISION` (defaulting to `main`
      
        96
        until Sprint 06's base-model registry owns the SHA).
      
        97
        
        98
        ### `tests/fixtures/golden.py`
      
        99
        
        100
        ```python
      
        101
        from tests.fixtures.golden import assert_golden
      
        102
        
        103
        def test_loss_curve():
      
        104
            values = compute_loss_curve()
      
        105
            assert_golden({"loss": values}, name="loss-curve-v1")
      
        106
        ```
      
        107
        
        108
        Goldens live at `tests/golden/<name>.torch-<version>.json`. Bumping torch
      
        109
        creates a new key; the old one stays until deliberately removed.
      
        110
        
        111
        ## Regenerating goldens
      
        112
        
        113
        ```
      
        114
        uv run pytest --update-goldens
      
        115
        ```
      
        116
        
        117
        This flips `assert_golden` into write mode. Review the diff before
      
        118
        committing:
      
        119
        
        120
        ```
      
        121
        git diff tests/golden/
      
        122
        ```
      
        123
        
        124
        A two-person review is mandatory for golden changes — they're determinism
      
        125
        contracts. See Sprint 15's `scripts/regen-determinism-golden.py` for the
      
        126
        heavier regeneration workflow once that lands.
      
        127
        
        128
        ## CI layout
      
        129
        
        130
        Three GitHub Actions jobs:
      
        131
        
        132
        1. **lint / typecheck / test** — ubuntu-latest + macos-latest matrix.
      
        133
           Runs ruff, ruff format --check, mypy, default pytest selection.
      
        134
        2. **no-network sandbox** — ubuntu-latest. Blocks egress via iptables,
      
        135
           then runs the local-only CLI surfaces (`dlm --version`, `--help`,
      
        136
           and later `init`/`doctor`/`show`). Asserts the "no telemetry, ever"
      
        137
           promise.
      
        138
        3. **slow tests (hf-cache)** — ubuntu-latest. Restores HF cache keyed
      
        139
           on `(pyproject.toml hash, TINY_MODEL_REVISION)`, pre-warms the tiny
      
        140
           model, then runs `pytest -m slow`.
      
        141
        
        142
        ## Offline-first autouse
      
        143
        
        144
        `tests/conftest.py` sets `HF_HUB_OFFLINE=1` + `TRANSFORMERS_OFFLINE=1` +
      
        145
        `HF_DATASETS_OFFLINE=1` via an autouse fixture. The `tiny_model_dir`
      
        146
        fixture temporarily clears these for its scope when an online test opts
      
        147
        in. This means a test that *accidentally* touches HF without the fixture
      
        148
        will fail fast instead of downloading silently.
      
        149
        
        150
        ## Common pitfalls
      
        151
        
        152
        - **Importing torch in test collection is slow** (~5s). Fixtures that
      
        153
          need it import lazily inside functions.
      
        154
        - **Hardware mocks don't simulate actual CUDA computation.** They only
      
        155
          toggle `is_available`-shaped attributes. Tests that need a real GPU use
      
        156
          the `gpu` marker.
      
        157
        - **Golden drift on torch bumps is expected.** Regeneration is the fix;
      
        158
          review the old vs new checksum side-by-side before approval.

1	# Testing guide (contributor-facing)
2
3	Everything you need to run the test suite locally and understand what each
4	layer does.
5
6	## Layers
7
8	```
9	tests/
10	test_smoke.py package + CLI boot
11	unit/ fast, in-process, no network
12	integration/ crosses 2+ modules (e.g. parser + store)
13	e2e/ full CLI against tmp stores
14	fixtures/ factories + mocks (see below)
15	golden/ checked-in JSON goldens per (name, torch_version)
16	```
17
18	## Markers
19
20	\| marker \| meaning \| default \|
21	\|---\|---\|---\|
22	\| (none) \| fast unit, <1s each \| run \|
23	\| `slow` \| expensive; may load the tiny model \| skipped \|
24	\| `gpu` \| requires CUDA \| skipped on CPU/MPS \|
25	\| `online` \| touches the network (HF Hub) \| skipped offline \|
26
27	`pyproject.toml` sets `addopts = ["-m", "not slow and not gpu and not online"]`
28	so the default `uv run pytest` is always the fast, local subset.
29
30	## Running
31
32	```
33	uv run pytest # fast subset, default
34	uv run pytest -m slow # tiny-model and long-running paths
35	uv run pytest -m "slow and online" # tiny-model download + inference
36	uv run pytest --update-goldens # regenerate goldens (see below)
37	uv run pytest -v path/to/test_file.py # single-file verbose
38	```
39
40	## Fixtures
41
42	### `tests/fixtures/dlm_factory.py`
43
44	Builds synthetic `.dlm` text. Stable shape matching Sprint 03's parser.
45
46	```python
47	from tests.fixtures.dlm_factory import make_dlm, prose, instruction, preference
48
49	text = make_dlm(
50	sections=[
51	prose("# intro\n\nbody\n"),
52	instruction(("Q1?", "A1."), ("Q2?", "A2.")),
53	preference(("prompt", "good", "bad")),
54	],
55	base_model="smollm2-135m",
56	dlm_id="01HZ...", # omit for a fresh ULID
57	training_overrides={"lora_r": 16},
58	)
59	```
60
61	### `tests/fixtures/hardware_mocks.py`
62
63	Context managers for backend simulation without real hardware.
64
65	```python
66	from tests.fixtures.hardware_mocks import force_cuda, force_mps, force_cpu
67
68	with force_cuda(sm=(8, 9), vram_gb=24.0):
69	# torch.cuda.is_available() is True, capability (8, 9), mem 24GB
70	...
71
72	with force_mps():
73	# MPS is available; CUDA is not
74	...
75	```
76
77	Nesting works — the inner context is restored on exit.
78
79	### `tests/fixtures/tiny_model.py`
80
81	SmolLM2-135M-Instruct as a session-scoped fixture. Download is gated behind
82	`@pytest.mark.online`; the session-scoped `tiny_model_dir` fixture returns the
83	cached path.
84
85	```python
86	import pytest
87
88	@pytest.mark.online
89	@pytest.mark.slow
90	def test_something(tiny_model_dir):
91	# tiny_model_dir is a pathlib.Path to the cached model
92	...
93	```
94
95	The revision is pinned via `DLM_TINY_MODEL_REVISION` (defaulting to `main`
96	until Sprint 06's base-model registry owns the SHA).
97
98	### `tests/fixtures/golden.py`
99
100	```python
101	from tests.fixtures.golden import assert_golden
102
103	def test_loss_curve():
104	values = compute_loss_curve()
105	assert_golden({"loss": values}, name="loss-curve-v1")
106	```
107
108	Goldens live at `tests/golden/<name>.torch-<version>.json`. Bumping torch
109	creates a new key; the old one stays until deliberately removed.
110
111	## Regenerating goldens
112
113	```
114	uv run pytest --update-goldens
115	```
116
117	This flips `assert_golden` into write mode. Review the diff before
118	committing:
119
120	```
121	git diff tests/golden/
122	```
123
124	A two-person review is mandatory for golden changes — they're determinism
125	contracts. See Sprint 15's `scripts/regen-determinism-golden.py` for the
126	heavier regeneration workflow once that lands.
127
128	## CI layout
129
130	Three GitHub Actions jobs:
131
132	1. lint / typecheck / test — ubuntu-latest + macos-latest matrix.
133	Runs ruff, ruff format --check, mypy, default pytest selection.
134	2. no-network sandbox — ubuntu-latest. Blocks egress via iptables,
135	then runs the local-only CLI surfaces (`dlm --version`, `--help`,
136	and later `init`/`doctor`/`show`). Asserts the "no telemetry, ever"
137	promise.
138	3. slow tests (hf-cache) — ubuntu-latest. Restores HF cache keyed
139	on `(pyproject.toml hash, TINY_MODEL_REVISION)`, pre-warms the tiny
140	model, then runs `pytest -m slow`.
141
142	## Offline-first autouse
143
144	`tests/conftest.py` sets `HF_HUB_OFFLINE=1` + `TRANSFORMERS_OFFLINE=1` +
145	`HF_DATASETS_OFFLINE=1` via an autouse fixture. The `tiny_model_dir`
146	fixture temporarily clears these for its scope when an online test opts
147	in. This means a test that accidentally touches HF without the fixture
148	will fail fast instead of downloading silently.
149
150	## Common pitfalls
151
152	- Importing torch in test collection is slow (~5s). Fixtures that
153	need it import lazily inside functions.
154	- Hardware mocks don't simulate actual CUDA computation. They only
155	toggle `is_available`-shaped attributes. Tests that need a real GPU use
156	the `gpu` marker.
157	- Golden drift on torch bumps is expected. Regeneration is the fix;
158	review the old vs new checksum side-by-side before approval.