Add determinism golden index

Status	File	+	-
A	`.determinism/lock.json`	5	0
M	`docs/determinism.md`	8	3
M	`scripts/regen-determinism-golden.py`	17	6
M	`src/dlm/base_models/license.py`	1	1
M	`src/dlm/lock/__init__.py`	34	6
M	`src/dlm/lock/errors.py`	18	0
A	`src/dlm/lock/golden_index.py`	136	0
A	`tests/unit/lock/test_golden_index.py`	128	0

.determinism/lock.jsonadded

++{
++  "goldens": [],
++  "lock_version": 1,
++  "updated_at": "2026-04-22T04:25:32"
++}

docs/determinism.mdmodified

  Proved by `tests/integration/lock/test_determinism_golden.py`, which
  runs two fresh training cycles on the tiny model and asserts the
--adapter SHAs match.
++adapter SHAs match. Approved tuple goldens are tracked at the repo
++level in `.determinism/lock.json`.
  ## What's in `dlm.lock`
 . Runs the tiny-model training twice; confirms the two SHAs match.
 . Writes `tests/golden/determinism/tuple-<hash>.json` keyed by a
     SHA-256 of the sorted version tuple + platform.
++4. Upserts `.determinism/lock.json` with the tuple path, adapter SHA,
++   platform, and pinned versions.
  Each tuple gets its own golden; the tuple file is keyed by content so
--running on a new platform simply writes a new golden file. The
++running on a new platform simply writes a new golden file. The repo-level
--reviewer checks in the new golden alongside the dep bump.
++index keeps the checked-in set explicit and avoids overloading the
++per-store `dlm.lock` name with a second meaning. The reviewer checks in
++the tuple file and the index update alongside the dep bump.
  ## Non-goals

scripts/regen-determinism-golden.pymodified

     - `regenerated_at` — UTC timestamp
     - `dlm_sha256` — hash of the synthetic training doc (reproducible
       across runs when the factory's ULID seed is pinned)
--5. Compare against the prior golden (if one existed) and print a diff.
++5. Upsert `.determinism/lock.json` with the checked-in tuple metadata.
--6. Exit non-zero unless `--approve` is passed. The default is
++6. Compare against the prior golden (if one existed) and print a diff.
++7. Exit non-zero unless `--approve` is passed. The default is
     dry-run-and-report so a stray script invocation doesn't silently
     overwrite a baseline.
      uv run python scripts/regen-determinism-golden.py           # dry run
      uv run python scripts/regen-determinism-golden.py --approve # write
--The matching root-level `dlm.lock` (distinct from the per-store
++The matching repo-level `.determinism/lock.json` records which tuples
--`dlm.lock`) records which tuples have a checked-in golden. CI computes
++have a checked-in golden. It is distinct from the per-store
--the current golden and fails iff that lock asserts a tuple has a
++`dlm.lock`, which captures one training run's determinism contract.
--golden but the on-disk file differs (catches silent drift on dep bump).
  """
  from __future__ import annotations
      import tempfile
++    from dlm.lock.golden_index import GOLDEN_INDEX_RELATIVE_PATH, upsert_golden_index
++
      versions = _current_versions()
      filename = _tuple_filename(versions)
      target = _GOLDEN_DIR / filename
++    golden_relpath = target.relative_to(_REPO_ROOT).as_posix()
      prior = None
      if target.is_file():
          try:
      _GOLDEN_DIR.mkdir(parents=True, exist_ok=True)
      target.write_text(json.dumps(payload, indent=2, sort_keys=True) + "\n", encoding="utf-8")
++    upsert_golden_index(
++        _REPO_ROOT,
++        golden_relpath=golden_relpath,
++        adapter_sha256=sha_a,
++        platform=payload["platform"],
++        pinned_versions=versions,
++    )
      print(f"[wrote] {target.relative_to(_REPO_ROOT)}")
++    print(f"[wrote] {GOLDEN_INDEX_RELATIVE_PATH}")
      return 0

src/dlm/base_models/license.pymodified


 - `manifest.json.license_acceptance`: the per-store durable record;
   read on every subsequent `dlm train` to verify the acceptance
   fingerprint is still present.
+- Per-store `dlm.lock.license_acceptance`: the determinism-contract
   mirror; divergence between the two triggers a lock re-check.
 
 The interactive prompt in `dlm init` lives in the CLI layer; this

src/dlm/lock/__init__.pymodified

  """Per-store `dlm.lock` — determinism contract for one `.dlm`.
--Separate from the repo-level `uv.lock` (tool-dep pins) and from the
++Separate from the repo-level `uv.lock` (tool-dep pins), from the
--`manifest.json` (training run narrative). The store-level `dlm.lock`
++repo-level determinism-golden index at `.determinism/lock.json`, and
--pins the tuple `(torch, transformers, peft, trl, bitsandbytes,
++from `manifest.json` (training run narrative). The store-level
--accelerate, llama.cpp tag, cuda/rocm, hardware_tier, seed,
++`dlm.lock` pins the tuple `(torch, transformers, peft, trl,
--determinism_flags, determinism_class)` and carries:
++bitsandbytes, accelerate, llama.cpp tag, cuda/rocm, hardware_tier,
++seed, determinism_flags, determinism_class)` and carries:
  - the hash of the `.dlm` source at the time the lock was written
  - the base-model revision + content hash
  from __future__ import annotations
  from dlm.lock.builder import build_lock, hardware_tier_from_backend, hash_dlm_file
--from dlm.lock.errors import LockError, LockSchemaError, LockValidationError, LockWriteError
++from dlm.lock.errors import (
++    GoldenIndexSchemaError,
++    GoldenIndexWriteError,
++    LockError,
++    LockSchemaError,
++    LockValidationError,
++    LockWriteError,
++)
++from dlm.lock.golden_index import (
++    CURRENT_GOLDEN_INDEX_VERSION,
++    GOLDEN_INDEX_RELATIVE_PATH,
++    DeterminismGoldenEntry,
++    DeterminismGoldenIndex,
++    golden_index_path,
++    load_golden_index,
++    upsert_golden_index,
++    write_golden_index,
++)
  from dlm.lock.policy import Severity, classify_mismatches
  from dlm.lock.schema import CURRENT_LOCK_VERSION, LOCK_FILENAME, DlmLock
  from dlm.lock.validator import LockDecision, LockMode, validate_lock
  __all__ = [
      "CURRENT_LOCK_VERSION",
++    "CURRENT_GOLDEN_INDEX_VERSION",
++    "GOLDEN_INDEX_RELATIVE_PATH",
++    "DeterminismGoldenEntry",
++    "DeterminismGoldenIndex",
++    "GoldenIndexSchemaError",
++    "GoldenIndexWriteError",
      "LOCK_FILENAME",
      "DlmLock",
      "LockDecision",
      "build_lock",
      "classify_mismatches",
      "hardware_tier_from_backend",
++    "golden_index_path",
      "hash_dlm_file",
      "load_lock",
++    "load_golden_index",
      "lock_path",
++    "upsert_golden_index",
      "validate_lock",
++    "write_golden_index",
      "write_lock",
+ ]

src/dlm/lock/errors.pymodified

          self.reasons = list(reasons)
          joined = "; ".join(reasons)
          super().__init__(f"{path}: lock validation failed ({joined})")
++
++
++class GoldenIndexSchemaError(LockError):
++    """Repo-level determinism-golden index is unreadable or schema-invalid."""
++
++    def __init__(self, path: Path, reason: str) -> None:
++        self.path = path
++        self.reason = reason
++        super().__init__(f"{path}: {reason}")
++
++
++class GoldenIndexWriteError(LockError):
++    """Programmer error on the repo-level determinism-golden index write path."""
++
++    def __init__(self, *, path: Path, reason: str) -> None:
++        self.path = path
++        self.reason = reason
++        super().__init__(f"{path}: write refused: {reason}")

src/dlm/lock/golden_index.pyadded

++"""Repo-level index of checked-in determinism goldens.
++
++Separate from the per-store `dlm.lock`: this file tracks which
++runtime tuples have an approved golden under `tests/golden/determinism/`.
++The canonical path is `.determinism/lock.json`.
++"""
++
++from __future__ import annotations
++
++import json
++from collections.abc import Mapping
++from datetime import UTC, datetime
++from pathlib import Path
++from typing import Final
++
++from pydantic import BaseModel, ConfigDict, Field
++
++from dlm.io.atomic import write_text
++from dlm.lock.errors import GoldenIndexSchemaError, GoldenIndexWriteError
++
++GOLDEN_INDEX_RELATIVE_PATH: Final[str] = ".determinism/lock.json"
++CURRENT_GOLDEN_INDEX_VERSION: Final[int] = 1
++
++
++class DeterminismGoldenEntry(BaseModel):
++    """One approved tuple golden tracked at repo scope."""
++
++    model_config = ConfigDict(extra="forbid", frozen=True)
++
++    golden_relpath: str = Field(
++        ...,
++        pattern=r"^tests/golden/determinism/tuple-[0-9a-f]{16}\.json$",
++    )
++    adapter_sha256: str = Field(..., pattern=r"^[0-9a-f]{64}$")
++    platform: str = Field(..., min_length=1)
++    pinned_versions: dict[str, str] = Field(default_factory=dict)
++
++
++class DeterminismGoldenIndex(BaseModel):
++    """Checked-in set of approved determinism goldens."""
++
++    model_config = ConfigDict(extra="forbid", frozen=True)
++
++    lock_version: int = Field(CURRENT_GOLDEN_INDEX_VERSION, ge=1)
++    updated_at: datetime
++    goldens: tuple[DeterminismGoldenEntry, ...] = ()
++
++
++def golden_index_path(repo_root: Path) -> Path:
++    """Return `<repo_root>/.determinism/lock.json`."""
++
++    return repo_root / GOLDEN_INDEX_RELATIVE_PATH
++
++
++def write_golden_index(repo_root: Path, index: DeterminismGoldenIndex) -> Path:
++    """Atomically persist the repo-level determinism-golden index."""
++
++    target = golden_index_path(repo_root)
++    if index.lock_version != CURRENT_GOLDEN_INDEX_VERSION:
++        raise GoldenIndexWriteError(
++            path=target,
++            reason=(
++                f"lock_version={index.lock_version!r} != writer's "
++                f"CURRENT_GOLDEN_INDEX_VERSION={CURRENT_GOLDEN_INDEX_VERSION}"
++            ),
++        )
++    target.parent.mkdir(parents=True, exist_ok=True)
++    payload = index.model_dump(mode="json")
++    text = json.dumps(payload, indent=2, sort_keys=True) + "\n"
++    write_text(target, text)
++    return target
++
++
++def load_golden_index(repo_root: Path) -> DeterminismGoldenIndex | None:
++    """Read `.determinism/lock.json`, returning `None` when absent."""
++
++    path = golden_index_path(repo_root)
++    if not path.is_file():
++        return None
++
++    try:
++        raw = path.read_text(encoding="utf-8")
++    except OSError as exc:
++        raise GoldenIndexSchemaError(path, f"unreadable: {exc}") from exc
++
++    try:
++        payload = json.loads(raw)
++    except json.JSONDecodeError as exc:
++        raise GoldenIndexSchemaError(path, f"invalid JSON: {exc}") from exc
++
++    if not isinstance(payload, dict):
++        raise GoldenIndexSchemaError(
++            path,
++            f"top-level JSON must be an object, got {type(payload).__name__}",
++        )
++
++    version = payload.get("lock_version")
++    if version != CURRENT_GOLDEN_INDEX_VERSION:
++        raise GoldenIndexSchemaError(
++            path,
++            f"unsupported lock_version {version!r} (reader expects {CURRENT_GOLDEN_INDEX_VERSION})",
++        )
++
++    try:
++        return DeterminismGoldenIndex.model_validate(payload)
++    except Exception as exc:
++        raise GoldenIndexSchemaError(path, f"schema validation: {exc}") from exc
++
++
++def upsert_golden_index(
++    repo_root: Path,
++    *,
++    golden_relpath: str,
++    adapter_sha256: str,
++    platform: str,
++    pinned_versions: Mapping[str, str],
++) -> Path:
++    """Insert or replace one tuple golden in `.determinism/lock.json`."""
++
++    current = load_golden_index(repo_root)
++    entries = {} if current is None else {entry.golden_relpath: entry for entry in current.goldens}
++    entries[golden_relpath] = DeterminismGoldenEntry(
++        golden_relpath=golden_relpath,
++        adapter_sha256=adapter_sha256,
++        platform=platform,
++        pinned_versions=dict(sorted(pinned_versions.items())),
++    )
++    updated = DeterminismGoldenIndex(
++        updated_at=_utcnow(),
++        goldens=tuple(sorted(entries.values(), key=lambda entry: entry.golden_relpath)),
++    )
++    return write_golden_index(repo_root, updated)
++
++
++def _utcnow() -> datetime:
++    return datetime.now(UTC).replace(tzinfo=None, microsecond=0)

tests/unit/lock/test_golden_index.pyadded

++"""Repo-level determinism-golden index I/O."""
++
++from __future__ import annotations
++
++from datetime import UTC, datetime
++from pathlib import Path
++
++import pytest
++
++from dlm.lock.errors import GoldenIndexSchemaError
++from dlm.lock.golden_index import (
++    GOLDEN_INDEX_RELATIVE_PATH,
++    DeterminismGoldenEntry,
++    DeterminismGoldenIndex,
++    golden_index_path,
++    load_golden_index,
++    upsert_golden_index,
++    write_golden_index,
++)
++
++
++def _index(*entries: DeterminismGoldenEntry) -> DeterminismGoldenIndex:
++    return DeterminismGoldenIndex(
++        updated_at=datetime(2026, 4, 22, 4, 25, 32, tzinfo=UTC),
++        goldens=entries,
++    )
++
++
++def _entry(
++    *,
++    golden_relpath: str = "tests/golden/determinism/tuple-0123456789abcdef.json",
++    adapter_sha256: str = "a" * 64,
++    platform: str = "darwin-arm64",
++) -> DeterminismGoldenEntry:
++    return DeterminismGoldenEntry(
++        golden_relpath=golden_relpath,
++        adapter_sha256=adapter_sha256,
++        platform=platform,
++        pinned_versions={"peft": "0.14.0", "torch": "2.5.1"},
++    )
++
++
++class TestGoldenIndexPath:
++    def test_returns_repo_relative_path(self, tmp_path: Path) -> None:
++        assert golden_index_path(tmp_path) == tmp_path / GOLDEN_INDEX_RELATIVE_PATH
++
++
++class TestWriteGoldenIndex:
++    def test_writes_readable_json(self, tmp_path: Path) -> None:
++        written = write_golden_index(tmp_path, _index(_entry()))
++        assert written.is_file()
++        text = written.read_text(encoding="utf-8")
++        assert text.endswith("\n")
++        assert text.index('"golden_relpath"') < text.index('"platform"')
++
++    def test_round_trip_equal(self, tmp_path: Path) -> None:
++        original = _index(_entry())
++        write_golden_index(tmp_path, original)
++        loaded = load_golden_index(tmp_path)
++        assert loaded == original
++
++
++class TestLoadGoldenIndex:
++    def test_missing_file_returns_none(self, tmp_path: Path) -> None:
++        assert load_golden_index(tmp_path) is None
++
++    def test_invalid_json_raises(self, tmp_path: Path) -> None:
++        golden_index_path(tmp_path).parent.mkdir(parents=True)
++        golden_index_path(tmp_path).write_text("{not valid", encoding="utf-8")
++        with pytest.raises(GoldenIndexSchemaError, match="invalid JSON"):
++            load_golden_index(tmp_path)
++
++    def test_non_object_top_level_raises(self, tmp_path: Path) -> None:
++        golden_index_path(tmp_path).parent.mkdir(parents=True)
++        golden_index_path(tmp_path).write_text("[]", encoding="utf-8")
++        with pytest.raises(GoldenIndexSchemaError, match="must be an object"):
++            load_golden_index(tmp_path)
++
++    def test_newer_version_is_rejected(self, tmp_path: Path) -> None:
++        golden_index_path(tmp_path).parent.mkdir(parents=True)
++        golden_index_path(tmp_path).write_text('{"lock_version": 99}', encoding="utf-8")
++        with pytest.raises(GoldenIndexSchemaError, match="unsupported lock_version"):
++            load_golden_index(tmp_path)
++
++
++class TestUpsertGoldenIndex:
++    def test_creates_index_when_absent(self, tmp_path: Path) -> None:
++        upsert_golden_index(
++            tmp_path,
++            golden_relpath="tests/golden/determinism/tuple-0123456789abcdef.json",
++            adapter_sha256="a" * 64,
++            platform="darwin-arm64",
++            pinned_versions={"torch": "2.5.1", "peft": "0.14.0"},
++        )
++        loaded = load_golden_index(tmp_path)
++        assert loaded is not None
++        assert [entry.golden_relpath for entry in loaded.goldens] == [
++            "tests/golden/determinism/tuple-0123456789abcdef.json"
++        ]
++
++    def test_overwrites_existing_entry_and_sorts(self, tmp_path: Path) -> None:
++        write_golden_index(
++            tmp_path,
++            _index(
++                _entry(golden_relpath="tests/golden/determinism/tuple-ffffffffffffffff.json"),
++                _entry(
++                    golden_relpath="tests/golden/determinism/tuple-aaaaaaaaaaaaaaaa.json",
++                    adapter_sha256="b" * 64,
++                ),
++            ),
++        )
++
++        upsert_golden_index(
++            tmp_path,
++            golden_relpath="tests/golden/determinism/tuple-ffffffffffffffff.json",
++            adapter_sha256="c" * 64,
++            platform="linux-x86_64",
++            pinned_versions={"torch": "2.6.0"},
++        )
++
++        loaded = load_golden_index(tmp_path)
++        assert loaded is not None
++        assert [entry.golden_relpath for entry in loaded.goldens] == [
++            "tests/golden/determinism/tuple-aaaaaaaaaaaaaaaa.json",
++            "tests/golden/determinism/tuple-ffffffffffffffff.json",
++        ]
++        assert loaded.goldens[1].adapter_sha256 == "c" * 64
++        assert loaded.goldens[1].platform == "linux-x86_64"

tenseleyflow/documentlanguagemodel / `15b524b`

8 changed files

`@@ -11,7 +11,7 @@` an `accept_license` flag against the spec.
11	- `manifest.json.license_acceptance`: the per-store durable record;	11	- `manifest.json.license_acceptance`: the per-store durable record;
12	read on every subsequent `dlm train` to verify the acceptance	12	read on every subsequent `dlm train` to verify the acceptance
13	fingerprint is still present.	13	fingerprint is still present.
14	-- Repo-level `dlm.lock.license_acceptance`: the determinism-contract	14	+- Per-store `dlm.lock.license_acceptance`: the determinism-contract
15	mirror; divergence between the two triggers a lock re-check.	15	mirror; divergence between the two triggers a lock re-check.
16		16
17	The interactive prompt in `dlm init` lives in the CLI layer; this	17	The interactive prompt in `dlm init` lives in the CLI layer; this