@@ -1,302 +0,0 @@ |
| 1 | -# DocumentLanguageModel — Codex Session Boot Context | | |
| 2 | - | | |
| 3 | -> This file is read on every session start. Keep it dense, authoritative, and | | |
| 4 | -> aligned with the living docs. When anything here conflicts with `.docs/`, | | |
| 5 | -> `.docs/` wins and this file must be updated. | | |
| 6 | - | | |
| 7 | -## One-line | | |
| 8 | - | | |
| 9 | -A text file with a `.dlm` extension becomes a local, reproducible, trainable | | |
| 10 | -LLM. Edit the document, retrain, share. Not a toy — LoRA/QLoRA on a real | | |
| 11 | -pretrained base, exportable to Ollama. | | |
| 12 | - | | |
| 13 | -## Current stage | | |
| 14 | - | | |
| 15 | -- ✅ Stage 1 — Planning & reference exploration (see `.docs/findings.md`) | | |
| 16 | -- ✅ Stage 2 — Revised overview + 29 sprint files across 7 phases | | |
| 17 | -- ✅ Stage 3 — this file | | |
| 18 | -- ✅ Stage 4 — Audit 01 (YELLOW → patched). Blockers F01–F04 and majors | | |
| 19 | - F05–F22 triaged into Sprint 12b + inline sprint amendments. See | | |
| 20 | - `.docs/audits/01-initial-plan-audit.md` and the end of this file for | | |
| 21 | - the triage summary. | | |
| 22 | -- ⏳ Stage 5 — Implementation (begin at Sprint 01) | | |
| 23 | - | | |
| 24 | -## Where things live | | |
| 25 | - | | |
| 26 | -``` | | |
| 27 | -.docs/overview.md Canonical project description (read this first) | | |
| 28 | -.docs/findings.md Stage 1 digest from 8 parallel ref explorations | | |
| 29 | -.docs/sprints/00-index.md Master index of the 29 sprints | | |
| 30 | -.docs/sprints/phase-*/ Sprint files; each has DoD and risks | | |
| 31 | -.docs/audits/ Stage 4+ audit outputs | | |
| 32 | -.refs/ Cloned reference repos (gitignored) | | |
| 33 | -AGENTS.md You are here. Gitignored. | | |
| 34 | -``` | | |
| 35 | - | | |
| 36 | -`.docs/` and `AGENTS.md` are in `.gitignore` by user choice — planning | | |
| 37 | -artifacts stay local. | | |
| 38 | - | | |
| 39 | -## Crystallized architecture | | |
| 40 | - | | |
| 41 | -**Training paradigm**: LoRA / QLoRA on a user-selected pretrained base. No | | |
| 42 | -from-scratch transformers. The base registry ships with Qwen 2.5 | | |
| 43 | -(0.5B–3B + Coder-1.5B), Llama-3.2 (1B, 3B), SmolLM2 (135M–1.7B), and | | |
| 44 | -Phi-3.5-mini. Any HF model via `hf:org/name` with compatibility probes. | | |
| 45 | - | | |
| 46 | -**Document shape**: `mydoc.dlm` is a single UTF-8 text file — YAML | | |
| 47 | -frontmatter + markdown body with section fences (`::instruction::`, | | |
| 48 | -`::preference::`, default-prose). A stable `dlm_id` in the frontmatter | | |
| 49 | -binds the document to a content-addressed store at `~/.dlm/store/<dlm_id>/`. | | |
| 50 | - | | |
| 51 | -**Retention**: single rolling adapter trained on the current document + | | |
| 52 | -recency-weighted sample from a zstd-compressed replay corpus accumulating | | |
| 53 | -every prior document version. Rejected alternative: versioned adapters | | |
| 54 | -with weighted merge (LoRA-only, SVD cost, harder determinism). | | |
| 55 | - | | |
| 56 | -**Export**: separate `base.gguf` + `adapter.gguf` + generated Modelfile with | | |
| 57 | -`ADAPTER` directive. `--merged` opt-in produces a single file (QLoRA | | |
| 58 | -requires explicit `--dequantize`). | | |
| 59 | - | | |
| 60 | -**Hardware tiers**: | | |
| 61 | -- NVIDIA CUDA (SM ≥ 8.0): first-class, bf16 + QLoRA 4-bit + FlashAttention | | |
| 62 | -- NVIDIA CUDA (SM < 8.0): second-class, fp16 LoRA | | |
| 63 | -- Apple Silicon MPS: first-class training (fp16 LoRA), optional MLX inference in Phase 5 | | |
| 64 | -- CPU: inference-only by default, training refused except `--force` on ≤200M bases | | |
| 65 | -- AMD ROCm: experimental; Phase 5 promotes to Tier 2 | | |
| 66 | - | | |
| 67 | -## Stack | | |
| 68 | - | | |
| 69 | -**In**: Python 3.11+, PyTorch ≥ 2.4, HuggingFace `transformers`/`peft`/`trl`/ | | |
| 70 | -`accelerate`/`datasets`, `bitsandbytes` (CUDA-gated), `safetensors`, | | |
| 71 | -`zstandard`, llama.cpp (vendored git submodule) for GGUF export, | | |
| 72 | -Ollama (user-installed), `typer`, `rich`, `uv`, `pytest`, `mypy --strict`, | | |
| 73 | -`ruff`. | | |
| 74 | - | | |
| 75 | -**Out**: | | |
| 76 | -- Unsloth (monkeypatch fragility, transformers-version pinning hell, CUDA-only, Apple Silicon excluded) | | |
| 77 | -- MLX for training (adapter `.npz` format is not PEFT-compatible) | | |
| 78 | -- From-scratch transformers | | |
| 79 | -- DeepSpeed / ZeRO through v1.0 | | |
| 80 | -- Windows first-class (best-effort; Linux + macOS are supported tiers) | | |
| 81 | - | | |
| 82 | -## Pitfalls to always remember | | |
| 83 | - | | |
| 84 | -1. **Ollama uses Go `text/template`, not Jinja2.** The GGUF's Jinja | | |
| 85 | - chat-template is fuzzy-matched by Ollama and fails silently when | | |
| 86 | - unmatched. We always emit an explicit `TEMPLATE "..."` in the Modelfile | | |
| 87 | - from our per-base-model Go template registry. Round-trip tests assert | | |
| 88 | - token-identity with the HF Jinja reference. | | |
| 89 | - | | |
| 90 | -2. **`peft.save_pretrained` does NOT save optimizer / scheduler / RNG.** We | | |
| 91 | - write a separate `training_state.pt` sidecar with optimizer state, | | |
| 92 | - scheduler state, AMP scaler, torch/cuda/numpy/python RNGs, step, epoch, | | |
| 93 | - pinned versions. Without this, resume is not deterministic. | | |
| 94 | - | | |
| 95 | -3. **`merge_and_unload` on 4-bit QLoRA base is precision-unsafe.** Refuse | | |
| 96 | - the merged export path on QLoRA unless `--dequantize` is explicit; then | | |
| 97 | - dequantize to fp16 before merge. | | |
| 98 | - | | |
| 99 | -4. **Pad token must NOT default to EOS.** Label corruption when EOS | | |
| 100 | - appears mid-sequence. Fallback: unk_token → else add `<|pad|>` (and | | |
| 101 | - then `modules_to_save=["embed_tokens","lm_head"]` is forced, inflating | | |
| 102 | - adapter size; warn loudly). | | |
| 103 | - | | |
| 104 | -5. **Pre-tokenizer hash table in llama.cpp** is a silent-failure surface. | | |
| 105 | - Sprint 06 probes at registry-build time + on `dlm init --base hf:...`; | | |
| 106 | - Sprint 11 re-verifies at `dlm export` preflight. Bumping | | |
| 107 | - `vendor/llama.cpp` re-runs the registry probe suite via | | |
| 108 | - `scripts/bump-llama-cpp.sh`. | | |
| 109 | - | | |
| 110 | -6. **Sample packing without FlashAttention** causes `position_ids` drift on | | |
| 111 | - MPS. Doctor disables packing when FlashAttention is unavailable and | | |
| 112 | - packing is otherwise unsafe. | | |
| 113 | - | | |
| 114 | -7. **`target_modules="all-linear"` on small models** causes memory blowup | | |
| 115 | - and instability. Use the per-architecture registry from sprint 06 as | | |
| 116 | - the default. | | |
| 117 | - | | |
| 118 | -8. **Determinism is a contract**: fixed seed, `use_deterministic_algorithms`, | | |
| 119 | - `CUBLAS_WORKSPACE_CONFIG=:4096:8`, pinned versions recorded in | | |
| 120 | - `dlm.lock`. Any code change that breaks the golden determinism test is | | |
| 121 | - a breaking change. | | |
| 122 | - | | |
| 123 | -Full inventory in `.docs/findings.md#9`. | | |
| 124 | - | | |
| 125 | -## Contract boundaries (audit F25) | | |
| 126 | - | | |
| 127 | -Four load-bearing files; keep them distinct when editing. | | |
| 128 | - | | |
| 129 | -- **`manifest.json`** (per-store): running narrative of training runs, | | |
| 130 | - exports, content hashes, adapter version. Mutable on every run. Owned | | |
| 131 | - by Sprint 04; extended by Sprints 09, 11, 12, 12b. | | |
| 132 | -- **`dlm.lock`** (per-store): version pins + hardware tier + determinism | | |
| 133 | - flags + license acceptance fingerprint. Written once per run; stable. | | |
| 134 | - Owned by Sprint 15; extended by Sprint 12b (license) and Sprint 23 | | |
| 135 | - (world_size + accelerate). | | |
| 136 | -- **`training_state.pt`** (per-store, per-adapter-version): optimizer, | | |
| 137 | - scheduler, scaler, all RNGs, step/epoch. Required for bit-exact resume. | | |
| 138 | - Owned by Sprint 09. Two-phase commit with adapter directory. | | |
| 139 | -- **`exports/<quant>/export_manifest.json`** (per-export): checksums, | | |
| 140 | - quant level, pinned llama.cpp tag, smoke output. Owned by Sprint 11; | | |
| 141 | - appended via Sprint 12. | | |
| 142 | - | | |
| 143 | -And one repo-level file: | | |
| 144 | - | | |
| 145 | -- **`dlm.lock`** at the repo root: records which `(torch, transformers, | | |
| 146 | - peft, trl, bnb, platform)` tuples have a checked-in determinism golden. | | |
| 147 | - Different from the per-store `dlm.lock`. Owned by Sprint 15. | | |
| 148 | - | | |
| 149 | -## Development guidelines | | |
| 150 | - | | |
| 151 | -- **Commit often, commit small.** Avoid monolithic commits; maximize commits | | |
| 152 | - per feature so the history shows a narrative. One commit per distinct | | |
| 153 | - change (a file, a config, a fix), not per day's work. | | |
| 154 | -- **Commit message style**: imperative, terse, one line unless a technical | | |
| 155 | - choice requires elaboration. **No coauthorship** on any commit. | | |
| 156 | -- **Avoid `git add -A`.** Stage specific files by name; it's harder to | | |
| 157 | - leak secrets or commit unrelated changes. | | |
| 158 | -- **No shortcuts when a robust approach exists.** If you find yourself | | |
| 159 | - writing "the simplest approach is…", stop and ask whether this produces | | |
| 160 | - a trainable LLM. If not, reapproach. | | |
| 161 | -- **Senior AI-engineering discipline.** Write efficient, well-engineered | | |
| 162 | - code. Respect the pitfall inventory. | | |
| 163 | -- **Strict validation, fail fast.** Axolotl's permissive warnings are the | | |
| 164 | - anti-pattern. Our Pydantic schemas reject unknown keys, wrong types, and | | |
| 165 | - inconsistent combinations at parse time. | | |
| 166 | -- **Determinism is a contract.** See above. | | |
| 167 | -- **Tests before implementation** for anything touching training dynamics, | | |
| 168 | - tokenization, or GGUF export. The tiny-model fixture (sprint 02) makes | | |
| 169 | - end-to-end CI feasible; use it. | | |
| 170 | -- **`mypy --strict` from day one.** Never loosen; fix the type at source. | | |
| 171 | -- **Per-sprint definition of Done is binary.** A sprint is not Done until | | |
| 172 | - every DoD checkbox passes and the sprint file is marked Done. | | |
| 173 | - | | |
| 174 | -## Workflow inside a sprint | | |
| 175 | - | | |
| 176 | -1. Read `.docs/sprints/phase-N/NN-*.md` in full. | | |
| 177 | -2. Cross-check against `.docs/findings.md` where the sprint references | | |
| 178 | - pitfalls or patterns (the sprints do cite sections). | | |
| 179 | -3. Implement incrementally. Commit per file / per logical unit. | | |
| 180 | -4. Write tests alongside (or before) the code. | | |
| 181 | -5. Check every DoD item manually before flipping Status to Done. | | |
| 182 | -6. Update `.docs/sprints/00-index.md` status column if we maintain one. | | |
| 183 | - | | |
| 184 | -## CLI surface by release | | |
| 185 | - | | |
| 186 | -**v1.0** (Phase 3 end): | | |
| 187 | -``` | | |
| 188 | -dlm init <path> [--base <key>] [--template <name>] [--i-accept-license] | | |
| 189 | -dlm train <path> [--resume|--fresh] [--seed N] [--max-steps N] [--gpus ...] | | |
| 190 | - [--strict-lock|--update-lock|--ignore-lock] | | |
| 191 | -dlm prompt <path> [query] [--max-tokens N] [--temp F] [--adapter <name,...>] | | |
| 192 | -dlm export <path> [--quant Q] [--merged [--dequantize]] [--name N] [--no-smoke] | | |
| 193 | - [--adapter-mix name:w,...] | | |
| 194 | -dlm pack <path> [--out X] [--include-exports] [--include-base | | |
| 195 | - [--i-am-the-licensee <url>]] | | |
| 196 | -dlm unpack <path> [--home DIR] [--force] | | |
| 197 | -dlm migrate <path> [--dry-run] [--no-backup] | | |
| 198 | -dlm doctor [--json] | | |
| 199 | -dlm show <path> [--json] | | |
| 200 | -``` | | |
| 201 | - | | |
| 202 | -**v2** (Phases 4–6): | | |
| 203 | -``` | | |
| 204 | -dlm repl <path> | | |
| 205 | -dlm train <path> --watch [--repl] | | |
| 206 | -dlm metrics <path> [--json|--csv] | | |
| 207 | -dlm metrics watch <path> | | |
| 208 | -dlm templates list [--refresh] | | |
| 209 | -``` | | |
| 210 | - | | |
| 211 | -**v2+** (Phase 7): | | |
| 212 | -``` | | |
| 213 | -dlm push <path> [--to hf:org/name | --to <url>] [--sign] | | |
| 214 | -dlm pull <source> | | |
| 215 | -dlm serve <path> [--public [--i-know-this-is-public]] | | |
| 216 | -``` | | |
| 217 | - | | |
| 218 | -## Stage gates | | |
| 219 | - | | |
| 220 | -- Stage 4 — **Patched (YELLOW → triaged)**. New Sprint 12b owns F01–F04. | | |
| 221 | - 17 majors amended inline into existing sprints. 9 minors deferred to | | |
| 222 | - first touch of their owning sprints. A re-audit pass is recommended | | |
| 223 | - before declaring GREEN and entering Stage 5. | | |
| 224 | -- Stage 5 — begin Sprint 01 (scaffolding) once Stage 4 is GREEN. | | |
| 225 | - | | |
| 226 | -## Context for future sessions | | |
| 227 | - | | |
| 228 | -- Always load `.docs/overview.md`, `.docs/findings.md`, and | | |
| 229 | - `.docs/sprints/00-index.md` before working on a sprint. Skim the | | |
| 230 | - relevant sprint file in full. | | |
| 231 | -- The user prefers concise, direct engineering discussion. Surface | | |
| 232 | - tradeoffs; make recommendations with reasoning. | | |
| 233 | -- When in doubt about an implementation choice, check findings §10 | | |
| 234 | - (adoption matrix per reference repo) — it's the opinionated source of | | |
| 235 | - truth for "why are we doing it this way, not that way." | | |
| 236 | - | | |
| 237 | - | | |
| 238 | -<claude-mem-context> | | |
| 239 | -# Memory Context | | |
| 240 | - | | |
| 241 | -# [DocumentLanguageModel] recent context, 2026-04-19 7:40pm EDT | | |
| 242 | - | | |
| 243 | -Legend: 🎯session 🔴bugfix 🟣feature 🔄refactor ✅change 🔵discovery ⚖️decision | | |
| 244 | -Format: ID TIME TYPE TITLE | | |
| 245 | -Fetch details: get_observations([IDs]) | Search: mem-search skill | | |
| 246 | - | | |
| 247 | -Stats: 50 obs (20,314t read) | 1,271,868t work | 98% savings | | |
| 248 | - | | |
| 249 | -### Apr 18, 2026 | | |
| 250 | -92 5:26p 🔵 armfortas/fortsh Build Produces Widespread Ambiguous USE Import Warnings | | |
| 251 | -94 5:27p 🔵 fortsh Full Build Succeeds via armfortas — Complete Object Link Map Confirmed | | |
| 252 | -98 5:29p 🔵 fortsh Smoke Tests Pass — Parameter Expansion and Pipeline Basics Verified | | |
| 253 | -99 " 🔵 fortsh Test Suite Results — read 94%, var-ops 80% with Identified Failure Clusters | | |
| 254 | -100 " 🔴 Null Pointer Dereference in afs_compare_char When Empty String Variable Used in Parameter Expansion | | |
| 255 | -101 5:32p 🔵 Empty-String Parameter Expansion Bug Isolated to Assignment Side-Effect, Not Expansion Engine | | |
| 256 | -105 5:34p 🔵 V="" Assignment Alone Crashes via execute_ast_node — Bug Is in Assignment Executor, Not Compound Commands | | |
| 257 | -111 5:36p 🔵 armfortas IR Builder Architecture — FuncBuilder API Surface Mapped | | |
| 258 | -113 5:38p 🔵 SIGSEGV Confirmed — Dynamic Substring Index on Zero-Length Allocatable Char Crashes | | |
| 259 | -117 5:39p 🔵 fortsh Crash Site Confirmed in ast_executor.f90 — Dynamic Substring on Zero-Length Allocatable | | |
| 260 | -120 5:40p 🔴 lower_substring_full — Dynamic Substring Out-of-Bounds GEP Fixed with Safe Clamp | | |
| 261 | -121 5:42p 🔴 substring fix validated — 8/8 substring tests pass, repro RC=0, fortsh build proceeding without errors | | |
| 262 | -137 5:55p 🔵 armfortas allocate(scalar_derived) Skips Field Default Initializers | | |
| 263 | -138 " 🔵 fortsh IFS / read Builtin Architecture Confirmed | | |
| 264 | -139 5:56p 🔴 armfortas: allocate(scalar_derived) Now Applies Field Default Initializers | | |
| 265 | -142 5:59p 🔵 fortsh Build Completes with Ambiguous USE Import Warnings in readline Module | | |
| 266 | -143 6:01p 🔵 fortsh Build Produces Ambiguous USE Import Warnings from Duplicate Module Exports | | |
| 267 | -145 " 🔵 armfortas trim/adjustl Branch Produces Correct Value but print '(a)' Adds Leading Space | | |
| 268 | -146 6:03p 🔵 armfortas print '(a)' Emits Carriage-Control Space — Confirmed by od Byte Dump | | |
| 269 | -147 " 🟣 Regression Test Added: allocatable_shell_default_ifs_follows_trim_branch | | |
| 270 | -148 " 🔵 fortsh Builtin Test Results: read 100%, arithmetic 100%, variable_ops 85%, arrays 0% on literal init | | |
| 271 | -149 " 🔵 fortsh Array Literal Init Bug — Bounds Check Failure: index 1 outside [1, 0] | | |
| 272 | -153 6:06p 🔵 Array Section Argument Descriptor Bug — values(1:count) Passed as Assumed-Shape Gets upper=0 | | |
| 273 | -155 6:14p 🔵 armfortas Emits Duplicate Ambiguous-USE Warnings Per Translation Unit | | |
| 274 | -156 " 🔵 fortsh Makefile Has Full Native armfortas Profile | | |
| 275 | -157 " 🟣 armfortas Rust Test Suite — Array Section Bounds Test Passing | | |
| 276 | -158 " 🔵 fortsh Binary Previously Built by armfortas — Incremental Rebuild in Progress | | |
| 277 | -159 " 🔵 fortsh test_variables_simple Uses Pooled String API — Not Standard Fortran Variables | | |
| 278 | -160 6:15p ✅ armfortas Array Section Fix Staged for Commit — lower.rs and cli_driver.rs | | |
| 279 | -161 " 🔴 armfortas Commit 4ec3e9a — Lower Array Section Descriptor Actuals | | |
| 280 | -162 6:16p 🔵 fortsh Incremental Rebuild with armfortas Completed Successfully | | |
| 281 | -163 6:17p 🔵 armfortas Working Tree — Active afs-as/afs-ld Changes Plus Repro Test Artifacts | | |
| 282 | -177 6:24p 🔵 fortsh Build — Mass Ambiguous USE Import Warnings from armfortas | | |
| 283 | -179 6:27p 🔵 armfortas Peak RSS ~99 MB Compiling fortsh lexer.f90 | | |
| 284 | -182 6:30p 🟣 fortsh Binary Successfully Built with armfortas — /tmp/fortsh_armf_arrayfix/bin/fortsh | | |
| 285 | -183 6:31p 🔵 fortsh 1.7.0 Binary Verified Functional — Basic Array and Pipeline Semantics Correct | | |
| 286 | -184 " 🔵 Array Test Suite Baseline — 17/31 Pass (54%), 14 Failures Cataloged | | |
| 287 | -192 6:34p 🔵 Test Harness Uses Bash 3.2 as Reference — Assoc Array Failures Are Baseline Artifacts | | |
| 288 | -193 " 🔵 Three armfortas-Specific Array Regressions Confirmed Against flang-ref Baseline | | |
| 289 | -220 6:50p 🔵 armfortas Unset Module Variable — Parity with flang-new Confirmed | | |
| 290 | -222 " 🔵 fortsh Array Unset Bugs — Two Distinct Failures in armfortas Build vs Correct flang Reference | | |
| 291 | -223 6:53p 🔵 fortsh Null-Assignment Hole (`arr[1]=`) Produces Correct Sparse Indices in armfortas Build | | |
| 292 | -224 " 🔵 AST Executor Does Not Dispatch `unset` as Builtin — "command not found" at Runtime | | |
| 293 | -228 " 🔵 armfortas `builtin_unset` Direct Call Crashes — "Bounds check failed: index 1026 outside [1, 1025]" | | |
| 294 | -231 6:54p 🔵 armfortas `unset foo[N]` on Non-Existent Variable Crashes — Bug Not Array-Existence-Dependent | | |
| 295 | -232 " 🔵 fortsh `execute_simple_command` Builtin Dispatch — Routes Through `execute_pipeline`, Not Direct Call | | |
| 296 | -234 6:56p 🔵 armfortas `unset` Bug Scope Narrowed — Scalar `unset` Works, Array-Index Form Always Crashes | | |
| 297 | -240 7:39p 🔵 armfortas expand_out Token Output Shows Null-Byte Corruption | | |
| 298 | -241 " 🔵 flang-new Cannot Compile expand_out Repro — `fill` Not Found in Module `m` | | |
| 299 | -242 7:40p 🔵 armfortas .amod Exports `fill` With Fixed-Length Allocatable Character(len=32) Intent(out) | | |
| 300 | - | | |
| 301 | -Access 1272k tokens of past work via get_observations([IDs]) or mem-search skill. | | |
| 302 | -</claude-mem-context> | | |