@@ -0,0 +1,302 @@ |
| | 1 | +# DocumentLanguageModel — Codex Session Boot Context |
| | 2 | + |
| | 3 | +> This file is read on every session start. Keep it dense, authoritative, and |
| | 4 | +> aligned with the living docs. When anything here conflicts with `.docs/`, |
| | 5 | +> `.docs/` wins and this file must be updated. |
| | 6 | + |
| | 7 | +## One-line |
| | 8 | + |
| | 9 | +A text file with a `.dlm` extension becomes a local, reproducible, trainable |
| | 10 | +LLM. Edit the document, retrain, share. Not a toy — LoRA/QLoRA on a real |
| | 11 | +pretrained base, exportable to Ollama. |
| | 12 | + |
| | 13 | +## Current stage |
| | 14 | + |
| | 15 | +- ✅ Stage 1 — Planning & reference exploration (see `.docs/findings.md`) |
| | 16 | +- ✅ Stage 2 — Revised overview + 29 sprint files across 7 phases |
| | 17 | +- ✅ Stage 3 — this file |
| | 18 | +- ✅ Stage 4 — Audit 01 (YELLOW → patched). Blockers F01–F04 and majors |
| | 19 | + F05–F22 triaged into Sprint 12b + inline sprint amendments. See |
| | 20 | + `.docs/audits/01-initial-plan-audit.md` and the end of this file for |
| | 21 | + the triage summary. |
| | 22 | +- ⏳ Stage 5 — Implementation (begin at Sprint 01) |
| | 23 | + |
| | 24 | +## Where things live |
| | 25 | + |
| | 26 | +``` |
| | 27 | +.docs/overview.md Canonical project description (read this first) |
| | 28 | +.docs/findings.md Stage 1 digest from 8 parallel ref explorations |
| | 29 | +.docs/sprints/00-index.md Master index of the 29 sprints |
| | 30 | +.docs/sprints/phase-*/ Sprint files; each has DoD and risks |
| | 31 | +.docs/audits/ Stage 4+ audit outputs |
| | 32 | +.refs/ Cloned reference repos (gitignored) |
| | 33 | +AGENTS.md You are here. Gitignored. |
| | 34 | +``` |
| | 35 | + |
| | 36 | +`.docs/` and `AGENTS.md` are in `.gitignore` by user choice — planning |
| | 37 | +artifacts stay local. |
| | 38 | + |
| | 39 | +## Crystallized architecture |
| | 40 | + |
| | 41 | +**Training paradigm**: LoRA / QLoRA on a user-selected pretrained base. No |
| | 42 | +from-scratch transformers. The base registry ships with Qwen 2.5 |
| | 43 | +(0.5B–3B + Coder-1.5B), Llama-3.2 (1B, 3B), SmolLM2 (135M–1.7B), and |
| | 44 | +Phi-3.5-mini. Any HF model via `hf:org/name` with compatibility probes. |
| | 45 | + |
| | 46 | +**Document shape**: `mydoc.dlm` is a single UTF-8 text file — YAML |
| | 47 | +frontmatter + markdown body with section fences (`::instruction::`, |
| | 48 | +`::preference::`, default-prose). A stable `dlm_id` in the frontmatter |
| | 49 | +binds the document to a content-addressed store at `~/.dlm/store/<dlm_id>/`. |
| | 50 | + |
| | 51 | +**Retention**: single rolling adapter trained on the current document + |
| | 52 | +recency-weighted sample from a zstd-compressed replay corpus accumulating |
| | 53 | +every prior document version. Rejected alternative: versioned adapters |
| | 54 | +with weighted merge (LoRA-only, SVD cost, harder determinism). |
| | 55 | + |
| | 56 | +**Export**: separate `base.gguf` + `adapter.gguf` + generated Modelfile with |
| | 57 | +`ADAPTER` directive. `--merged` opt-in produces a single file (QLoRA |
| | 58 | +requires explicit `--dequantize`). |
| | 59 | + |
| | 60 | +**Hardware tiers**: |
| | 61 | +- NVIDIA CUDA (SM ≥ 8.0): first-class, bf16 + QLoRA 4-bit + FlashAttention |
| | 62 | +- NVIDIA CUDA (SM < 8.0): second-class, fp16 LoRA |
| | 63 | +- Apple Silicon MPS: first-class training (fp16 LoRA), optional MLX inference in Phase 5 |
| | 64 | +- CPU: inference-only by default, training refused except `--force` on ≤200M bases |
| | 65 | +- AMD ROCm: experimental; Phase 5 promotes to Tier 2 |
| | 66 | + |
| | 67 | +## Stack |
| | 68 | + |
| | 69 | +**In**: Python 3.11+, PyTorch ≥ 2.4, HuggingFace `transformers`/`peft`/`trl`/ |
| | 70 | +`accelerate`/`datasets`, `bitsandbytes` (CUDA-gated), `safetensors`, |
| | 71 | +`zstandard`, llama.cpp (vendored git submodule) for GGUF export, |
| | 72 | +Ollama (user-installed), `typer`, `rich`, `uv`, `pytest`, `mypy --strict`, |
| | 73 | +`ruff`. |
| | 74 | + |
| | 75 | +**Out**: |
| | 76 | +- Unsloth (monkeypatch fragility, transformers-version pinning hell, CUDA-only, Apple Silicon excluded) |
| | 77 | +- MLX for training (adapter `.npz` format is not PEFT-compatible) |
| | 78 | +- From-scratch transformers |
| | 79 | +- DeepSpeed / ZeRO through v1.0 |
| | 80 | +- Windows first-class (best-effort; Linux + macOS are supported tiers) |
| | 81 | + |
| | 82 | +## Pitfalls to always remember |
| | 83 | + |
| | 84 | +1. **Ollama uses Go `text/template`, not Jinja2.** The GGUF's Jinja |
| | 85 | + chat-template is fuzzy-matched by Ollama and fails silently when |
| | 86 | + unmatched. We always emit an explicit `TEMPLATE "..."` in the Modelfile |
| | 87 | + from our per-base-model Go template registry. Round-trip tests assert |
| | 88 | + token-identity with the HF Jinja reference. |
| | 89 | + |
| | 90 | +2. **`peft.save_pretrained` does NOT save optimizer / scheduler / RNG.** We |
| | 91 | + write a separate `training_state.pt` sidecar with optimizer state, |
| | 92 | + scheduler state, AMP scaler, torch/cuda/numpy/python RNGs, step, epoch, |
| | 93 | + pinned versions. Without this, resume is not deterministic. |
| | 94 | + |
| | 95 | +3. **`merge_and_unload` on 4-bit QLoRA base is precision-unsafe.** Refuse |
| | 96 | + the merged export path on QLoRA unless `--dequantize` is explicit; then |
| | 97 | + dequantize to fp16 before merge. |
| | 98 | + |
| | 99 | +4. **Pad token must NOT default to EOS.** Label corruption when EOS |
| | 100 | + appears mid-sequence. Fallback: unk_token → else add `<|pad|>` (and |
| | 101 | + then `modules_to_save=["embed_tokens","lm_head"]` is forced, inflating |
| | 102 | + adapter size; warn loudly). |
| | 103 | + |
| | 104 | +5. **Pre-tokenizer hash table in llama.cpp** is a silent-failure surface. |
| | 105 | + Sprint 06 probes at registry-build time + on `dlm init --base hf:...`; |
| | 106 | + Sprint 11 re-verifies at `dlm export` preflight. Bumping |
| | 107 | + `vendor/llama.cpp` re-runs the registry probe suite via |
| | 108 | + `scripts/bump-llama-cpp.sh`. |
| | 109 | + |
| | 110 | +6. **Sample packing without FlashAttention** causes `position_ids` drift on |
| | 111 | + MPS. Doctor disables packing when FlashAttention is unavailable and |
| | 112 | + packing is otherwise unsafe. |
| | 113 | + |
| | 114 | +7. **`target_modules="all-linear"` on small models** causes memory blowup |
| | 115 | + and instability. Use the per-architecture registry from sprint 06 as |
| | 116 | + the default. |
| | 117 | + |
| | 118 | +8. **Determinism is a contract**: fixed seed, `use_deterministic_algorithms`, |
| | 119 | + `CUBLAS_WORKSPACE_CONFIG=:4096:8`, pinned versions recorded in |
| | 120 | + `dlm.lock`. Any code change that breaks the golden determinism test is |
| | 121 | + a breaking change. |
| | 122 | + |
| | 123 | +Full inventory in `.docs/findings.md#9`. |
| | 124 | + |
| | 125 | +## Contract boundaries (audit F25) |
| | 126 | + |
| | 127 | +Four load-bearing files; keep them distinct when editing. |
| | 128 | + |
| | 129 | +- **`manifest.json`** (per-store): running narrative of training runs, |
| | 130 | + exports, content hashes, adapter version. Mutable on every run. Owned |
| | 131 | + by Sprint 04; extended by Sprints 09, 11, 12, 12b. |
| | 132 | +- **`dlm.lock`** (per-store): version pins + hardware tier + determinism |
| | 133 | + flags + license acceptance fingerprint. Written once per run; stable. |
| | 134 | + Owned by Sprint 15; extended by Sprint 12b (license) and Sprint 23 |
| | 135 | + (world_size + accelerate). |
| | 136 | +- **`training_state.pt`** (per-store, per-adapter-version): optimizer, |
| | 137 | + scheduler, scaler, all RNGs, step/epoch. Required for bit-exact resume. |
| | 138 | + Owned by Sprint 09. Two-phase commit with adapter directory. |
| | 139 | +- **`exports/<quant>/export_manifest.json`** (per-export): checksums, |
| | 140 | + quant level, pinned llama.cpp tag, smoke output. Owned by Sprint 11; |
| | 141 | + appended via Sprint 12. |
| | 142 | + |
| | 143 | +And one repo-level file: |
| | 144 | + |
| | 145 | +- **`dlm.lock`** at the repo root: records which `(torch, transformers, |
| | 146 | + peft, trl, bnb, platform)` tuples have a checked-in determinism golden. |
| | 147 | + Different from the per-store `dlm.lock`. Owned by Sprint 15. |
| | 148 | + |
| | 149 | +## Development guidelines |
| | 150 | + |
| | 151 | +- **Commit often, commit small.** Avoid monolithic commits; maximize commits |
| | 152 | + per feature so the history shows a narrative. One commit per distinct |
| | 153 | + change (a file, a config, a fix), not per day's work. |
| | 154 | +- **Commit message style**: imperative, terse, one line unless a technical |
| | 155 | + choice requires elaboration. **No coauthorship** on any commit. |
| | 156 | +- **Avoid `git add -A`.** Stage specific files by name; it's harder to |
| | 157 | + leak secrets or commit unrelated changes. |
| | 158 | +- **No shortcuts when a robust approach exists.** If you find yourself |
| | 159 | + writing "the simplest approach is…", stop and ask whether this produces |
| | 160 | + a trainable LLM. If not, reapproach. |
| | 161 | +- **Senior AI-engineering discipline.** Write efficient, well-engineered |
| | 162 | + code. Respect the pitfall inventory. |
| | 163 | +- **Strict validation, fail fast.** Axolotl's permissive warnings are the |
| | 164 | + anti-pattern. Our Pydantic schemas reject unknown keys, wrong types, and |
| | 165 | + inconsistent combinations at parse time. |
| | 166 | +- **Determinism is a contract.** See above. |
| | 167 | +- **Tests before implementation** for anything touching training dynamics, |
| | 168 | + tokenization, or GGUF export. The tiny-model fixture (sprint 02) makes |
| | 169 | + end-to-end CI feasible; use it. |
| | 170 | +- **`mypy --strict` from day one.** Never loosen; fix the type at source. |
| | 171 | +- **Per-sprint definition of Done is binary.** A sprint is not Done until |
| | 172 | + every DoD checkbox passes and the sprint file is marked Done. |
| | 173 | + |
| | 174 | +## Workflow inside a sprint |
| | 175 | + |
| | 176 | +1. Read `.docs/sprints/phase-N/NN-*.md` in full. |
| | 177 | +2. Cross-check against `.docs/findings.md` where the sprint references |
| | 178 | + pitfalls or patterns (the sprints do cite sections). |
| | 179 | +3. Implement incrementally. Commit per file / per logical unit. |
| | 180 | +4. Write tests alongside (or before) the code. |
| | 181 | +5. Check every DoD item manually before flipping Status to Done. |
| | 182 | +6. Update `.docs/sprints/00-index.md` status column if we maintain one. |
| | 183 | + |
| | 184 | +## CLI surface by release |
| | 185 | + |
| | 186 | +**v1.0** (Phase 3 end): |
| | 187 | +``` |
| | 188 | +dlm init <path> [--base <key>] [--template <name>] [--i-accept-license] |
| | 189 | +dlm train <path> [--resume|--fresh] [--seed N] [--max-steps N] [--gpus ...] |
| | 190 | + [--strict-lock|--update-lock|--ignore-lock] |
| | 191 | +dlm prompt <path> [query] [--max-tokens N] [--temp F] [--adapter <name,...>] |
| | 192 | +dlm export <path> [--quant Q] [--merged [--dequantize]] [--name N] [--no-smoke] |
| | 193 | + [--adapter-mix name:w,...] |
| | 194 | +dlm pack <path> [--out X] [--include-exports] [--include-base |
| | 195 | + [--i-am-the-licensee <url>]] |
| | 196 | +dlm unpack <path> [--home DIR] [--force] |
| | 197 | +dlm migrate <path> [--dry-run] [--no-backup] |
| | 198 | +dlm doctor [--json] |
| | 199 | +dlm show <path> [--json] |
| | 200 | +``` |
| | 201 | + |
| | 202 | +**v2** (Phases 4–6): |
| | 203 | +``` |
| | 204 | +dlm repl <path> |
| | 205 | +dlm train <path> --watch [--repl] |
| | 206 | +dlm metrics <path> [--json|--csv] |
| | 207 | +dlm metrics watch <path> |
| | 208 | +dlm templates list [--refresh] |
| | 209 | +``` |
| | 210 | + |
| | 211 | +**v2+** (Phase 7): |
| | 212 | +``` |
| | 213 | +dlm push <path> [--to hf:org/name | --to <url>] [--sign] |
| | 214 | +dlm pull <source> |
| | 215 | +dlm serve <path> [--public [--i-know-this-is-public]] |
| | 216 | +``` |
| | 217 | + |
| | 218 | +## Stage gates |
| | 219 | + |
| | 220 | +- Stage 4 — **Patched (YELLOW → triaged)**. New Sprint 12b owns F01–F04. |
| | 221 | + 17 majors amended inline into existing sprints. 9 minors deferred to |
| | 222 | + first touch of their owning sprints. A re-audit pass is recommended |
| | 223 | + before declaring GREEN and entering Stage 5. |
| | 224 | +- Stage 5 — begin Sprint 01 (scaffolding) once Stage 4 is GREEN. |
| | 225 | + |
| | 226 | +## Context for future sessions |
| | 227 | + |
| | 228 | +- Always load `.docs/overview.md`, `.docs/findings.md`, and |
| | 229 | + `.docs/sprints/00-index.md` before working on a sprint. Skim the |
| | 230 | + relevant sprint file in full. |
| | 231 | +- The user prefers concise, direct engineering discussion. Surface |
| | 232 | + tradeoffs; make recommendations with reasoning. |
| | 233 | +- When in doubt about an implementation choice, check findings §10 |
| | 234 | + (adoption matrix per reference repo) — it's the opinionated source of |
| | 235 | + truth for "why are we doing it this way, not that way." |
| | 236 | + |
| | 237 | + |
| | 238 | +<claude-mem-context> |
| | 239 | +# Memory Context |
| | 240 | + |
| | 241 | +# [DocumentLanguageModel] recent context, 2026-04-19 7:40pm EDT |
| | 242 | + |
| | 243 | +Legend: 🎯session 🔴bugfix 🟣feature 🔄refactor ✅change 🔵discovery ⚖️decision |
| | 244 | +Format: ID TIME TYPE TITLE |
| | 245 | +Fetch details: get_observations([IDs]) | Search: mem-search skill |
| | 246 | + |
| | 247 | +Stats: 50 obs (20,314t read) | 1,271,868t work | 98% savings |
| | 248 | + |
| | 249 | +### Apr 18, 2026 |
| | 250 | +92 5:26p 🔵 armfortas/fortsh Build Produces Widespread Ambiguous USE Import Warnings |
| | 251 | +94 5:27p 🔵 fortsh Full Build Succeeds via armfortas — Complete Object Link Map Confirmed |
| | 252 | +98 5:29p 🔵 fortsh Smoke Tests Pass — Parameter Expansion and Pipeline Basics Verified |
| | 253 | +99 " 🔵 fortsh Test Suite Results — read 94%, var-ops 80% with Identified Failure Clusters |
| | 254 | +100 " 🔴 Null Pointer Dereference in afs_compare_char When Empty String Variable Used in Parameter Expansion |
| | 255 | +101 5:32p 🔵 Empty-String Parameter Expansion Bug Isolated to Assignment Side-Effect, Not Expansion Engine |
| | 256 | +105 5:34p 🔵 V="" Assignment Alone Crashes via execute_ast_node — Bug Is in Assignment Executor, Not Compound Commands |
| | 257 | +111 5:36p 🔵 armfortas IR Builder Architecture — FuncBuilder API Surface Mapped |
| | 258 | +113 5:38p 🔵 SIGSEGV Confirmed — Dynamic Substring Index on Zero-Length Allocatable Char Crashes |
| | 259 | +117 5:39p 🔵 fortsh Crash Site Confirmed in ast_executor.f90 — Dynamic Substring on Zero-Length Allocatable |
| | 260 | +120 5:40p 🔴 lower_substring_full — Dynamic Substring Out-of-Bounds GEP Fixed with Safe Clamp |
| | 261 | +121 5:42p 🔴 substring fix validated — 8/8 substring tests pass, repro RC=0, fortsh build proceeding without errors |
| | 262 | +137 5:55p 🔵 armfortas allocate(scalar_derived) Skips Field Default Initializers |
| | 263 | +138 " 🔵 fortsh IFS / read Builtin Architecture Confirmed |
| | 264 | +139 5:56p 🔴 armfortas: allocate(scalar_derived) Now Applies Field Default Initializers |
| | 265 | +142 5:59p 🔵 fortsh Build Completes with Ambiguous USE Import Warnings in readline Module |
| | 266 | +143 6:01p 🔵 fortsh Build Produces Ambiguous USE Import Warnings from Duplicate Module Exports |
| | 267 | +145 " 🔵 armfortas trim/adjustl Branch Produces Correct Value but print '(a)' Adds Leading Space |
| | 268 | +146 6:03p 🔵 armfortas print '(a)' Emits Carriage-Control Space — Confirmed by od Byte Dump |
| | 269 | +147 " 🟣 Regression Test Added: allocatable_shell_default_ifs_follows_trim_branch |
| | 270 | +148 " 🔵 fortsh Builtin Test Results: read 100%, arithmetic 100%, variable_ops 85%, arrays 0% on literal init |
| | 271 | +149 " 🔵 fortsh Array Literal Init Bug — Bounds Check Failure: index 1 outside [1, 0] |
| | 272 | +153 6:06p 🔵 Array Section Argument Descriptor Bug — values(1:count) Passed as Assumed-Shape Gets upper=0 |
| | 273 | +155 6:14p 🔵 armfortas Emits Duplicate Ambiguous-USE Warnings Per Translation Unit |
| | 274 | +156 " 🔵 fortsh Makefile Has Full Native armfortas Profile |
| | 275 | +157 " 🟣 armfortas Rust Test Suite — Array Section Bounds Test Passing |
| | 276 | +158 " 🔵 fortsh Binary Previously Built by armfortas — Incremental Rebuild in Progress |
| | 277 | +159 " 🔵 fortsh test_variables_simple Uses Pooled String API — Not Standard Fortran Variables |
| | 278 | +160 6:15p ✅ armfortas Array Section Fix Staged for Commit — lower.rs and cli_driver.rs |
| | 279 | +161 " 🔴 armfortas Commit 4ec3e9a — Lower Array Section Descriptor Actuals |
| | 280 | +162 6:16p 🔵 fortsh Incremental Rebuild with armfortas Completed Successfully |
| | 281 | +163 6:17p 🔵 armfortas Working Tree — Active afs-as/afs-ld Changes Plus Repro Test Artifacts |
| | 282 | +177 6:24p 🔵 fortsh Build — Mass Ambiguous USE Import Warnings from armfortas |
| | 283 | +179 6:27p 🔵 armfortas Peak RSS ~99 MB Compiling fortsh lexer.f90 |
| | 284 | +182 6:30p 🟣 fortsh Binary Successfully Built with armfortas — /tmp/fortsh_armf_arrayfix/bin/fortsh |
| | 285 | +183 6:31p 🔵 fortsh 1.7.0 Binary Verified Functional — Basic Array and Pipeline Semantics Correct |
| | 286 | +184 " 🔵 Array Test Suite Baseline — 17/31 Pass (54%), 14 Failures Cataloged |
| | 287 | +192 6:34p 🔵 Test Harness Uses Bash 3.2 as Reference — Assoc Array Failures Are Baseline Artifacts |
| | 288 | +193 " 🔵 Three armfortas-Specific Array Regressions Confirmed Against flang-ref Baseline |
| | 289 | +220 6:50p 🔵 armfortas Unset Module Variable — Parity with flang-new Confirmed |
| | 290 | +222 " 🔵 fortsh Array Unset Bugs — Two Distinct Failures in armfortas Build vs Correct flang Reference |
| | 291 | +223 6:53p 🔵 fortsh Null-Assignment Hole (`arr[1]=`) Produces Correct Sparse Indices in armfortas Build |
| | 292 | +224 " 🔵 AST Executor Does Not Dispatch `unset` as Builtin — "command not found" at Runtime |
| | 293 | +228 " 🔵 armfortas `builtin_unset` Direct Call Crashes — "Bounds check failed: index 1026 outside [1, 1025]" |
| | 294 | +231 6:54p 🔵 armfortas `unset foo[N]` on Non-Existent Variable Crashes — Bug Not Array-Existence-Dependent |
| | 295 | +232 " 🔵 fortsh `execute_simple_command` Builtin Dispatch — Routes Through `execute_pipeline`, Not Direct Call |
| | 296 | +234 6:56p 🔵 armfortas `unset` Bug Scope Narrowed — Scalar `unset` Works, Array-Index Form Always Crashes |
| | 297 | +240 7:39p 🔵 armfortas expand_out Token Output Shows Null-Byte Corruption |
| | 298 | +241 " 🔵 flang-new Cannot Compile expand_out Repro — `fill` Not Found in Module `m` |
| | 299 | +242 7:40p 🔵 armfortas .amod Exports `fill` With Fixed-Length Allocatable Character(len=32) Intent(out) |
| | 300 | + |
| | 301 | +Access 1272k tokens of past work via get_observations([IDs]) or mem-search skill. |
| | 302 | +</claude-mem-context> |