tenseleyflow/documentlanguagemodel / cae284f

Browse files

docs+ci: VL emitter trajectory + minisign install for signing-path CI coverage

Authored by espadonne
SHA
cae284f3bd9db1bb2692520b414a27bf579c13f1
Parents
be66b4a
Tree
cb120d4

2 changed files

StatusFile+-
M .github/workflows/ci.yml 16 0
M docs/cookbook/multimodal-training.md 37 15
.github/workflows/ci.ymlmodified
@@ -35,6 +35,22 @@ jobs:
3535
       - name: Sync dependencies
3636
         run: uv sync --all-extras --dev
3737
 
38
+      - name: Install minisign (for share/signing coverage)
39
+        # The signing code path probes `shutil.which("minisign")` and
40
+        # refuses with a typed error when absent. CI installs it so the
41
+        # "available → sign/verify" branch runs alongside the "absent"
42
+        # refusal branch that's exercised on developer machines without
43
+        # it. Best-effort: if the install fails (e.g. Homebrew rate
44
+        # limit), tests still pass via the refusal path.
45
+        run: |
46
+          if [ "${{ matrix.os }}" = "ubuntu-latest" ]; then
47
+            sudo apt-get update -qq
48
+            sudo apt-get install -y minisign || true
49
+          elif [ "${{ matrix.os }}" = "macos-latest" ]; then
50
+            brew install minisign || true
51
+          fi
52
+          command -v minisign && minisign -v || echo "minisign not available; tests use the refusal path"
53
+
3854
       - name: Ruff lint
3955
         run: uv run ruff check .
4056
 
docs/cookbook/multimodal-training.mdmodified
@@ -200,21 +200,43 @@ text content needs an image to anchor the placeholder.
200200
 
201201
 PaliGemma + batch=1 fits on 16 GB but leaves little headroom for
202202
 background processes. Close your browser, VS Code, etc. For
203
-persistent OOM, swap to CUDA or wait for Sprint 35.4's quantization
204
-support.
205
-
206
-## What's not yet in Sprint 35 v1
207
-
208
-- **Other VL bases.** Qwen2-VL-2B-Instruct + InternVL2-2B landed in
209
-  Sprint 35.3 — use `--base qwen2-vl-2b-instruct` or `--base
210
-  internvl2-2b`. See the base-selection section above.
211
-- **Audio.** Sprint 35.2 ships `::audio path="..." transcript="..."::`.
212
-- **GGUF export.** Sprint 35.4 shipped the llama.cpp arch detection
213
-  + VL-aware Modelfile renderer. The final piece is the dlm-side
214
-  single-file GGUF emitter that actually invokes
215
-  `convert_hf_to_gguf.py` for a VL adapter; until that lands, even
216
-  SUPPORTED bases fall through to HF-snapshot. The dispatcher's
217
-  banner tells you which verdict your base hit.
203
+persistent OOM, swap to CUDA (VL QLoRA is a planned follow-up).
204
+
205
+## Known limitations
206
+
218207
 - **Multi-image in one section.** Each `::image::` fence carries one
219208
   image; prompts can stack multiple `<image>` tokens by repeating
220209
   `--image` on the CLI.
210
+- **Audio ingest.** Audio is a separate path —
211
+  `::audio path="..." transcript="..."::` on an audio-language base.
212
+  See [audio-training.md](audio-training.md).
213
+
214
+## VL GGUF emitter trajectory
215
+
216
+The VL export path today routes every verdict through HF-snapshot
217
+and prints a banner. Going from that to single-file VL GGUF needs
218
+three pieces to line up, in order:
219
+
220
+1. **Upstream llama.cpp** registers the VL arch class in
221
+   `convert_hf_to_gguf.py` (currently only Qwen2-VL; PaliGemma and
222
+   InternVL2 are UNSUPPORTED at the pinned tag). Our
223
+   `scripts/bump-llama-cpp.sh` re-runs the arch probe on every bump
224
+   and caches verdicts in `vendor/llama_cpp_vl_arch_support.json`,
225
+   so re-verdicting is mechanical once a new llama.cpp tag lands.
226
+2. **The dlm-side emitter** invokes the upstream converter on a
227
+   merged VL adapter, packages the resulting GGUF, and hands it to
228
+   `render_vl_modelfile` for the Ollama-compatible Modelfile. The
229
+   renderer, arch probe, version guard, and per-family stops are
230
+   already in place; only the emitter orchestration is missing.
231
+3. **An integration test** picks one SUPPORTED base, trains a
232
+   1-step adapter on the fixture, converts to GGUF, runs
233
+   `ollama create`, and smoke-tests inference. The test scaffold
234
+   (auto-skip while UNSUPPORTED) is already checked in; the body
235
+   fills in when step 2 lands.
236
+
237
+Until all three align, `dlm export` on a VL base writes an
238
+HF-snapshot tarball — the same artifact a downstream recipient loads
239
+via `AutoModelForImageTextToText.from_pretrained` +
240
+`PeftModel.from_pretrained`. See
241
+[docs/hardware/vl-memory.md](../hardware/vl-memory.md#llamacpp-gguf-support-matrix-sprint-354)
242
+for the current per-arch verdicts.