documentlanguagemodel Public
Contributing
Hey — glad you're here. DLM is a small, opinionated project, and patches that fit the project's shape are very welcome. No sign-up, no CLA, just open an issue if the change is non-trivial and a PR when you're ready.
Getting set up
You'll need Python 3.11+ and uv.
git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git
cd DocumentLanguageModel
uv sync --all-extras --dev
uv run dlm --help
The four checks CI runs — run them locally before pushing:
uv run ruff check .
uv run ruff format --check .
uv run mypy src/dlm
uv run pytest
If you've only touched one module, you can run that module's tests
directly (uv run pytest tests/unit/pack -q, etc.). The full suite
takes around eight seconds.
mypy --strict is non-negotiable — if you need to loosen a type,
please fix the type at its source instead.
Pre-commit hooks
Install once per clone to catch ruff / mypy / non-slow-pytest failures before they reach CI:
uv run pre-commit install
The config lives at .pre-commit-config.yaml. The local hook runs
pytest -m "not slow and not gpu and not online", so it's ~8 seconds
on a warm cache.
Testing conventions (markers, fixtures, the tiny-model fixture, golden outputs) are documented separately at docs-internal/README-testing.md.
Coverage
We keep each shipped module above 95% line coverage; CI enforces it.
When you add a new module, add a matching gate in .github/workflows/ci.yml
next to the existing ones. When you add a branch that's hard to exercise
in unit tests — a real-GPU path, a subprocess that needs a full HF
model — mark it # pragma: no cover with a short reason, and write a
slow-marked integration test for it.
Commits
I care about commits being readable later — not because there's a
style police, but because git log is the one place future-you reads
history when something breaks.
- One commit per logical change. A new source file plus its tests is usually one commit. Unrelated fixes go in separate commits.
- Imperative subject line, under ~72 chars:
feat(export): emit explicit Go template in Modelfile, notUpdated the export module to emit the template. - If the why needs a paragraph, put it in the commit body. If it doesn't, one line is fine.
- Stage files by name (
git add path/to/file.py).git add -Apicks up stray files we don't want in the repo. - No coauthor trailers. No
--no-verify. If a pre-commit hook fails, fix what it's telling you about.
Pull requests
- Link the issue or discussion your change resolves, if there is one.
- If your change could alter training output under a fixed seed, call it out: "this breaks the determinism golden, here's the regeneration path." Better to know up front than discover it in a retrain.
- For larger cross-cutting changes (a new dependency, a new on-disk format, anything that touches the manifest schema), open an issue first so we can nail down the design before you write the code.
Scope
DLM has a clear story — edit a document, train a LoRA, export to Ollama, do it locally, don't forget on retrain. Contributions that fit that story are the easy ones to land. Things that add scope (new training paradigms, cloud integrations, alternate inference backends) are worth discussing in an issue first; sometimes the answer is "yes, but later," and it saves you from writing code that won't merge.
A few things we actively don't want:
- Silent-failure surfaces. If a preflight can't verify something, it refuses rather than warns.
- Backwards-compat shims for code that hasn't shipped yet. If a v1 hasn't gone out the door, you can rename a function without a deprecation wrapper.
- Telemetry or network calls outside of model download. Ever.
Releasing
Tag-driven. Pushing v* triggers .github/workflows/release.yml,
which runs the full CI gate, builds a "fat" source tarball (includes
vendor/llama.cpp/ so the Homebrew formula can drop the convert
scripts into libexec without cloning submodules), and creates a
GitHub release with the tarball + computed sha256.
Docs are built strict in CI but not hosted — read them in-repo or in the brew-installed tarball. Hosting (GitHub Pages or a custom subdomain) is a separate change, deferred post-v0.9.0.
We publish to PyPI as
document-language-model.
The PYPI_TOKEN secret in GitHub Actions handles automated releases via
.github/workflows/release.yml. We also maintain a Homebrew tap at
tenseleyFlow/homebrew-tap.
Conservative versioning
Stay below v1.0.0 until a human has trained + exported +
ollama run'd an adapter end-to-end. That's the only contract v1.0
actually owes users. Current target: v0.9.0 for the first tagged
release.
Pre-flight (run locally before tagging)
uv run ruff check .
uv run ruff format --check .
uv run mypy src/dlm
uv run pytest
uv sync --group docs
uv run mkdocs build --strict
Bump the version in pyproject.toml, move ## [Unreleased] entries
under a new ## [X.Y.Z] heading in CHANGELOG.md, and land both in
one commit.
Tagging
git tag v0.9.0
git push origin v0.9.0
release.yml classifies the tag via
packaging.version.Version.is_prerelease:
- Prerelease (
v0.9.0rc1,v0.9.0a1,v0.9.0-rc1): GitHub release gets theprereleaseflag so it doesn't show as "latest." - Release (
v0.9.0,v0.9.1): standard GitHub release.
Bumping the Homebrew formula
After the release workflow finishes, it prints the fat-tarball sha256
in the release notes. Bump Formula/dlm.rb in the tap:
url "https://github.com/tenseleyFlow/DocumentLanguageModel/releases/download/v0.9.0/dlm-v0.9.0.tar.gz"
sha256 "<copy from release notes>"
Then:
cd ~/path/to/homebrew-tap
brew install --build-from-source ./Formula/dlm.rb # local smoke
brew test ./Formula/dlm.rb # runs the `test do` block
git commit -am "dlm: bump to v0.9.0"
git push
Rollback
Homebrew rollback is straightforward: delete the bad GitHub release
(or mark it draft), revert the formula bump in the tap. Users who
already installed the bad version can brew uninstall dlm && brew install dlm to pick up the revert.
Thanks again — reach out in issues if anything's unclear.
-mfw
View source
| 1 | # Contributing |
| 2 | |
| 3 | Hey — glad you're here. DLM is a small, opinionated project, and patches |
| 4 | that fit the project's shape are very welcome. No sign-up, no CLA, just |
| 5 | open an issue if the change is non-trivial and a PR when you're ready. |
| 6 | |
| 7 | ## Getting set up |
| 8 | |
| 9 | You'll need Python 3.11+ and [uv](https://github.com/astral-sh/uv). |
| 10 | |
| 11 | ```sh |
| 12 | git clone https://github.com/tenseleyFlow/DocumentLanguageModel.git |
| 13 | cd DocumentLanguageModel |
| 14 | uv sync --all-extras --dev |
| 15 | uv run dlm --help |
| 16 | ``` |
| 17 | |
| 18 | The four checks CI runs — run them locally before pushing: |
| 19 | |
| 20 | ```sh |
| 21 | uv run ruff check . |
| 22 | uv run ruff format --check . |
| 23 | uv run mypy src/dlm |
| 24 | uv run pytest |
| 25 | ``` |
| 26 | |
| 27 | If you've only touched one module, you can run that module's tests |
| 28 | directly (`uv run pytest tests/unit/pack -q`, etc.). The full suite |
| 29 | takes around eight seconds. |
| 30 | |
| 31 | `mypy --strict` is non-negotiable — if you need to loosen a type, |
| 32 | please fix the type at its source instead. |
| 33 | |
| 34 | ### Pre-commit hooks |
| 35 | |
| 36 | Install once per clone to catch ruff / mypy / non-slow-pytest |
| 37 | failures before they reach CI: |
| 38 | |
| 39 | ```sh |
| 40 | uv run pre-commit install |
| 41 | ``` |
| 42 | |
| 43 | The config lives at `.pre-commit-config.yaml`. The local hook runs |
| 44 | `pytest -m "not slow and not gpu and not online"`, so it's ~8 seconds |
| 45 | on a warm cache. |
| 46 | |
| 47 | Testing conventions (markers, fixtures, the tiny-model fixture, |
| 48 | golden outputs) are documented separately at |
| 49 | [docs-internal/README-testing.md](./docs-internal/README-testing.md). |
| 50 | |
| 51 | ## Coverage |
| 52 | |
| 53 | We keep each shipped module above 95% line coverage; CI enforces it. |
| 54 | When you add a new module, add a matching gate in `.github/workflows/ci.yml` |
| 55 | next to the existing ones. When you add a branch that's hard to exercise |
| 56 | in unit tests — a real-GPU path, a subprocess that needs a full HF |
| 57 | model — mark it `# pragma: no cover` with a short reason, and write a |
| 58 | slow-marked integration test for it. |
| 59 | |
| 60 | ## Commits |
| 61 | |
| 62 | I care about commits being readable later — not because there's a |
| 63 | style police, but because `git log` is the one place future-you reads |
| 64 | history when something breaks. |
| 65 | |
| 66 | - One commit per logical change. A new source file plus its tests is |
| 67 | usually one commit. Unrelated fixes go in separate commits. |
| 68 | - Imperative subject line, under ~72 chars: |
| 69 | `feat(export): emit explicit Go template in Modelfile`, not |
| 70 | `Updated the export module to emit the template`. |
| 71 | - If the *why* needs a paragraph, put it in the commit body. If it |
| 72 | doesn't, one line is fine. |
| 73 | - Stage files by name (`git add path/to/file.py`). `git add -A` picks |
| 74 | up stray files we don't want in the repo. |
| 75 | - No coauthor trailers. No `--no-verify`. If a pre-commit hook fails, |
| 76 | fix what it's telling you about. |
| 77 | |
| 78 | ## Pull requests |
| 79 | |
| 80 | - Link the issue or discussion your change resolves, if there is one. |
| 81 | - If your change could alter training output under a fixed seed, call |
| 82 | it out: "this breaks the determinism golden, here's the regeneration |
| 83 | path." Better to know up front than discover it in a retrain. |
| 84 | - For larger cross-cutting changes (a new dependency, a new on-disk |
| 85 | format, anything that touches the manifest schema), open an issue |
| 86 | first so we can nail down the design before you write the code. |
| 87 | |
| 88 | ## Scope |
| 89 | |
| 90 | DLM has a clear story — edit a document, train a LoRA, export to |
| 91 | Ollama, do it locally, don't forget on retrain. Contributions that |
| 92 | fit that story are the easy ones to land. Things that add scope |
| 93 | (new training paradigms, cloud integrations, alternate inference |
| 94 | backends) are worth discussing in an issue first; sometimes the |
| 95 | answer is "yes, but later," and it saves you from writing code that |
| 96 | won't merge. |
| 97 | |
| 98 | A few things we actively don't want: |
| 99 | |
| 100 | - Silent-failure surfaces. If a preflight can't verify something, |
| 101 | it refuses rather than warns. |
| 102 | - Backwards-compat shims for code that hasn't shipped yet. If a v1 |
| 103 | hasn't gone out the door, you can rename a function without a |
| 104 | deprecation wrapper. |
| 105 | - Telemetry or network calls outside of model download. Ever. |
| 106 | |
| 107 | ## Releasing |
| 108 | |
| 109 | Tag-driven. Pushing `v*` triggers `.github/workflows/release.yml`, |
| 110 | which runs the full CI gate, builds a "fat" source tarball (includes |
| 111 | `vendor/llama.cpp/` so the Homebrew formula can drop the convert |
| 112 | scripts into libexec without cloning submodules), and creates a |
| 113 | GitHub release with the tarball + computed sha256. |
| 114 | |
| 115 | Docs are built strict in CI but not hosted — read them in-repo or in |
| 116 | the brew-installed tarball. Hosting (GitHub Pages or a custom |
| 117 | subdomain) is a separate change, deferred post-v0.9.0. |
| 118 | |
| 119 | We publish to PyPI as |
| 120 | [document-language-model](https://pypi.org/project/document-language-model/). |
| 121 | The `PYPI_TOKEN` secret in GitHub Actions handles automated releases via |
| 122 | `.github/workflows/release.yml`. We also maintain a Homebrew tap at |
| 123 | [tenseleyFlow/homebrew-tap](https://github.com/tenseleyFlow/homebrew-tap). |
| 124 | |
| 125 | ### Conservative versioning |
| 126 | |
| 127 | Stay below `v1.0.0` until a human has trained + exported + |
| 128 | `ollama run`'d an adapter end-to-end. That's the only contract v1.0 |
| 129 | actually owes users. Current target: `v0.9.0` for the first tagged |
| 130 | release. |
| 131 | |
| 132 | ### Pre-flight (run locally before tagging) |
| 133 | |
| 134 | ```sh |
| 135 | uv run ruff check . |
| 136 | uv run ruff format --check . |
| 137 | uv run mypy src/dlm |
| 138 | uv run pytest |
| 139 | uv sync --group docs |
| 140 | uv run mkdocs build --strict |
| 141 | ``` |
| 142 | |
| 143 | Bump the version in `pyproject.toml`, move `## [Unreleased]` entries |
| 144 | under a new `## [X.Y.Z]` heading in `CHANGELOG.md`, and land both in |
| 145 | one commit. |
| 146 | |
| 147 | ### Tagging |
| 148 | |
| 149 | ```sh |
| 150 | git tag v0.9.0 |
| 151 | git push origin v0.9.0 |
| 152 | ``` |
| 153 | |
| 154 | `release.yml` classifies the tag via |
| 155 | `packaging.version.Version.is_prerelease`: |
| 156 | |
| 157 | - **Prerelease** (`v0.9.0rc1`, `v0.9.0a1`, `v0.9.0-rc1`): GitHub |
| 158 | release gets the `prerelease` flag so it doesn't show as "latest." |
| 159 | - **Release** (`v0.9.0`, `v0.9.1`): standard GitHub release. |
| 160 | |
| 161 | ### Bumping the Homebrew formula |
| 162 | |
| 163 | After the release workflow finishes, it prints the fat-tarball sha256 |
| 164 | in the release notes. Bump `Formula/dlm.rb` in the tap: |
| 165 | |
| 166 | ```ruby |
| 167 | url "https://github.com/tenseleyFlow/DocumentLanguageModel/releases/download/v0.9.0/dlm-v0.9.0.tar.gz" |
| 168 | sha256 "<copy from release notes>" |
| 169 | ``` |
| 170 | |
| 171 | Then: |
| 172 | |
| 173 | ```sh |
| 174 | cd ~/path/to/homebrew-tap |
| 175 | brew install --build-from-source ./Formula/dlm.rb # local smoke |
| 176 | brew test ./Formula/dlm.rb # runs the `test do` block |
| 177 | git commit -am "dlm: bump to v0.9.0" |
| 178 | git push |
| 179 | ``` |
| 180 | |
| 181 | ### Rollback |
| 182 | |
| 183 | Homebrew rollback is straightforward: delete the bad GitHub release |
| 184 | (or mark it draft), revert the formula bump in the tap. Users who |
| 185 | already installed the bad version can `brew uninstall dlm && brew |
| 186 | install dlm` to pick up the revert. |
| 187 | |
| 188 | Thanks again — reach out in issues if anything's unclear. |
| 189 | |
| 190 | -mfw |