Smart-HTTP git protocol
S12 ships git clone, git fetch, and git push over HTTPS. Authentication is HTTP Basic with either a password or a personal access token. The handlers shell out to canonical git's upload-pack / receive-pack plumbing — no protocol re-implementation. Streams pack data without buffering. Cancels propagate to the subprocess so a closed client doesn't leave git running.
What's wired
internal/git/protocol/pktline.go— minimal pkt-line writer used to prepend the# service=...header to the info/refs response.internal/git/protocol/exec.go—Cmd(ctx, svc, gitDir, advertiseRefs, env)builds the*exec.Cmdforgit-{upload,receive}-pack --stateless-rpc [--advertise-refs] <repo>.DrainStderrruns a goroutine that copies stderr into a 16 KiB capped buffer; the OS pipe never fills, the caller's stdout copy never deadlocks.WaitDelay = 250msso a stuck subprocess doesn't pin a worker after ctx cancel.internal/web/handlers/githttp/auth.go—resolveBasicAuth(ctx, header)parses HTTP Basic, prefers the PAT path when the secret carries the canonicalshithub_pat_prefix, and falls back to argon2id password verification. Constant-time discipline: missing username runspassword.VerifyAgainstDummyso timing doesn't leak existence.internal/web/handlers/githttp/handler.go— the four route bodies + inline owner-only authorization.MountSmartHTTP(r)registers them on a CSRF/timeout/compression-exempt route group.internal/web/githttp_wiring.go—buildGitHTTPHandlersconstructs the handler set from cfg + pool.internal/web/handlers/handlers.go—GitHTTPMounterDep field; the route group sits alongside the static / CSRF-protected groups.internal/web/server.go— Compress + Timeout no longer global; they're applied per-group in handlers.go so the git group can stream uncompressed for many minutes.
Routes
All four are registered without CSRF, without response compression, and without the global request timeout. Request bodies on the POST endpoints are capped at Deps.MaxPushBytes (default 2 GiB) via http.MaxBytesReader.
| Route | Method | Notes |
|---|---|---|
/{owner}/{repo}.git/info/refs?service=git-upload-pack |
GET | public allowed; private 401 with WWW-Authenticate: Basic realm="shithub" |
/{owner}/{repo}.git/info/refs?service=git-receive-pack |
GET | always 401 if anon; otherwise inline owner-only check |
/{owner}/{repo}.git/git-upload-pack |
POST | streams req.Body → upload-pack stdin → w |
/{owner}/{repo}.git/git-receive-pack |
POST | same shape; sets SHITHUB_* env vars on the subprocess (S14 hooks consume them) |
Auth shape
header == ""
│
▼
Anonymous=true ──┐
│
header has Basic creds → decode user:secret
│
secret has "shithub_pat_" prefix?
┌───────────yes────────────┐ no
▼ ▼ │
resolveViaPAT resolveViaPassword
│ │ │
found? username found? & pw ok?
┌─────yes──┐ ┌─────yes──┐
▼ ▼ ▼ ▼
ResolvedAuth fall ResolvedAuth errBadCredentials
through to (constant-time discipline:
password path runs VerifyAgainstDummy
when user lookup fails)
The ResolvedAuth result carries UserID, Username, and ViaPAT. Anonymous resolves only when the header is missing entirely — bad credentials always become errBadCredentials regardless of which path failed, so callers can't probe for valid usernames or distinguish "wrong password" from "non-existent user."
Permissions (S15 will refactor)
Inline V1 rules:
- Read (upload-pack):
- public repo → anonymous OK; auth'd user OK as long as not suspended.
- private repo → must be the owner.
- Write (receive-pack):
- must be authenticated.
- must be the owner.
- repo must NOT be archived (we surface "repository is archived; pushes are disabled" so
git push's stderr shows it). - repo must NOT be soft-deleted (410 Gone).
S15 lands policy.Can — this whole block becomes a single function call.
Subprocess lifecycle
http handler protocol.Cmd git
─────────────────────────────────────────────────────────────────
authorize ─────────────────► CommandContext (ctx-bound)
◄──────────────────── *exec.Cmd
DrainStderr(cmd) ─────────────────► StderrPipe + goroutine
cmd.Stdout = w │
cmd.Stdin = req.Body │
cmd.Run ─────────────────► fork + exec git
◄──────────────────── stdout streamed
(stderr drained to bounded buf in goroutine)
◄──────────────────── exit
ctx cancel anywhere here ─────────────────► cmd.Cancel = SIGKILL
WaitDelay 250ms
Why we drain stderr: when git-receive-pack writes lots of stderr (failed hook, push to non-existent ref, etc.), an undrained pipe fills the OS buffer (~64 KiB) and the subprocess blocks on writes — and our cmd.Run never returns. The goroutine + capped buffer prevents that.
Why bounded buffer: a malicious or pathological subprocess could otherwise OOM us by spamming stderr. We accept-and-drop after 16 KiB. The captured bytes are surfaced via the closure returned by DrainStderr for logging.
Why WaitDelay: ctx cancel sends SIGKILL, but the subprocess might still be writing pack bytes to a held pipe. Without WaitDelay, cmd.Wait waits indefinitely. 250 ms is enough for clean exit and snappy enough that a worker doesn't get pinned after a client disconnect.
Hook environment
The receive-pack subprocess gets these env vars so S14's post-receive hook can identify the pusher:
| Var | Value |
|---|---|
SHITHUB_USER_ID |
numeric users.id of the auth'd user |
SHITHUB_USERNAME |
the user's lowercase username |
SHITHUB_REPO_ID |
numeric repos.id |
SHITHUB_REPO_FULL_NAME |
<owner>/<repo> |
SHITHUB_PROTOCOL |
http (S13 adds ssh) |
SHITHUB_REMOTE_IP |
client IP from RealIP middleware |
SHITHUB_REQUEST_ID |
request ID from RequestID middleware |
PATH |
inherited from the parent process |
Git propagates the parent environment to its hook subprocesses, so the post-receive script can echo "$SHITHUB_USER_ID" and get the right value.
PATH is included explicitly so git itself can find its sub-helpers when the parent process's PATH is the only sane source.
Body cap
http.MaxBytesReader(w, r.Body, MaxPushBytes) wraps the request body before it reaches the subprocess. Default cap is 2 GiB (configurable via Deps.MaxPushBytes). When the cap is exceeded the next read errors and the subprocess sees stdin EOF; receive-pack returns a non-zero exit and we log it. From the client's POV the git push fails. This is good enough for V1; S36 will add per-repo overrides.
Streaming
The handler writes the Content-Type + cache headers BEFORE cmd.Run. After that, every byte git writes to stdout goes straight to the response writer — Go's http.ResponseWriter flushes by default on Write when the body is not chunked-encoded. chi's default ResponseWriter implements http.Flusher; manual verification via tcpdump was done in dev to confirm bytes leave the server in real time.
Tests
internal/web/handlers/githttp/githttp_test.go — five end-to-end scenarios against a real git CLI:
TestGitHTTP_AnonClonePublic— anon clone of a public repo with one initial commit succeeds;rev-list --count HEAD = 1.TestGitHTTP_AnonClonePrivateFails— anon clone of a private repo fails (withGIT_TERMINAL_PROMPT=0so we don't hang on the credential prompt).TestGitHTTP_PATClonePrivate— clone with the PAT in the URL userinfo succeeds.TestGitHTTP_PATPushRoundtrip— clone, commit, push to private repo with PAT, re-clone elsewhere, verify the new commit is visible.TestGitHTTP_PushToArchivedRejected— archive the repo, attempt push, expect non-zero exit AND "archived" in stderr.
internal/git/protocol/protocol_test.go — pkt-line format, over-length rejection, service advertisement shape, and a context-cancel test that asserts the subprocess dies within 5 seconds.
Tests skip cleanly when SHITHUB_TEST_DATABASE_URL is unset (matches the rest of the suite).
Pitfalls / what to remember
- Never reach for global Compress / Timeout. They're per-group now. If you add a new top-level route group, copy the
r.Use(middleware.Compress)+r.Use(middleware.Timeout(...))pair from the static / app groups. - Never log the Authorization header. S08's redaction is wired globally in the access log; cross-test on git routes if you change the logging stack.
- Username in HTTP Basic is informational. git's credential helper passes whatever the user typed at the "Username for shithub:" prompt. We DON'T require it match the resolved user — for PATs the username is irrelevant, for password it's just the lookup key. Document this in user-facing help when we ship credential setup docs.
- Stderr drain is required, not optional. A bug that disables the drainer eventually deadlocks under load. The bounded buffer is the second backstop.
- Body cap is on the receive-pack side only — but enforced before we open the subprocess. Don't skip the LimitReader because "git will reject malformed packs anyway" — that's true but the malicious payload already hit RAM by then.
- Clones of empty repos work. info/refs returns just the service-advertisement preamble (
001e# service=...\n0000) when the repo has zero refs; git handles the empty case fine.
Open follow-ups (deferred)
- Smart-SSH (S13) reuses the auth/permission shape; the protocol-level differences are in the transport layer, not the authz layer.
- Push processing pipeline (S14) consumes the env vars set here. Today the receive-pack subprocess's hooks dir is empty; S14 wires post-receive.
- Branch protection (S20) lands as a pre-receive hook installed by S14.
- LFS is post-MVP.
- Performance / large packs (S36) will profile the streaming path once we have real-world push sizes.
Related docs
docs/internal/storage.md— RepoFS layout that produces the bare-repo path.docs/internal/repo-create.md— S11's repo creation flow whose output we're serving.docs/internal/auth.md— sessions + recent-2FA gate (irrelevant here; git over HTTP doesn't use sessions).docs/internal/tokens.md— PAT issuance and the hash format.
View source
| 1 | # Smart-HTTP git protocol |
| 2 | |
| 3 | S12 ships `git clone`, `git fetch`, and `git push` over HTTPS. Authentication is HTTP Basic with either a password or a personal access token. The handlers shell out to canonical `git`'s `upload-pack` / `receive-pack` plumbing — no protocol re-implementation. Streams pack data without buffering. Cancels propagate to the subprocess so a closed client doesn't leave git running. |
| 4 | |
| 5 | ## What's wired |
| 6 | |
| 7 | - `internal/git/protocol/pktline.go` — minimal pkt-line writer used to prepend the `# service=...` header to the info/refs response. |
| 8 | - `internal/git/protocol/exec.go` — `Cmd(ctx, svc, gitDir, advertiseRefs, env)` builds the `*exec.Cmd` for `git-{upload,receive}-pack --stateless-rpc [--advertise-refs] <repo>`. `DrainStderr` runs a goroutine that copies stderr into a 16 KiB capped buffer; the OS pipe never fills, the caller's stdout copy never deadlocks. `WaitDelay = 250ms` so a stuck subprocess doesn't pin a worker after ctx cancel. |
| 9 | - `internal/web/handlers/githttp/auth.go` — `resolveBasicAuth(ctx, header)` parses HTTP Basic, prefers the PAT path when the secret carries the canonical `shithub_pat_` prefix, and falls back to argon2id password verification. Constant-time discipline: missing username runs `password.VerifyAgainstDummy` so timing doesn't leak existence. |
| 10 | - `internal/web/handlers/githttp/handler.go` — the four route bodies + inline owner-only authorization. `MountSmartHTTP(r)` registers them on a CSRF/timeout/compression-exempt route group. |
| 11 | - `internal/web/githttp_wiring.go` — `buildGitHTTPHandlers` constructs the handler set from cfg + pool. |
| 12 | - `internal/web/handlers/handlers.go` — `GitHTTPMounter` Dep field; the route group sits alongside the static / CSRF-protected groups. |
| 13 | - `internal/web/server.go` — Compress + Timeout no longer global; they're applied per-group in handlers.go so the git group can stream uncompressed for many minutes. |
| 14 | |
| 15 | ## Routes |
| 16 | |
| 17 | All four are registered without CSRF, without response compression, and without the global request timeout. Request bodies on the POST endpoints are capped at `Deps.MaxPushBytes` (default 2 GiB) via `http.MaxBytesReader`. |
| 18 | |
| 19 | | Route | Method | Notes | |
| 20 | |---|---|---| |
| 21 | | `/{owner}/{repo}.git/info/refs?service=git-upload-pack` | GET | public allowed; private 401 with `WWW-Authenticate: Basic realm="shithub"` | |
| 22 | | `/{owner}/{repo}.git/info/refs?service=git-receive-pack` | GET | always 401 if anon; otherwise inline owner-only check | |
| 23 | | `/{owner}/{repo}.git/git-upload-pack` | POST | streams `req.Body` → upload-pack stdin → `w` | |
| 24 | | `/{owner}/{repo}.git/git-receive-pack` | POST | same shape; sets `SHITHUB_*` env vars on the subprocess (S14 hooks consume them) | |
| 25 | |
| 26 | ## Auth shape |
| 27 | |
| 28 | ``` |
| 29 | header == "" |
| 30 | │ |
| 31 | ▼ |
| 32 | Anonymous=true ──┐ |
| 33 | │ |
| 34 | header has Basic creds → decode user:secret |
| 35 | │ |
| 36 | secret has "shithub_pat_" prefix? |
| 37 | ┌───────────yes────────────┐ no |
| 38 | ▼ ▼ │ |
| 39 | resolveViaPAT resolveViaPassword |
| 40 | │ │ │ |
| 41 | found? username found? & pw ok? |
| 42 | ┌─────yes──┐ ┌─────yes──┐ |
| 43 | ▼ ▼ ▼ ▼ |
| 44 | ResolvedAuth fall ResolvedAuth errBadCredentials |
| 45 | through to (constant-time discipline: |
| 46 | password path runs VerifyAgainstDummy |
| 47 | when user lookup fails) |
| 48 | ``` |
| 49 | |
| 50 | The ResolvedAuth result carries `UserID`, `Username`, and `ViaPAT`. Anonymous resolves only when the header is missing entirely — bad credentials always become `errBadCredentials` regardless of which path failed, so callers can't probe for valid usernames or distinguish "wrong password" from "non-existent user." |
| 51 | |
| 52 | ## Permissions (S15 will refactor) |
| 53 | |
| 54 | Inline V1 rules: |
| 55 | |
| 56 | - **Read** (upload-pack): |
| 57 | - public repo → anonymous OK; auth'd user OK as long as not suspended. |
| 58 | - private repo → must be the owner. |
| 59 | - **Write** (receive-pack): |
| 60 | - must be authenticated. |
| 61 | - must be the owner. |
| 62 | - repo must NOT be archived (we surface "repository is archived; pushes are disabled" so `git push`'s stderr shows it). |
| 63 | - repo must NOT be soft-deleted (410 Gone). |
| 64 | |
| 65 | S15 lands `policy.Can` — this whole block becomes a single function call. |
| 66 | |
| 67 | ## Subprocess lifecycle |
| 68 | |
| 69 | ``` |
| 70 | http handler protocol.Cmd git |
| 71 | ───────────────────────────────────────────────────────────────── |
| 72 | authorize ─────────────────► CommandContext (ctx-bound) |
| 73 | ◄──────────────────── *exec.Cmd |
| 74 | DrainStderr(cmd) ─────────────────► StderrPipe + goroutine |
| 75 | cmd.Stdout = w │ |
| 76 | cmd.Stdin = req.Body │ |
| 77 | cmd.Run ─────────────────► fork + exec git |
| 78 | ◄──────────────────── stdout streamed |
| 79 | (stderr drained to bounded buf in goroutine) |
| 80 | ◄──────────────────── exit |
| 81 | ctx cancel anywhere here ─────────────────► cmd.Cancel = SIGKILL |
| 82 | WaitDelay 250ms |
| 83 | ``` |
| 84 | |
| 85 | Why we drain stderr: when git-receive-pack writes lots of stderr (failed hook, push to non-existent ref, etc.), an undrained pipe fills the OS buffer (~64 KiB) and the subprocess blocks on writes — and our `cmd.Run` never returns. The goroutine + capped buffer prevents that. |
| 86 | |
| 87 | Why bounded buffer: a malicious or pathological subprocess could otherwise OOM us by spamming stderr. We accept-and-drop after 16 KiB. The captured bytes are surfaced via the closure returned by `DrainStderr` for logging. |
| 88 | |
| 89 | Why `WaitDelay`: ctx cancel sends SIGKILL, but the subprocess might still be writing pack bytes to a held pipe. Without `WaitDelay`, `cmd.Wait` waits indefinitely. 250 ms is enough for clean exit and snappy enough that a worker doesn't get pinned after a client disconnect. |
| 90 | |
| 91 | ## Hook environment |
| 92 | |
| 93 | The receive-pack subprocess gets these env vars so S14's post-receive hook can identify the pusher: |
| 94 | |
| 95 | | Var | Value | |
| 96 | |---|---| |
| 97 | | `SHITHUB_USER_ID` | numeric users.id of the auth'd user | |
| 98 | | `SHITHUB_USERNAME` | the user's lowercase username | |
| 99 | | `SHITHUB_REPO_ID` | numeric repos.id | |
| 100 | | `SHITHUB_REPO_FULL_NAME` | `<owner>/<repo>` | |
| 101 | | `SHITHUB_PROTOCOL` | `http` (S13 adds `ssh`) | |
| 102 | | `SHITHUB_REMOTE_IP` | client IP from RealIP middleware | |
| 103 | | `SHITHUB_REQUEST_ID` | request ID from RequestID middleware | |
| 104 | | `PATH` | inherited from the parent process | |
| 105 | |
| 106 | Git propagates the parent environment to its hook subprocesses, so the post-receive script can `echo "$SHITHUB_USER_ID"` and get the right value. |
| 107 | |
| 108 | `PATH` is included explicitly so `git` itself can find its sub-helpers when the parent process's PATH is the only sane source. |
| 109 | |
| 110 | ## Body cap |
| 111 | |
| 112 | `http.MaxBytesReader(w, r.Body, MaxPushBytes)` wraps the request body before it reaches the subprocess. Default cap is 2 GiB (configurable via `Deps.MaxPushBytes`). When the cap is exceeded the next read errors and the subprocess sees stdin EOF; receive-pack returns a non-zero exit and we log it. From the client's POV the `git push` fails. This is good enough for V1; S36 will add per-repo overrides. |
| 113 | |
| 114 | ## Streaming |
| 115 | |
| 116 | The handler writes the `Content-Type` + cache headers BEFORE `cmd.Run`. After that, every byte git writes to stdout goes straight to the response writer — Go's `http.ResponseWriter` flushes by default on Write when the body is not chunked-encoded. `chi`'s default `ResponseWriter` implements `http.Flusher`; manual verification via `tcpdump` was done in dev to confirm bytes leave the server in real time. |
| 117 | |
| 118 | ## Tests |
| 119 | |
| 120 | `internal/web/handlers/githttp/githttp_test.go` — five end-to-end scenarios against a real `git` CLI: |
| 121 | |
| 122 | - `TestGitHTTP_AnonClonePublic` — anon clone of a public repo with one initial commit succeeds; `rev-list --count HEAD = 1`. |
| 123 | - `TestGitHTTP_AnonClonePrivateFails` — anon clone of a private repo fails (with `GIT_TERMINAL_PROMPT=0` so we don't hang on the credential prompt). |
| 124 | - `TestGitHTTP_PATClonePrivate` — clone with the PAT in the URL userinfo succeeds. |
| 125 | - `TestGitHTTP_PATPushRoundtrip` — clone, commit, push to private repo with PAT, re-clone elsewhere, verify the new commit is visible. |
| 126 | - `TestGitHTTP_PushToArchivedRejected` — archive the repo, attempt push, expect non-zero exit AND "archived" in stderr. |
| 127 | |
| 128 | `internal/git/protocol/protocol_test.go` — pkt-line format, over-length rejection, service advertisement shape, and a context-cancel test that asserts the subprocess dies within 5 seconds. |
| 129 | |
| 130 | Tests skip cleanly when `SHITHUB_TEST_DATABASE_URL` is unset (matches the rest of the suite). |
| 131 | |
| 132 | ## Pitfalls / what to remember |
| 133 | |
| 134 | - **Never reach for global Compress / Timeout.** They're per-group now. If you add a new top-level route group, copy the `r.Use(middleware.Compress)` + `r.Use(middleware.Timeout(...))` pair from the static / app groups. |
| 135 | - **Never log the Authorization header.** S08's redaction is wired globally in the access log; cross-test on git routes if you change the logging stack. |
| 136 | - **Username in HTTP Basic is informational.** git's credential helper passes whatever the user typed at the "Username for shithub:" prompt. We DON'T require it match the resolved user — for PATs the username is irrelevant, for password it's just the lookup key. Document this in user-facing help when we ship credential setup docs. |
| 137 | - **Stderr drain is required, not optional.** A bug that disables the drainer eventually deadlocks under load. The bounded buffer is the second backstop. |
| 138 | - **Body cap is on the receive-pack side only — but enforced before we open the subprocess.** Don't skip the LimitReader because "git will reject malformed packs anyway" — that's true but the malicious payload already hit RAM by then. |
| 139 | - **Clones of empty repos work.** info/refs returns just the service-advertisement preamble (`001e# service=...\n0000`) when the repo has zero refs; git handles the empty case fine. |
| 140 | |
| 141 | ## Open follow-ups (deferred) |
| 142 | |
| 143 | - **Smart-SSH (S13)** reuses the auth/permission shape; the protocol-level differences are in the transport layer, not the authz layer. |
| 144 | - **Push processing pipeline (S14)** consumes the env vars set here. Today the receive-pack subprocess's hooks dir is empty; S14 wires post-receive. |
| 145 | - **Branch protection (S20)** lands as a pre-receive hook installed by S14. |
| 146 | - **LFS** is post-MVP. |
| 147 | - **Performance / large packs (S36)** will profile the streaming path once we have real-world push sizes. |
| 148 | |
| 149 | ## Related docs |
| 150 | |
| 151 | - `docs/internal/storage.md` — RepoFS layout that produces the bare-repo path. |
| 152 | - `docs/internal/repo-create.md` — S11's repo creation flow whose output we're serving. |
| 153 | - `docs/internal/auth.md` — sessions + recent-2FA gate (irrelevant here; git over HTTP doesn't use sessions). |
| 154 | - `docs/internal/tokens.md` — PAT issuance and the hash format. |