SSH git protocol
S13 lights up git clone git@host:owner/repo.git, git fetch, and git push over SSH. The S07 ssh-shell placeholder is now the real dispatcher: it parses SSH_ORIGINAL_COMMAND, resolves user + repo against the DB, runs an inline owner-only authz, sets the SHITHUB_* hook environment, closes the DB pool, and syscall.Execs the canonical git-upload-pack or git-receive-pack binary. After exec, sshd's stdin/stdout/stderr flow directly to git.
What's wired
internal/git/protocol/ssh_command.go—ParseSSHCommandstrict-parsesSSH_ORIGINAL_COMMANDintoService + Owner + Repo. Whitelist regex; matched-quote alternation ('a/b'ora/bonly —a/b'rejected); single trailing.gitstripped; owner + repo run through the storage-shape rules.internal/git/protocol/ssh_dispatch.go—PrepareDispatchdoes the DB + authz + env work. Returns the binary path, argv, and env vector the caller needs to exec git. Also exposesFriendlyMessageFor(err, requestID)so the caller writes a consistent stderr line andParseRemoteIP(SSH_CONNECTION)for the env var.cmd/shithubd/ssh.go::sshShellCmd— the cobra command sshd invokes. Replaces the S07 stub. Manages the DB pool lifecycle and callssyscall.Exec.
The ssh-authkeys flow from S07 (the AuthorizedKeysCommand handler) is unchanged; it still emits the command="shithubd ssh-shell <user_id>",no-pty,… line, which is what binds an inbound SSH connection to a known user before this dispatcher ever runs.
End-to-end flow
client sshd shithubd
──────────────────────────────────────────────────────────────────────
ssh git@host ───►
git-upload-pack 'alice/foo.git'
ssh-authkeys SHA256:…
◄─── command="shithubd ssh-shell 42",no-pty,…
───── pubkey auth OK
fork; setenv SSH_ORIGINAL_COMMAND, SSH_CONNECTION
exec shithubd ssh-shell 42
│
├─ ParseSSHCommand
├─ DB pool (max 2)
├─ GetUserByID, GetUserByUsername, GetRepoByOwnerUserAndName
├─ Inline authz
├─ Build SHITHUB_* env
├─ pool.Close()
└─ syscall.Exec git-upload-pack /data/repos/al/alice/foo.git
│
▼
git pack protocol
(sshd's stdio is now git's)
Strict command parsing
The whitelist regex is intentionally narrow:
^(git-upload-pack|git-receive-pack)\s+(?:'([^']+)'|([^'\s]+))$
The path group is one of two alternatives — quoted-with-matching-quotes OR no-quotes-and-no-spaces. Anything else (mismatched quotes, escaped quotes, command chaining, extra whitespace, alternate services like git-archive) returns ErrUnknownSSHCommand and the dispatcher writes "shithub does not allow shell access" to stderr.
After parse, the path goes through:
- Strip a single trailing
.git. - Split on the first
/. - Lowercase both halves (the storage layer is lowercase-only).
- Validate
owneragainst^[a-z0-9](?:[a-z0-9-]{0,37}[a-z0-9])?$(S05 username shape). - Validate
repoagainst^[a-z0-9](?:[a-z0-9._-]{0,98}[a-z0-9_])?$(S11 repo shape). - Reject if either contains
.., leading dot, or extra slashes.
These validators live inside the protocol package rather than importing internal/infra/storage so that an SSH connection — which runs a brand-new shithubd process every time — doesn't pay the storage package's init cost.
Authorization (S15 will refactor)
Inline V1 rules, mirroring the HTTP path:
- upload-pack:
- public repo → any non-suspended user.
- private repo → must be the owner; non-owners get the not-found message (no existence leak).
- receive-pack:
- must be the owner.
- repo not archived (else
MsgArchived). - repo not soft-deleted (else
MsgRepoNotFound).
Suspended or soft-deleted users are rejected up-front with MsgSuspended.
S15 lifts these into policy.Can(actor, action, resource).
Hook environment
Same as HTTP, plus protocol distinction:
| Var | Value |
|---|---|
SHITHUB_USER_ID |
numeric users.id of the authenticated client |
SHITHUB_USERNAME |
the user's lowercase username |
SHITHUB_REPO_ID |
numeric repos.id |
SHITHUB_REPO_FULL_NAME |
<owner>/<repo> (lowercased) |
SHITHUB_PROTOCOL |
ssh |
SHITHUB_REMOTE_IP |
first field of SSH_CONNECTION |
SHITHUB_REQUEST_ID |
freshly generated 16-byte hex token |
PATH |
inherited from the parent process so git's helpers resolve |
Because git propagates its environment to its hook subprocesses, S14's post-receive can echo "$SHITHUB_USER_ID" and get the right value.
PATH is included so git-receive-pack can find git-pack-objects, git-index-pack, and friends.
syscall.Exec discipline
Three things are non-negotiable here:
- Close the DB pool BEFORE
syscall.Exec. Go'sdeferdoes NOT fire onsyscall.Exec— the OS replaces the current process image and the runtime never returns. The pgx pool's TCP connections to Postgres would otherwise leak into git's FD table. We close explicitly. - Resolve the binary via
exec.LookPath.syscall.Execrequires an absolute path.exec.LookPath("git-upload-pack")walks$PATHand returns the first match. If git isn't on the daemon user's$PATH, the operator seesshithub: server misconfiguredin their git client. - No writes to stdout AFTER the exec call. If the exec call returns, it failed (typically EACCES on the binary or ENOENT). We write a friendly error and exit non-zero. On success we never see another instruction in this process.
sysExec is a package-level var pointing at syscall.Exec so unit tests can stub it. The production binding takes the //nolint:gosec for G204 because the inputs are constrained: bin is the LookPath result of a fixed service name, argv[1] is a sanitized path from storage.RepoFS.
Friendly error catalogue
| Error sentinel | Stderr line |
|---|---|
ErrSSHRepoNotFound |
shithub: repository not found |
ErrSSHPermDenied |
shithub: permission denied |
ErrSSHArchived |
shithub: this repository is archived; pushes are disabled |
ErrSSHSuspended |
shithub: your account is suspended |
ErrUnknownSSHCommand |
shithub does not allow shell access |
ErrInvalidSSHPath |
shithub: repository not found (collapses to the same message — bad paths are indistinguishable from missing repos) |
| (anything else) | shithub: internal error (request_id=<hex>) |
The friendly message is what the user sees in git's output. The structured slog event is what the operator reads — it logs at WARN for denials and INFO for successful dispatches with (user_id, op, owner, repo, remote_ip) fields.
Tests
internal/git/protocol/ssh_command_test.go — accepts (quoted, unquoted, dotted names, hyphenated names, .git suffix), rejects (wrong service, mismatched quotes, command injection attempts, leading/trailing whitespace, absolute paths, dot-dot, leading dot in repo name).
internal/git/protocol/ssh_dispatch_test.go (DB-integration):
TestDispatch_PublicCloneByOwner— full round-trip, asserts on argv path + env vars.TestDispatch_PublicCloneByOther— non-owner pull of public repo OK.TestDispatch_PrivateCloneByOtherIsNotFound— non-owner seesErrSSHRepoNotFound.TestDispatch_PushByNonOwnerIsPermDenied— non-owner push →ErrSSHPermDenied.TestDispatch_PushToArchivedIsArchived.TestDispatch_SuspendedUserSuspended.TestDispatch_UnknownCommandIsRejected.TestFriendlyMessageForcovers the message catalogue.TestParseRemoteIPcoversSSH_CONNECTIONparsing edge cases.
We don't unit-test the cobra command's syscall.Exec path; the whole point of syscall.Exec is that the test process would be replaced. The dispatcher tests cover everything up to the exec call; the deploy doc has a manual smoke-test recipe.
Deploy checklist
Reviewed once before going live:
- The shithubd binary runs as a dedicated
shithub-sshsystem user. - That user has read+write on
/data/repos/. - That user's
$PATHincludesgit-upload-packandgit-receive-pack(typically/usr/lib/git-core/). - sshd's
AuthorizedKeysCommandpoints atshithubd ssh-authkeys. - sshd's
AuthorizedKeysCommandUseris set toshithub-ssh(NOT root). - The
command=directive in the AKC line forcesshithubd ssh-shell <user_id>— a user can't bypass it via~/.ssh/authorized_keysbecause the AKC is the only key source we trust.
Failure mode for missing config: any unset cfg value (cfg.DB.URL, cfg.Storage.ReposRoot) lights up shithub: server misconfigured in the user's git client. Operator reads the structured slog line on the daemon side.
Pitfalls / what to remember
- Strict command parsing is the security boundary. Don't relax the regex without writing a fresh fixture set. Shell injection attempts are caught here, not later.
- Path validation is the second boundary.
.., absolute paths, and embedded slashes are caught BEFORE we ever construct a filesystem path. syscall.Execdoesn't run defers. Anything that needs cleanup runs explicitly before the exec call. Today that's just the pool; if you add a tempfile or a flock, close it first.- Stdin/stdout flow directly to git after exec. sshd doesn't insert a layer — what git writes is what the client reads. No buffering hooks possible after this point; if you need to inspect bytes, do it before exec.
- No environment leak. We replace the entire env with the curated
SHITHUB_*set +PATH.SSH_CONNECTION,SSH_ORIGINAL_COMMAND, etc. are not propagated to git. - Concurrent pushes serialize on
refs/...lock. Multiple SSH connections pushing to the same ref take turns; different refs go in parallel. This is git's own behavior, not ours.
Open follow-ups (deferred)
- SSH certificates / CA-signed user keys are post-MVP — today it's strictly key-by-key.
git-lfsover SSH is post-MVP; the dispatcher rejectsgit-lfs-authenticate(and anything else) as unknown.- Pull-mirror / federation is post-MVP.
- S14 wires post-receive hooks; this dispatcher is the source of the env vars they read.
- S15 refactors the inline owner-only check into
policy.Can. Both this dispatcher and the HTTP handler swap atomically.
Related docs
docs/internal/git-http.md— the parallel HTTP transport, same auth/permission shape.docs/internal/repo-create.md— repo creation flow whose output we serve.docs/internal/storage.md—RepoFSlayout that produces the bare-repo path.
View source
| 1 | # SSH git protocol |
| 2 | |
| 3 | S13 lights up `git clone git@host:owner/repo.git`, `git fetch`, and `git push` over SSH. The S07 `ssh-shell` placeholder is now the real dispatcher: it parses `SSH_ORIGINAL_COMMAND`, resolves user + repo against the DB, runs an inline owner-only authz, sets the `SHITHUB_*` hook environment, closes the DB pool, and `syscall.Exec`s the canonical `git-upload-pack` or `git-receive-pack` binary. After exec, sshd's stdin/stdout/stderr flow directly to git. |
| 4 | |
| 5 | ## What's wired |
| 6 | |
| 7 | - `internal/git/protocol/ssh_command.go` — `ParseSSHCommand` strict-parses `SSH_ORIGINAL_COMMAND` into `Service + Owner + Repo`. Whitelist regex; matched-quote alternation (`'a/b'` or `a/b` only — `a/b'` rejected); single trailing `.git` stripped; owner + repo run through the storage-shape rules. |
| 8 | - `internal/git/protocol/ssh_dispatch.go` — `PrepareDispatch` does the DB + authz + env work. Returns the binary path, argv, and env vector the caller needs to exec git. Also exposes `FriendlyMessageFor(err, requestID)` so the caller writes a consistent stderr line and `ParseRemoteIP(SSH_CONNECTION)` for the env var. |
| 9 | - `cmd/shithubd/ssh.go::sshShellCmd` — the cobra command sshd invokes. Replaces the S07 stub. Manages the DB pool lifecycle and calls `syscall.Exec`. |
| 10 | |
| 11 | The `ssh-authkeys` flow from S07 (the `AuthorizedKeysCommand` handler) is unchanged; it still emits the `command="shithubd ssh-shell <user_id>",no-pty,…` line, which is what binds an inbound SSH connection to a known user before this dispatcher ever runs. |
| 12 | |
| 13 | ## End-to-end flow |
| 14 | |
| 15 | ``` |
| 16 | client sshd shithubd |
| 17 | ────────────────────────────────────────────────────────────────────── |
| 18 | ssh git@host ───► |
| 19 | git-upload-pack 'alice/foo.git' |
| 20 | ssh-authkeys SHA256:… |
| 21 | ◄─── command="shithubd ssh-shell 42",no-pty,… |
| 22 | ───── pubkey auth OK |
| 23 | fork; setenv SSH_ORIGINAL_COMMAND, SSH_CONNECTION |
| 24 | exec shithubd ssh-shell 42 |
| 25 | │ |
| 26 | ├─ ParseSSHCommand |
| 27 | ├─ DB pool (max 2) |
| 28 | ├─ GetUserByID, GetUserByUsername, GetRepoByOwnerUserAndName |
| 29 | ├─ Inline authz |
| 30 | ├─ Build SHITHUB_* env |
| 31 | ├─ pool.Close() |
| 32 | └─ syscall.Exec git-upload-pack /data/repos/al/alice/foo.git |
| 33 | │ |
| 34 | ▼ |
| 35 | git pack protocol |
| 36 | (sshd's stdio is now git's) |
| 37 | ``` |
| 38 | |
| 39 | ## Strict command parsing |
| 40 | |
| 41 | The whitelist regex is intentionally narrow: |
| 42 | |
| 43 | ``` |
| 44 | ^(git-upload-pack|git-receive-pack)\s+(?:'([^']+)'|([^'\s]+))$ |
| 45 | ``` |
| 46 | |
| 47 | The path group is one of two alternatives — quoted-with-matching-quotes OR no-quotes-and-no-spaces. Anything else (mismatched quotes, escaped quotes, command chaining, extra whitespace, alternate services like `git-archive`) returns `ErrUnknownSSHCommand` and the dispatcher writes "shithub does not allow shell access" to stderr. |
| 48 | |
| 49 | After parse, the path goes through: |
| 50 | |
| 51 | 1. Strip a single trailing `.git`. |
| 52 | 2. Split on the first `/`. |
| 53 | 3. Lowercase both halves (the storage layer is lowercase-only). |
| 54 | 4. Validate `owner` against `^[a-z0-9](?:[a-z0-9-]{0,37}[a-z0-9])?$` (S05 username shape). |
| 55 | 5. Validate `repo` against `^[a-z0-9](?:[a-z0-9._-]{0,98}[a-z0-9_])?$` (S11 repo shape). |
| 56 | 6. Reject if either contains `..`, leading dot, or extra slashes. |
| 57 | |
| 58 | These validators live inside the `protocol` package rather than importing `internal/infra/storage` so that an SSH connection — which runs a brand-new shithubd process every time — doesn't pay the storage package's init cost. |
| 59 | |
| 60 | ## Authorization (S15 will refactor) |
| 61 | |
| 62 | Inline V1 rules, mirroring the HTTP path: |
| 63 | |
| 64 | - **upload-pack**: |
| 65 | - public repo → any non-suspended user. |
| 66 | - private repo → must be the owner; non-owners get the not-found message (no existence leak). |
| 67 | - **receive-pack**: |
| 68 | - must be the owner. |
| 69 | - repo not archived (else `MsgArchived`). |
| 70 | - repo not soft-deleted (else `MsgRepoNotFound`). |
| 71 | |
| 72 | Suspended or soft-deleted users are rejected up-front with `MsgSuspended`. |
| 73 | |
| 74 | S15 lifts these into `policy.Can(actor, action, resource)`. |
| 75 | |
| 76 | ## Hook environment |
| 77 | |
| 78 | Same as HTTP, plus protocol distinction: |
| 79 | |
| 80 | | Var | Value | |
| 81 | |---|---| |
| 82 | | `SHITHUB_USER_ID` | numeric `users.id` of the authenticated client | |
| 83 | | `SHITHUB_USERNAME` | the user's lowercase username | |
| 84 | | `SHITHUB_REPO_ID` | numeric `repos.id` | |
| 85 | | `SHITHUB_REPO_FULL_NAME` | `<owner>/<repo>` (lowercased) | |
| 86 | | `SHITHUB_PROTOCOL` | `ssh` | |
| 87 | | `SHITHUB_REMOTE_IP` | first field of `SSH_CONNECTION` | |
| 88 | | `SHITHUB_REQUEST_ID` | freshly generated 16-byte hex token | |
| 89 | | `PATH` | inherited from the parent process so git's helpers resolve | |
| 90 | |
| 91 | Because git propagates its environment to its hook subprocesses, S14's post-receive can `echo "$SHITHUB_USER_ID"` and get the right value. |
| 92 | |
| 93 | `PATH` is included so `git-receive-pack` can find `git-pack-objects`, `git-index-pack`, and friends. |
| 94 | |
| 95 | ## syscall.Exec discipline |
| 96 | |
| 97 | Three things are non-negotiable here: |
| 98 | |
| 99 | 1. **Close the DB pool BEFORE `syscall.Exec`.** Go's `defer` does NOT fire on `syscall.Exec` — the OS replaces the current process image and the runtime never returns. The pgx pool's TCP connections to Postgres would otherwise leak into git's FD table. We close explicitly. |
| 100 | 2. **Resolve the binary via `exec.LookPath`.** `syscall.Exec` requires an absolute path. `exec.LookPath("git-upload-pack")` walks `$PATH` and returns the first match. If git isn't on the daemon user's `$PATH`, the operator sees `shithub: server misconfigured` in their git client. |
| 101 | 3. **No writes to stdout AFTER the exec call.** If the exec call returns, it failed (typically EACCES on the binary or ENOENT). We write a friendly error and exit non-zero. On success we never see another instruction in this process. |
| 102 | |
| 103 | `sysExec` is a package-level var pointing at `syscall.Exec` so unit tests can stub it. The production binding takes the `//nolint:gosec` for G204 because the inputs are constrained: `bin` is the LookPath result of a fixed service name, `argv[1]` is a sanitized path from `storage.RepoFS`. |
| 104 | |
| 105 | ## Friendly error catalogue |
| 106 | |
| 107 | | Error sentinel | Stderr line | |
| 108 | |---|---| |
| 109 | | `ErrSSHRepoNotFound` | `shithub: repository not found` | |
| 110 | | `ErrSSHPermDenied` | `shithub: permission denied` | |
| 111 | | `ErrSSHArchived` | `shithub: this repository is archived; pushes are disabled` | |
| 112 | | `ErrSSHSuspended` | `shithub: your account is suspended` | |
| 113 | | `ErrUnknownSSHCommand` | `shithub does not allow shell access` | |
| 114 | | `ErrInvalidSSHPath` | `shithub: repository not found` (collapses to the same message — bad paths are indistinguishable from missing repos) | |
| 115 | | (anything else) | `shithub: internal error (request_id=<hex>)` | |
| 116 | |
| 117 | The friendly message is what the user sees in `git`'s output. The structured slog event is what the operator reads — it logs at `WARN` for denials and `INFO` for successful dispatches with `(user_id, op, owner, repo, remote_ip)` fields. |
| 118 | |
| 119 | ## Tests |
| 120 | |
| 121 | `internal/git/protocol/ssh_command_test.go` — accepts (quoted, unquoted, dotted names, hyphenated names, `.git` suffix), rejects (wrong service, mismatched quotes, command injection attempts, leading/trailing whitespace, absolute paths, dot-dot, leading dot in repo name). |
| 122 | |
| 123 | `internal/git/protocol/ssh_dispatch_test.go` (DB-integration): |
| 124 | - `TestDispatch_PublicCloneByOwner` — full round-trip, asserts on argv path + env vars. |
| 125 | - `TestDispatch_PublicCloneByOther` — non-owner pull of public repo OK. |
| 126 | - `TestDispatch_PrivateCloneByOtherIsNotFound` — non-owner sees `ErrSSHRepoNotFound`. |
| 127 | - `TestDispatch_PushByNonOwnerIsPermDenied` — non-owner push → `ErrSSHPermDenied`. |
| 128 | - `TestDispatch_PushToArchivedIsArchived`. |
| 129 | - `TestDispatch_SuspendedUserSuspended`. |
| 130 | - `TestDispatch_UnknownCommandIsRejected`. |
| 131 | - `TestFriendlyMessageFor` covers the message catalogue. |
| 132 | - `TestParseRemoteIP` covers `SSH_CONNECTION` parsing edge cases. |
| 133 | |
| 134 | We don't unit-test the cobra command's `syscall.Exec` path; the whole point of `syscall.Exec` is that the test process would be replaced. The dispatcher tests cover everything up to the exec call; the deploy doc has a manual smoke-test recipe. |
| 135 | |
| 136 | ## Deploy checklist |
| 137 | |
| 138 | Reviewed once before going live: |
| 139 | |
| 140 | - The shithubd binary runs as a dedicated `shithub-ssh` system user. |
| 141 | - That user has read+write on `/data/repos/`. |
| 142 | - That user's `$PATH` includes `git-upload-pack` and `git-receive-pack` (typically `/usr/lib/git-core/`). |
| 143 | - sshd's `AuthorizedKeysCommand` points at `shithubd ssh-authkeys`. |
| 144 | - sshd's `AuthorizedKeysCommandUser` is set to `shithub-ssh` (NOT root). |
| 145 | - The `command=` directive in the AKC line forces `shithubd ssh-shell <user_id>` — a user can't bypass it via `~/.ssh/authorized_keys` because the AKC is the only key source we trust. |
| 146 | |
| 147 | Failure mode for missing config: any unset cfg value (`cfg.DB.URL`, `cfg.Storage.ReposRoot`) lights up `shithub: server misconfigured` in the user's git client. Operator reads the structured slog line on the daemon side. |
| 148 | |
| 149 | ## Pitfalls / what to remember |
| 150 | |
| 151 | - **Strict command parsing is the security boundary.** Don't relax the regex without writing a fresh fixture set. Shell injection attempts are caught here, not later. |
| 152 | - **Path validation is the second boundary.** `..`, absolute paths, and embedded slashes are caught BEFORE we ever construct a filesystem path. |
| 153 | - **`syscall.Exec` doesn't run defers.** Anything that needs cleanup runs explicitly before the exec call. Today that's just the pool; if you add a tempfile or a flock, close it first. |
| 154 | - **Stdin/stdout flow directly to git after exec.** sshd doesn't insert a layer — what git writes is what the client reads. No buffering hooks possible after this point; if you need to inspect bytes, do it before exec. |
| 155 | - **No environment leak.** We replace the entire env with the curated `SHITHUB_*` set + `PATH`. `SSH_CONNECTION`, `SSH_ORIGINAL_COMMAND`, etc. are not propagated to git. |
| 156 | - **Concurrent pushes serialize on `refs/...lock`.** Multiple SSH connections pushing to the same ref take turns; different refs go in parallel. This is git's own behavior, not ours. |
| 157 | |
| 158 | ## Open follow-ups (deferred) |
| 159 | |
| 160 | - **SSH certificates / CA-signed user keys** are post-MVP — today it's strictly key-by-key. |
| 161 | - **`git-lfs` over SSH** is post-MVP; the dispatcher rejects `git-lfs-authenticate` (and anything else) as unknown. |
| 162 | - **Pull-mirror / federation** is post-MVP. |
| 163 | - **S14** wires post-receive hooks; this dispatcher is the source of the env vars they read. |
| 164 | - **S15** refactors the inline owner-only check into `policy.Can`. Both this dispatcher and the HTTP handler swap atomically. |
| 165 | |
| 166 | ## Related docs |
| 167 | |
| 168 | - `docs/internal/git-http.md` — the parallel HTTP transport, same auth/permission shape. |
| 169 | - `docs/internal/repo-create.md` — repo creation flow whose output we serve. |
| 170 | - `docs/internal/storage.md` — `RepoFS` layout that produces the bare-repo path. |