markdown · 11650 bytes Raw Blame History

SSH git protocol

S13 lights up git clone git@host:owner/repo.git, git fetch, and git push over SSH. The S07 ssh-shell placeholder is now the real dispatcher: it parses SSH_ORIGINAL_COMMAND, resolves user + repo against the DB, runs an inline owner-only authz, sets the SHITHUB_* hook environment, closes the DB pool, and syscall.Execs the canonical git-upload-pack or git-receive-pack binary. After exec, sshd's stdin/stdout/stderr flow directly to git.

What's wired

  • internal/git/protocol/ssh_command.goParseSSHCommand strict-parses SSH_ORIGINAL_COMMAND into Service + Owner + Repo. Whitelist regex; matched-quote alternation ('a/b' or a/b only — a/b' rejected); single trailing .git stripped; owner + repo run through the storage-shape rules.
  • internal/git/protocol/ssh_dispatch.goPrepareDispatch does the DB + authz + env work. Returns the binary path, argv, and env vector the caller needs to exec git. Also exposes FriendlyMessageFor(err, requestID) so the caller writes a consistent stderr line and ParseRemoteIP(SSH_CONNECTION) for the env var.
  • cmd/shithubd/ssh.go::sshShellCmd — the cobra command sshd invokes. Replaces the S07 stub. Manages the DB pool lifecycle and calls syscall.Exec.

The ssh-authkeys flow from S07 (the AuthorizedKeysCommand handler) is unchanged; it still emits the command="shithubd ssh-shell <user_id>",no-pty,… line, which is what binds an inbound SSH connection to a known user before this dispatcher ever runs.

End-to-end flow

client                          sshd                       shithubd
──────────────────────────────────────────────────────────────────────
ssh git@host                    ───►
git-upload-pack 'alice/foo.git'
                                ssh-authkeys SHA256:…
                                ◄─── command="shithubd ssh-shell 42",no-pty,…
                                ───── pubkey auth OK
                                fork; setenv SSH_ORIGINAL_COMMAND, SSH_CONNECTION
                                exec shithubd ssh-shell 42
                                                            │
                                                            ├─ ParseSSHCommand
                                                            ├─ DB pool (max 2)
                                                            ├─ GetUserByID, GetUserByUsername, GetRepoByOwnerUserAndName
                                                            ├─ Inline authz
                                                            ├─ Build SHITHUB_* env
                                                            ├─ pool.Close()
                                                            └─ syscall.Exec git-upload-pack /data/repos/al/alice/foo.git
                                                                              │
                                                                              ▼
                                                                       git pack protocol
                                                                       (sshd's stdio is now git's)

Strict command parsing

The whitelist regex is intentionally narrow:

^(git-upload-pack|git-receive-pack)\s+(?:'([^']+)'|([^'\s]+))$

The path group is one of two alternatives — quoted-with-matching-quotes OR no-quotes-and-no-spaces. Anything else (mismatched quotes, escaped quotes, command chaining, extra whitespace, alternate services like git-archive) returns ErrUnknownSSHCommand and the dispatcher writes "shithub does not allow shell access" to stderr.

After parse, the path goes through:

  1. Strip a single trailing .git.
  2. Split on the first /.
  3. Lowercase both halves (the storage layer is lowercase-only).
  4. Validate owner against ^[a-z0-9](?:[a-z0-9-]{0,37}[a-z0-9])?$ (S05 username shape).
  5. Validate repo against ^[a-z0-9](?:[a-z0-9._-]{0,98}[a-z0-9_])?$ (S11 repo shape).
  6. Reject if either contains .., leading dot, or extra slashes.

These validators live inside the protocol package rather than importing internal/infra/storage so that an SSH connection — which runs a brand-new shithubd process every time — doesn't pay the storage package's init cost.

Authorization (S15 will refactor)

Inline V1 rules, mirroring the HTTP path:

  • upload-pack:
    • public repo → any non-suspended user.
    • private repo → must be the owner; non-owners get the not-found message (no existence leak).
  • receive-pack:
    • must be the owner.
    • repo not archived (else MsgArchived).
    • repo not soft-deleted (else MsgRepoNotFound).

Suspended or soft-deleted users are rejected up-front with MsgSuspended.

S15 lifts these into policy.Can(actor, action, resource).

Hook environment

Same as HTTP, plus protocol distinction:

Var Value
SHITHUB_USER_ID numeric users.id of the authenticated client
SHITHUB_USERNAME the user's lowercase username
SHITHUB_REPO_ID numeric repos.id
SHITHUB_REPO_FULL_NAME <owner>/<repo> (lowercased)
SHITHUB_PROTOCOL ssh
SHITHUB_REMOTE_IP first field of SSH_CONNECTION
SHITHUB_REQUEST_ID freshly generated 16-byte hex token
PATH inherited from the parent process so git's helpers resolve

Because git propagates its environment to its hook subprocesses, S14's post-receive can echo "$SHITHUB_USER_ID" and get the right value.

PATH is included so git-receive-pack can find git-pack-objects, git-index-pack, and friends.

syscall.Exec discipline

Three things are non-negotiable here:

  1. Close the DB pool BEFORE syscall.Exec. Go's defer does NOT fire on syscall.Exec — the OS replaces the current process image and the runtime never returns. The pgx pool's TCP connections to Postgres would otherwise leak into git's FD table. We close explicitly.
  2. Resolve the binary via exec.LookPath. syscall.Exec requires an absolute path. exec.LookPath("git-upload-pack") walks $PATH and returns the first match. If git isn't on the daemon user's $PATH, the operator sees shithub: server misconfigured in their git client.
  3. No writes to stdout AFTER the exec call. If the exec call returns, it failed (typically EACCES on the binary or ENOENT). We write a friendly error and exit non-zero. On success we never see another instruction in this process.

sysExec is a package-level var pointing at syscall.Exec so unit tests can stub it. The production binding takes the //nolint:gosec for G204 because the inputs are constrained: bin is the LookPath result of a fixed service name, argv[1] is a sanitized path from storage.RepoFS.

Friendly error catalogue

Error sentinel Stderr line
ErrSSHRepoNotFound shithub: repository not found
ErrSSHPermDenied shithub: permission denied
ErrSSHArchived shithub: this repository is archived; pushes are disabled
ErrSSHSuspended shithub: your account is suspended
ErrUnknownSSHCommand shithub does not allow shell access
ErrInvalidSSHPath shithub: repository not found (collapses to the same message — bad paths are indistinguishable from missing repos)
(anything else) shithub: internal error (request_id=<hex>)

The friendly message is what the user sees in git's output. The structured slog event is what the operator reads — it logs at WARN for denials and INFO for successful dispatches with (user_id, op, owner, repo, remote_ip) fields.

Tests

internal/git/protocol/ssh_command_test.go — accepts (quoted, unquoted, dotted names, hyphenated names, .git suffix), rejects (wrong service, mismatched quotes, command injection attempts, leading/trailing whitespace, absolute paths, dot-dot, leading dot in repo name).

internal/git/protocol/ssh_dispatch_test.go (DB-integration):

  • TestDispatch_PublicCloneByOwner — full round-trip, asserts on argv path + env vars.
  • TestDispatch_PublicCloneByOther — non-owner pull of public repo OK.
  • TestDispatch_PrivateCloneByOtherIsNotFound — non-owner sees ErrSSHRepoNotFound.
  • TestDispatch_PushByNonOwnerIsPermDenied — non-owner push → ErrSSHPermDenied.
  • TestDispatch_PushToArchivedIsArchived.
  • TestDispatch_SuspendedUserSuspended.
  • TestDispatch_UnknownCommandIsRejected.
  • TestFriendlyMessageFor covers the message catalogue.
  • TestParseRemoteIP covers SSH_CONNECTION parsing edge cases.

We don't unit-test the cobra command's syscall.Exec path; the whole point of syscall.Exec is that the test process would be replaced. The dispatcher tests cover everything up to the exec call; the deploy doc has a manual smoke-test recipe.

Deploy checklist

Reviewed once before going live:

  • The shithubd binary runs as a dedicated shithub-ssh system user.
  • That user has read+write on /data/repos/.
  • That user's $PATH includes git-upload-pack and git-receive-pack (typically /usr/lib/git-core/).
  • sshd's AuthorizedKeysCommand points at shithubd ssh-authkeys.
  • sshd's AuthorizedKeysCommandUser is set to shithub-ssh (NOT root).
  • The command= directive in the AKC line forces shithubd ssh-shell <user_id> — a user can't bypass it via ~/.ssh/authorized_keys because the AKC is the only key source we trust.

Failure mode for missing config: any unset cfg value (cfg.DB.URL, cfg.Storage.ReposRoot) lights up shithub: server misconfigured in the user's git client. Operator reads the structured slog line on the daemon side.

Pitfalls / what to remember

  • Strict command parsing is the security boundary. Don't relax the regex without writing a fresh fixture set. Shell injection attempts are caught here, not later.
  • Path validation is the second boundary. .., absolute paths, and embedded slashes are caught BEFORE we ever construct a filesystem path.
  • syscall.Exec doesn't run defers. Anything that needs cleanup runs explicitly before the exec call. Today that's just the pool; if you add a tempfile or a flock, close it first.
  • Stdin/stdout flow directly to git after exec. sshd doesn't insert a layer — what git writes is what the client reads. No buffering hooks possible after this point; if you need to inspect bytes, do it before exec.
  • No environment leak. We replace the entire env with the curated SHITHUB_* set + PATH. SSH_CONNECTION, SSH_ORIGINAL_COMMAND, etc. are not propagated to git.
  • Concurrent pushes serialize on refs/...lock. Multiple SSH connections pushing to the same ref take turns; different refs go in parallel. This is git's own behavior, not ours.

Open follow-ups (deferred)

  • SSH certificates / CA-signed user keys are post-MVP — today it's strictly key-by-key.
  • git-lfs over SSH is post-MVP; the dispatcher rejects git-lfs-authenticate (and anything else) as unknown.
  • Pull-mirror / federation is post-MVP.
  • S14 wires post-receive hooks; this dispatcher is the source of the env vars they read.
  • S15 refactors the inline owner-only check into policy.Can. Both this dispatcher and the HTTP handler swap atomically.
  • docs/internal/git-http.md — the parallel HTTP transport, same auth/permission shape.
  • docs/internal/repo-create.md — repo creation flow whose output we serve.
  • docs/internal/storage.mdRepoFS layout that produces the bare-repo path.
View source
1 # SSH git protocol
2
3 S13 lights up `git clone git@host:owner/repo.git`, `git fetch`, and `git push` over SSH. The S07 `ssh-shell` placeholder is now the real dispatcher: it parses `SSH_ORIGINAL_COMMAND`, resolves user + repo against the DB, runs an inline owner-only authz, sets the `SHITHUB_*` hook environment, closes the DB pool, and `syscall.Exec`s the canonical `git-upload-pack` or `git-receive-pack` binary. After exec, sshd's stdin/stdout/stderr flow directly to git.
4
5 ## What's wired
6
7 - `internal/git/protocol/ssh_command.go``ParseSSHCommand` strict-parses `SSH_ORIGINAL_COMMAND` into `Service + Owner + Repo`. Whitelist regex; matched-quote alternation (`'a/b'` or `a/b` only — `a/b'` rejected); single trailing `.git` stripped; owner + repo run through the storage-shape rules.
8 - `internal/git/protocol/ssh_dispatch.go``PrepareDispatch` does the DB + authz + env work. Returns the binary path, argv, and env vector the caller needs to exec git. Also exposes `FriendlyMessageFor(err, requestID)` so the caller writes a consistent stderr line and `ParseRemoteIP(SSH_CONNECTION)` for the env var.
9 - `cmd/shithubd/ssh.go::sshShellCmd` — the cobra command sshd invokes. Replaces the S07 stub. Manages the DB pool lifecycle and calls `syscall.Exec`.
10
11 The `ssh-authkeys` flow from S07 (the `AuthorizedKeysCommand` handler) is unchanged; it still emits the `command="shithubd ssh-shell <user_id>",no-pty,…` line, which is what binds an inbound SSH connection to a known user before this dispatcher ever runs.
12
13 ## End-to-end flow
14
15 ```
16 client sshd shithubd
17 ──────────────────────────────────────────────────────────────────────
18 ssh git@host ───►
19 git-upload-pack 'alice/foo.git'
20 ssh-authkeys SHA256:…
21 ◄─── command="shithubd ssh-shell 42",no-pty,…
22 ───── pubkey auth OK
23 fork; setenv SSH_ORIGINAL_COMMAND, SSH_CONNECTION
24 exec shithubd ssh-shell 42
25
26 ├─ ParseSSHCommand
27 ├─ DB pool (max 2)
28 ├─ GetUserByID, GetUserByUsername, GetRepoByOwnerUserAndName
29 ├─ Inline authz
30 ├─ Build SHITHUB_* env
31 ├─ pool.Close()
32 └─ syscall.Exec git-upload-pack /data/repos/al/alice/foo.git
33
34
35 git pack protocol
36 (sshd's stdio is now git's)
37 ```
38
39 ## Strict command parsing
40
41 The whitelist regex is intentionally narrow:
42
43 ```
44 ^(git-upload-pack|git-receive-pack)\s+(?:'([^']+)'|([^'\s]+))$
45 ```
46
47 The path group is one of two alternatives — quoted-with-matching-quotes OR no-quotes-and-no-spaces. Anything else (mismatched quotes, escaped quotes, command chaining, extra whitespace, alternate services like `git-archive`) returns `ErrUnknownSSHCommand` and the dispatcher writes "shithub does not allow shell access" to stderr.
48
49 After parse, the path goes through:
50
51 1. Strip a single trailing `.git`.
52 2. Split on the first `/`.
53 3. Lowercase both halves (the storage layer is lowercase-only).
54 4. Validate `owner` against `^[a-z0-9](?:[a-z0-9-]{0,37}[a-z0-9])?$` (S05 username shape).
55 5. Validate `repo` against `^[a-z0-9](?:[a-z0-9._-]{0,98}[a-z0-9_])?$` (S11 repo shape).
56 6. Reject if either contains `..`, leading dot, or extra slashes.
57
58 These validators live inside the `protocol` package rather than importing `internal/infra/storage` so that an SSH connection — which runs a brand-new shithubd process every time — doesn't pay the storage package's init cost.
59
60 ## Authorization (S15 will refactor)
61
62 Inline V1 rules, mirroring the HTTP path:
63
64 - **upload-pack**:
65 - public repo → any non-suspended user.
66 - private repo → must be the owner; non-owners get the not-found message (no existence leak).
67 - **receive-pack**:
68 - must be the owner.
69 - repo not archived (else `MsgArchived`).
70 - repo not soft-deleted (else `MsgRepoNotFound`).
71
72 Suspended or soft-deleted users are rejected up-front with `MsgSuspended`.
73
74 S15 lifts these into `policy.Can(actor, action, resource)`.
75
76 ## Hook environment
77
78 Same as HTTP, plus protocol distinction:
79
80 | Var | Value |
81 |---|---|
82 | `SHITHUB_USER_ID` | numeric `users.id` of the authenticated client |
83 | `SHITHUB_USERNAME` | the user's lowercase username |
84 | `SHITHUB_REPO_ID` | numeric `repos.id` |
85 | `SHITHUB_REPO_FULL_NAME` | `<owner>/<repo>` (lowercased) |
86 | `SHITHUB_PROTOCOL` | `ssh` |
87 | `SHITHUB_REMOTE_IP` | first field of `SSH_CONNECTION` |
88 | `SHITHUB_REQUEST_ID` | freshly generated 16-byte hex token |
89 | `PATH` | inherited from the parent process so git's helpers resolve |
90
91 Because git propagates its environment to its hook subprocesses, S14's post-receive can `echo "$SHITHUB_USER_ID"` and get the right value.
92
93 `PATH` is included so `git-receive-pack` can find `git-pack-objects`, `git-index-pack`, and friends.
94
95 ## syscall.Exec discipline
96
97 Three things are non-negotiable here:
98
99 1. **Close the DB pool BEFORE `syscall.Exec`.** Go's `defer` does NOT fire on `syscall.Exec` — the OS replaces the current process image and the runtime never returns. The pgx pool's TCP connections to Postgres would otherwise leak into git's FD table. We close explicitly.
100 2. **Resolve the binary via `exec.LookPath`.** `syscall.Exec` requires an absolute path. `exec.LookPath("git-upload-pack")` walks `$PATH` and returns the first match. If git isn't on the daemon user's `$PATH`, the operator sees `shithub: server misconfigured` in their git client.
101 3. **No writes to stdout AFTER the exec call.** If the exec call returns, it failed (typically EACCES on the binary or ENOENT). We write a friendly error and exit non-zero. On success we never see another instruction in this process.
102
103 `sysExec` is a package-level var pointing at `syscall.Exec` so unit tests can stub it. The production binding takes the `//nolint:gosec` for G204 because the inputs are constrained: `bin` is the LookPath result of a fixed service name, `argv[1]` is a sanitized path from `storage.RepoFS`.
104
105 ## Friendly error catalogue
106
107 | Error sentinel | Stderr line |
108 |---|---|
109 | `ErrSSHRepoNotFound` | `shithub: repository not found` |
110 | `ErrSSHPermDenied` | `shithub: permission denied` |
111 | `ErrSSHArchived` | `shithub: this repository is archived; pushes are disabled` |
112 | `ErrSSHSuspended` | `shithub: your account is suspended` |
113 | `ErrUnknownSSHCommand` | `shithub does not allow shell access` |
114 | `ErrInvalidSSHPath` | `shithub: repository not found` (collapses to the same message — bad paths are indistinguishable from missing repos) |
115 | (anything else) | `shithub: internal error (request_id=<hex>)` |
116
117 The friendly message is what the user sees in `git`'s output. The structured slog event is what the operator reads — it logs at `WARN` for denials and `INFO` for successful dispatches with `(user_id, op, owner, repo, remote_ip)` fields.
118
119 ## Tests
120
121 `internal/git/protocol/ssh_command_test.go` — accepts (quoted, unquoted, dotted names, hyphenated names, `.git` suffix), rejects (wrong service, mismatched quotes, command injection attempts, leading/trailing whitespace, absolute paths, dot-dot, leading dot in repo name).
122
123 `internal/git/protocol/ssh_dispatch_test.go` (DB-integration):
124 - `TestDispatch_PublicCloneByOwner` — full round-trip, asserts on argv path + env vars.
125 - `TestDispatch_PublicCloneByOther` — non-owner pull of public repo OK.
126 - `TestDispatch_PrivateCloneByOtherIsNotFound` — non-owner sees `ErrSSHRepoNotFound`.
127 - `TestDispatch_PushByNonOwnerIsPermDenied` — non-owner push → `ErrSSHPermDenied`.
128 - `TestDispatch_PushToArchivedIsArchived`.
129 - `TestDispatch_SuspendedUserSuspended`.
130 - `TestDispatch_UnknownCommandIsRejected`.
131 - `TestFriendlyMessageFor` covers the message catalogue.
132 - `TestParseRemoteIP` covers `SSH_CONNECTION` parsing edge cases.
133
134 We don't unit-test the cobra command's `syscall.Exec` path; the whole point of `syscall.Exec` is that the test process would be replaced. The dispatcher tests cover everything up to the exec call; the deploy doc has a manual smoke-test recipe.
135
136 ## Deploy checklist
137
138 Reviewed once before going live:
139
140 - The shithubd binary runs as a dedicated `shithub-ssh` system user.
141 - That user has read+write on `/data/repos/`.
142 - That user's `$PATH` includes `git-upload-pack` and `git-receive-pack` (typically `/usr/lib/git-core/`).
143 - sshd's `AuthorizedKeysCommand` points at `shithubd ssh-authkeys`.
144 - sshd's `AuthorizedKeysCommandUser` is set to `shithub-ssh` (NOT root).
145 - The `command=` directive in the AKC line forces `shithubd ssh-shell <user_id>` — a user can't bypass it via `~/.ssh/authorized_keys` because the AKC is the only key source we trust.
146
147 Failure mode for missing config: any unset cfg value (`cfg.DB.URL`, `cfg.Storage.ReposRoot`) lights up `shithub: server misconfigured` in the user's git client. Operator reads the structured slog line on the daemon side.
148
149 ## Pitfalls / what to remember
150
151 - **Strict command parsing is the security boundary.** Don't relax the regex without writing a fresh fixture set. Shell injection attempts are caught here, not later.
152 - **Path validation is the second boundary.** `..`, absolute paths, and embedded slashes are caught BEFORE we ever construct a filesystem path.
153 - **`syscall.Exec` doesn't run defers.** Anything that needs cleanup runs explicitly before the exec call. Today that's just the pool; if you add a tempfile or a flock, close it first.
154 - **Stdin/stdout flow directly to git after exec.** sshd doesn't insert a layer — what git writes is what the client reads. No buffering hooks possible after this point; if you need to inspect bytes, do it before exec.
155 - **No environment leak.** We replace the entire env with the curated `SHITHUB_*` set + `PATH`. `SSH_CONNECTION`, `SSH_ORIGINAL_COMMAND`, etc. are not propagated to git.
156 - **Concurrent pushes serialize on `refs/...lock`.** Multiple SSH connections pushing to the same ref take turns; different refs go in parallel. This is git's own behavior, not ours.
157
158 ## Open follow-ups (deferred)
159
160 - **SSH certificates / CA-signed user keys** are post-MVP — today it's strictly key-by-key.
161 - **`git-lfs` over SSH** is post-MVP; the dispatcher rejects `git-lfs-authenticate` (and anything else) as unknown.
162 - **Pull-mirror / federation** is post-MVP.
163 - **S14** wires post-receive hooks; this dispatcher is the source of the env vars they read.
164 - **S15** refactors the inline owner-only check into `policy.Can`. Both this dispatcher and the HTTP handler swap atomically.
165
166 ## Related docs
167
168 - `docs/internal/git-http.md` — the parallel HTTP transport, same auth/permission shape.
169 - `docs/internal/repo-create.md` — repo creation flow whose output we serve.
170 - `docs/internal/storage.md``RepoFS` layout that produces the bare-repo path.