@@ -0,0 +1,170 @@ |
| | 1 | +# SSH git protocol |
| | 2 | + |
| | 3 | +S13 lights up `git clone git@host:owner/repo.git`, `git fetch`, and `git push` over SSH. The S07 `ssh-shell` placeholder is now the real dispatcher: it parses `SSH_ORIGINAL_COMMAND`, resolves user + repo against the DB, runs an inline owner-only authz, sets the `SHITHUB_*` hook environment, closes the DB pool, and `syscall.Exec`s the canonical `git-upload-pack` or `git-receive-pack` binary. After exec, sshd's stdin/stdout/stderr flow directly to git. |
| | 4 | + |
| | 5 | +## What's wired |
| | 6 | + |
| | 7 | +- `internal/git/protocol/ssh_command.go` — `ParseSSHCommand` strict-parses `SSH_ORIGINAL_COMMAND` into `Service + Owner + Repo`. Whitelist regex; matched-quote alternation (`'a/b'` or `a/b` only — `a/b'` rejected); single trailing `.git` stripped; owner + repo run through the storage-shape rules. |
| | 8 | +- `internal/git/protocol/ssh_dispatch.go` — `PrepareDispatch` does the DB + authz + env work. Returns the binary path, argv, and env vector the caller needs to exec git. Also exposes `FriendlyMessageFor(err, requestID)` so the caller writes a consistent stderr line and `ParseRemoteIP(SSH_CONNECTION)` for the env var. |
| | 9 | +- `cmd/shithubd/ssh.go::sshShellCmd` — the cobra command sshd invokes. Replaces the S07 stub. Manages the DB pool lifecycle and calls `syscall.Exec`. |
| | 10 | + |
| | 11 | +The `ssh-authkeys` flow from S07 (the `AuthorizedKeysCommand` handler) is unchanged; it still emits the `command="shithubd ssh-shell <user_id>",no-pty,…` line, which is what binds an inbound SSH connection to a known user before this dispatcher ever runs. |
| | 12 | + |
| | 13 | +## End-to-end flow |
| | 14 | + |
| | 15 | +``` |
| | 16 | +client sshd shithubd |
| | 17 | +────────────────────────────────────────────────────────────────────── |
| | 18 | +ssh git@host ───► |
| | 19 | +git-upload-pack 'alice/foo.git' |
| | 20 | + ssh-authkeys SHA256:… |
| | 21 | + ◄─── command="shithubd ssh-shell 42",no-pty,… |
| | 22 | + ───── pubkey auth OK |
| | 23 | + fork; setenv SSH_ORIGINAL_COMMAND, SSH_CONNECTION |
| | 24 | + exec shithubd ssh-shell 42 |
| | 25 | + │ |
| | 26 | + ├─ ParseSSHCommand |
| | 27 | + ├─ DB pool (max 2) |
| | 28 | + ├─ GetUserByID, GetUserByUsername, GetRepoByOwnerUserAndName |
| | 29 | + ├─ Inline authz |
| | 30 | + ├─ Build SHITHUB_* env |
| | 31 | + ├─ pool.Close() |
| | 32 | + └─ syscall.Exec git-upload-pack /data/repos/al/alice/foo.git |
| | 33 | + │ |
| | 34 | + ▼ |
| | 35 | + git pack protocol |
| | 36 | + (sshd's stdio is now git's) |
| | 37 | +``` |
| | 38 | + |
| | 39 | +## Strict command parsing |
| | 40 | + |
| | 41 | +The whitelist regex is intentionally narrow: |
| | 42 | + |
| | 43 | +``` |
| | 44 | +^(git-upload-pack|git-receive-pack)\s+(?:'([^']+)'|([^'\s]+))$ |
| | 45 | +``` |
| | 46 | + |
| | 47 | +The path group is one of two alternatives — quoted-with-matching-quotes OR no-quotes-and-no-spaces. Anything else (mismatched quotes, escaped quotes, command chaining, extra whitespace, alternate services like `git-archive`) returns `ErrUnknownSSHCommand` and the dispatcher writes "shithub does not allow shell access" to stderr. |
| | 48 | + |
| | 49 | +After parse, the path goes through: |
| | 50 | + |
| | 51 | +1. Strip a single trailing `.git`. |
| | 52 | +2. Split on the first `/`. |
| | 53 | +3. Lowercase both halves (the storage layer is lowercase-only). |
| | 54 | +4. Validate `owner` against `^[a-z0-9](?:[a-z0-9-]{0,37}[a-z0-9])?$` (S05 username shape). |
| | 55 | +5. Validate `repo` against `^[a-z0-9](?:[a-z0-9._-]{0,98}[a-z0-9_])?$` (S11 repo shape). |
| | 56 | +6. Reject if either contains `..`, leading dot, or extra slashes. |
| | 57 | + |
| | 58 | +These validators live inside the `protocol` package rather than importing `internal/infra/storage` so that an SSH connection — which runs a brand-new shithubd process every time — doesn't pay the storage package's init cost. |
| | 59 | + |
| | 60 | +## Authorization (S15 will refactor) |
| | 61 | + |
| | 62 | +Inline V1 rules, mirroring the HTTP path: |
| | 63 | + |
| | 64 | +- **upload-pack**: |
| | 65 | + - public repo → any non-suspended user. |
| | 66 | + - private repo → must be the owner; non-owners get the not-found message (no existence leak). |
| | 67 | +- **receive-pack**: |
| | 68 | + - must be the owner. |
| | 69 | + - repo not archived (else `MsgArchived`). |
| | 70 | + - repo not soft-deleted (else `MsgRepoNotFound`). |
| | 71 | + |
| | 72 | +Suspended or soft-deleted users are rejected up-front with `MsgSuspended`. |
| | 73 | + |
| | 74 | +S15 lifts these into `policy.Can(actor, action, resource)`. |
| | 75 | + |
| | 76 | +## Hook environment |
| | 77 | + |
| | 78 | +Same as HTTP, plus protocol distinction: |
| | 79 | + |
| | 80 | +| Var | Value | |
| | 81 | +|---|---| |
| | 82 | +| `SHITHUB_USER_ID` | numeric `users.id` of the authenticated client | |
| | 83 | +| `SHITHUB_USERNAME` | the user's lowercase username | |
| | 84 | +| `SHITHUB_REPO_ID` | numeric `repos.id` | |
| | 85 | +| `SHITHUB_REPO_FULL_NAME` | `<owner>/<repo>` (lowercased) | |
| | 86 | +| `SHITHUB_PROTOCOL` | `ssh` | |
| | 87 | +| `SHITHUB_REMOTE_IP` | first field of `SSH_CONNECTION` | |
| | 88 | +| `SHITHUB_REQUEST_ID` | freshly generated 16-byte hex token | |
| | 89 | +| `PATH` | inherited from the parent process so git's helpers resolve | |
| | 90 | + |
| | 91 | +Because git propagates its environment to its hook subprocesses, S14's post-receive can `echo "$SHITHUB_USER_ID"` and get the right value. |
| | 92 | + |
| | 93 | +`PATH` is included so `git-receive-pack` can find `git-pack-objects`, `git-index-pack`, and friends. |
| | 94 | + |
| | 95 | +## syscall.Exec discipline |
| | 96 | + |
| | 97 | +Three things are non-negotiable here: |
| | 98 | + |
| | 99 | +1. **Close the DB pool BEFORE `syscall.Exec`.** Go's `defer` does NOT fire on `syscall.Exec` — the OS replaces the current process image and the runtime never returns. The pgx pool's TCP connections to Postgres would otherwise leak into git's FD table. We close explicitly. |
| | 100 | +2. **Resolve the binary via `exec.LookPath`.** `syscall.Exec` requires an absolute path. `exec.LookPath("git-upload-pack")` walks `$PATH` and returns the first match. If git isn't on the daemon user's `$PATH`, the operator sees `shithub: server misconfigured` in their git client. |
| | 101 | +3. **No writes to stdout AFTER the exec call.** If the exec call returns, it failed (typically EACCES on the binary or ENOENT). We write a friendly error and exit non-zero. On success we never see another instruction in this process. |
| | 102 | + |
| | 103 | +`sysExec` is a package-level var pointing at `syscall.Exec` so unit tests can stub it. The production binding takes the `//nolint:gosec` for G204 because the inputs are constrained: `bin` is the LookPath result of a fixed service name, `argv[1]` is a sanitized path from `storage.RepoFS`. |
| | 104 | + |
| | 105 | +## Friendly error catalogue |
| | 106 | + |
| | 107 | +| Error sentinel | Stderr line | |
| | 108 | +|---|---| |
| | 109 | +| `ErrSSHRepoNotFound` | `shithub: repository not found` | |
| | 110 | +| `ErrSSHPermDenied` | `shithub: permission denied` | |
| | 111 | +| `ErrSSHArchived` | `shithub: this repository is archived; pushes are disabled` | |
| | 112 | +| `ErrSSHSuspended` | `shithub: your account is suspended` | |
| | 113 | +| `ErrUnknownSSHCommand` | `shithub does not allow shell access` | |
| | 114 | +| `ErrInvalidSSHPath` | `shithub: repository not found` (collapses to the same message — bad paths are indistinguishable from missing repos) | |
| | 115 | +| (anything else) | `shithub: internal error (request_id=<hex>)` | |
| | 116 | + |
| | 117 | +The friendly message is what the user sees in `git`'s output. The structured slog event is what the operator reads — it logs at `WARN` for denials and `INFO` for successful dispatches with `(user_id, op, owner, repo, remote_ip)` fields. |
| | 118 | + |
| | 119 | +## Tests |
| | 120 | + |
| | 121 | +`internal/git/protocol/ssh_command_test.go` — accepts (quoted, unquoted, dotted names, hyphenated names, `.git` suffix), rejects (wrong service, mismatched quotes, command injection attempts, leading/trailing whitespace, absolute paths, dot-dot, leading dot in repo name). |
| | 122 | + |
| | 123 | +`internal/git/protocol/ssh_dispatch_test.go` (DB-integration): |
| | 124 | +- `TestDispatch_PublicCloneByOwner` — full round-trip, asserts on argv path + env vars. |
| | 125 | +- `TestDispatch_PublicCloneByOther` — non-owner pull of public repo OK. |
| | 126 | +- `TestDispatch_PrivateCloneByOtherIsNotFound` — non-owner sees `ErrSSHRepoNotFound`. |
| | 127 | +- `TestDispatch_PushByNonOwnerIsPermDenied` — non-owner push → `ErrSSHPermDenied`. |
| | 128 | +- `TestDispatch_PushToArchivedIsArchived`. |
| | 129 | +- `TestDispatch_SuspendedUserSuspended`. |
| | 130 | +- `TestDispatch_UnknownCommandIsRejected`. |
| | 131 | +- `TestFriendlyMessageFor` covers the message catalogue. |
| | 132 | +- `TestParseRemoteIP` covers `SSH_CONNECTION` parsing edge cases. |
| | 133 | + |
| | 134 | +We don't unit-test the cobra command's `syscall.Exec` path; the whole point of `syscall.Exec` is that the test process would be replaced. The dispatcher tests cover everything up to the exec call; the deploy doc has a manual smoke-test recipe. |
| | 135 | + |
| | 136 | +## Deploy checklist |
| | 137 | + |
| | 138 | +Reviewed once before going live: |
| | 139 | + |
| | 140 | +- The shithubd binary runs as a dedicated `shithub-ssh` system user. |
| | 141 | +- That user has read+write on `/data/repos/`. |
| | 142 | +- That user's `$PATH` includes `git-upload-pack` and `git-receive-pack` (typically `/usr/lib/git-core/`). |
| | 143 | +- sshd's `AuthorizedKeysCommand` points at `shithubd ssh-authkeys`. |
| | 144 | +- sshd's `AuthorizedKeysCommandUser` is set to `shithub-ssh` (NOT root). |
| | 145 | +- The `command=` directive in the AKC line forces `shithubd ssh-shell <user_id>` — a user can't bypass it via `~/.ssh/authorized_keys` because the AKC is the only key source we trust. |
| | 146 | + |
| | 147 | +Failure mode for missing config: any unset cfg value (`cfg.DB.URL`, `cfg.Storage.ReposRoot`) lights up `shithub: server misconfigured` in the user's git client. Operator reads the structured slog line on the daemon side. |
| | 148 | + |
| | 149 | +## Pitfalls / what to remember |
| | 150 | + |
| | 151 | +- **Strict command parsing is the security boundary.** Don't relax the regex without writing a fresh fixture set. Shell injection attempts are caught here, not later. |
| | 152 | +- **Path validation is the second boundary.** `..`, absolute paths, and embedded slashes are caught BEFORE we ever construct a filesystem path. |
| | 153 | +- **`syscall.Exec` doesn't run defers.** Anything that needs cleanup runs explicitly before the exec call. Today that's just the pool; if you add a tempfile or a flock, close it first. |
| | 154 | +- **Stdin/stdout flow directly to git after exec.** sshd doesn't insert a layer — what git writes is what the client reads. No buffering hooks possible after this point; if you need to inspect bytes, do it before exec. |
| | 155 | +- **No environment leak.** We replace the entire env with the curated `SHITHUB_*` set + `PATH`. `SSH_CONNECTION`, `SSH_ORIGINAL_COMMAND`, etc. are not propagated to git. |
| | 156 | +- **Concurrent pushes serialize on `refs/...lock`.** Multiple SSH connections pushing to the same ref take turns; different refs go in parallel. This is git's own behavior, not ours. |
| | 157 | + |
| | 158 | +## Open follow-ups (deferred) |
| | 159 | + |
| | 160 | +- **SSH certificates / CA-signed user keys** are post-MVP — today it's strictly key-by-key. |
| | 161 | +- **`git-lfs` over SSH** is post-MVP; the dispatcher rejects `git-lfs-authenticate` (and anything else) as unknown. |
| | 162 | +- **Pull-mirror / federation** is post-MVP. |
| | 163 | +- **S14** wires post-receive hooks; this dispatcher is the source of the env vars they read. |
| | 164 | +- **S15** refactors the inline owner-only check into `policy.Can`. Both this dispatcher and the HTTP handler swap atomically. |
| | 165 | + |
| | 166 | +## Related docs |
| | 167 | + |
| | 168 | +- `docs/internal/git-http.md` — the parallel HTTP transport, same auth/permission shape. |
| | 169 | +- `docs/internal/repo-create.md` — repo creation flow whose output we serve. |
| | 170 | +- `docs/internal/storage.md` — `RepoFS` layout that produces the bare-repo path. |