tenseleyflow/shithub / 1c8fbce

Browse files

S13: docs/internal/git-ssh.md — sprint reference

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
1c8fbcee0de508910c96247627eb7ffc8122da17
Parents
f810a0c
Tree
cf5820b

1 changed file

StatusFile+-
A docs/internal/git-ssh.md 170 0
docs/internal/git-ssh.mdadded
@@ -0,0 +1,170 @@
1
+# SSH git protocol
2
+
3
+S13 lights up `git clone git@host:owner/repo.git`, `git fetch`, and `git push` over SSH. The S07 `ssh-shell` placeholder is now the real dispatcher: it parses `SSH_ORIGINAL_COMMAND`, resolves user + repo against the DB, runs an inline owner-only authz, sets the `SHITHUB_*` hook environment, closes the DB pool, and `syscall.Exec`s the canonical `git-upload-pack` or `git-receive-pack` binary. After exec, sshd's stdin/stdout/stderr flow directly to git.
4
+
5
+## What's wired
6
+
7
+- `internal/git/protocol/ssh_command.go` — `ParseSSHCommand` strict-parses `SSH_ORIGINAL_COMMAND` into `Service + Owner + Repo`. Whitelist regex; matched-quote alternation (`'a/b'` or `a/b` only — `a/b'` rejected); single trailing `.git` stripped; owner + repo run through the storage-shape rules.
8
+- `internal/git/protocol/ssh_dispatch.go` — `PrepareDispatch` does the DB + authz + env work. Returns the binary path, argv, and env vector the caller needs to exec git. Also exposes `FriendlyMessageFor(err, requestID)` so the caller writes a consistent stderr line and `ParseRemoteIP(SSH_CONNECTION)` for the env var.
9
+- `cmd/shithubd/ssh.go::sshShellCmd` — the cobra command sshd invokes. Replaces the S07 stub. Manages the DB pool lifecycle and calls `syscall.Exec`.
10
+
11
+The `ssh-authkeys` flow from S07 (the `AuthorizedKeysCommand` handler) is unchanged; it still emits the `command="shithubd ssh-shell <user_id>",no-pty,…` line, which is what binds an inbound SSH connection to a known user before this dispatcher ever runs.
12
+
13
+## End-to-end flow
14
+
15
+```
16
+client                          sshd                       shithubd
17
+──────────────────────────────────────────────────────────────────────
18
+ssh git@host                    ───►
19
+git-upload-pack 'alice/foo.git'
20
+                                ssh-authkeys SHA256:…
21
+                                ◄─── command="shithubd ssh-shell 42",no-pty,…
22
+                                ───── pubkey auth OK
23
+                                fork; setenv SSH_ORIGINAL_COMMAND, SSH_CONNECTION
24
+                                exec shithubd ssh-shell 42
25
+                                                            │
26
+                                                            ├─ ParseSSHCommand
27
+                                                            ├─ DB pool (max 2)
28
+                                                            ├─ GetUserByID, GetUserByUsername, GetRepoByOwnerUserAndName
29
+                                                            ├─ Inline authz
30
+                                                            ├─ Build SHITHUB_* env
31
+                                                            ├─ pool.Close()
32
+                                                            └─ syscall.Exec git-upload-pack /data/repos/al/alice/foo.git
33
+                                                                              │
34
+                                                                              ▼
35
+                                                                       git pack protocol
36
+                                                                       (sshd's stdio is now git's)
37
+```
38
+
39
+## Strict command parsing
40
+
41
+The whitelist regex is intentionally narrow:
42
+
43
+```
44
+^(git-upload-pack|git-receive-pack)\s+(?:'([^']+)'|([^'\s]+))$
45
+```
46
+
47
+The path group is one of two alternatives — quoted-with-matching-quotes OR no-quotes-and-no-spaces. Anything else (mismatched quotes, escaped quotes, command chaining, extra whitespace, alternate services like `git-archive`) returns `ErrUnknownSSHCommand` and the dispatcher writes "shithub does not allow shell access" to stderr.
48
+
49
+After parse, the path goes through:
50
+
51
+1. Strip a single trailing `.git`.
52
+2. Split on the first `/`.
53
+3. Lowercase both halves (the storage layer is lowercase-only).
54
+4. Validate `owner` against `^[a-z0-9](?:[a-z0-9-]{0,37}[a-z0-9])?$` (S05 username shape).
55
+5. Validate `repo` against `^[a-z0-9](?:[a-z0-9._-]{0,98}[a-z0-9_])?$` (S11 repo shape).
56
+6. Reject if either contains `..`, leading dot, or extra slashes.
57
+
58
+These validators live inside the `protocol` package rather than importing `internal/infra/storage` so that an SSH connection — which runs a brand-new shithubd process every time — doesn't pay the storage package's init cost.
59
+
60
+## Authorization (S15 will refactor)
61
+
62
+Inline V1 rules, mirroring the HTTP path:
63
+
64
+- **upload-pack**:
65
+  - public repo → any non-suspended user.
66
+  - private repo → must be the owner; non-owners get the not-found message (no existence leak).
67
+- **receive-pack**:
68
+  - must be the owner.
69
+  - repo not archived (else `MsgArchived`).
70
+  - repo not soft-deleted (else `MsgRepoNotFound`).
71
+
72
+Suspended or soft-deleted users are rejected up-front with `MsgSuspended`.
73
+
74
+S15 lifts these into `policy.Can(actor, action, resource)`.
75
+
76
+## Hook environment
77
+
78
+Same as HTTP, plus protocol distinction:
79
+
80
+| Var | Value |
81
+|---|---|
82
+| `SHITHUB_USER_ID` | numeric `users.id` of the authenticated client |
83
+| `SHITHUB_USERNAME` | the user's lowercase username |
84
+| `SHITHUB_REPO_ID` | numeric `repos.id` |
85
+| `SHITHUB_REPO_FULL_NAME` | `<owner>/<repo>` (lowercased) |
86
+| `SHITHUB_PROTOCOL` | `ssh` |
87
+| `SHITHUB_REMOTE_IP` | first field of `SSH_CONNECTION` |
88
+| `SHITHUB_REQUEST_ID` | freshly generated 16-byte hex token |
89
+| `PATH` | inherited from the parent process so git's helpers resolve |
90
+
91
+Because git propagates its environment to its hook subprocesses, S14's post-receive can `echo "$SHITHUB_USER_ID"` and get the right value.
92
+
93
+`PATH` is included so `git-receive-pack` can find `git-pack-objects`, `git-index-pack`, and friends.
94
+
95
+## syscall.Exec discipline
96
+
97
+Three things are non-negotiable here:
98
+
99
+1. **Close the DB pool BEFORE `syscall.Exec`.** Go's `defer` does NOT fire on `syscall.Exec` — the OS replaces the current process image and the runtime never returns. The pgx pool's TCP connections to Postgres would otherwise leak into git's FD table. We close explicitly.
100
+2. **Resolve the binary via `exec.LookPath`.** `syscall.Exec` requires an absolute path. `exec.LookPath("git-upload-pack")` walks `$PATH` and returns the first match. If git isn't on the daemon user's `$PATH`, the operator sees `shithub: server misconfigured` in their git client.
101
+3. **No writes to stdout AFTER the exec call.** If the exec call returns, it failed (typically EACCES on the binary or ENOENT). We write a friendly error and exit non-zero. On success we never see another instruction in this process.
102
+
103
+`sysExec` is a package-level var pointing at `syscall.Exec` so unit tests can stub it. The production binding takes the `//nolint:gosec` for G204 because the inputs are constrained: `bin` is the LookPath result of a fixed service name, `argv[1]` is a sanitized path from `storage.RepoFS`.
104
+
105
+## Friendly error catalogue
106
+
107
+| Error sentinel | Stderr line |
108
+|---|---|
109
+| `ErrSSHRepoNotFound` | `shithub: repository not found` |
110
+| `ErrSSHPermDenied` | `shithub: permission denied` |
111
+| `ErrSSHArchived` | `shithub: this repository is archived; pushes are disabled` |
112
+| `ErrSSHSuspended` | `shithub: your account is suspended` |
113
+| `ErrUnknownSSHCommand` | `shithub does not allow shell access` |
114
+| `ErrInvalidSSHPath` | `shithub: repository not found` (collapses to the same message — bad paths are indistinguishable from missing repos) |
115
+| (anything else) | `shithub: internal error (request_id=<hex>)` |
116
+
117
+The friendly message is what the user sees in `git`'s output. The structured slog event is what the operator reads — it logs at `WARN` for denials and `INFO` for successful dispatches with `(user_id, op, owner, repo, remote_ip)` fields.
118
+
119
+## Tests
120
+
121
+`internal/git/protocol/ssh_command_test.go` — accepts (quoted, unquoted, dotted names, hyphenated names, `.git` suffix), rejects (wrong service, mismatched quotes, command injection attempts, leading/trailing whitespace, absolute paths, dot-dot, leading dot in repo name).
122
+
123
+`internal/git/protocol/ssh_dispatch_test.go` (DB-integration):
124
+- `TestDispatch_PublicCloneByOwner` — full round-trip, asserts on argv path + env vars.
125
+- `TestDispatch_PublicCloneByOther` — non-owner pull of public repo OK.
126
+- `TestDispatch_PrivateCloneByOtherIsNotFound` — non-owner sees `ErrSSHRepoNotFound`.
127
+- `TestDispatch_PushByNonOwnerIsPermDenied` — non-owner push → `ErrSSHPermDenied`.
128
+- `TestDispatch_PushToArchivedIsArchived`.
129
+- `TestDispatch_SuspendedUserSuspended`.
130
+- `TestDispatch_UnknownCommandIsRejected`.
131
+- `TestFriendlyMessageFor` covers the message catalogue.
132
+- `TestParseRemoteIP` covers `SSH_CONNECTION` parsing edge cases.
133
+
134
+We don't unit-test the cobra command's `syscall.Exec` path; the whole point of `syscall.Exec` is that the test process would be replaced. The dispatcher tests cover everything up to the exec call; the deploy doc has a manual smoke-test recipe.
135
+
136
+## Deploy checklist
137
+
138
+Reviewed once before going live:
139
+
140
+- The shithubd binary runs as a dedicated `shithub-ssh` system user.
141
+- That user has read+write on `/data/repos/`.
142
+- That user's `$PATH` includes `git-upload-pack` and `git-receive-pack` (typically `/usr/lib/git-core/`).
143
+- sshd's `AuthorizedKeysCommand` points at `shithubd ssh-authkeys`.
144
+- sshd's `AuthorizedKeysCommandUser` is set to `shithub-ssh` (NOT root).
145
+- The `command=` directive in the AKC line forces `shithubd ssh-shell <user_id>` — a user can't bypass it via `~/.ssh/authorized_keys` because the AKC is the only key source we trust.
146
+
147
+Failure mode for missing config: any unset cfg value (`cfg.DB.URL`, `cfg.Storage.ReposRoot`) lights up `shithub: server misconfigured` in the user's git client. Operator reads the structured slog line on the daemon side.
148
+
149
+## Pitfalls / what to remember
150
+
151
+- **Strict command parsing is the security boundary.** Don't relax the regex without writing a fresh fixture set. Shell injection attempts are caught here, not later.
152
+- **Path validation is the second boundary.** `..`, absolute paths, and embedded slashes are caught BEFORE we ever construct a filesystem path.
153
+- **`syscall.Exec` doesn't run defers.** Anything that needs cleanup runs explicitly before the exec call. Today that's just the pool; if you add a tempfile or a flock, close it first.
154
+- **Stdin/stdout flow directly to git after exec.** sshd doesn't insert a layer — what git writes is what the client reads. No buffering hooks possible after this point; if you need to inspect bytes, do it before exec.
155
+- **No environment leak.** We replace the entire env with the curated `SHITHUB_*` set + `PATH`. `SSH_CONNECTION`, `SSH_ORIGINAL_COMMAND`, etc. are not propagated to git.
156
+- **Concurrent pushes serialize on `refs/...lock`.** Multiple SSH connections pushing to the same ref take turns; different refs go in parallel. This is git's own behavior, not ours.
157
+
158
+## Open follow-ups (deferred)
159
+
160
+- **SSH certificates / CA-signed user keys** are post-MVP — today it's strictly key-by-key.
161
+- **`git-lfs` over SSH** is post-MVP; the dispatcher rejects `git-lfs-authenticate` (and anything else) as unknown.
162
+- **Pull-mirror / federation** is post-MVP.
163
+- **S14** wires post-receive hooks; this dispatcher is the source of the env vars they read.
164
+- **S15** refactors the inline owner-only check into `policy.Can`. Both this dispatcher and the HTTP handler swap atomically.
165
+
166
+## Related docs
167
+
168
+- `docs/internal/git-http.md` — the parallel HTTP transport, same auth/permission shape.
169
+- `docs/internal/repo-create.md` — repo creation flow whose output we serve.
170
+- `docs/internal/storage.md` — `RepoFS` layout that produces the bare-repo path.