tenseleyflow/shithub / 11f0fda

Browse files

S14: docs/internal/{hooks,worker}.md

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
11f0fdaf85bfc8192a65ff31a0305145d37476f8
Parents
aa9ab45
Tree
e17b934

2 changed files

StatusFile+-
A docs/internal/hooks.md 105 0
A docs/internal/worker.md 96 0
docs/internal/hooks.mdadded
@@ -0,0 +1,105 @@
1
+# Git hooks
2
+
3
+Bare-repo git hooks are how the push pipeline (S14) gets called.
4
+shithub installs **pre-receive** (gates) and **post-receive** (enqueue)
5
+on every repo. Both are tiny shell shims that exec into the same
6
+`shithubd` binary the web/SSH/worker layers run.
7
+
8
+## Why shell shims and not direct binary symlinks
9
+
10
+git invokes hooks with stdin piped, a particular cwd, and a controlled
11
+env. The shim normalizes the entry point: `exec /path/to/shithubd hook
12
+<name>`. Hooks aren't symlinked because (a) macOS git treats some
13
+symlinks oddly under hardened runtime, and (b) the shim explicitly
14
+re-exec's so signals and exit codes propagate cleanly.
15
+
16
+The shim is regenerated on every `hooks.Install` call — there's no
17
+manual editing path, by design. If the binary moves (deploy upgrade,
18
+versioned install path), run `shithubd hooks reinstall --all` to point
19
+every repo at the new location.
20
+
21
+## Hook execution flow
22
+
23
+```
24
+git push  ──▶  receive-pack
25
+              │
26
+              ▼
27
+         pre-receive  (one process per push)
28
+              │ stdin: "<old> <new> <ref>" lines
29
+              │ env:   SHITHUB_USER_ID / _USERNAME / _REPO_ID / _REPO_FULL_NAME
30
+              │        SHITHUB_PROTOCOL / _REMOTE_IP / _REQUEST_ID
31
+              ▼
32
+         exec shithubd hook pre-receive
33
+              │ exit 0 → accept
34
+              │ exit ≠ 0 → reject; stderr is shown to the pusher
35
+              ▼
36
+       (push proceeds)
37
+              │
38
+              ▼
39
+         post-receive  (one process per push)
40
+              │ stdin: same shape
41
+              ▼
42
+         exec shithubd hook post-receive
43
+              │ INSERT push_events row per ref
44
+              │ INSERT job (push:process)
45
+              │ NOTIFY shithub_jobs
46
+              │ exit 0
47
+              ▼
48
+       worker picks it up via LISTEN
49
+```
50
+
51
+## pre-receive contract
52
+
53
+Implements the **minimum gates** described in S14. The full branch
54
+protection engine is S20.
55
+
56
+Re-checks the following from the DB even though the env carries them
57
+(env can be stale on a long-lived SSH session):
58
+
59
+* User not suspended (`users.suspended_at IS NULL`).
60
+* Repo not archived (`repos.is_archived = false`).
61
+* Repo not soft-deleted (`repos.deleted_at IS NULL`).
62
+
63
+Failures emit a `shithub: ...` line on stderr that git surfaces directly
64
+to the pusher's terminal. Latency budget: <100ms p99.
65
+
66
+## post-receive contract
67
+
68
+* Reads stdin, ignores empty/malformed lines.
69
+* For each `<old> <new> <ref>` line: INSERT a `push_events` row, then
70
+  enqueue a `push:process` job carrying the new event id.
71
+* Issues a single `NOTIFY shithub_jobs` per push (workers wake on the
72
+  next tx commit).
73
+* Exits 0 even on internal errors. The push has already landed; the
74
+  pipeline is async — partial enqueue failures surface in the worker
75
+  logs and a backstop reconciler (post-MVP) can re-process orphaned
76
+  events.
77
+
78
+## Installation
79
+
80
+* On repo create (`internal/repos/create.go::Create`): runs
81
+  `hooks.Install` after `RepoFS.InitBare` succeeds. The S11 plumbing
82
+  initial commit does *not* fire hooks — that's correct, the contract
83
+  is hooks fire on user-driven pushes only.
84
+* On deploy: `shithubd hooks reinstall --all` walks every active repo
85
+  via the DB and re-installs. Single repos: `--repo owner/name`.
86
+* The binary path baked into the shim is `os.Executable()` of the
87
+  running shithubd at install time, resolved through `filepath.Abs`.
88
+  Test fixtures that don't exercise hooks pass `ShithubdPath: ""` to
89
+  `repos.Create.Deps` to skip installation.
90
+
91
+## Failure modes worth knowing
92
+
93
+* **`SHITHUB_*` env not set** (e.g. someone manually triggered a push
94
+  via a path that bypassed S12/S13): pre-receive returns the "missing
95
+  context" error. post-receive logs a warning and exits 0 — the push
96
+  still lands.
97
+* **DB unreachable from hook**: pre-receive returns "server error";
98
+  the user sees a generic message, the push aborts. post-receive logs
99
+  and exits 0; backstop reconciler will pick up the unprocessed push
100
+  on next opportunity.
101
+* **Binary path drift between install and execution**: the shim's hard-
102
+  coded path no longer exists. git will report "hook execution failed"
103
+  and abort the push. Operators recover with `hooks reinstall`.
104
+* **Stale shim from previous shithubd version**: `Install` is
105
+  idempotent and overwrites; re-running on a deploy is the right move.
docs/internal/worker.mdadded
@@ -0,0 +1,96 @@
1
+# Worker
2
+
3
+The worker pool drains the Postgres-backed job queue introduced in S14.
4
+One binary serves the web API, the SSH dispatcher, the hooks, and the
5
+worker — `shithubd worker` boots a long-running process whose only job
6
+is to dequeue and run.
7
+
8
+## Architecture
9
+
10
+```
11
+post-receive  ──INSERT push_event + INSERT job + NOTIFY──▶  Postgres jobs
12
+                                                                  │
13
+                                                                  ▼
14
+                                                         shithubd worker
15
+                                                  ┌──────────────────────┐
16
+                                                  │ N goroutines          │
17
+                                                  │ ClaimJob (FOR UPDATE  │
18
+                                                  │   SKIP LOCKED)        │
19
+                                                  │ → handler             │
20
+                                                  │ → MarkCompleted /     │
21
+                                                  │   Reschedule /        │
22
+                                                  │   MarkFailed          │
23
+                                                  └──────────────────────┘
24
+```
25
+
26
+The pool also runs a dedicated LISTEN goroutine that holds one Postgres
27
+connection on `LISTEN shithub_jobs`. A NOTIFY from any process (typically
28
+the post-receive hook) wakes idle workers in <1s without polling. The
29
+backstop poll (every 5s by default) covers dropped notifications.
30
+
31
+## Job kinds shipped in S14
32
+
33
+| Kind                   | Trigger                              | Idempotent on            |
34
+| ---------------------- | ------------------------------------ | ------------------------ |
35
+| `push:process`         | post-receive hook per ref            | `push_events.processed_at` |
36
+| `repo:size_recalc`     | enqueued by `push:process`           | overwrite-last-wins        |
37
+| `jobs:purge_completed` | future cron / manual ad-hoc          | always safe to re-run      |
38
+
39
+Adding a new kind: write the handler in `internal/worker/jobs/<kind>.go`,
40
+add the `Kind` constant to `internal/worker/types.go`, register it in
41
+`cmd/shithubd/worker.go`. The dispatch loop is generic — no other code
42
+needs to change.
43
+
44
+## Failure handling
45
+
46
+* **Transient errors** (handler returns a non-nil non-poison error):
47
+  reschedule with `Backoff(attempts) ± 20% jitter`, where `Backoff` is
48
+  `30s * 2^(attempts-1)` capped at 1h.
49
+* **Poison errors** (handler returns an error wrapping `worker.ErrPoison`):
50
+  jump to `MarkJobFailed` immediately. Use this for malformed payloads or
51
+  references to vanished rows.
52
+* **Panics**: caught by `safeRun` and treated as a transient error so a
53
+  buggy handler can't take down a worker goroutine.
54
+* **Stuck locks**: `ClaimJob` ignores rows whose `locked_at` is older
55
+  than 5 minutes — a worker that died mid-job releases its rows on the
56
+  next claim cycle.
57
+* **Max attempts**: `jobs.max_attempts` (default 5). Past the limit the
58
+  row is moved to `failed_at`. Failed rows surface in the S34 admin
59
+  panel for poison-job triage.
60
+
61
+## Concurrency contract
62
+
63
+* `FOR UPDATE SKIP LOCKED` on the inner SELECT means N concurrent workers
64
+  on the same kind can claim N distinct rows in one round. Each row is
65
+  processed exactly once across the cluster.
66
+* Workers hold a row's lock only while their handler runs. The lock is
67
+  released atomically by `MarkJobCompleted` / `RescheduleJob` /
68
+  `MarkJobFailed`.
69
+* Job handlers MUST be idempotent. A worker process killed mid-handler
70
+  leaves the row locked until the 5-minute stale-lock window passes,
71
+  at which point another worker may pick it up and re-run. Guard the
72
+  side-effects (e.g. `processed_at IS NULL` checks).
73
+
74
+## Metrics
75
+
76
+All exported through the standard `/metrics` registry:
77
+
78
+* `shithub_worker_jobs_processed_total{kind,outcome}` — outcome ∈
79
+  `ok | retry | failed | poison`.
80
+* `shithub_worker_job_duration_seconds{kind}` — handler latency
81
+  histogram.
82
+* `shithub_worker_in_flight{kind}` — gauge of currently-running handlers.
83
+
84
+## Operational notes
85
+
86
+* Default pool size is 4. Override via `--workers <n>` on the CLI or
87
+  `SHITHUB_WORKERS=<n>` in the environment.
88
+* Graceful shutdown: SIGINT/SIGTERM cancels the root context. Workers
89
+  stop pulling new jobs and let in-flight handlers finish (bounded by
90
+  `JobTimeout`, default 5 minutes). The LISTEN goroutine drops its
91
+  conn cleanly.
92
+* The pool size on the Postgres connection is sized to `Workers + 2`
93
+  (one for LISTEN, one slack for enqueues during shutdown).
94
+* `LISTEN/NOTIFY` semantics: when the hook calls `pg_notify` inside a
95
+  transaction, the notification is *only* delivered after commit. So
96
+  failed inserts never wake workers for nothing.