tenseleyflow/shithub / d7d47d4

Browse files

docs: add runner deploy runbook

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
d7d47d432987f58beb5377b1981e34fe00fe3105
Parents
7441984
Tree
0d227dd

3 changed files

StatusFile+-
M docs/internal/index.md 2 0
M docs/internal/runbooks/actions-runner.md 3 0
A docs/internal/runbooks/runner-deploy.md 130 0
docs/internal/index.mdmodified
@@ -64,6 +64,8 @@ site.
6464
 ## Operations
6565
 
6666
 - [deploy.md](./deploy.md) — Ansible playbook + topology.
67
+- [runbooks/runner-deploy.md](./runbooks/runner-deploy.md) —
68
+  Actions runner host deployment.
6769
 - [runbooks/](./runbooks/) — incident, backup, restore, upgrade,
6870
   rollback, plus rotation procedures.
6971
 
docs/internal/runbooks/actions-runner.mdmodified
@@ -4,6 +4,9 @@ This runbook validates the runner-facing Actions path. `shithubd-runner`
44
 now claims jobs and executes containerized `run:` steps through Docker or
55
 Podman. The curl flow below remains useful for token/replay debugging.
66
 
7
+For host provisioning and the systemd/Ansible path, see
8
+[runner-deploy.md](./runner-deploy.md).
9
+
710
 Prereqs:
811
 
912
 - Database migrations are current through `0053_runner_jwt_used.sql`.
docs/internal/runbooks/runner-deploy.mdadded
@@ -0,0 +1,130 @@
1
+# Actions runner deploy runbook
2
+
3
+This runbook owns the S41d deployment path for `shithubd-runner`: the
4
+Nix-built default image, systemd unit, and Ansible role. The smoke flow
5
+for an already-installed runner lives in [actions-runner.md](./actions-runner.md).
6
+
7
+## Prereqs
8
+
9
+- The app database is migrated through S41d and the web API has
10
+  `auth.totp_key_b64` configured so job JWTs can be minted.
11
+- Docker is installed on the runner host and the `docker` group exists.
12
+  S41e narrows the sandbox; S41d runner hosts must be treated as trusted.
13
+- `bin/shithubd-runner` exists locally. `make build` builds both
14
+  `bin/shithubd` and `bin/shithubd-runner` with the same version ldflags.
15
+- The default image has been loaded or published. Build it with:
16
+
17
+```sh
18
+nix build ./deploy/runner-images#runnerImage
19
+docker load < result
20
+```
21
+
22
+The committed `deploy/runner-images/flake.lock` pins the nixpkgs input.
23
+Update it deliberately when changing the default image toolchain.
24
+
25
+Publishing to GHCR is manual through `.github/workflows/runner-image.yml`
26
+because forks may not control the upstream `ghcr.io/shithub` namespace.
27
+Leave the workflow's `image` input blank to publish under the current
28
+repository's package namespace, or set it explicitly for upstream
29
+publishing.
30
+
31
+## Register
32
+
33
+Run this once from a host that can reach the production database config:
34
+
35
+```sh
36
+shithubd admin runner register \
37
+  --name prod-runner-1 \
38
+  --labels self-hosted,linux,ubuntu-latest \
39
+  --capacity 1
40
+```
41
+
42
+Store the printed token in ansible-vault or the deployment secret store.
43
+Only the token hash is stored in Postgres; the raw token cannot be
44
+recovered later.
45
+
46
+## Inventory
47
+
48
+Enable the role explicitly. The default is disabled so ordinary app
49
+deploys do not start a runner by accident.
50
+
51
+```ini
52
+[shithub:vars]
53
+shithub_runner_enabled=true
54
+shithub_runner_token=REPLACE_ME
55
+shithub_runner_labels=self-hosted,linux,ubuntu-latest
56
+shithub_runner_capacity=1
57
+shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
58
+```
59
+
60
+The role writes non-secret config to
61
+`/etc/shithubd-runner/config.toml` and the registration token to
62
+`/etc/shithubd-runner/runner.env` with mode `0600`.
63
+Keep `shithub_runner_workspace_root` under `/var/lib/shithubd-runner`;
64
+the systemd unit grants runner writes only to that subtree.
65
+
66
+## Deploy
67
+
68
+For the runner role only:
69
+
70
+```sh
71
+make build
72
+cd deploy/ansible
73
+ansible-playbook -i inventory/production site.yml -t shithubd-runner
74
+```
75
+
76
+The role:
77
+
78
+- creates the `shithub-runner` system user and joins it to `docker`
79
+- uploads `/usr/local/bin/shithubd-runner`
80
+- renders `/etc/shithubd-runner/config.toml` and `runner.env`
81
+- installs `deploy/systemd/shithubd-runner.service`
82
+- pulls the configured runner image
83
+- enables and starts `shithubd-runner`
84
+
85
+## Verify
86
+
87
+On the runner host:
88
+
89
+```sh
90
+systemctl status shithubd-runner
91
+journalctl -u shithubd-runner -n 100 --no-pager
92
+```
93
+
94
+Then push a workflow with a simple `run:` step:
95
+
96
+```yaml
97
+name: ci
98
+on: push
99
+jobs:
100
+  build:
101
+    runs-on: ubuntu-latest
102
+    steps:
103
+      - run: bash -c "echo hello && exit 0"
104
+```
105
+
106
+Expected state:
107
+
108
+- a runner heartbeat claims the queued job within one idle poll interval
109
+- the step emits SQL log chunks during execution
110
+- `workflow:finalize_step` uploads
111
+  `actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`
112
+- the job and check run complete with conclusion `success`
113
+
114
+Repeat with `exit 1`; the check should complete with conclusion
115
+`failure`.
116
+
117
+## Rollback
118
+
119
+Stop the runner first so it does not claim new jobs:
120
+
121
+```sh
122
+systemctl stop shithubd-runner
123
+systemctl disable shithubd-runner
124
+```
125
+
126
+If the binary itself is bad, copy a prior archived binary from
127
+`/var/lib/shithubd-runner/binaries/` back to
128
+`/usr/local/bin/shithubd-runner` and restart the unit. Jobs already
129
+claimed by the stopped runner remain visible in the database; S41g adds
130
+operator cancel/re-run controls.