tenseleyflow/shithub / 1945c79

Browse files

docs/actions: document runner token handoff and label diagnostics (S41j)

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
1945c79588965a214a1b7af80984dc333e4b8999
Parents
fdc8b27
Tree
e2350d1

7 changed files

StatusFile+-
M deploy/doctl/README.md 16 0
M docs/internal/actions-ga-readiness.md 1 1
M docs/internal/actions-runner-api.md 10 3
M docs/internal/runbooks/actions-runner.md 9 6
M docs/internal/runbooks/actions.md 22 7
M docs/internal/runbooks/runner-deploy.md 7 3
M docs/public/user/actions.md 8 1
deploy/doctl/README.mdmodified
@@ -40,6 +40,22 @@ refuses `0.0.0.0/0` and `::/0` for SSH.
4040
 
4141
 Replace the generated token placeholders with per-host values from
4242
 `shithubd admin runner register`, preferably through ansible-vault or host_vars.
43
+Generate one token per runner host:
44
+
45
+```sh
46
+shithubd admin runner register \
47
+  --name actions-runner-1 \
48
+  --labels self-hosted,linux,ubuntu-latest,x64 \
49
+  --capacity 1 \
50
+  --output json
51
+```
52
+
53
+Store the returned `token` in inventory/vault, not in shell history. Rotate by
54
+registering a replacement token, deploying it to the host, confirming heartbeat,
55
+then revoking the old runner token.
56
+Use `--expires-in` only when that rotation is automated before the token
57
+expires.
58
+
4359
 Then run:
4460
 
4561
 ```sh
docs/internal/actions-ga-readiness.mdmodified
@@ -79,7 +79,7 @@ go test -trimpath ./internal/actions/... ./internal/auth/runnerjwt ./internal/ru
7979
 
8080
 For a production-like runner host, manually verify:
8181
 
82
-1. Register a new runner with `self-hosted,linux,ubuntu-latest`.
82
+1. Register a new runner with `self-hosted,linux,ubuntu-latest,x64`.
8383
 2. Trigger `.shithub/workflows/checkout-canary.yml` on trunk.
8484
 3. Confirm the run appears in the Actions tab and the check run completes.
8585
 4. Confirm step logs stream while the job is running and finalize to object
docs/internal/actions-runner-api.mdmodified
@@ -11,11 +11,18 @@ short-lived per-job JWTs.
1111
 Operators register a runner with:
1212
 
1313
 ```sh
14
-shithubd admin runner register --name runner-1 --labels self-hosted,linux,ubuntu-latest
14
+shithubd admin runner register \
15
+  --name runner-1 \
16
+  --labels self-hosted,linux,ubuntu-latest,x64 \
17
+  --capacity 1 \
18
+  --output json
1519
 ```
1620
 
1721
 The command inserts `workflow_runners`, stores only a SHA-256 hash in
18
-`runner_tokens`, and prints the 32-byte hex token once.
22
+`runner_tokens`, and returns the raw 32-byte hex token once.
23
+`--expires-in` is optional and should only be used when the deployment rotates
24
+the runner token before it expires, because the runner uses that same token for
25
+heartbeat authentication.
1926
 
2027
 `POST /api/v1/runners/heartbeat` accepts:
2128
 
@@ -70,7 +77,7 @@ runner API endpoints.
7077
 Request body:
7178
 
7279
 ```json
73
-{"labels":["ubuntu-latest","linux"],"capacity":1}
80
+{"labels":["self-hosted","linux","ubuntu-latest","x64"],"capacity":1}
7481
 ```
7582
 
7683
 Returns 204 when no matching job is claimable. Returns 200 with
docs/internal/runbooks/actions-runner.mdmodified
@@ -31,11 +31,12 @@ Register a runner:
3131
 ```sh
3232
 shithubd admin runner register \
3333
   --name runner-1 \
34
-  --labels self-hosted,linux,ubuntu-latest \
35
-  --capacity 1
34
+  --labels self-hosted,linux,ubuntu-latest,x64 \
35
+  --capacity 1 \
36
+  --output json
3637
 ```
3738
 
38
-Save the printed token:
39
+Save the returned token:
3940
 
4041
 ```sh
4142
 export RUNNER_TOKEN='<printed-token>'
@@ -48,7 +49,7 @@ Run the binary:
4849
 shithubd-runner run \
4950
   --server-url "$BASE" \
5051
   --token "$RUNNER_TOKEN" \
51
-  --labels self-hosted,linux,ubuntu-latest \
52
+  --labels self-hosted,linux,ubuntu-latest,x64 \
5253
   --workspace-root /var/lib/shithubd-runner/workspaces \
5354
   --network shithub-actions \
5455
   --dns-servers 172.30.0.1
@@ -62,7 +63,7 @@ base_url = "https://shithub.example"
6263
 
6364
 [runner]
6465
 token = "<printed-token>"
65
-labels = ["self-hosted", "linux", "ubuntu-latest"]
66
+labels = ["self-hosted", "linux", "ubuntu-latest", "x64"]
6667
 capacity = 1
6768
 poll_interval = "5s"
6869
 workspace_root = "/var/lib/shithubd-runner/workspaces"
@@ -93,6 +94,8 @@ dns_servers = ["172.30.0.1"]
9394
 The config path defaults to `/etc/shithubd-runner/config.toml`.
9495
 Environment variables use the `SHITHUB_RUNNER_` prefix, for example
9596
 `SHITHUB_RUNNER_TOKEN` or `SHITHUB_RUNNER_SERVER__BASE_URL`.
97
+Use `--expires-in` only for tokens that your automation rotates before expiry;
98
+the runner presents its registration token on every heartbeat.
9699
 
97100
 The Ansible runner role creates the `shithub-actions` bridge, runs the
98101
 allowlist resolver at `172.30.0.1`, and installs firewall rules that
@@ -108,7 +111,7 @@ Claim a job:
108111
 curl -fsS "$BASE/api/v1/runners/heartbeat" \
109112
   -H "Authorization: Bearer $RUNNER_TOKEN" \
110113
   -H "Content-Type: application/json" \
111
-  -d '{"labels":["self-hosted","linux","ubuntu-latest"],"capacity":1}' \
114
+  -d '{"labels":["self-hosted","linux","ubuntu-latest","x64"],"capacity":1}' \
112115
   | tee /tmp/shithub-claim.json
113116
 ```
114117
 
docs/internal/runbooks/actions.mdmodified
@@ -39,12 +39,17 @@ reserved aliases and fail until artifact transfer is wired end to end.
3939
 ```sh
4040
 shithubd admin runner register \
4141
   --name smoke-runner-1 \
42
-  --labels self-hosted,linux,ubuntu-latest \
43
-  --capacity 1
42
+  --labels self-hosted,linux,ubuntu-latest,x64 \
43
+  --capacity 1 \
44
+  --output json
4445
 ```
4546
 
46
-3. Start `shithubd-runner` with the printed token. For production hosts, use
47
-   the Ansible/systemd path in [runner-deploy.md](./runner-deploy.md).
47
+3. Start `shithubd-runner` with the returned token. For production hosts, use
48
+   one token per host and store it in ansible-vault, host vars, or the deployment
49
+   secret store. The role writes `/etc/shithubd-runner/config.toml` with
50
+   restrictive permissions. Use `--expires-in` only when the automation rotates
51
+   the runner token before that deadline; the current runner uses the
52
+   registration token for every heartbeat.
4853
 4. Push a `run:`-only workflow:
4954
 
5055
 ```yaml
@@ -155,6 +160,13 @@ On the app host, inspect runner registration and heartbeat state:
155160
 shithubd admin actions runner list
156161
 ```
157162
 
163
+Inspect queued jobs by requested `runs-on` label:
164
+
165
+```sh
166
+shithubd admin runner queue
167
+shithubd admin runner queue --output json
168
+```
169
+
158170
 Important metrics:
159171
 
160172
 - `shithub_actions_queue_depth{resource="runs|jobs"}`
@@ -198,7 +210,7 @@ Useful knobs:
198210
 
199211
 - `SHITHUB_ACTIONS_VUS=50` controls concurrent virtual users.
200212
 - `SHITHUB_ACTIONS_DURATION=10m` controls the steady-state window.
201
-- `SHITHUB_RUNNER_LABELS=self-hosted,linux,ubuntu-latest` sets heartbeat
213
+- `SHITHUB_RUNNER_LABELS=self-hosted,linux,ubuntu-latest,x64` sets heartbeat
202214
   labels.
203215
 - `SHITHUB_RUNNER_CAPACITY=17` keeps three runners near the 50-concurrent
204216
   target.
@@ -240,8 +252,11 @@ active container, and reports terminal `cancelled`.
240252
 - **Run never appears:** confirm the workflow file is under
241253
   `.shithub/workflows/`, parse it with `shithubd admin actions parse <file>`,
242254
   and verify the trigger event matches `on:`.
243
-- **Run stays queued:** confirm a runner is registered with matching labels and
244
-  capacity, then inspect runner journal output and heartbeat metrics.
255
+- **Run stays queued:** open the run page to see the requested runner labels,
256
+  then run `shithubd admin runner queue` and confirm a live runner is registered
257
+  with matching labels and capacity. Unsupported hosted labels such as
258
+  `windows-latest` and `macos-latest` intentionally remain queued until an
259
+  operator registers matching runners.
245260
 - **Step logs buffer:** verify the Caddy route above and confirm the SSE route
246261
   is still mounted outside compression and short timeouts.
247262
 - **`actions/checkout@v4` fails:** confirm the job is still running, the repo
docs/internal/runbooks/runner-deploy.mdmodified
@@ -97,12 +97,15 @@ Run this once from a host that can reach the production database config:
9797
 shithubd admin runner register \
9898
   --name prod-runner-1 \
9999
   --labels self-hosted,linux,ubuntu-latest,x64 \
100
-  --capacity 1
100
+  --capacity 1 \
101
+  --output json
101102
 ```
102103
 
103
-Store the printed token in ansible-vault or the deployment secret store.
104
+Store the returned `token` in ansible-vault or the deployment secret store.
104105
 Only the token hash is stored in Postgres; the raw token cannot be
105106
 recovered later.
107
+Use `--expires-in` only when the deployment rotates the token before expiry,
108
+because the runner presents the registration token on every heartbeat.
106109
 
107110
 For a generated DigitalOcean inventory, register one token per runner host and
108111
 use the host name as the runner name:
@@ -111,7 +114,8 @@ use the host name as the runner name:
111114
 shithubd admin runner register \
112115
   --name shithub-runner-shared-linux-1 \
113116
   --labels self-hosted,linux,ubuntu-latest,x64 \
114
-  --capacity 1
117
+  --capacity 1 \
118
+  --output json
115119
 ```
116120
 
117121
 ## Inventory
docs/public/user/actions.mdmodified
@@ -38,7 +38,12 @@ a check run on matching pull requests.
3838
 
3939
 `runs-on: ubuntu-latest` is a runner label, not a promise that shithub downloads
4040
 a hosted Ubuntu image for you. The site operator decides which image a matching
41
-runner uses. On shithub.sh, use the labels published by the instance operator.
41
+runner uses. On shithub.sh, the shared Linux pool advertises
42
+`self-hosted`, `linux`, `ubuntu-latest`, and `x64`.
43
+
44
+If a run stays queued, the run page shows the requested label set, for example
45
+`Waiting for runner with labels: windows-latest`. That means no currently
46
+registered runner can claim the job.
4247
 
4348
 ## Current limit
4449
 
@@ -99,6 +104,8 @@ Most simple CI files need three edits:
99104
 2. Keep `actions/checkout@v4`, but replace marketplace and artifact `uses:`
100105
    actions with equivalent `run:` commands for now.
101106
 3. Confirm `runs-on:` matches a label registered by your shithub operator.
107
+   The default shithub.sh shared label for ordinary Linux CI is
108
+   `ubuntu-latest`.
102109
 
103110
 Marketplace actions, Docker actions, composite actions, hosted runner images,
104111
 matrix expansion, service containers, submodules, LFS, and artifact transfer