tenseleyflow/shithub / 1945c79

Browse files

docs/actions: document runner token handoff and label diagnostics (S41j)

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
1945c79588965a214a1b7af80984dc333e4b8999
Parents
fdc8b27
Tree
e2350d1

7 changed files

StatusFile+-
M deploy/doctl/README.md 16 0
M docs/internal/actions-ga-readiness.md 1 1
M docs/internal/actions-runner-api.md 10 3
M docs/internal/runbooks/actions-runner.md 9 6
M docs/internal/runbooks/actions.md 22 7
M docs/internal/runbooks/runner-deploy.md 7 3
M docs/public/user/actions.md 8 1
deploy/doctl/README.mdmodified
@@ -40,6 +40,22 @@ refuses `0.0.0.0/0` and `::/0` for SSH.
40
 
40
 
41
 Replace the generated token placeholders with per-host values from
41
 Replace the generated token placeholders with per-host values from
42
 `shithubd admin runner register`, preferably through ansible-vault or host_vars.
42
 `shithubd admin runner register`, preferably through ansible-vault or host_vars.
43
+Generate one token per runner host:
44
+
45
+```sh
46
+shithubd admin runner register \
47
+  --name actions-runner-1 \
48
+  --labels self-hosted,linux,ubuntu-latest,x64 \
49
+  --capacity 1 \
50
+  --output json
51
+```
52
+
53
+Store the returned `token` in inventory/vault, not in shell history. Rotate by
54
+registering a replacement token, deploying it to the host, confirming heartbeat,
55
+then revoking the old runner token.
56
+Use `--expires-in` only when that rotation is automated before the token
57
+expires.
58
+
43
 Then run:
59
 Then run:
44
 
60
 
45
 ```sh
61
 ```sh
docs/internal/actions-ga-readiness.mdmodified
@@ -79,7 +79,7 @@ go test -trimpath ./internal/actions/... ./internal/auth/runnerjwt ./internal/ru
79
 
79
 
80
 For a production-like runner host, manually verify:
80
 For a production-like runner host, manually verify:
81
 
81
 
82
-1. Register a new runner with `self-hosted,linux,ubuntu-latest`.
82
+1. Register a new runner with `self-hosted,linux,ubuntu-latest,x64`.
83
 2. Trigger `.shithub/workflows/checkout-canary.yml` on trunk.
83
 2. Trigger `.shithub/workflows/checkout-canary.yml` on trunk.
84
 3. Confirm the run appears in the Actions tab and the check run completes.
84
 3. Confirm the run appears in the Actions tab and the check run completes.
85
 4. Confirm step logs stream while the job is running and finalize to object
85
 4. Confirm step logs stream while the job is running and finalize to object
docs/internal/actions-runner-api.mdmodified
@@ -11,11 +11,18 @@ short-lived per-job JWTs.
11
 Operators register a runner with:
11
 Operators register a runner with:
12
 
12
 
13
 ```sh
13
 ```sh
14
-shithubd admin runner register --name runner-1 --labels self-hosted,linux,ubuntu-latest
14
+shithubd admin runner register \
15
+  --name runner-1 \
16
+  --labels self-hosted,linux,ubuntu-latest,x64 \
17
+  --capacity 1 \
18
+  --output json
15
 ```
19
 ```
16
 
20
 
17
 The command inserts `workflow_runners`, stores only a SHA-256 hash in
21
 The command inserts `workflow_runners`, stores only a SHA-256 hash in
18
-`runner_tokens`, and prints the 32-byte hex token once.
22
+`runner_tokens`, and returns the raw 32-byte hex token once.
23
+`--expires-in` is optional and should only be used when the deployment rotates
24
+the runner token before it expires, because the runner uses that same token for
25
+heartbeat authentication.
19
 
26
 
20
 `POST /api/v1/runners/heartbeat` accepts:
27
 `POST /api/v1/runners/heartbeat` accepts:
21
 
28
 
@@ -70,7 +77,7 @@ runner API endpoints.
70
 Request body:
77
 Request body:
71
 
78
 
72
 ```json
79
 ```json
73
-{"labels":["ubuntu-latest","linux"],"capacity":1}
80
+{"labels":["self-hosted","linux","ubuntu-latest","x64"],"capacity":1}
74
 ```
81
 ```
75
 
82
 
76
 Returns 204 when no matching job is claimable. Returns 200 with
83
 Returns 204 when no matching job is claimable. Returns 200 with
docs/internal/runbooks/actions-runner.mdmodified
@@ -31,11 +31,12 @@ Register a runner:
31
 ```sh
31
 ```sh
32
 shithubd admin runner register \
32
 shithubd admin runner register \
33
   --name runner-1 \
33
   --name runner-1 \
34
-  --labels self-hosted,linux,ubuntu-latest \
34
+  --labels self-hosted,linux,ubuntu-latest,x64 \
35
-  --capacity 1
35
+  --capacity 1 \
36
+  --output json
36
 ```
37
 ```
37
 
38
 
38
-Save the printed token:
39
+Save the returned token:
39
 
40
 
40
 ```sh
41
 ```sh
41
 export RUNNER_TOKEN='<printed-token>'
42
 export RUNNER_TOKEN='<printed-token>'
@@ -48,7 +49,7 @@ Run the binary:
48
 shithubd-runner run \
49
 shithubd-runner run \
49
   --server-url "$BASE" \
50
   --server-url "$BASE" \
50
   --token "$RUNNER_TOKEN" \
51
   --token "$RUNNER_TOKEN" \
51
-  --labels self-hosted,linux,ubuntu-latest \
52
+  --labels self-hosted,linux,ubuntu-latest,x64 \
52
   --workspace-root /var/lib/shithubd-runner/workspaces \
53
   --workspace-root /var/lib/shithubd-runner/workspaces \
53
   --network shithub-actions \
54
   --network shithub-actions \
54
   --dns-servers 172.30.0.1
55
   --dns-servers 172.30.0.1
@@ -62,7 +63,7 @@ base_url = "https://shithub.example"
62
 
63
 
63
 [runner]
64
 [runner]
64
 token = "<printed-token>"
65
 token = "<printed-token>"
65
-labels = ["self-hosted", "linux", "ubuntu-latest"]
66
+labels = ["self-hosted", "linux", "ubuntu-latest", "x64"]
66
 capacity = 1
67
 capacity = 1
67
 poll_interval = "5s"
68
 poll_interval = "5s"
68
 workspace_root = "/var/lib/shithubd-runner/workspaces"
69
 workspace_root = "/var/lib/shithubd-runner/workspaces"
@@ -93,6 +94,8 @@ dns_servers = ["172.30.0.1"]
93
 The config path defaults to `/etc/shithubd-runner/config.toml`.
94
 The config path defaults to `/etc/shithubd-runner/config.toml`.
94
 Environment variables use the `SHITHUB_RUNNER_` prefix, for example
95
 Environment variables use the `SHITHUB_RUNNER_` prefix, for example
95
 `SHITHUB_RUNNER_TOKEN` or `SHITHUB_RUNNER_SERVER__BASE_URL`.
96
 `SHITHUB_RUNNER_TOKEN` or `SHITHUB_RUNNER_SERVER__BASE_URL`.
97
+Use `--expires-in` only for tokens that your automation rotates before expiry;
98
+the runner presents its registration token on every heartbeat.
96
 
99
 
97
 The Ansible runner role creates the `shithub-actions` bridge, runs the
100
 The Ansible runner role creates the `shithub-actions` bridge, runs the
98
 allowlist resolver at `172.30.0.1`, and installs firewall rules that
101
 allowlist resolver at `172.30.0.1`, and installs firewall rules that
@@ -108,7 +111,7 @@ Claim a job:
108
 curl -fsS "$BASE/api/v1/runners/heartbeat" \
111
 curl -fsS "$BASE/api/v1/runners/heartbeat" \
109
   -H "Authorization: Bearer $RUNNER_TOKEN" \
112
   -H "Authorization: Bearer $RUNNER_TOKEN" \
110
   -H "Content-Type: application/json" \
113
   -H "Content-Type: application/json" \
111
-  -d '{"labels":["self-hosted","linux","ubuntu-latest"],"capacity":1}' \
114
+  -d '{"labels":["self-hosted","linux","ubuntu-latest","x64"],"capacity":1}' \
112
   | tee /tmp/shithub-claim.json
115
   | tee /tmp/shithub-claim.json
113
 ```
116
 ```
114
 
117
 
docs/internal/runbooks/actions.mdmodified
@@ -39,12 +39,17 @@ reserved aliases and fail until artifact transfer is wired end to end.
39
 ```sh
39
 ```sh
40
 shithubd admin runner register \
40
 shithubd admin runner register \
41
   --name smoke-runner-1 \
41
   --name smoke-runner-1 \
42
-  --labels self-hosted,linux,ubuntu-latest \
42
+  --labels self-hosted,linux,ubuntu-latest,x64 \
43
-  --capacity 1
43
+  --capacity 1 \
44
+  --output json
44
 ```
45
 ```
45
 
46
 
46
-3. Start `shithubd-runner` with the printed token. For production hosts, use
47
+3. Start `shithubd-runner` with the returned token. For production hosts, use
47
-   the Ansible/systemd path in [runner-deploy.md](./runner-deploy.md).
48
+   one token per host and store it in ansible-vault, host vars, or the deployment
49
+   secret store. The role writes `/etc/shithubd-runner/config.toml` with
50
+   restrictive permissions. Use `--expires-in` only when the automation rotates
51
+   the runner token before that deadline; the current runner uses the
52
+   registration token for every heartbeat.
48
 4. Push a `run:`-only workflow:
53
 4. Push a `run:`-only workflow:
49
 
54
 
50
 ```yaml
55
 ```yaml
@@ -155,6 +160,13 @@ On the app host, inspect runner registration and heartbeat state:
155
 shithubd admin actions runner list
160
 shithubd admin actions runner list
156
 ```
161
 ```
157
 
162
 
163
+Inspect queued jobs by requested `runs-on` label:
164
+
165
+```sh
166
+shithubd admin runner queue
167
+shithubd admin runner queue --output json
168
+```
169
+
158
 Important metrics:
170
 Important metrics:
159
 
171
 
160
 - `shithub_actions_queue_depth{resource="runs|jobs"}`
172
 - `shithub_actions_queue_depth{resource="runs|jobs"}`
@@ -198,7 +210,7 @@ Useful knobs:
198
 
210
 
199
 - `SHITHUB_ACTIONS_VUS=50` controls concurrent virtual users.
211
 - `SHITHUB_ACTIONS_VUS=50` controls concurrent virtual users.
200
 - `SHITHUB_ACTIONS_DURATION=10m` controls the steady-state window.
212
 - `SHITHUB_ACTIONS_DURATION=10m` controls the steady-state window.
201
-- `SHITHUB_RUNNER_LABELS=self-hosted,linux,ubuntu-latest` sets heartbeat
213
+- `SHITHUB_RUNNER_LABELS=self-hosted,linux,ubuntu-latest,x64` sets heartbeat
202
   labels.
214
   labels.
203
 - `SHITHUB_RUNNER_CAPACITY=17` keeps three runners near the 50-concurrent
215
 - `SHITHUB_RUNNER_CAPACITY=17` keeps three runners near the 50-concurrent
204
   target.
216
   target.
@@ -240,8 +252,11 @@ active container, and reports terminal `cancelled`.
240
 - **Run never appears:** confirm the workflow file is under
252
 - **Run never appears:** confirm the workflow file is under
241
   `.shithub/workflows/`, parse it with `shithubd admin actions parse <file>`,
253
   `.shithub/workflows/`, parse it with `shithubd admin actions parse <file>`,
242
   and verify the trigger event matches `on:`.
254
   and verify the trigger event matches `on:`.
243
-- **Run stays queued:** confirm a runner is registered with matching labels and
255
+- **Run stays queued:** open the run page to see the requested runner labels,
244
-  capacity, then inspect runner journal output and heartbeat metrics.
256
+  then run `shithubd admin runner queue` and confirm a live runner is registered
257
+  with matching labels and capacity. Unsupported hosted labels such as
258
+  `windows-latest` and `macos-latest` intentionally remain queued until an
259
+  operator registers matching runners.
245
 - **Step logs buffer:** verify the Caddy route above and confirm the SSE route
260
 - **Step logs buffer:** verify the Caddy route above and confirm the SSE route
246
   is still mounted outside compression and short timeouts.
261
   is still mounted outside compression and short timeouts.
247
 - **`actions/checkout@v4` fails:** confirm the job is still running, the repo
262
 - **`actions/checkout@v4` fails:** confirm the job is still running, the repo
docs/internal/runbooks/runner-deploy.mdmodified
@@ -97,12 +97,15 @@ Run this once from a host that can reach the production database config:
97
 shithubd admin runner register \
97
 shithubd admin runner register \
98
   --name prod-runner-1 \
98
   --name prod-runner-1 \
99
   --labels self-hosted,linux,ubuntu-latest,x64 \
99
   --labels self-hosted,linux,ubuntu-latest,x64 \
100
-  --capacity 1
100
+  --capacity 1 \
101
+  --output json
101
 ```
102
 ```
102
 
103
 
103
-Store the printed token in ansible-vault or the deployment secret store.
104
+Store the returned `token` in ansible-vault or the deployment secret store.
104
 Only the token hash is stored in Postgres; the raw token cannot be
105
 Only the token hash is stored in Postgres; the raw token cannot be
105
 recovered later.
106
 recovered later.
107
+Use `--expires-in` only when the deployment rotates the token before expiry,
108
+because the runner presents the registration token on every heartbeat.
106
 
109
 
107
 For a generated DigitalOcean inventory, register one token per runner host and
110
 For a generated DigitalOcean inventory, register one token per runner host and
108
 use the host name as the runner name:
111
 use the host name as the runner name:
@@ -111,7 +114,8 @@ use the host name as the runner name:
111
 shithubd admin runner register \
114
 shithubd admin runner register \
112
   --name shithub-runner-shared-linux-1 \
115
   --name shithub-runner-shared-linux-1 \
113
   --labels self-hosted,linux,ubuntu-latest,x64 \
116
   --labels self-hosted,linux,ubuntu-latest,x64 \
114
-  --capacity 1
117
+  --capacity 1 \
118
+  --output json
115
 ```
119
 ```
116
 
120
 
117
 ## Inventory
121
 ## Inventory
docs/public/user/actions.mdmodified
@@ -38,7 +38,12 @@ a check run on matching pull requests.
38
 
38
 
39
 `runs-on: ubuntu-latest` is a runner label, not a promise that shithub downloads
39
 `runs-on: ubuntu-latest` is a runner label, not a promise that shithub downloads
40
 a hosted Ubuntu image for you. The site operator decides which image a matching
40
 a hosted Ubuntu image for you. The site operator decides which image a matching
41
-runner uses. On shithub.sh, use the labels published by the instance operator.
41
+runner uses. On shithub.sh, the shared Linux pool advertises
42
+`self-hosted`, `linux`, `ubuntu-latest`, and `x64`.
43
+
44
+If a run stays queued, the run page shows the requested label set, for example
45
+`Waiting for runner with labels: windows-latest`. That means no currently
46
+registered runner can claim the job.
42
 
47
 
43
 ## Current limit
48
 ## Current limit
44
 
49
 
@@ -99,6 +104,8 @@ Most simple CI files need three edits:
99
 2. Keep `actions/checkout@v4`, but replace marketplace and artifact `uses:`
104
 2. Keep `actions/checkout@v4`, but replace marketplace and artifact `uses:`
100
    actions with equivalent `run:` commands for now.
105
    actions with equivalent `run:` commands for now.
101
 3. Confirm `runs-on:` matches a label registered by your shithub operator.
106
 3. Confirm `runs-on:` matches a label registered by your shithub operator.
107
+   The default shithub.sh shared label for ordinary Linux CI is
108
+   `ubuntu-latest`.
102
 
109
 
103
 Marketplace actions, Docker actions, composite actions, hosted runner images,
110
 Marketplace actions, Docker actions, composite actions, hosted runner images,
104
 matrix expansion, service containers, submodules, LFS, and artifact transfer
111
 matrix expansion, service containers, submodules, LFS, and artifact transfer