markdown · 12608 bytes Raw Blame History

Actions runner smoke runbook

This runbook validates the runner-facing Actions path. shithubd-runner now claims jobs, performs scoped actions/checkout@v4, and executes containerized run: steps through Docker or Podman. The curl flow below remains useful for token/replay debugging.

For host provisioning and the systemd/Ansible path, see runner-deploy.md.

Prereqs:

  • Database migrations are current through 0067_runner_pool_ops.sql.
  • SHITHUB_TOTP_KEY or auth.totp_key_b64 is set on the web process.
  • Object storage is configured if testing artifact upload.
  • Docker or Podman is installed on the runner host.
  • A repo has a workflow under .shithub/workflows/*.yml with runs-on: ubuntu-latest, and a push/dispatch has enqueued a run. Checkout and run: steps are executable; artifact aliases remain reserved until artifact transfer lands.

runs-on is a runner-label selector, not a hard-coded image name. A workflow that says runs-on: ubuntu-latest can be claimed by any runner advertising the ubuntu-latest label. The container image is selected by the runner host's engine.default_image setting; the reproducible Nix-built image is the default, but operators can point it at another OCI image when they need closer Ubuntu parity.

Register a runner:

shithubd admin runner register \
  --name runner-1 \
  --labels self-hosted,linux,ubuntu-latest,x64 \
  --capacity 1 \
  --output json

Save the returned token:

export RUNNER_TOKEN='<printed-token>'
export BASE='https://shithub.example'

Run the binary:

shithubd-runner run \
  --server-url "$BASE" \
  --token "$RUNNER_TOKEN" \
  --labels self-hosted,linux,ubuntu-latest,x64 \
  --workspace-root /var/lib/shithubd-runner/workspaces \
  --network shithub-actions \
  --dns-servers 172.30.0.1

Pool Operations

List pool state:

shithubd admin runner list --output json
shithubd admin runner queue --output json

list includes labels, capacity, active job count, last heartbeat, host name, runner version, drain state, and revoke state. queue groups queued jobs by requested runs-on label so unsupported labels are visible without querying Postgres.

Drain a runner before host maintenance:

shithubd admin runner drain --id 7 --reason 'kernel update'

The runner keeps heartbeating and can finish already claimed jobs, but new heartbeat claims return 204 until it is undrained:

shithubd admin runner undrain --id 7

Rotate a registration token after a config-management change:

shithubd admin runner rotate-token --id 7 --expires-in 24h --output json

Update the runner host config with the printed token, restart shithubd-runner, then verify heartbeats with runner list.

Mark stale runners offline:

shithubd admin runner cleanup-stale --older-than 2m

Hard-revoke a compromised runner:

shithubd admin runner revoke --id 7 --reason 'host compromise'

Revocation records revoked_at, marks the runner offline, revokes every registration token for that runner, and causes existing job API JWTs from that runner to fail. Use drain for routine maintenance; use revoke when the token or host may be compromised.

Destroy/recreate with DigitalOcean:

doctl compute droplet list --tag-name shithub-actions-runner
doctl compute droplet delete <droplet-id>

After recreation, register a fresh runner token and let the provisioning role write the new config. Do not reuse a token from a revoked runner.

Emergency controls:

  • Stop all new claims: disable Actions at site level or drain every runner with shithubd admin runner list --output json followed by shithubd admin runner drain --id <id>.
  • Pause one repo/org: set the repo/org Actions policy to disabled so the claim query leaves its queued jobs untouched.
  • Cancel active work for a repo: use the repo Actions UI or job cancel API for each queued/running job.
  • Revoke all runner tokens: iterate shithubd admin runner revoke --id <id> over every non-revoked runner.
  • Fence the pool: block or destroy droplets tagged shithub-actions-runner in DigitalOcean, then rotate or revoke the affected runner tokens before allowing replacement hosts to connect.

Equivalent config file:

[server]
base_url = "https://shithub.example"

[runner]
token = "<printed-token>"
labels = ["self-hosted", "linux", "ubuntu-latest", "x64"]
capacity = 1
poll_interval = "5s"
workspace_root = "/var/lib/shithubd-runner/workspaces"
workspace_ttl = "24h"
network_allowlist = [
  "api.github.com",
  "auth.docker.io",
  "codeload.github.com",
  "github.com",
  "objects.githubusercontent.com",
  "production.cloudflare.docker.com",
  "registry-1.docker.io",
  "*.githubusercontent.com",
]

[engine]
kind = "docker"
default_image = "ghcr.io/shithub/runner-nix:1.0"
network = "shithub-actions"
memory = "2g"
cpus = "2"
seccomp_profile = "/etc/shithubd-runner/seccomp.json"
user = "65534:65534"
pids_limit = 512
dns_servers = ["172.30.0.1"]

The config path defaults to /etc/shithubd-runner/config.toml. Environment variables use the SHITHUB_RUNNER_ prefix, for example SHITHUB_RUNNER_TOKEN or SHITHUB_RUNNER_SERVER__BASE_URL. Use --expires-in only for tokens that your automation rotates before expiry; the runner presents its registration token on every heartbeat.

Arbitrary Repository Smoke

Use this checklist after provisioning a shared runner pool or changing runner labels. The purpose is to prove that ordinary repositories can use the pool without repo-specific labels.

Pick at least two repositories:

  • mfwolffe/scratch, the historical dogfood repo;
  • one additional public repository;
  • one private repository if one is available for the operator account.

In each repository, commit this file as .shithub/workflows/smoke.yml on trunk:

name: Smoke
on:
  push:
    branches: [trunk]
jobs:
  green:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Verify checkout
        run: test -f README.md || test -f readme.md || pwd
      - name: Smoke
        run: printf 'shithub actions smoke passed\n'

Expected results for each repo:

  • the repo heading shows a green check after the push;
  • the Actions run page shows Triggered via push on refs/heads/trunk;
  • the green job is completed with conclusion success;
  • the checkout step logs the scoped repository URL for that repo;
  • the smoke step log contains shithub actions smoke passed;
  • the check run attached to the commit agrees with the workflow run state;
  • the downloaded or raw archived step log matches the in-page step log.

Confirm the pool is shared:

shithubd admin runner list --output json
shithubd admin runner queue --output json

The same online runner labels should satisfy both repositories. The smoke workflow must use runs-on: ubuntu-latest; do not add per-repo labels for this test.

Unsupported-label negative test:

name: Unsupported runner label
on:
  workflow_dispatch:
jobs:
  nope:
    runs-on: windows-latest
    steps:
      - run: echo should-not-run

Trigger it manually. The run should stay queued and the run page should say Waiting for runner with labels: windows-latest. shithubd admin runner queue --output json should show one queued windows-latest job with zero matching runners. Cancel the run after confirming the diagnostic.

Untrusted-PR secret negative test:

  1. Add a repo secret named S41J_SECRET_SMOKE with any non-production value.
  2. Open an untrusted pull request whose workflow prints whether that secret is present.
  3. Before approval, confirm the claimed job contains no injected secrets and logs do not contain the secret value.
  4. Only after explicit approval should the run be allowed through the trusted secret path.

Runner-outage negative test:

  1. Drain every shared runner with shithubd admin runner drain --id <id>.
  2. Push the smoke workflow to a test repo.
  3. Confirm the run stays queued with the requested ubuntu-latest label visible.
  4. Undrain one runner and confirm the same queued job is claimed and completes.

This smoke is considered passing only when scratch and the second repository both complete from the same shared label set and the negative cases produce clear queued/secret-denied behavior.

The Ansible runner role creates the shithub-actions bridge, runs the allowlist resolver at 172.30.0.1, and installs firewall rules that reject direct-IP egress from step containers. If you run the binary without the role, provision equivalent network controls before pointing workflows at the runner.

Curl token smoke

Claim a job:

curl -fsS "$BASE/api/v1/runners/heartbeat" \
  -H "Authorization: Bearer $RUNNER_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"labels":["self-hosted","linux","ubuntu-latest","x64"],"capacity":1,"host_name":"curl-smoke","version":"manual"}' \
  | tee /tmp/shithub-claim.json

Extract the job token and id:

export JOB_ID="$(jq -r '.job.id' /tmp/shithub-claim.json)"
export JOB_TOKEN="$(jq -r '.token' /tmp/shithub-claim.json)"

Append a log chunk:

curl -fsS "$BASE/api/v1/jobs/$JOB_ID/logs" \
  -H "Authorization: Bearer $JOB_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{\"seq\":0,\"chunk\":\"$(printf 'hello from curl\n' | base64)\"}" \
  | tee /tmp/shithub-log.json

export JOB_TOKEN="$(jq -r '.next_token' /tmp/shithub-log.json)"

Complete the job:

curl -fsS "$BASE/api/v1/jobs/$JOB_ID/status" \
  -H "Authorization: Bearer $JOB_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"status":"completed","conclusion":"success"}'

Cancel smoke: on a separate queued or running job, a repo-write PAT can request cancellation:

curl -fsS "$BASE/api/v1/jobs/$JOB_ID/cancel" \
  -H "Authorization: Bearer $PAT_WITH_REPO_WRITE" \
  -X POST

If the job is still queued, it becomes cancelled immediately. If it is running, the next runner cancel check returns {"cancelled":true}, the runner kills the active container, and the terminal job status becomes cancelled.

Re-run smoke: after a completed or cancelled workflow run, a repo-write PAT can enqueue a new run from the original workflow file at the original commit:

curl -fsS "$BASE/api/v1/runs/$RUN_ID/rerun" \
  -H "Authorization: Bearer $PAT_WITH_REPO_WRITE" \
  -X POST

Expected response includes a new run_id, the new run_index, and parent_run_id equal to the source run. Confirm the new row has the same head_sha as the source run.

Timeout smoke: create a workflow with a short deadline and a long-running step:

jobs:
  timeout_probe:
    runs-on: ubuntu-latest
    timeout-minutes: 1
    steps:
      - run: sleep 600

Expected results:

  • The runner kills the active container shortly after the one-minute deadline.
  • The step row becomes status=completed, conclusion=timed_out.
  • The job row becomes status=completed, conclusion=timed_out.
  • /metrics increments shithub_actions_step_timeouts_total.

Replay check: reusing the log token after the log call must fail with 401 because its jti is already present in runner_jwt_used.

curl -i "$BASE/api/v1/jobs/$JOB_ID/status" \
  -H "Authorization: Bearer $(jq -r '.next_token' /tmp/shithub-log.json)" \
  -H "Content-Type: application/json" \
  -d '{"status":"running"}'

Expected results:

  • workflow_jobs.status = completed and conclusion success.
  • The parent workflow_runs row rolls up to completed/success when all jobs are terminal.
  • The PR Checks tab shows the matching check run as success.
  • /metrics includes runner registration, heartbeat, JWT, job cancellation, log-scrub, step-timeout, retention, queue-depth by label, claim-latency, runner-online, runner-stale, runner-draining, and runner-revocation metrics.

Retention Sweep

The daily housekeeping timer enqueues workflow:cleanup at 03:30 UTC, after the 03:17 backup window:

systemctl list-timers shithubd-cron.timer
journalctl -u shithubd-cron.service -n 100

Manual smoke:

shithubd admin run-job workflow:cleanup

Expected behavior:

  • SQL log chunks older than 7 days for terminal steps are deleted.
  • Expired artifact rows are deleted only after their actions/runs/... objects are deleted from object storage.
  • Unpinned terminal workflow runs older than 365 days are pruned; pinned runs survive.
  • Consumed runner JWT rows older than 30 days are pruned.
  • /metrics exposes shithub_actions_runs_pruned_total{kind="chunks|blobs|runs|jwt_used"}.
View source
1 # Actions runner smoke runbook
2
3 This runbook validates the runner-facing Actions path. `shithubd-runner`
4 now claims jobs, performs scoped `actions/checkout@v4`, and executes
5 containerized `run:` steps through Docker or Podman. The curl flow below
6 remains useful for token/replay debugging.
7
8 For host provisioning and the systemd/Ansible path, see
9 [runner-deploy.md](./runner-deploy.md).
10
11 Prereqs:
12
13 - Database migrations are current through `0067_runner_pool_ops.sql`.
14 - `SHITHUB_TOTP_KEY` or `auth.totp_key_b64` is set on the web process.
15 - Object storage is configured if testing artifact upload.
16 - Docker or Podman is installed on the runner host.
17 - A repo has a workflow under `.shithub/workflows/*.yml` with
18 `runs-on: ubuntu-latest`, and a push/dispatch has enqueued a run.
19 Checkout and `run:` steps are executable; artifact aliases remain
20 reserved until artifact transfer lands.
21
22 `runs-on` is a runner-label selector, not a hard-coded image name.
23 A workflow that says `runs-on: ubuntu-latest` can be claimed by any
24 runner advertising the `ubuntu-latest` label. The container image is
25 selected by the runner host's `engine.default_image` setting; the
26 reproducible Nix-built image is the default, but operators can point it
27 at another OCI image when they need closer Ubuntu parity.
28
29 Register a runner:
30
31 ```sh
32 shithubd admin runner register \
33 --name runner-1 \
34 --labels self-hosted,linux,ubuntu-latest,x64 \
35 --capacity 1 \
36 --output json
37 ```
38
39 Save the returned token:
40
41 ```sh
42 export RUNNER_TOKEN='<printed-token>'
43 export BASE='https://shithub.example'
44 ```
45
46 Run the binary:
47
48 ```sh
49 shithubd-runner run \
50 --server-url "$BASE" \
51 --token "$RUNNER_TOKEN" \
52 --labels self-hosted,linux,ubuntu-latest,x64 \
53 --workspace-root /var/lib/shithubd-runner/workspaces \
54 --network shithub-actions \
55 --dns-servers 172.30.0.1
56 ```
57
58 ## Pool Operations
59
60 List pool state:
61
62 ```sh
63 shithubd admin runner list --output json
64 shithubd admin runner queue --output json
65 ```
66
67 `list` includes labels, capacity, active job count, last heartbeat,
68 host name, runner version, drain state, and revoke state. `queue`
69 groups queued jobs by requested `runs-on` label so unsupported labels
70 are visible without querying Postgres.
71
72 Drain a runner before host maintenance:
73
74 ```sh
75 shithubd admin runner drain --id 7 --reason 'kernel update'
76 ```
77
78 The runner keeps heartbeating and can finish already claimed jobs, but
79 new heartbeat claims return 204 until it is undrained:
80
81 ```sh
82 shithubd admin runner undrain --id 7
83 ```
84
85 Rotate a registration token after a config-management change:
86
87 ```sh
88 shithubd admin runner rotate-token --id 7 --expires-in 24h --output json
89 ```
90
91 Update the runner host config with the printed token, restart
92 `shithubd-runner`, then verify heartbeats with `runner list`.
93
94 Mark stale runners offline:
95
96 ```sh
97 shithubd admin runner cleanup-stale --older-than 2m
98 ```
99
100 Hard-revoke a compromised runner:
101
102 ```sh
103 shithubd admin runner revoke --id 7 --reason 'host compromise'
104 ```
105
106 Revocation records `revoked_at`, marks the runner offline, revokes every
107 registration token for that runner, and causes existing job API JWTs from
108 that runner to fail. Use drain for routine maintenance; use revoke when
109 the token or host may be compromised.
110
111 Destroy/recreate with DigitalOcean:
112
113 ```sh
114 doctl compute droplet list --tag-name shithub-actions-runner
115 doctl compute droplet delete <droplet-id>
116 ```
117
118 After recreation, register a fresh runner token and let the provisioning
119 role write the new config. Do not reuse a token from a revoked runner.
120
121 Emergency controls:
122
123 - Stop all new claims: disable Actions at site level or drain every
124 runner with `shithubd admin runner list --output json` followed by
125 `shithubd admin runner drain --id <id>`.
126 - Pause one repo/org: set the repo/org Actions policy to disabled so the
127 claim query leaves its queued jobs untouched.
128 - Cancel active work for a repo: use the repo Actions UI or job cancel
129 API for each queued/running job.
130 - Revoke all runner tokens: iterate `shithubd admin runner revoke --id
131 <id>` over every non-revoked runner.
132 - Fence the pool: block or destroy droplets tagged
133 `shithub-actions-runner` in DigitalOcean, then rotate or revoke the
134 affected runner tokens before allowing replacement hosts to connect.
135
136 Equivalent config file:
137
138 ```toml
139 [server]
140 base_url = "https://shithub.example"
141
142 [runner]
143 token = "<printed-token>"
144 labels = ["self-hosted", "linux", "ubuntu-latest", "x64"]
145 capacity = 1
146 poll_interval = "5s"
147 workspace_root = "/var/lib/shithubd-runner/workspaces"
148 workspace_ttl = "24h"
149 network_allowlist = [
150 "api.github.com",
151 "auth.docker.io",
152 "codeload.github.com",
153 "github.com",
154 "objects.githubusercontent.com",
155 "production.cloudflare.docker.com",
156 "registry-1.docker.io",
157 "*.githubusercontent.com",
158 ]
159
160 [engine]
161 kind = "docker"
162 default_image = "ghcr.io/shithub/runner-nix:1.0"
163 network = "shithub-actions"
164 memory = "2g"
165 cpus = "2"
166 seccomp_profile = "/etc/shithubd-runner/seccomp.json"
167 user = "65534:65534"
168 pids_limit = 512
169 dns_servers = ["172.30.0.1"]
170 ```
171
172 The config path defaults to `/etc/shithubd-runner/config.toml`.
173 Environment variables use the `SHITHUB_RUNNER_` prefix, for example
174 `SHITHUB_RUNNER_TOKEN` or `SHITHUB_RUNNER_SERVER__BASE_URL`.
175 Use `--expires-in` only for tokens that your automation rotates before expiry;
176 the runner presents its registration token on every heartbeat.
177
178 ## Arbitrary Repository Smoke
179
180 Use this checklist after provisioning a shared runner pool or changing runner
181 labels. The purpose is to prove that ordinary repositories can use the pool
182 without repo-specific labels.
183
184 Pick at least two repositories:
185
186 - `mfwolffe/scratch`, the historical dogfood repo;
187 - one additional public repository;
188 - one private repository if one is available for the operator account.
189
190 In each repository, commit this file as `.shithub/workflows/smoke.yml` on
191 `trunk`:
192
193 ```yaml
194 name: Smoke
195 on:
196 push:
197 branches: [trunk]
198 jobs:
199 green:
200 runs-on: ubuntu-latest
201 steps:
202 - uses: actions/checkout@v4
203 - name: Verify checkout
204 run: test -f README.md || test -f readme.md || pwd
205 - name: Smoke
206 run: printf 'shithub actions smoke passed\n'
207 ```
208
209 Expected results for each repo:
210
211 - the repo heading shows a green check after the push;
212 - the Actions run page shows `Triggered via push` on `refs/heads/trunk`;
213 - the `green` job is completed with conclusion `success`;
214 - the checkout step logs the scoped repository URL for that repo;
215 - the smoke step log contains `shithub actions smoke passed`;
216 - the check run attached to the commit agrees with the workflow run state;
217 - the downloaded or raw archived step log matches the in-page step log.
218
219 Confirm the pool is shared:
220
221 ```sh
222 shithubd admin runner list --output json
223 shithubd admin runner queue --output json
224 ```
225
226 The same online runner labels should satisfy both repositories. The smoke
227 workflow must use `runs-on: ubuntu-latest`; do not add per-repo labels for this
228 test.
229
230 Unsupported-label negative test:
231
232 ```yaml
233 name: Unsupported runner label
234 on:
235 workflow_dispatch:
236 jobs:
237 nope:
238 runs-on: windows-latest
239 steps:
240 - run: echo should-not-run
241 ```
242
243 Trigger it manually. The run should stay queued and the run page should say
244 `Waiting for runner with labels: windows-latest`. `shithubd admin runner queue
245 --output json` should show one queued `windows-latest` job with zero matching
246 runners. Cancel the run after confirming the diagnostic.
247
248 Untrusted-PR secret negative test:
249
250 1. Add a repo secret named `S41J_SECRET_SMOKE` with any non-production value.
251 2. Open an untrusted pull request whose workflow prints whether that secret is
252 present.
253 3. Before approval, confirm the claimed job contains no injected secrets and
254 logs do not contain the secret value.
255 4. Only after explicit approval should the run be allowed through the trusted
256 secret path.
257
258 Runner-outage negative test:
259
260 1. Drain every shared runner with `shithubd admin runner drain --id <id>`.
261 2. Push the smoke workflow to a test repo.
262 3. Confirm the run stays queued with the requested `ubuntu-latest` label visible.
263 4. Undrain one runner and confirm the same queued job is claimed and completes.
264
265 This smoke is considered passing only when scratch and the second repository
266 both complete from the same shared label set and the negative cases produce
267 clear queued/secret-denied behavior.
268
269 The Ansible runner role creates the `shithub-actions` bridge, runs the
270 allowlist resolver at `172.30.0.1`, and installs firewall rules that
271 reject direct-IP egress from step containers. If you run the binary
272 without the role, provision equivalent network controls before pointing
273 workflows at the runner.
274
275 ## Curl token smoke
276
277 Claim a job:
278
279 ```sh
280 curl -fsS "$BASE/api/v1/runners/heartbeat" \
281 -H "Authorization: Bearer $RUNNER_TOKEN" \
282 -H "Content-Type: application/json" \
283 -d '{"labels":["self-hosted","linux","ubuntu-latest","x64"],"capacity":1,"host_name":"curl-smoke","version":"manual"}' \
284 | tee /tmp/shithub-claim.json
285 ```
286
287 Extract the job token and id:
288
289 ```sh
290 export JOB_ID="$(jq -r '.job.id' /tmp/shithub-claim.json)"
291 export JOB_TOKEN="$(jq -r '.token' /tmp/shithub-claim.json)"
292 ```
293
294 Append a log chunk:
295
296 ```sh
297 curl -fsS "$BASE/api/v1/jobs/$JOB_ID/logs" \
298 -H "Authorization: Bearer $JOB_TOKEN" \
299 -H "Content-Type: application/json" \
300 -d "{\"seq\":0,\"chunk\":\"$(printf 'hello from curl\n' | base64)\"}" \
301 | tee /tmp/shithub-log.json
302
303 export JOB_TOKEN="$(jq -r '.next_token' /tmp/shithub-log.json)"
304 ```
305
306 Complete the job:
307
308 ```sh
309 curl -fsS "$BASE/api/v1/jobs/$JOB_ID/status" \
310 -H "Authorization: Bearer $JOB_TOKEN" \
311 -H "Content-Type: application/json" \
312 -d '{"status":"completed","conclusion":"success"}'
313 ```
314
315 Cancel smoke: on a separate queued or running job, a repo-write PAT can
316 request cancellation:
317
318 ```sh
319 curl -fsS "$BASE/api/v1/jobs/$JOB_ID/cancel" \
320 -H "Authorization: Bearer $PAT_WITH_REPO_WRITE" \
321 -X POST
322 ```
323
324 If the job is still queued, it becomes `cancelled` immediately. If it is
325 running, the next runner cancel check returns `{"cancelled":true}`, the
326 runner kills the active container, and the terminal job status becomes
327 `cancelled`.
328
329 Re-run smoke: after a completed or cancelled workflow run, a repo-write
330 PAT can enqueue a new run from the original workflow file at the
331 original commit:
332
333 ```sh
334 curl -fsS "$BASE/api/v1/runs/$RUN_ID/rerun" \
335 -H "Authorization: Bearer $PAT_WITH_REPO_WRITE" \
336 -X POST
337 ```
338
339 Expected response includes a new `run_id`, the new `run_index`, and
340 `parent_run_id` equal to the source run. Confirm the new row has the
341 same `head_sha` as the source run.
342
343 Timeout smoke: create a workflow with a short deadline and a long-running
344 step:
345
346 ```yaml
347 jobs:
348 timeout_probe:
349 runs-on: ubuntu-latest
350 timeout-minutes: 1
351 steps:
352 - run: sleep 600
353 ```
354
355 Expected results:
356
357 - The runner kills the active container shortly after the one-minute
358 deadline.
359 - The step row becomes `status=completed`,
360 `conclusion=timed_out`.
361 - The job row becomes `status=completed`, `conclusion=timed_out`.
362 - `/metrics` increments `shithub_actions_step_timeouts_total`.
363
364 Replay check: reusing the log token after the log call must fail with
365 401 because its `jti` is already present in `runner_jwt_used`.
366
367 ```sh
368 curl -i "$BASE/api/v1/jobs/$JOB_ID/status" \
369 -H "Authorization: Bearer $(jq -r '.next_token' /tmp/shithub-log.json)" \
370 -H "Content-Type: application/json" \
371 -d '{"status":"running"}'
372 ```
373
374 Expected results:
375
376 - `workflow_jobs.status = completed` and conclusion `success`.
377 - The parent `workflow_runs` row rolls up to completed/success when all
378 jobs are terminal.
379 - The PR Checks tab shows the matching check run as success.
380 - `/metrics` includes runner registration, heartbeat, JWT, job
381 cancellation, log-scrub, step-timeout, retention, queue-depth by label,
382 claim-latency, runner-online, runner-stale, runner-draining, and
383 runner-revocation metrics.
384
385 ## Retention Sweep
386
387 The daily housekeeping timer enqueues `workflow:cleanup` at 03:30 UTC,
388 after the 03:17 backup window:
389
390 ```sh
391 systemctl list-timers shithubd-cron.timer
392 journalctl -u shithubd-cron.service -n 100
393 ```
394
395 Manual smoke:
396
397 ```sh
398 shithubd admin run-job workflow:cleanup
399 ```
400
401 Expected behavior:
402
403 - SQL log chunks older than 7 days for terminal steps are deleted.
404 - Expired artifact rows are deleted only after their `actions/runs/...`
405 objects are deleted from object storage.
406 - Unpinned terminal workflow runs older than 365 days are pruned;
407 pinned runs survive.
408 - Consumed runner JWT rows older than 30 days are pruned.
409 - `/metrics` exposes
410 `shithub_actions_runs_pruned_total{kind="chunks|blobs|runs|jwt_used"}`.