tenseleyflow/shithub / 59d7d7a

Browse files

docs: record actions timeout behavior

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
59d7d7aae491f8625ee6a76429cdea8e1521f6ca
Parents
f23d1f1
Tree
8faf4a2

3 changed files

StatusFile+-
M docs/internal/actions-runner-api.md 7 0
M docs/internal/actions-schema.md 18 0
M docs/internal/runbooks/actions-runner.md 22 1
docs/internal/actions-runner-api.mdmodified
@@ -131,6 +131,12 @@ Completed jobs require a valid check conclusion. The handler updates
131131
 `workflow_jobs`, rolls up `workflow_runs`, and best-effort updates the
132132
 matching `check_runs` row created by the trigger pipeline.
133133
 
134
+`timeout-minutes` is enforced by `shithubd-runner` as a whole-job
135
+deadline. When it expires, the runner kills the active container,
136
+reports the current step as `completed/timed_out`, and reports the job
137
+as `completed/timed_out`. The server treats that conclusion as terminal
138
+failure for the workflow run rollup.
139
+
134140
 When a runner reports `status:"cancelled"`, any still-open steps in the
135141
 job are marked cancelled too. This keeps a killed job from leaving queued
136142
 step rows that the UI would otherwise treat as live.
@@ -209,3 +215,4 @@ runner posts terminal job status `cancelled`.
209215
 - `shithub_actions_jobs_cancelled_total{reason="user|concurrency|timeout"}`
210216
 - `shithub_actions_log_scrub_replacements_total{location="server"}`
211217
 - `shithub_actions_runs_pruned_total{kind="chunks|blobs|runs|jwt_used"}`
218
+- `shithub_actions_step_timeouts_total`
docs/internal/actions-schema.mdmodified
@@ -382,6 +382,24 @@ Other admin surfaces are scoped to later sub-sprints:
382382
   `shithubd-cron.service`. Operators can run it manually with
383383
   `shithubd admin run-job workflow:cleanup`.
384384
 
385
+## Runner timeouts (S41g)
386
+
387
+`jobs.<key>.timeout-minutes` is enforced by `shithubd-runner` as a
388
+whole-job deadline. The parser stores the value in
389
+`workflow_jobs.timeout_minutes` with the GitHub-compatible default of
390
+360 minutes and a 1..4320 cap.
391
+
392
+When the deadline expires, the Docker engine explicitly kills the
393
+active step container, emits a terminal step update with
394
+`status=completed` and `conclusion=timed_out`, and the runner reports
395
+the job itself as `completed/timed_out`. The server rolls the parent
396
+workflow run up to `timed_out` when all jobs are terminal. A timed-out
397
+step is not masked by `continue-on-error`; the job deadline always wins.
398
+
399
+The runner API increments `shithub_actions_step_timeouts_total` the
400
+first time a step reaches `conclusion=timed_out`. Duplicate terminal
401
+step-status retries do not increment the counter again.
402
+
385403
 ## Retention cleanup (S41g)
386404
 
387405
 `workflow:cleanup` applies the durable Actions retention contract in
docs/internal/runbooks/actions-runner.mdmodified
@@ -167,6 +167,27 @@ Expected response includes a new `run_id`, the new `run_index`, and
167167
 `parent_run_id` equal to the source run. Confirm the new row has the
168168
 same `head_sha` as the source run.
169169
 
170
+Timeout smoke: create a workflow with a short deadline and a long-running
171
+step:
172
+
173
+```yaml
174
+jobs:
175
+  timeout_probe:
176
+    runs-on: ubuntu-latest
177
+    timeout-minutes: 1
178
+    steps:
179
+      - run: sleep 600
180
+```
181
+
182
+Expected results:
183
+
184
+- The runner kills the active container shortly after the one-minute
185
+  deadline.
186
+- The step row becomes `status=completed`,
187
+  `conclusion=timed_out`.
188
+- The job row becomes `status=completed`, `conclusion=timed_out`.
189
+- `/metrics` increments `shithub_actions_step_timeouts_total`.
190
+
170191
 Replay check: reusing the log token after the log call must fail with
171192
 401 because its `jti` is already present in `runner_jwt_used`.
172193
 
@@ -184,7 +205,7 @@ Expected results:
184205
   jobs are terminal.
185206
 - The PR Checks tab shows the matching check run as success.
186207
 - `/metrics` includes runner registration, heartbeat, JWT, job
187
-  cancellation, log-scrub, and retention counters.
208
+  cancellation, log-scrub, step-timeout, and retention counters.
188209
 
189210
 ## Retention Sweep
190211