tenseleyflow/shithub / 097ff8f

Browse files

docs/actions: document cancellation flow

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
097ff8f63862ff30d7ef53dcea0cebaeeb938811
Parents
7da9698
Tree
bcca71b

3 changed files

StatusFile+-
M docs/internal/actions-runner-api.md 28 2
M docs/internal/actions-schema.md 3 2
M docs/internal/runbooks/actions-runner.md 16 1
docs/internal/actions-runner-api.mdmodified
@@ -126,6 +126,10 @@ Completed jobs require a valid check conclusion. The handler updates
126126
 `workflow_jobs`, rolls up `workflow_runs`, and best-effort updates the
127127
 matching `check_runs` row created by the trigger pipeline.
128128
 
129
+When a runner reports `status:"cancelled"`, any still-open steps in the
130
+job are marked cancelled too. This keeps a killed job from leaving queued
131
+step rows that the UI would otherwise treat as live.
132
+
129133
 S41d PR2 runner execution supports containerized `run:` steps with
130134
 per-step log streaming and server-side log finalization. `uses:` aliases
131135
 such as `actions/checkout@v4` and artifact upload/download remain
@@ -143,6 +147,24 @@ Auth: job JWT. Body:
143147
 Creates a `workflow_artifacts` row and returns a pre-signed S3 PUT URL.
144148
 The object key is `actions/runs/<run_id>/artifacts/<name>`.
145149
 
150
+`POST /api/v1/jobs/{id}/cancel`
151
+
152
+Auth: PAT with `repo:write`, and the actor must have write permission on
153
+the repository that owns the job's workflow run. Browser UI forms use
154
+CSRF-protected repo routes that call the same lifecycle orchestrator.
155
+
156
+Queued jobs are made terminal immediately:
157
+
158
+- `workflow_jobs.status = cancelled`
159
+- `workflow_jobs.conclusion = cancelled`
160
+- `workflow_jobs.cancel_requested = true`
161
+- open steps for that job are marked cancelled
162
+
163
+Running jobs keep `status = running` and get
164
+`cancel_requested = true`. The runner sees this through
165
+`cancel-check`, kills the active container, then reports terminal
166
+`cancelled`.
167
+
146168
 `POST /api/v1/jobs/{id}/cancel-check`
147169
 
148170
 Auth: job JWT. Returns:
@@ -151,12 +173,16 @@ Auth: job JWT. Returns:
151173
 {"cancelled":false,"next_token":"..."}
152174
 ```
153175
 
154
-The boolean mirrors `workflow_jobs.cancel_requested`; the actual cancel
155
-request UI lands later in S41g.
176
+The boolean mirrors `workflow_jobs.cancel_requested`. `shithubd-runner`
177
+polls this endpoint during job execution, serializing it through the
178
+same single-use JWT chain as logs and status updates. On `cancelled:
179
+true`, the Docker engine runs `docker kill <active-container>` and the
180
+runner posts terminal job status `cancelled`.
156181
 
157182
 ## Metrics
158183
 
159184
 - `shithub_actions_runner_registrations_total`
160185
 - `shithub_actions_runner_heartbeats_total{result="claimed|no_job"}`
161186
 - `shithub_actions_runner_jwt_total{result="issued|rejected|replay"}`
187
+- `shithub_actions_jobs_cancelled_total{reason="user|concurrency|timeout"}`
162188
 - `shithub_actions_log_scrub_replacements_total{location="server"}`
docs/internal/actions-schema.mdmodified
@@ -369,8 +369,9 @@ Other admin surfaces are scoped to later sub-sprints:
369369
 
370370
 - S41c: `shithubd admin runner register --name <foo>` issues a
371371
   registration token + writes a row to `workflow_runners`.
372
-- S41g: `shithubd admin actions cancel <run-id>` flips
373
-  `cancel_requested`.
372
+- S41g: `POST /api/v1/jobs/{id}/cancel` and the repository run-detail
373
+  UI request cancellation. Running jobs flip `cancel_requested`; queued
374
+  jobs are made terminal immediately.
374375
 
375376
 ## Trigger pipeline (S41b)
376377
 
docs/internal/runbooks/actions-runner.mdmodified
@@ -139,6 +139,20 @@ curl -fsS "$BASE/api/v1/jobs/$JOB_ID/status" \
139139
   -d '{"status":"completed","conclusion":"success"}'
140140
 ```
141141
 
142
+Cancel smoke: on a separate queued or running job, a repo-write PAT can
143
+request cancellation:
144
+
145
+```sh
146
+curl -fsS "$BASE/api/v1/jobs/$JOB_ID/cancel" \
147
+  -H "Authorization: Bearer $PAT_WITH_REPO_WRITE" \
148
+  -X POST
149
+```
150
+
151
+If the job is still queued, it becomes `cancelled` immediately. If it is
152
+running, the next runner cancel check returns `{"cancelled":true}`, the
153
+runner kills the active container, and the terminal job status becomes
154
+`cancelled`.
155
+
142156
 Replay check: reusing the log token after the log call must fail with
143157
 401 because its `jti` is already present in `runner_jwt_used`.
144158
 
@@ -155,4 +169,5 @@ Expected results:
155169
 - The parent `workflow_runs` row rolls up to completed/success when all
156170
   jobs are terminal.
157171
 - The PR Checks tab shows the matching check run as success.
158
-- `/metrics` includes runner registration, heartbeat, and JWT counters.
172
+- `/metrics` includes runner registration, heartbeat, JWT, and job
173
+  cancellation counters.