markdown · 8934 bytes Raw Blame History

Actions runner API

The runner-facing HTTP surface lives in internal/web/handlers/api/runners.go. It is mounted under /api/v1 in the CSRF-exempt API group, but it does not use PAT auth. Runners authenticate first with a long-lived registration token and then with short-lived per-job JWTs.

Auth model

Operators register a runner with:

shithubd admin runner register --name runner-1 --labels self-hosted,linux,ubuntu-latest

The command inserts workflow_runners, stores only a SHA-256 hash in runner_tokens, and prints the 32-byte hex token once.

POST /api/v1/runners/heartbeat accepts:

Authorization: Bearer <registration-token>

When a queued job matches the runner labels and capacity is available, the response includes a job payload and a 15-minute job JWT. That JWT has claims:

{"sub":"runner:<id>","purpose":"api","job_id":1,"run_id":1,"repo_id":1,"exp":0,"jti":"..."}

The signing key is derived from auth.totp_key_b64 with HKDF label actions-runner-jwt-v1; the raw TOTP/secretbox key is not used directly for JWT signing.

API-purpose job JWTs are single-use. Every job endpoint verifies the signature and expiry, checks that the path job belongs to the claimed runner/run, and then inserts jti into runner_jwt_used. A replay returns 401. To support multi-step runner flows, successful in-flight job endpoints return next_token and next_token_expires_at.

Consumed JWT rows are retained for 30 days after token expiry, then pruned by the daily workflow:cleanup worker. This keeps the replay gate audit trail available for recent jobs without letting the table grow unbounded.

shithubd-runner consumes the same token chain: it claims with the registration token, marks the job running with the first API-purpose job JWT, then uses each returned next_token serially for log chunks, step-status updates, cancel checks, artifact upload requests, and finally the terminal job-status update. Reusing any consumed API-purpose job JWT is a replay and must fail with 401.

The heartbeat claim also returns job.checkout_url and job.checkout_token for actions/checkout@v4. The checkout token is a separate JWT with purpose:"checkout" and the same runner/job/run/repo scope. It is intentionally reusable while the job is running, because Git smart HTTP performs multiple Basic-authenticated requests during one checkout. The git HTTP handler accepts it only for git-upload-pack, only for the claimed repository, and only while the database still shows that the claimed runner is running the job. It is never accepted for pushes or runner API endpoints.

Endpoints

POST /api/v1/runners/heartbeat

Request body:

{"labels":["ubuntu-latest","linux"],"capacity":1}

Returns 204 when no matching job is claimable. Returns 200 with token, expires_at, and job when a job is claimed. Capacity is enforced server-side by counting current workflow_jobs.status = 'running' rows for the runner while holding a row lock on the runner. The job payload includes checkout_url, checkout_token, resolved secrets, and mask_values; repo secrets shadow org secrets with the same name. The server also stores an encrypted claim-time copy of the mask values on workflow_job_secret_masks so later log uploads are scrubbed against the secrets that were actually handed to the runner, even if an operator rotates or deletes a secret mid-job.

POST /api/v1/jobs/{id}/logs

Auth: job JWT. Body:

{"seq":0,"chunk":"aGVsbG8K","step_id":123}

step_id is optional for the S41c curl smoke path; when omitted the first step in the job receives the chunk. Chunks are base64-decoded, capped at 512 KiB raw, and appended to workflow_step_log_chunks. Duplicate (step_id, seq) inserts are accepted as idempotent retries. Before append, the API re-scrubs exact secret values from the job's claim-time mask snapshot. It also reprocesses any possible secret prefix carried at the end of the prior chunk, so a runner cannot leak a secret by splitting it across two log calls.

POST /api/v1/jobs/{id}/steps/{step_id}/status

Auth: job JWT. Body:

{"status":"completed","conclusion":"success"}

Valid transitions are queued|running -> running|completed|cancelled|skipped with idempotent repeats of the target terminal state. Completed and skipped steps require a valid check conclusion; cancelled defaults to cancelled when omitted. The endpoint always returns a next_token because a completed step is not the end of the job.

When object storage is configured, terminal step updates enqueue workflow:finalize_step. The worker concatenates workflow_step_log_chunks in sequence order, uploads the log to actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log, stores that key and byte count on workflow_steps, then deletes the SQL chunks.

The repository Actions UI reads logs from the same two-stage storage model. While chunks remain in SQL, a step log page concatenates them in sequence order and renders a static snapshot. After finalization, the page reads workflow_steps.log_object_key from object storage and offers a short-lived signed download URL. Live tailing is intentionally separate and lands in the S41f SSE slice.

POST /api/v1/jobs/{id}/status

Auth: job JWT. Body:

{"status":"completed","conclusion":"success"}

Valid transitions are queued|running -> running|completed|cancelled. Completed jobs require a valid check conclusion. The handler updates workflow_jobs, rolls up workflow_runs, and best-effort updates the matching check_runs row created by the trigger pipeline.

timeout-minutes is enforced by shithubd-runner as a whole-job deadline. When it expires, the runner kills the active container, reports the current step as completed/timed_out, and reports the job as completed/timed_out. The server treats that conclusion as terminal failure for the workflow run rollup.

When a runner reports status:"cancelled", any still-open steps in the job are marked cancelled too. This keeps a killed job from leaving queued step rows that the UI would otherwise treat as live.

Runner execution supports host-side actions/checkout@v4 followed by containerized run: steps with per-step log streaming and server-side log finalization. Artifact upload/download aliases remain reserved until the artifact transfer path lands.

POST /api/v1/jobs/{id}/artifacts/upload

Auth: job JWT. Body:

{"name":"test-results.tgz","size_bytes":12345}

Creates a workflow_artifacts row and returns a pre-signed S3 PUT URL. The object key is actions/runs/<run_id>/artifacts/<name>.

POST /api/v1/jobs/{id}/cancel

Auth: PAT with repo:write, and the actor must have write permission on the repository that owns the job's workflow run. Browser UI forms use CSRF-protected repo routes that call the same lifecycle orchestrator.

Queued jobs are made terminal immediately:

  • workflow_jobs.status = cancelled
  • workflow_jobs.conclusion = cancelled
  • workflow_jobs.cancel_requested = true
  • open steps for that job are marked cancelled

Running jobs keep status = running and get cancel_requested = true. The runner sees this through cancel-check, kills the active container, then reports terminal cancelled.

POST /api/v1/runs/{id}/rerun

Auth: PAT with repo:write, and the actor must have write permission on the repository that owns the workflow run. Browser UI forms use CSRF-protected repo routes for the same operation.

Only terminal workflow runs are rerunnable. A re-run reads the original workflow file from the source run's head_sha, not from the current branch tip, then enqueues a new workflow_runs row with:

  • the same repo_id, workflow_file, head_sha, head_ref, event, and event payload
  • actor_user_id set to the user requesting the re-run
  • parent_run_id set to the source run
  • a fresh trigger_event_id in the rerun:<source_run_id>:<random> namespace

POST /api/v1/jobs/{id}/cancel-check

Auth: job JWT. Returns:

{"cancelled":false,"next_token":"..."}

The boolean mirrors workflow_jobs.cancel_requested. shithubd-runner polls this endpoint during job execution, serializing it through the same single-use JWT chain as logs and status updates. On cancelled: true, the Docker engine runs docker kill <active-container> and the runner posts terminal job status cancelled.

Metrics

  • shithub_actions_runner_registrations_total
  • shithub_actions_runner_heartbeats_total{result="claimed|no_job"}
  • shithub_actions_runner_jwt_total{result="issued|rejected|replay"}
  • shithub_actions_jobs_cancelled_total{reason="user|concurrency|timeout"}
  • shithub_actions_concurrency_queued_total
  • shithub_actions_log_scrub_replacements_total{location="server"}
  • shithub_actions_runs_pruned_total{kind="chunks|blobs|runs|jwt_used"}
  • shithub_actions_step_timeouts_total
View source
1 # Actions runner API
2
3 The runner-facing HTTP surface lives in
4 `internal/web/handlers/api/runners.go`. It is mounted under `/api/v1`
5 in the CSRF-exempt API group, but it does not use PAT auth. Runners
6 authenticate first with a long-lived registration token and then with
7 short-lived per-job JWTs.
8
9 ## Auth model
10
11 Operators register a runner with:
12
13 ```sh
14 shithubd admin runner register --name runner-1 --labels self-hosted,linux,ubuntu-latest
15 ```
16
17 The command inserts `workflow_runners`, stores only a SHA-256 hash in
18 `runner_tokens`, and prints the 32-byte hex token once.
19
20 `POST /api/v1/runners/heartbeat` accepts:
21
22 ```http
23 Authorization: Bearer <registration-token>
24 ```
25
26 When a queued job matches the runner labels and capacity is available,
27 the response includes a job payload and a 15-minute job JWT. That JWT
28 has claims:
29
30 ```json
31 {"sub":"runner:<id>","purpose":"api","job_id":1,"run_id":1,"repo_id":1,"exp":0,"jti":"..."}
32 ```
33
34 The signing key is derived from `auth.totp_key_b64` with HKDF label
35 `actions-runner-jwt-v1`; the raw TOTP/secretbox key is not used
36 directly for JWT signing.
37
38 API-purpose job JWTs are single-use. Every job endpoint verifies the
39 signature and expiry, checks that the path job belongs to the claimed
40 runner/run, and then inserts `jti` into `runner_jwt_used`. A replay
41 returns 401. To support multi-step runner flows, successful in-flight job
42 endpoints return `next_token` and `next_token_expires_at`.
43
44 Consumed JWT rows are retained for 30 days after token expiry, then
45 pruned by the daily `workflow:cleanup` worker. This keeps the replay
46 gate audit trail available for recent jobs without letting the table
47 grow unbounded.
48
49 `shithubd-runner` consumes the same token chain: it claims with the
50 registration token, marks the job `running` with the first API-purpose job
51 JWT, then uses each returned `next_token` serially for log chunks,
52 step-status updates, cancel checks, artifact upload requests, and finally
53 the terminal job-status update. Reusing any consumed API-purpose job JWT
54 is a replay and must fail with 401.
55
56 The heartbeat claim also returns `job.checkout_url` and
57 `job.checkout_token` for `actions/checkout@v4`. The checkout token is a
58 separate JWT with `purpose:"checkout"` and the same runner/job/run/repo
59 scope. It is intentionally reusable while the job is `running`, because
60 Git smart HTTP performs multiple Basic-authenticated requests during one
61 checkout. The git HTTP handler accepts it only for `git-upload-pack`, only
62 for the claimed repository, and only while the database still shows that
63 the claimed runner is running the job. It is never accepted for pushes or
64 runner API endpoints.
65
66 ## Endpoints
67
68 `POST /api/v1/runners/heartbeat`
69
70 Request body:
71
72 ```json
73 {"labels":["ubuntu-latest","linux"],"capacity":1}
74 ```
75
76 Returns 204 when no matching job is claimable. Returns 200 with
77 `token`, `expires_at`, and `job` when a job is claimed. Capacity is
78 enforced server-side by counting current `workflow_jobs.status =
79 'running'` rows for the runner while holding a row lock on the runner.
80 The job payload includes `checkout_url`, `checkout_token`, resolved
81 `secrets`, and `mask_values`; repo secrets shadow org secrets with the
82 same name. The server also stores an encrypted claim-time copy of the mask
83 values on `workflow_job_secret_masks` so later log uploads are scrubbed
84 against the secrets that were actually handed to the runner, even if an
85 operator rotates or deletes a secret mid-job.
86
87 `POST /api/v1/jobs/{id}/logs`
88
89 Auth: job JWT. Body:
90
91 ```json
92 {"seq":0,"chunk":"aGVsbG8K","step_id":123}
93 ```
94
95 `step_id` is optional for the S41c curl smoke path; when omitted the
96 first step in the job receives the chunk. Chunks are base64-decoded,
97 capped at 512 KiB raw, and appended to `workflow_step_log_chunks`.
98 Duplicate `(step_id, seq)` inserts are accepted as idempotent retries.
99 Before append, the API re-scrubs exact secret values from the job's
100 claim-time mask snapshot. It also reprocesses any possible secret prefix
101 carried at the end of the prior chunk, so a runner cannot leak a secret
102 by splitting it across two log calls.
103
104 `POST /api/v1/jobs/{id}/steps/{step_id}/status`
105
106 Auth: job JWT. Body:
107
108 ```json
109 {"status":"completed","conclusion":"success"}
110 ```
111
112 Valid transitions are `queued|running -> running|completed|cancelled|skipped`
113 with idempotent repeats of the target terminal state. Completed and
114 skipped steps require a valid check conclusion; cancelled defaults to
115 `cancelled` when omitted. The endpoint always returns a `next_token`
116 because a completed step is not the end of the job.
117
118 When object storage is configured, terminal step updates enqueue
119 `workflow:finalize_step`. The worker concatenates
120 `workflow_step_log_chunks` in sequence order, uploads the log to
121 `actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`, stores that
122 key and byte count on `workflow_steps`, then deletes the SQL chunks.
123
124 The repository Actions UI reads logs from the same two-stage storage
125 model. While chunks remain in SQL, a step log page concatenates them in
126 sequence order and renders a static snapshot. After finalization, the
127 page reads `workflow_steps.log_object_key` from object storage and
128 offers a short-lived signed download URL. Live tailing is intentionally
129 separate and lands in the S41f SSE slice.
130
131 `POST /api/v1/jobs/{id}/status`
132
133 Auth: job JWT. Body:
134
135 ```json
136 {"status":"completed","conclusion":"success"}
137 ```
138
139 Valid transitions are `queued|running -> running|completed|cancelled`.
140 Completed jobs require a valid check conclusion. The handler updates
141 `workflow_jobs`, rolls up `workflow_runs`, and best-effort updates the
142 matching `check_runs` row created by the trigger pipeline.
143
144 `timeout-minutes` is enforced by `shithubd-runner` as a whole-job
145 deadline. When it expires, the runner kills the active container,
146 reports the current step as `completed/timed_out`, and reports the job
147 as `completed/timed_out`. The server treats that conclusion as terminal
148 failure for the workflow run rollup.
149
150 When a runner reports `status:"cancelled"`, any still-open steps in the
151 job are marked cancelled too. This keeps a killed job from leaving queued
152 step rows that the UI would otherwise treat as live.
153
154 Runner execution supports host-side `actions/checkout@v4` followed by
155 containerized `run:` steps with per-step log streaming and server-side log
156 finalization. Artifact upload/download aliases remain reserved until the
157 artifact transfer path lands.
158
159 `POST /api/v1/jobs/{id}/artifacts/upload`
160
161 Auth: job JWT. Body:
162
163 ```json
164 {"name":"test-results.tgz","size_bytes":12345}
165 ```
166
167 Creates a `workflow_artifacts` row and returns a pre-signed S3 PUT URL.
168 The object key is `actions/runs/<run_id>/artifacts/<name>`.
169
170 `POST /api/v1/jobs/{id}/cancel`
171
172 Auth: PAT with `repo:write`, and the actor must have write permission on
173 the repository that owns the job's workflow run. Browser UI forms use
174 CSRF-protected repo routes that call the same lifecycle orchestrator.
175
176 Queued jobs are made terminal immediately:
177
178 - `workflow_jobs.status = cancelled`
179 - `workflow_jobs.conclusion = cancelled`
180 - `workflow_jobs.cancel_requested = true`
181 - open steps for that job are marked cancelled
182
183 Running jobs keep `status = running` and get
184 `cancel_requested = true`. The runner sees this through
185 `cancel-check`, kills the active container, then reports terminal
186 `cancelled`.
187
188 `POST /api/v1/runs/{id}/rerun`
189
190 Auth: PAT with `repo:write`, and the actor must have write permission on
191 the repository that owns the workflow run. Browser UI forms use
192 CSRF-protected repo routes for the same operation.
193
194 Only terminal workflow runs are rerunnable. A re-run reads the original
195 workflow file from the source run's `head_sha`, not from the current
196 branch tip, then enqueues a new `workflow_runs` row with:
197
198 - the same `repo_id`, `workflow_file`, `head_sha`, `head_ref`, event,
199 and event payload
200 - `actor_user_id` set to the user requesting the re-run
201 - `parent_run_id` set to the source run
202 - a fresh `trigger_event_id` in the `rerun:<source_run_id>:<random>`
203 namespace
204
205 `POST /api/v1/jobs/{id}/cancel-check`
206
207 Auth: job JWT. Returns:
208
209 ```json
210 {"cancelled":false,"next_token":"..."}
211 ```
212
213 The boolean mirrors `workflow_jobs.cancel_requested`. `shithubd-runner`
214 polls this endpoint during job execution, serializing it through the
215 same single-use JWT chain as logs and status updates. On `cancelled:
216 true`, the Docker engine runs `docker kill <active-container>` and the
217 runner posts terminal job status `cancelled`.
218
219 ## Metrics
220
221 - `shithub_actions_runner_registrations_total`
222 - `shithub_actions_runner_heartbeats_total{result="claimed|no_job"}`
223 - `shithub_actions_runner_jwt_total{result="issued|rejected|replay"}`
224 - `shithub_actions_jobs_cancelled_total{reason="user|concurrency|timeout"}`
225 - `shithub_actions_concurrency_queued_total`
226 - `shithub_actions_log_scrub_replacements_total{location="server"}`
227 - `shithub_actions_runs_pruned_total{kind="chunks|blobs|runs|jwt_used"}`
228 - `shithub_actions_step_timeouts_total`