markdown · 7902 bytes Raw Blame History

Actions runner API

The runner-facing HTTP surface lives in internal/web/handlers/api/runners.go. It is mounted under /api/v1 in the CSRF-exempt API group, but it does not use PAT auth. Runners authenticate first with a long-lived registration token and then with short-lived per-job JWTs.

Auth model

Operators register a runner with:

shithubd admin runner register --name runner-1 --labels self-hosted,linux,ubuntu-latest

The command inserts workflow_runners, stores only a SHA-256 hash in runner_tokens, and prints the 32-byte hex token once.

POST /api/v1/runners/heartbeat accepts:

Authorization: Bearer <registration-token>

When a queued job matches the runner labels and capacity is available, the response includes a job payload and a 15-minute job JWT. That JWT has claims:

{"sub":"runner:<id>","job_id":1,"run_id":1,"repo_id":1,"exp":0,"jti":"..."}

The signing key is derived from auth.totp_key_b64 with HKDF label actions-runner-jwt-v1; the raw TOTP/secretbox key is not used directly for JWT signing.

Job JWTs are single-use. Every job endpoint verifies the signature and expiry, checks that the path job belongs to the claimed runner/run, and then inserts jti into runner_jwt_used. A replay returns 401. To support multi-step runner flows, successful in-flight job endpoints return next_token and next_token_expires_at.

Consumed JWT rows are retained for 30 days after token expiry, then pruned by the daily workflow:cleanup worker. This keeps the replay gate audit trail available for recent jobs without letting the table grow unbounded.

shithubd-runner consumes the same token chain: it claims with the registration token, marks the job running with the first job JWT, then uses each returned next_token serially for log chunks, step-status updates, cancel checks, artifact upload requests, and finally the terminal job-status update. Reusing any consumed job JWT is a replay and must fail with 401.

Endpoints

POST /api/v1/runners/heartbeat

Request body:

{"labels":["ubuntu-latest","linux"],"capacity":1}

Returns 204 when no matching job is claimable. Returns 200 with token, expires_at, and job when a job is claimed. Capacity is enforced server-side by counting current workflow_jobs.status = 'running' rows for the runner while holding a row lock on the runner. The job payload includes resolved secrets and mask_values; repo secrets shadow org secrets with the same name. The server also stores an encrypted claim-time copy of the mask values on workflow_job_secret_masks so later log uploads are scrubbed against the secrets that were actually handed to the runner, even if an operator rotates or deletes a secret mid-job.

POST /api/v1/jobs/{id}/logs

Auth: job JWT. Body:

{"seq":0,"chunk":"aGVsbG8K","step_id":123}

step_id is optional for the S41c curl smoke path; when omitted the first step in the job receives the chunk. Chunks are base64-decoded, capped at 512 KiB raw, and appended to workflow_step_log_chunks. Duplicate (step_id, seq) inserts are accepted as idempotent retries. Before append, the API re-scrubs exact secret values from the job's claim-time mask snapshot. It also reprocesses any possible secret prefix carried at the end of the prior chunk, so a runner cannot leak a secret by splitting it across two log calls.

POST /api/v1/jobs/{id}/steps/{step_id}/status

Auth: job JWT. Body:

{"status":"completed","conclusion":"success"}

Valid transitions are queued|running -> running|completed|cancelled|skipped with idempotent repeats of the target terminal state. Completed and skipped steps require a valid check conclusion; cancelled defaults to cancelled when omitted. The endpoint always returns a next_token because a completed step is not the end of the job.

When object storage is configured, terminal step updates enqueue workflow:finalize_step. The worker concatenates workflow_step_log_chunks in sequence order, uploads the log to actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log, stores that key and byte count on workflow_steps, then deletes the SQL chunks.

The repository Actions UI reads logs from the same two-stage storage model. While chunks remain in SQL, a step log page concatenates them in sequence order and renders a static snapshot. After finalization, the page reads workflow_steps.log_object_key from object storage and offers a short-lived signed download URL. Live tailing is intentionally separate and lands in the S41f SSE slice.

POST /api/v1/jobs/{id}/status

Auth: job JWT. Body:

{"status":"completed","conclusion":"success"}

Valid transitions are queued|running -> running|completed|cancelled. Completed jobs require a valid check conclusion. The handler updates workflow_jobs, rolls up workflow_runs, and best-effort updates the matching check_runs row created by the trigger pipeline.

When a runner reports status:"cancelled", any still-open steps in the job are marked cancelled too. This keeps a killed job from leaving queued step rows that the UI would otherwise treat as live.

S41d PR2 runner execution supports containerized run: steps with per-step log streaming and server-side log finalization. uses: aliases such as actions/checkout@v4 and artifact upload/download remain reserved for the later S41d slices that add checkout metadata and artifact transfer.

POST /api/v1/jobs/{id}/artifacts/upload

Auth: job JWT. Body:

{"name":"test-results.tgz","size_bytes":12345}

Creates a workflow_artifacts row and returns a pre-signed S3 PUT URL. The object key is actions/runs/<run_id>/artifacts/<name>.

POST /api/v1/jobs/{id}/cancel

Auth: PAT with repo:write, and the actor must have write permission on the repository that owns the job's workflow run. Browser UI forms use CSRF-protected repo routes that call the same lifecycle orchestrator.

Queued jobs are made terminal immediately:

  • workflow_jobs.status = cancelled
  • workflow_jobs.conclusion = cancelled
  • workflow_jobs.cancel_requested = true
  • open steps for that job are marked cancelled

Running jobs keep status = running and get cancel_requested = true. The runner sees this through cancel-check, kills the active container, then reports terminal cancelled.

POST /api/v1/runs/{id}/rerun

Auth: PAT with repo:write, and the actor must have write permission on the repository that owns the workflow run. Browser UI forms use CSRF-protected repo routes for the same operation.

Only terminal workflow runs are rerunnable. A re-run reads the original workflow file from the source run's head_sha, not from the current branch tip, then enqueues a new workflow_runs row with:

  • the same repo_id, workflow_file, head_sha, head_ref, event, and event payload
  • actor_user_id set to the user requesting the re-run
  • parent_run_id set to the source run
  • a fresh trigger_event_id in the rerun:<source_run_id>:<random> namespace

POST /api/v1/jobs/{id}/cancel-check

Auth: job JWT. Returns:

{"cancelled":false,"next_token":"..."}

The boolean mirrors workflow_jobs.cancel_requested. shithubd-runner polls this endpoint during job execution, serializing it through the same single-use JWT chain as logs and status updates. On cancelled: true, the Docker engine runs docker kill <active-container> and the runner posts terminal job status cancelled.

Metrics

  • shithub_actions_runner_registrations_total
  • shithub_actions_runner_heartbeats_total{result="claimed|no_job"}
  • shithub_actions_runner_jwt_total{result="issued|rejected|replay"}
  • shithub_actions_jobs_cancelled_total{reason="user|concurrency|timeout"}
  • shithub_actions_log_scrub_replacements_total{location="server"}
  • shithub_actions_runs_pruned_total{kind="chunks|blobs|runs|jwt_used"}
View source
1 # Actions runner API
2
3 The runner-facing HTTP surface lives in
4 `internal/web/handlers/api/runners.go`. It is mounted under `/api/v1`
5 in the CSRF-exempt API group, but it does not use PAT auth. Runners
6 authenticate first with a long-lived registration token and then with
7 short-lived per-job JWTs.
8
9 ## Auth model
10
11 Operators register a runner with:
12
13 ```sh
14 shithubd admin runner register --name runner-1 --labels self-hosted,linux,ubuntu-latest
15 ```
16
17 The command inserts `workflow_runners`, stores only a SHA-256 hash in
18 `runner_tokens`, and prints the 32-byte hex token once.
19
20 `POST /api/v1/runners/heartbeat` accepts:
21
22 ```http
23 Authorization: Bearer <registration-token>
24 ```
25
26 When a queued job matches the runner labels and capacity is available,
27 the response includes a job payload and a 15-minute job JWT. That JWT
28 has claims:
29
30 ```json
31 {"sub":"runner:<id>","job_id":1,"run_id":1,"repo_id":1,"exp":0,"jti":"..."}
32 ```
33
34 The signing key is derived from `auth.totp_key_b64` with HKDF label
35 `actions-runner-jwt-v1`; the raw TOTP/secretbox key is not used
36 directly for JWT signing.
37
38 Job JWTs are single-use. Every job endpoint verifies the signature and
39 expiry, checks that the path job belongs to the claimed runner/run, and
40 then inserts `jti` into `runner_jwt_used`. A replay returns 401. To
41 support multi-step runner flows, successful in-flight job endpoints
42 return `next_token` and `next_token_expires_at`.
43
44 Consumed JWT rows are retained for 30 days after token expiry, then
45 pruned by the daily `workflow:cleanup` worker. This keeps the replay
46 gate audit trail available for recent jobs without letting the table
47 grow unbounded.
48
49 `shithubd-runner` consumes the same token chain: it claims with the
50 registration token, marks the job `running` with the first job JWT, then
51 uses each returned `next_token` serially for log chunks, step-status
52 updates, cancel checks, artifact upload requests, and finally the
53 terminal job-status update. Reusing any consumed job JWT is a replay and
54 must fail with 401.
55
56 ## Endpoints
57
58 `POST /api/v1/runners/heartbeat`
59
60 Request body:
61
62 ```json
63 {"labels":["ubuntu-latest","linux"],"capacity":1}
64 ```
65
66 Returns 204 when no matching job is claimable. Returns 200 with
67 `token`, `expires_at`, and `job` when a job is claimed. Capacity is
68 enforced server-side by counting current `workflow_jobs.status =
69 'running'` rows for the runner while holding a row lock on the runner.
70 The job payload includes resolved `secrets` and `mask_values`; repo
71 secrets shadow org secrets with the same name. The server also stores
72 an encrypted claim-time copy of the mask values on
73 `workflow_job_secret_masks` so later log uploads are scrubbed against
74 the secrets that were actually handed to the runner, even if an
75 operator rotates or deletes a secret mid-job.
76
77 `POST /api/v1/jobs/{id}/logs`
78
79 Auth: job JWT. Body:
80
81 ```json
82 {"seq":0,"chunk":"aGVsbG8K","step_id":123}
83 ```
84
85 `step_id` is optional for the S41c curl smoke path; when omitted the
86 first step in the job receives the chunk. Chunks are base64-decoded,
87 capped at 512 KiB raw, and appended to `workflow_step_log_chunks`.
88 Duplicate `(step_id, seq)` inserts are accepted as idempotent retries.
89 Before append, the API re-scrubs exact secret values from the job's
90 claim-time mask snapshot. It also reprocesses any possible secret prefix
91 carried at the end of the prior chunk, so a runner cannot leak a secret
92 by splitting it across two log calls.
93
94 `POST /api/v1/jobs/{id}/steps/{step_id}/status`
95
96 Auth: job JWT. Body:
97
98 ```json
99 {"status":"completed","conclusion":"success"}
100 ```
101
102 Valid transitions are `queued|running -> running|completed|cancelled|skipped`
103 with idempotent repeats of the target terminal state. Completed and
104 skipped steps require a valid check conclusion; cancelled defaults to
105 `cancelled` when omitted. The endpoint always returns a `next_token`
106 because a completed step is not the end of the job.
107
108 When object storage is configured, terminal step updates enqueue
109 `workflow:finalize_step`. The worker concatenates
110 `workflow_step_log_chunks` in sequence order, uploads the log to
111 `actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`, stores that
112 key and byte count on `workflow_steps`, then deletes the SQL chunks.
113
114 The repository Actions UI reads logs from the same two-stage storage
115 model. While chunks remain in SQL, a step log page concatenates them in
116 sequence order and renders a static snapshot. After finalization, the
117 page reads `workflow_steps.log_object_key` from object storage and
118 offers a short-lived signed download URL. Live tailing is intentionally
119 separate and lands in the S41f SSE slice.
120
121 `POST /api/v1/jobs/{id}/status`
122
123 Auth: job JWT. Body:
124
125 ```json
126 {"status":"completed","conclusion":"success"}
127 ```
128
129 Valid transitions are `queued|running -> running|completed|cancelled`.
130 Completed jobs require a valid check conclusion. The handler updates
131 `workflow_jobs`, rolls up `workflow_runs`, and best-effort updates the
132 matching `check_runs` row created by the trigger pipeline.
133
134 When a runner reports `status:"cancelled"`, any still-open steps in the
135 job are marked cancelled too. This keeps a killed job from leaving queued
136 step rows that the UI would otherwise treat as live.
137
138 S41d PR2 runner execution supports containerized `run:` steps with
139 per-step log streaming and server-side log finalization. `uses:` aliases
140 such as `actions/checkout@v4` and artifact upload/download remain
141 reserved for the later S41d slices that add checkout metadata and
142 artifact transfer.
143
144 `POST /api/v1/jobs/{id}/artifacts/upload`
145
146 Auth: job JWT. Body:
147
148 ```json
149 {"name":"test-results.tgz","size_bytes":12345}
150 ```
151
152 Creates a `workflow_artifacts` row and returns a pre-signed S3 PUT URL.
153 The object key is `actions/runs/<run_id>/artifacts/<name>`.
154
155 `POST /api/v1/jobs/{id}/cancel`
156
157 Auth: PAT with `repo:write`, and the actor must have write permission on
158 the repository that owns the job's workflow run. Browser UI forms use
159 CSRF-protected repo routes that call the same lifecycle orchestrator.
160
161 Queued jobs are made terminal immediately:
162
163 - `workflow_jobs.status = cancelled`
164 - `workflow_jobs.conclusion = cancelled`
165 - `workflow_jobs.cancel_requested = true`
166 - open steps for that job are marked cancelled
167
168 Running jobs keep `status = running` and get
169 `cancel_requested = true`. The runner sees this through
170 `cancel-check`, kills the active container, then reports terminal
171 `cancelled`.
172
173 `POST /api/v1/runs/{id}/rerun`
174
175 Auth: PAT with `repo:write`, and the actor must have write permission on
176 the repository that owns the workflow run. Browser UI forms use
177 CSRF-protected repo routes for the same operation.
178
179 Only terminal workflow runs are rerunnable. A re-run reads the original
180 workflow file from the source run's `head_sha`, not from the current
181 branch tip, then enqueues a new `workflow_runs` row with:
182
183 - the same `repo_id`, `workflow_file`, `head_sha`, `head_ref`, event,
184 and event payload
185 - `actor_user_id` set to the user requesting the re-run
186 - `parent_run_id` set to the source run
187 - a fresh `trigger_event_id` in the `rerun:<source_run_id>:<random>`
188 namespace
189
190 `POST /api/v1/jobs/{id}/cancel-check`
191
192 Auth: job JWT. Returns:
193
194 ```json
195 {"cancelled":false,"next_token":"..."}
196 ```
197
198 The boolean mirrors `workflow_jobs.cancel_requested`. `shithubd-runner`
199 polls this endpoint during job execution, serializing it through the
200 same single-use JWT chain as logs and status updates. On `cancelled:
201 true`, the Docker engine runs `docker kill <active-container>` and the
202 runner posts terminal job status `cancelled`.
203
204 ## Metrics
205
206 - `shithub_actions_runner_registrations_total`
207 - `shithub_actions_runner_heartbeats_total{result="claimed|no_job"}`
208 - `shithub_actions_runner_jwt_total{result="issued|rejected|replay"}`
209 - `shithub_actions_jobs_cancelled_total{reason="user|concurrency|timeout"}`
210 - `shithub_actions_log_scrub_replacements_total{location="server"}`
211 - `shithub_actions_runs_pruned_total{kind="chunks|blobs|runs|jwt_used"}`