shithub Public

Watch 1 Fork 0 Star 0

markdown · 11254 bytes Raw Blame History

Actions runner API

The runner-facing HTTP surface lives in internal/web/handlers/api/runners.go. It is mounted under /api/v1 in the CSRF-exempt API group, but it does not use PAT auth. Runners authenticate first with a long-lived registration token and then with short-lived per-job JWTs.

Auth model

Operators register a runner with:

shithubd admin runner register \
  --name runner-1 \
  --labels self-hosted,linux,ubuntu-latest,x64 \
  --capacity 1 \
  --output json

The command inserts workflow_runners, stores only a SHA-256 hash in runner_tokens, and returns the raw 32-byte hex token once. --expires-in is optional and should only be used when the deployment rotates the runner token before it expires, because the runner uses that same token for heartbeat authentication.

Operators can drain, undrain, rotate, and hard-revoke runners with shithubd admin runner drain, undrain, rotate-token, and revoke. Drained runners keep heartbeating and may finish already claimed jobs, but heartbeat claims return 204 until the runner is undrained. Hard revocation sets the runner offline, records revoked_at, revokes all registration tokens, and makes job API JWTs minted for that runner invalid. This is the token-compromise boundary: a host with an old config file cannot claim new jobs or update already claimed jobs after revocation lands in Postgres.

POST /api/v1/runners/heartbeat accepts:

Authorization: Bearer <registration-token>

When a queued job matches the runner labels and capacity is available, the response includes a job payload and a 15-minute job JWT. That JWT has claims:

{"sub":"runner:<id>","purpose":"api","job_id":1,"run_id":1,"repo_id":1,"exp":0,"jti":"..."}

The signing key is derived from auth.totp_key_b64 with HKDF label actions-runner-jwt-v1; the raw TOTP/secretbox key is not used directly for JWT signing.

API-purpose job JWTs are single-use. Every job endpoint verifies the signature and expiry, checks that the path job belongs to the claimed runner/run, and then inserts jti into runner_jwt_used. A replay returns 401. To support multi-step runner flows, successful in-flight job endpoints return next_token and next_token_expires_at.

Consumed JWT rows are retained for 30 days after token expiry, then pruned by the daily workflow:cleanup worker. This keeps the replay gate audit trail available for recent jobs without letting the table grow unbounded.

shithubd-runner consumes the same token chain: it claims with the registration token, marks the job running with the first API-purpose job JWT, then uses each returned next_token serially for log chunks, step-status updates, cancel checks, artifact upload requests, and finally the terminal job-status update. Reusing any consumed API-purpose job JWT is a replay and must fail with 401.

The heartbeat claim also returns job.checkout_url and job.checkout_token for actions/checkout@v4. The checkout token is a separate JWT with purpose:"checkout" and the same runner/job/run/repo scope. It is intentionally reusable while the job is running, because Git smart HTTP performs multiple Basic-authenticated requests during one checkout. The git HTTP handler accepts it only for git-upload-pack, only for the claimed repository, and only while the database still shows that the claimed runner is running the job. It is never accepted for pushes or runner API endpoints.

Endpoints

POST /api/v1/runners/heartbeat

Request body:

{
  "labels": ["self-hosted", "linux", "ubuntu-latest", "x64"],
  "capacity": 1,
  "host_name": "runner-host-1",
  "version": "v0.1.0"
}

Returns 204 when no matching job is claimable. Returns 200 with token, expires_at, and job when a job is claimed. Capacity is enforced server-side by counting current workflow_jobs.status = 'running' rows for the runner while holding a row lock on the runner. Claiming also enforces the effective Actions policy for the repository: disabled repos, approval-pending runs, per-repo concurrent job caps, and per-owner/org concurrent job caps are not dispatchable. Approval simply sets workflow_runs.approved_by_user_id; the next heartbeat can claim the same queued jobs, so no duplicate run is created. host_name and version are optional runner metadata. The server stores trimmed values up to 255 bytes for pool diagnostics and preserves the previous values when old runners omit them. The job payload includes checkout_url, checkout_token, resolved secrets, and mask_values; repo secrets shadow org secrets with the same name. The server also stores an encrypted claim-time copy of the mask values on workflow_job_secret_masks so later log uploads are scrubbed against the secrets that were actually handed to the runner, even if an operator rotates or deletes a secret mid-job.

Pull request runs receive no org or repo secrets in v1, even after a maintainer approves dispatch. This is intentionally stricter than the approval gate until environments/protected deployment secrets exist.

POST /api/v1/jobs/{id}/logs

Auth: job JWT. Body:

{"seq":0,"chunk":"aGVsbG8K","step_id":123}

step_id is optional for the S41c curl smoke path; when omitted the first step in the job receives the chunk. Chunks are base64-decoded, capped at 512 KiB raw, and appended to workflow_step_log_chunks. Duplicate (step_id, seq) inserts are accepted as idempotent retries. Before append, the API re-scrubs exact secret values from the job's claim-time mask snapshot. It also reprocesses any possible secret prefix carried at the end of the prior chunk, so a runner cannot leak a secret by splitting it across two log calls.

POST /api/v1/jobs/{id}/steps/{step_id}/status

Auth: job JWT. Body:

{"status":"completed","conclusion":"success"}

Valid transitions are queued|running -> running|completed|cancelled|skipped with idempotent repeats of the target terminal state. Completed and skipped steps require a valid check conclusion; cancelled defaults to cancelled when omitted. The endpoint always returns a next_token because a completed step is not the end of the job.

When object storage is configured, terminal step updates enqueue workflow:finalize_step. The worker concatenates workflow_step_log_chunks in sequence order, uploads the log to actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log, stores that key and byte count on workflow_steps, then deletes the SQL chunks.

The repository Actions UI reads logs from the same two-stage storage model. While chunks remain in SQL, a step log page concatenates them in sequence order and renders a static snapshot. After finalization, the page reads workflow_steps.log_object_key from object storage and offers a short-lived signed download URL. Live tailing is intentionally separate and lands in the S41f SSE slice.

POST /api/v1/jobs/{id}/status

Auth: job JWT. Body:

{"status":"completed","conclusion":"success"}

Valid transitions are queued|running -> running|completed|cancelled. Completed jobs require a valid check conclusion. The handler updates workflow_jobs, rolls up workflow_runs, and best-effort updates the matching check_runs row created by the trigger pipeline.

timeout-minutes is enforced by shithubd-runner as a whole-job deadline. When it expires, the runner kills the active container, reports the current step as completed/timed_out, and reports the job as completed/timed_out. The server treats that conclusion as terminal failure for the workflow run rollup.

When a runner reports status:"cancelled", any still-open steps in the job are marked cancelled too. This keeps a killed job from leaving queued step rows that the UI would otherwise treat as live.

Runner execution supports host-side actions/checkout@v4 followed by containerized run: steps with per-step log streaming and server-side log finalization. Artifact upload/download aliases remain reserved until the artifact transfer path lands.

POST /api/v1/jobs/{id}/artifacts/upload

Auth: job JWT. Body:

{"name":"test-results.tgz","size_bytes":12345}

Creates a workflow_artifacts row and returns a pre-signed S3 PUT URL. The object key is actions/runs/<run_id>/artifacts/<name>.

POST /api/v1/jobs/{id}/cancel

Auth: PAT with repo:write, and the actor must have write permission on the repository that owns the job's workflow run. Browser UI forms use CSRF-protected repo routes that call the same lifecycle orchestrator.

Queued jobs are made terminal immediately:

workflow_jobs.status = cancelled
workflow_jobs.conclusion = cancelled
workflow_jobs.cancel_requested = true
open steps for that job are marked cancelled

Running jobs keep status = running and get cancel_requested = true. The runner sees this through cancel-check, kills the active container, then reports terminal cancelled.

POST /api/v1/runs/{id}/rerun

Auth: PAT with repo:write, and the actor must have write permission on the repository that owns the workflow run. Browser UI forms use CSRF-protected repo routes for the same operation.

Only terminal workflow runs are rerunnable. A re-run reads the original workflow file from the source run's head_sha, not from the current branch tip, then enqueues a new workflow_runs row with:

the same repo_id, workflow_file, head_sha, head_ref, event, and event payload
actor_user_id set to the user requesting the re-run
parent_run_id set to the source run
a fresh trigger_event_id in the rerun:<source_run_id>:<random> namespace

POST /api/v1/jobs/{id}/cancel-check

Auth: job JWT. Returns:

{"cancelled":false,"next_token":"..."}

The boolean mirrors workflow_jobs.cancel_requested. shithubd-runner polls this endpoint during job execution, serializing it through the same single-use JWT chain as logs and status updates. On cancelled: true, the Docker engine runs docker kill <active-container> and the runner posts terminal job status cancelled.

Metrics

shithub_actions_runner_registrations_total
shithub_actions_runner_heartbeats_total{result="claimed|no_job"}
shithub_actions_runner_jwt_total{result="issued|rejected|replay"}
shithub_actions_queue_depth{resource="runs|jobs"}
shithub_actions_active{resource="runs|jobs"}
shithub_actions_runner_heartbeat_age_seconds{runner,status}
shithub_actions_runner_capacity{runner,status}
shithub_actions_runs_completed_total{event,conclusion}
shithub_actions_run_duration_seconds{event,conclusion}
shithub_actions_steps_completed_total{step_type,conclusion}
shithub_actions_jobs_cancelled_total{reason="user|concurrency|timeout"}
shithub_actions_concurrency_queued_total
shithub_actions_log_scrub_replacements_total{location="server"}
shithub_actions_log_chunks_total{location="server"}
shithub_actions_log_chunk_bytes_total{location="server"}
shithub_actions_runs_pruned_total{kind="chunks|blobs|runs|jwt_used"}
shithub_actions_step_timeouts_total
shithub_actions_storage_objects{kind="artifacts|step_logs|hot_log_chunks"}
shithub_actions_storage_bytes{kind="artifacts|step_logs|hot_log_chunks"}

View source

  
        1
        # Actions runner API
      
        2
        
        3
        The runner-facing HTTP surface lives in
      
        4
        `internal/web/handlers/api/runners.go`. It is mounted under `/api/v1`
      
        5
        in the CSRF-exempt API group, but it does not use PAT auth. Runners
      
        6
        authenticate first with a long-lived registration token and then with
      
        7
        short-lived per-job JWTs.
      
        8
        
        9
        ## Auth model
      
        10
        
        11
        Operators register a runner with:
      
        12
        
        13
        ```sh
      
        14
        shithubd admin runner register \
      
        15
          --name runner-1 \
      
        16
          --labels self-hosted,linux,ubuntu-latest,x64 \
      
        17
          --capacity 1 \
      
        18
          --output json
      
        19
        ```
      
        20
        
        21
        The command inserts `workflow_runners`, stores only a SHA-256 hash in
      
        22
        `runner_tokens`, and returns the raw 32-byte hex token once.
      
        23
        `--expires-in` is optional and should only be used when the deployment rotates
      
        24
        the runner token before it expires, because the runner uses that same token for
      
        25
        heartbeat authentication.
      
        26
        
        27
        Operators can drain, undrain, rotate, and hard-revoke runners with
      
        28
        `shithubd admin runner drain`, `undrain`, `rotate-token`, and `revoke`.
      
        29
        Drained runners keep heartbeating and may finish already claimed jobs,
      
        30
        but heartbeat claims return 204 until the runner is undrained. Hard
      
        31
        revocation sets the runner offline, records `revoked_at`, revokes all
      
        32
        registration tokens, and makes job API JWTs minted for that runner
      
        33
        invalid. This is the token-compromise boundary: a host with an old config
      
        34
        file cannot claim new jobs or update already claimed jobs after
      
        35
        revocation lands in Postgres.
      
        36
        
        37
        `POST /api/v1/runners/heartbeat` accepts:
      
        38
        
        39
        ```http
      
        40
        Authorization: Bearer <registration-token>
      
        41
        ```
      
        42
        
        43
        When a queued job matches the runner labels and capacity is available,
      
        44
        the response includes a job payload and a 15-minute job JWT. That JWT
      
        45
        has claims:
      
        46
        
        47
        ```json
      
        48
        {"sub":"runner:<id>","purpose":"api","job_id":1,"run_id":1,"repo_id":1,"exp":0,"jti":"..."}
      
        49
        ```
      
        50
        
        51
        The signing key is derived from `auth.totp_key_b64` with HKDF label
      
        52
        `actions-runner-jwt-v1`; the raw TOTP/secretbox key is not used
      
        53
        directly for JWT signing.
      
        54
        
        55
        API-purpose job JWTs are single-use. Every job endpoint verifies the
      
        56
        signature and expiry, checks that the path job belongs to the claimed
      
        57
        runner/run, and then inserts `jti` into `runner_jwt_used`. A replay
      
        58
        returns 401. To support multi-step runner flows, successful in-flight job
      
        59
        endpoints return `next_token` and `next_token_expires_at`.
      
        60
        
        61
        Consumed JWT rows are retained for 30 days after token expiry, then
      
        62
        pruned by the daily `workflow:cleanup` worker. This keeps the replay
      
        63
        gate audit trail available for recent jobs without letting the table
      
        64
        grow unbounded.
      
        65
        
        66
        `shithubd-runner` consumes the same token chain: it claims with the
      
        67
        registration token, marks the job `running` with the first API-purpose job
      
        68
        JWT, then uses each returned `next_token` serially for log chunks,
      
        69
        step-status updates, cancel checks, artifact upload requests, and finally
      
        70
        the terminal job-status update. Reusing any consumed API-purpose job JWT
      
        71
        is a replay and must fail with 401.
      
        72
        
        73
        The heartbeat claim also returns `job.checkout_url` and
      
        74
        `job.checkout_token` for `actions/checkout@v4`. The checkout token is a
      
        75
        separate JWT with `purpose:"checkout"` and the same runner/job/run/repo
      
        76
        scope. It is intentionally reusable while the job is `running`, because
      
        77
        Git smart HTTP performs multiple Basic-authenticated requests during one
      
        78
        checkout. The git HTTP handler accepts it only for `git-upload-pack`, only
      
        79
        for the claimed repository, and only while the database still shows that
      
        80
        the claimed runner is running the job. It is never accepted for pushes or
      
        81
        runner API endpoints.
      
        82
        
        83
        ## Endpoints
      
        84
        
        85
        `POST /api/v1/runners/heartbeat`
      
        86
        
        87
        Request body:
      
        88
        
        89
        ```json
      
        90
        {
      
        91
          "labels": ["self-hosted", "linux", "ubuntu-latest", "x64"],
      
        92
          "capacity": 1,
      
        93
          "host_name": "runner-host-1",
      
        94
          "version": "v0.1.0"
      
        95
        }
      
        96
        ```
      
        97
        
        98
        Returns 204 when no matching job is claimable. Returns 200 with
      
        99
        `token`, `expires_at`, and `job` when a job is claimed. Capacity is
      
        100
        enforced server-side by counting current `workflow_jobs.status =
      
        101
        'running'` rows for the runner while holding a row lock on the runner.
      
        102
        Claiming also enforces the effective Actions policy for the repository:
      
        103
        disabled repos, approval-pending runs, per-repo concurrent job caps, and
      
        104
        per-owner/org concurrent job caps are not dispatchable. Approval simply
      
        105
        sets `workflow_runs.approved_by_user_id`; the next heartbeat can claim the
      
        106
        same queued jobs, so no duplicate run is created.
      
        107
        `host_name` and `version` are optional runner metadata. The server stores
      
        108
        trimmed values up to 255 bytes for pool diagnostics and preserves the
      
        109
        previous values when old runners omit them.
      
        110
        The job payload includes `checkout_url`, `checkout_token`, resolved
      
        111
        `secrets`, and `mask_values`; repo secrets shadow org secrets with the
      
        112
        same name. The server also stores an encrypted claim-time copy of the mask
      
        113
        values on `workflow_job_secret_masks` so later log uploads are scrubbed
      
        114
        against the secrets that were actually handed to the runner, even if an
      
        115
        operator rotates or deletes a secret mid-job.
      
        116
        
        117
        Pull request runs receive no org or repo secrets in v1, even after a
      
        118
        maintainer approves dispatch. This is intentionally stricter than the
      
        119
        approval gate until environments/protected deployment secrets exist.
      
        120
        
        121
        `POST /api/v1/jobs/{id}/logs`
      
        122
        
        123
        Auth: job JWT. Body:
      
        124
        
        125
        ```json
      
        126
        {"seq":0,"chunk":"aGVsbG8K","step_id":123}
      
        127
        ```
      
        128
        
        129
        `step_id` is optional for the S41c curl smoke path; when omitted the
      
        130
        first step in the job receives the chunk. Chunks are base64-decoded,
      
        131
        capped at 512 KiB raw, and appended to `workflow_step_log_chunks`.
      
        132
        Duplicate `(step_id, seq)` inserts are accepted as idempotent retries.
      
        133
        Before append, the API re-scrubs exact secret values from the job's
      
        134
        claim-time mask snapshot. It also reprocesses any possible secret prefix
      
        135
        carried at the end of the prior chunk, so a runner cannot leak a secret
      
        136
        by splitting it across two log calls.
      
        137
        
        138
        `POST /api/v1/jobs/{id}/steps/{step_id}/status`
      
        139
        
        140
        Auth: job JWT. Body:
      
        141
        
        142
        ```json
      
        143
        {"status":"completed","conclusion":"success"}
      
        144
        ```
      
        145
        
        146
        Valid transitions are `queued|running -> running|completed|cancelled|skipped`
      
        147
        with idempotent repeats of the target terminal state. Completed and
      
        148
        skipped steps require a valid check conclusion; cancelled defaults to
      
        149
        `cancelled` when omitted. The endpoint always returns a `next_token`
      
        150
        because a completed step is not the end of the job.
      
        151
        
        152
        When object storage is configured, terminal step updates enqueue
      
        153
        `workflow:finalize_step`. The worker concatenates
      
        154
        `workflow_step_log_chunks` in sequence order, uploads the log to
      
        155
        `actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`, stores that
      
        156
        key and byte count on `workflow_steps`, then deletes the SQL chunks.
      
        157
        
        158
        The repository Actions UI reads logs from the same two-stage storage
      
        159
        model. While chunks remain in SQL, a step log page concatenates them in
      
        160
        sequence order and renders a static snapshot. After finalization, the
      
        161
        page reads `workflow_steps.log_object_key` from object storage and
      
        162
        offers a short-lived signed download URL. Live tailing is intentionally
      
        163
        separate and lands in the S41f SSE slice.
      
        164
        
        165
        `POST /api/v1/jobs/{id}/status`
      
        166
        
        167
        Auth: job JWT. Body:
      
        168
        
        169
        ```json
      
        170
        {"status":"completed","conclusion":"success"}
      
        171
        ```
      
        172
        
        173
        Valid transitions are `queued|running -> running|completed|cancelled`.
      
        174
        Completed jobs require a valid check conclusion. The handler updates
      
        175
        `workflow_jobs`, rolls up `workflow_runs`, and best-effort updates the
      
        176
        matching `check_runs` row created by the trigger pipeline.
      
        177
        
        178
        `timeout-minutes` is enforced by `shithubd-runner` as a whole-job
      
        179
        deadline. When it expires, the runner kills the active container,
      
        180
        reports the current step as `completed/timed_out`, and reports the job
      
        181
        as `completed/timed_out`. The server treats that conclusion as terminal
      
        182
        failure for the workflow run rollup.
      
        183
        
        184
        When a runner reports `status:"cancelled"`, any still-open steps in the
      
        185
        job are marked cancelled too. This keeps a killed job from leaving queued
      
        186
        step rows that the UI would otherwise treat as live.
      
        187
        
        188
        Runner execution supports host-side `actions/checkout@v4` followed by
      
        189
        containerized `run:` steps with per-step log streaming and server-side log
      
        190
        finalization. Artifact upload/download aliases remain reserved until the
      
        191
        artifact transfer path lands.
      
        192
        
        193
        `POST /api/v1/jobs/{id}/artifacts/upload`
      
        194
        
        195
        Auth: job JWT. Body:
      
        196
        
        197
        ```json
      
        198
        {"name":"test-results.tgz","size_bytes":12345}
      
        199
        ```
      
        200
        
        201
        Creates a `workflow_artifacts` row and returns a pre-signed S3 PUT URL.
      
        202
        The object key is `actions/runs/<run_id>/artifacts/<name>`.
      
        203
        
        204
        `POST /api/v1/jobs/{id}/cancel`
      
        205
        
        206
        Auth: PAT with `repo:write`, and the actor must have write permission on
      
        207
        the repository that owns the job's workflow run. Browser UI forms use
      
        208
        CSRF-protected repo routes that call the same lifecycle orchestrator.
      
        209
        
        210
        Queued jobs are made terminal immediately:
      
        211
        
        212
        - `workflow_jobs.status = cancelled`
      
        213
        - `workflow_jobs.conclusion = cancelled`
      
        214
        - `workflow_jobs.cancel_requested = true`
      
        215
        - open steps for that job are marked cancelled
      
        216
        
        217
        Running jobs keep `status = running` and get
      
        218
        `cancel_requested = true`. The runner sees this through
      
        219
        `cancel-check`, kills the active container, then reports terminal
      
        220
        `cancelled`.
      
        221
        
        222
        `POST /api/v1/runs/{id}/rerun`
      
        223
        
        224
        Auth: PAT with `repo:write`, and the actor must have write permission on
      
        225
        the repository that owns the workflow run. Browser UI forms use
      
        226
        CSRF-protected repo routes for the same operation.
      
        227
        
        228
        Only terminal workflow runs are rerunnable. A re-run reads the original
      
        229
        workflow file from the source run's `head_sha`, not from the current
      
        230
        branch tip, then enqueues a new `workflow_runs` row with:
      
        231
        
        232
        - the same `repo_id`, `workflow_file`, `head_sha`, `head_ref`, event,
      
        233
          and event payload
      
        234
        - `actor_user_id` set to the user requesting the re-run
      
        235
        - `parent_run_id` set to the source run
      
        236
        - a fresh `trigger_event_id` in the `rerun:<source_run_id>:<random>`
      
        237
          namespace
      
        238
        
        239
        `POST /api/v1/jobs/{id}/cancel-check`
      
        240
        
        241
        Auth: job JWT. Returns:
      
        242
        
        243
        ```json
      
        244
        {"cancelled":false,"next_token":"..."}
      
        245
        ```
      
        246
        
        247
        The boolean mirrors `workflow_jobs.cancel_requested`. `shithubd-runner`
      
        248
        polls this endpoint during job execution, serializing it through the
      
        249
        same single-use JWT chain as logs and status updates. On `cancelled:
      
        250
        true`, the Docker engine runs `docker kill <active-container>` and the
      
        251
        runner posts terminal job status `cancelled`.
      
        252
        
        253
        ## Metrics
      
        254
        
        255
        - `shithub_actions_runner_registrations_total`
      
        256
        - `shithub_actions_runner_heartbeats_total{result="claimed|no_job"}`
      
        257
        - `shithub_actions_runner_jwt_total{result="issued|rejected|replay"}`
      
        258
        - `shithub_actions_queue_depth{resource="runs|jobs"}`
      
        259
        - `shithub_actions_active{resource="runs|jobs"}`
      
        260
        - `shithub_actions_runner_heartbeat_age_seconds{runner,status}`
      
        261
        - `shithub_actions_runner_capacity{runner,status}`
      
        262
        - `shithub_actions_runs_completed_total{event,conclusion}`
      
        263
        - `shithub_actions_run_duration_seconds{event,conclusion}`
      
        264
        - `shithub_actions_steps_completed_total{step_type,conclusion}`
      
        265
        - `shithub_actions_jobs_cancelled_total{reason="user|concurrency|timeout"}`
      
        266
        - `shithub_actions_concurrency_queued_total`
      
        267
        - `shithub_actions_log_scrub_replacements_total{location="server"}`
      
        268
        - `shithub_actions_log_chunks_total{location="server"}`
      
        269
        - `shithub_actions_log_chunk_bytes_total{location="server"}`
      
        270
        - `shithub_actions_runs_pruned_total{kind="chunks|blobs|runs|jwt_used"}`
      
        271
        - `shithub_actions_step_timeouts_total`
      
        272
        - `shithub_actions_storage_objects{kind="artifacts|step_logs|hot_log_chunks"}`
      
        273
        - `shithub_actions_storage_bytes{kind="artifacts|step_logs|hot_log_chunks"}`

1	# Actions runner API
2
3	The runner-facing HTTP surface lives in
4	`internal/web/handlers/api/runners.go`. It is mounted under `/api/v1`
5	in the CSRF-exempt API group, but it does not use PAT auth. Runners
6	authenticate first with a long-lived registration token and then with
7	short-lived per-job JWTs.
8
9	## Auth model
10
11	Operators register a runner with:
12
13	```sh
14	shithubd admin runner register \
15	--name runner-1 \
16	--labels self-hosted,linux,ubuntu-latest,x64 \
17	--capacity 1 \
18	--output json
19	```
20
21	The command inserts `workflow_runners`, stores only a SHA-256 hash in
22	`runner_tokens`, and returns the raw 32-byte hex token once.
23	`--expires-in` is optional and should only be used when the deployment rotates
24	the runner token before it expires, because the runner uses that same token for
25	heartbeat authentication.
26
27	Operators can drain, undrain, rotate, and hard-revoke runners with
28	`shithubd admin runner drain`, `undrain`, `rotate-token`, and `revoke`.
29	Drained runners keep heartbeating and may finish already claimed jobs,
30	but heartbeat claims return 204 until the runner is undrained. Hard
31	revocation sets the runner offline, records `revoked_at`, revokes all
32	registration tokens, and makes job API JWTs minted for that runner
33	invalid. This is the token-compromise boundary: a host with an old config
34	file cannot claim new jobs or update already claimed jobs after
35	revocation lands in Postgres.
36
37	`POST /api/v1/runners/heartbeat` accepts:
38
39	```http
40	Authorization: Bearer <registration-token>
41	```
42
43	When a queued job matches the runner labels and capacity is available,
44	the response includes a job payload and a 15-minute job JWT. That JWT
45	has claims:
46
47	```json
48	{"sub":"runner:<id>","purpose":"api","job_id":1,"run_id":1,"repo_id":1,"exp":0,"jti":"..."}
49	```
50
51	The signing key is derived from `auth.totp_key_b64` with HKDF label
52	`actions-runner-jwt-v1`; the raw TOTP/secretbox key is not used
53	directly for JWT signing.
54
55	API-purpose job JWTs are single-use. Every job endpoint verifies the
56	signature and expiry, checks that the path job belongs to the claimed
57	runner/run, and then inserts `jti` into `runner_jwt_used`. A replay
58	returns 401. To support multi-step runner flows, successful in-flight job
59	endpoints return `next_token` and `next_token_expires_at`.
60
61	Consumed JWT rows are retained for 30 days after token expiry, then
62	pruned by the daily `workflow:cleanup` worker. This keeps the replay
63	gate audit trail available for recent jobs without letting the table
64	grow unbounded.
65
66	`shithubd-runner` consumes the same token chain: it claims with the
67	registration token, marks the job `running` with the first API-purpose job
68	JWT, then uses each returned `next_token` serially for log chunks,
69	step-status updates, cancel checks, artifact upload requests, and finally
70	the terminal job-status update. Reusing any consumed API-purpose job JWT
71	is a replay and must fail with 401.
72
73	The heartbeat claim also returns `job.checkout_url` and
74	`job.checkout_token` for `actions/checkout@v4`. The checkout token is a
75	separate JWT with `purpose:"checkout"` and the same runner/job/run/repo
76	scope. It is intentionally reusable while the job is `running`, because
77	Git smart HTTP performs multiple Basic-authenticated requests during one
78	checkout. The git HTTP handler accepts it only for `git-upload-pack`, only
79	for the claimed repository, and only while the database still shows that
80	the claimed runner is running the job. It is never accepted for pushes or
81	runner API endpoints.
82
83	## Endpoints
84
85	`POST /api/v1/runners/heartbeat`
86
87	Request body:
88
89	```json
90	{
91	"labels": ["self-hosted", "linux", "ubuntu-latest", "x64"],
92	"capacity": 1,
93	"host_name": "runner-host-1",
94	"version": "v0.1.0"
95	}
96	```
97
98	Returns 204 when no matching job is claimable. Returns 200 with
99	`token`, `expires_at`, and `job` when a job is claimed. Capacity is
100	enforced server-side by counting current `workflow_jobs.status =
101	'running'` rows for the runner while holding a row lock on the runner.
102	Claiming also enforces the effective Actions policy for the repository:
103	disabled repos, approval-pending runs, per-repo concurrent job caps, and
104	per-owner/org concurrent job caps are not dispatchable. Approval simply
105	sets `workflow_runs.approved_by_user_id`; the next heartbeat can claim the
106	same queued jobs, so no duplicate run is created.
107	`host_name` and `version` are optional runner metadata. The server stores
108	trimmed values up to 255 bytes for pool diagnostics and preserves the
109	previous values when old runners omit them.
110	The job payload includes `checkout_url`, `checkout_token`, resolved
111	`secrets`, and `mask_values`; repo secrets shadow org secrets with the
112	same name. The server also stores an encrypted claim-time copy of the mask
113	values on `workflow_job_secret_masks` so later log uploads are scrubbed
114	against the secrets that were actually handed to the runner, even if an
115	operator rotates or deletes a secret mid-job.
116
117	Pull request runs receive no org or repo secrets in v1, even after a
118	maintainer approves dispatch. This is intentionally stricter than the
119	approval gate until environments/protected deployment secrets exist.
120
121	`POST /api/v1/jobs/{id}/logs`
122
123	Auth: job JWT. Body:
124
125	```json
126	{"seq":0,"chunk":"aGVsbG8K","step_id":123}
127	```
128
129	`step_id` is optional for the S41c curl smoke path; when omitted the
130	first step in the job receives the chunk. Chunks are base64-decoded,
131	capped at 512 KiB raw, and appended to `workflow_step_log_chunks`.
132	Duplicate `(step_id, seq)` inserts are accepted as idempotent retries.
133	Before append, the API re-scrubs exact secret values from the job's
134	claim-time mask snapshot. It also reprocesses any possible secret prefix
135	carried at the end of the prior chunk, so a runner cannot leak a secret
136	by splitting it across two log calls.
137
138	`POST /api/v1/jobs/{id}/steps/{step_id}/status`
139
140	Auth: job JWT. Body:
141
142	```json
143	{"status":"completed","conclusion":"success"}
144	```
145
146	Valid transitions are `queued\|running -> running\|completed\|cancelled\|skipped`
147	with idempotent repeats of the target terminal state. Completed and
148	skipped steps require a valid check conclusion; cancelled defaults to
149	`cancelled` when omitted. The endpoint always returns a `next_token`
150	because a completed step is not the end of the job.
151
152	When object storage is configured, terminal step updates enqueue
153	`workflow:finalize_step`. The worker concatenates
154	`workflow_step_log_chunks` in sequence order, uploads the log to
155	`actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`, stores that
156	key and byte count on `workflow_steps`, then deletes the SQL chunks.
157
158	The repository Actions UI reads logs from the same two-stage storage
159	model. While chunks remain in SQL, a step log page concatenates them in
160	sequence order and renders a static snapshot. After finalization, the
161	page reads `workflow_steps.log_object_key` from object storage and
162	offers a short-lived signed download URL. Live tailing is intentionally
163	separate and lands in the S41f SSE slice.
164
165	`POST /api/v1/jobs/{id}/status`
166
167	Auth: job JWT. Body:
168
169	```json
170	{"status":"completed","conclusion":"success"}
171	```
172
173	Valid transitions are `queued\|running -> running\|completed\|cancelled`.
174	Completed jobs require a valid check conclusion. The handler updates
175	`workflow_jobs`, rolls up `workflow_runs`, and best-effort updates the
176	matching `check_runs` row created by the trigger pipeline.
177
178	`timeout-minutes` is enforced by `shithubd-runner` as a whole-job
179	deadline. When it expires, the runner kills the active container,
180	reports the current step as `completed/timed_out`, and reports the job
181	as `completed/timed_out`. The server treats that conclusion as terminal
182	failure for the workflow run rollup.
183
184	When a runner reports `status:"cancelled"`, any still-open steps in the
185	job are marked cancelled too. This keeps a killed job from leaving queued
186	step rows that the UI would otherwise treat as live.
187
188	Runner execution supports host-side `actions/checkout@v4` followed by
189	containerized `run:` steps with per-step log streaming and server-side log
190	finalization. Artifact upload/download aliases remain reserved until the
191	artifact transfer path lands.
192
193	`POST /api/v1/jobs/{id}/artifacts/upload`
194
195	Auth: job JWT. Body:
196
197	```json
198	{"name":"test-results.tgz","size_bytes":12345}
199	```
200
201	Creates a `workflow_artifacts` row and returns a pre-signed S3 PUT URL.
202	The object key is `actions/runs/<run_id>/artifacts/<name>`.
203
204	`POST /api/v1/jobs/{id}/cancel`
205
206	Auth: PAT with `repo:write`, and the actor must have write permission on
207	the repository that owns the job's workflow run. Browser UI forms use
208	CSRF-protected repo routes that call the same lifecycle orchestrator.
209
210	Queued jobs are made terminal immediately:
211
212	- `workflow_jobs.status = cancelled`
213	- `workflow_jobs.conclusion = cancelled`
214	- `workflow_jobs.cancel_requested = true`
215	- open steps for that job are marked cancelled
216
217	Running jobs keep `status = running` and get
218	`cancel_requested = true`. The runner sees this through
219	`cancel-check`, kills the active container, then reports terminal
220	`cancelled`.
221
222	`POST /api/v1/runs/{id}/rerun`
223
224	Auth: PAT with `repo:write`, and the actor must have write permission on
225	the repository that owns the workflow run. Browser UI forms use
226	CSRF-protected repo routes for the same operation.
227
228	Only terminal workflow runs are rerunnable. A re-run reads the original
229	workflow file from the source run's `head_sha`, not from the current
230	branch tip, then enqueues a new `workflow_runs` row with:
231
232	- the same `repo_id`, `workflow_file`, `head_sha`, `head_ref`, event,
233	and event payload
234	- `actor_user_id` set to the user requesting the re-run
235	- `parent_run_id` set to the source run
236	- a fresh `trigger_event_id` in the `rerun:<source_run_id>:<random>`
237	namespace
238
239	`POST /api/v1/jobs/{id}/cancel-check`
240
241	Auth: job JWT. Returns:
242
243	```json
244	{"cancelled":false,"next_token":"..."}
245	```
246
247	The boolean mirrors `workflow_jobs.cancel_requested`. `shithubd-runner`
248	polls this endpoint during job execution, serializing it through the
249	same single-use JWT chain as logs and status updates. On `cancelled:
250	true`, the Docker engine runs `docker kill <active-container>` and the
251	runner posts terminal job status `cancelled`.
252
253	## Metrics
254
255	- `shithub_actions_runner_registrations_total`
256	- `shithub_actions_runner_heartbeats_total{result="claimed\|no_job"}`
257	- `shithub_actions_runner_jwt_total{result="issued\|rejected\|replay"}`
258	- `shithub_actions_queue_depth{resource="runs\|jobs"}`
259	- `shithub_actions_active{resource="runs\|jobs"}`
260	- `shithub_actions_runner_heartbeat_age_seconds{runner,status}`
261	- `shithub_actions_runner_capacity{runner,status}`
262	- `shithub_actions_runs_completed_total{event,conclusion}`
263	- `shithub_actions_run_duration_seconds{event,conclusion}`
264	- `shithub_actions_steps_completed_total{step_type,conclusion}`
265	- `shithub_actions_jobs_cancelled_total{reason="user\|concurrency\|timeout"}`
266	- `shithub_actions_concurrency_queued_total`
267	- `shithub_actions_log_scrub_replacements_total{location="server"}`
268	- `shithub_actions_log_chunks_total{location="server"}`
269	- `shithub_actions_log_chunk_bytes_total{location="server"}`
270	- `shithub_actions_runs_pruned_total{kind="chunks\|blobs\|runs\|jwt_used"}`
271	- `shithub_actions_step_timeouts_total`
272	- `shithub_actions_storage_objects{kind="artifacts\|step_logs\|hot_log_chunks"}`
273	- `shithub_actions_storage_bytes{kind="artifacts\|step_logs\|hot_log_chunks"}`