# Actions runner API The runner-facing HTTP surface lives in `internal/web/handlers/api/runners.go`. It is mounted under `/api/v1` in the CSRF-exempt API group, but it does not use PAT auth. Runners authenticate first with a long-lived registration token and then with short-lived per-job JWTs. ## Auth model Operators register a runner with: ```sh shithubd admin runner register \ --name runner-1 \ --labels self-hosted,linux,ubuntu-latest,x64 \ --capacity 1 \ --output json ``` The command inserts `workflow_runners`, stores only a SHA-256 hash in `runner_tokens`, and returns the raw 32-byte hex token once. `--expires-in` is optional and should only be used when the deployment rotates the runner token before it expires, because the runner uses that same token for heartbeat authentication. Operators can drain, undrain, rotate, and hard-revoke runners with `shithubd admin runner drain`, `undrain`, `rotate-token`, and `revoke`. Drained runners keep heartbeating and may finish already claimed jobs, but heartbeat claims return 204 until the runner is undrained. Hard revocation sets the runner offline, records `revoked_at`, revokes all registration tokens, and makes job API JWTs minted for that runner invalid. This is the token-compromise boundary: a host with an old config file cannot claim new jobs or update already claimed jobs after revocation lands in Postgres. `POST /api/v1/runners/heartbeat` accepts: ```http Authorization: Bearer ``` When a queued job matches the runner labels and capacity is available, the response includes a job payload and a 15-minute job JWT. That JWT has claims: ```json {"sub":"runner:","purpose":"api","job_id":1,"run_id":1,"repo_id":1,"exp":0,"jti":"..."} ``` The signing key is derived from `auth.totp_key_b64` with HKDF label `actions-runner-jwt-v1`; the raw TOTP/secretbox key is not used directly for JWT signing. API-purpose job JWTs are single-use. Every job endpoint verifies the signature and expiry, checks that the path job belongs to the claimed runner/run, and then inserts `jti` into `runner_jwt_used`. A replay returns 401. To support multi-step runner flows, successful in-flight job endpoints return `next_token` and `next_token_expires_at`. Consumed JWT rows are retained for 30 days after token expiry, then pruned by the daily `workflow:cleanup` worker. This keeps the replay gate audit trail available for recent jobs without letting the table grow unbounded. `shithubd-runner` consumes the same token chain: it claims with the registration token, marks the job `running` with the first API-purpose job JWT, then uses each returned `next_token` serially for log chunks, step-status updates, cancel checks, artifact upload requests, and finally the terminal job-status update. Reusing any consumed API-purpose job JWT is a replay and must fail with 401. The heartbeat claim also returns `job.checkout_url` and `job.checkout_token` for `actions/checkout@v4`. The checkout token is a separate JWT with `purpose:"checkout"` and the same runner/job/run/repo scope. It is intentionally reusable while the job is `running`, because Git smart HTTP performs multiple Basic-authenticated requests during one checkout. The git HTTP handler accepts it only for `git-upload-pack`, only for the claimed repository, and only while the database still shows that the claimed runner is running the job. It is never accepted for pushes or runner API endpoints. ## Endpoints `POST /api/v1/runners/heartbeat` Request body: ```json { "labels": ["self-hosted", "linux", "ubuntu-latest", "x64"], "capacity": 1, "host_name": "runner-host-1", "version": "v0.1.0" } ``` Returns 204 when no matching job is claimable. Returns 200 with `token`, `expires_at`, and `job` when a job is claimed. Capacity is enforced server-side by counting current `workflow_jobs.status = 'running'` rows for the runner while holding a row lock on the runner. Claiming also enforces the effective Actions policy for the repository: disabled repos, approval-pending runs, per-repo concurrent job caps, and per-owner/org concurrent job caps are not dispatchable. Approval simply sets `workflow_runs.approved_by_user_id`; the next heartbeat can claim the same queued jobs, so no duplicate run is created. `host_name` and `version` are optional runner metadata. The server stores trimmed values up to 255 bytes for pool diagnostics and preserves the previous values when old runners omit them. The job payload includes `checkout_url`, `checkout_token`, resolved `secrets`, and `mask_values`; repo secrets shadow org secrets with the same name. The server also stores an encrypted claim-time copy of the mask values on `workflow_job_secret_masks` so later log uploads are scrubbed against the secrets that were actually handed to the runner, even if an operator rotates or deletes a secret mid-job. Pull request runs receive no org or repo secrets in v1, even after a maintainer approves dispatch. This is intentionally stricter than the approval gate until environments/protected deployment secrets exist. `POST /api/v1/jobs/{id}/logs` Auth: job JWT. Body: ```json {"seq":0,"chunk":"aGVsbG8K","step_id":123} ``` `step_id` is optional for the S41c curl smoke path; when omitted the first step in the job receives the chunk. Chunks are base64-decoded, capped at 512 KiB raw, and appended to `workflow_step_log_chunks`. Duplicate `(step_id, seq)` inserts are accepted as idempotent retries. Before append, the API re-scrubs exact secret values from the job's claim-time mask snapshot. It also reprocesses any possible secret prefix carried at the end of the prior chunk, so a runner cannot leak a secret by splitting it across two log calls. `POST /api/v1/jobs/{id}/steps/{step_id}/status` Auth: job JWT. Body: ```json {"status":"completed","conclusion":"success"} ``` Valid transitions are `queued|running -> running|completed|cancelled|skipped` with idempotent repeats of the target terminal state. Completed and skipped steps require a valid check conclusion; cancelled defaults to `cancelled` when omitted. The endpoint always returns a `next_token` because a completed step is not the end of the job. When object storage is configured, terminal step updates enqueue `workflow:finalize_step`. The worker concatenates `workflow_step_log_chunks` in sequence order, uploads the log to `actions/runs//jobs//steps/.log`, stores that key and byte count on `workflow_steps`, then deletes the SQL chunks. The repository Actions UI reads logs from the same two-stage storage model. While chunks remain in SQL, a step log page concatenates them in sequence order and renders a static snapshot. After finalization, the page reads `workflow_steps.log_object_key` from object storage and offers a short-lived signed download URL. Live tailing is intentionally separate and lands in the S41f SSE slice. `POST /api/v1/jobs/{id}/status` Auth: job JWT. Body: ```json {"status":"completed","conclusion":"success"} ``` Valid transitions are `queued|running -> running|completed|cancelled`. Completed jobs require a valid check conclusion. The handler updates `workflow_jobs`, rolls up `workflow_runs`, and best-effort updates the matching `check_runs` row created by the trigger pipeline. `timeout-minutes` is enforced by `shithubd-runner` as a whole-job deadline. When it expires, the runner kills the active container, reports the current step as `completed/timed_out`, and reports the job as `completed/timed_out`. The server treats that conclusion as terminal failure for the workflow run rollup. When a runner reports `status:"cancelled"`, any still-open steps in the job are marked cancelled too. This keeps a killed job from leaving queued step rows that the UI would otherwise treat as live. Runner execution supports host-side `actions/checkout@v4` followed by containerized `run:` steps with per-step log streaming and server-side log finalization. Artifact upload/download aliases remain reserved until the artifact transfer path lands. `POST /api/v1/jobs/{id}/artifacts/upload` Auth: job JWT. Body: ```json {"name":"test-results.tgz","size_bytes":12345} ``` Creates a `workflow_artifacts` row and returns a pre-signed S3 PUT URL. The object key is `actions/runs//artifacts/`. `POST /api/v1/jobs/{id}/cancel` Auth: PAT with `repo:write`, and the actor must have write permission on the repository that owns the job's workflow run. Browser UI forms use CSRF-protected repo routes that call the same lifecycle orchestrator. Queued jobs are made terminal immediately: - `workflow_jobs.status = cancelled` - `workflow_jobs.conclusion = cancelled` - `workflow_jobs.cancel_requested = true` - open steps for that job are marked cancelled Running jobs keep `status = running` and get `cancel_requested = true`. The runner sees this through `cancel-check`, kills the active container, then reports terminal `cancelled`. `POST /api/v1/runs/{id}/rerun` Auth: PAT with `repo:write`, and the actor must have write permission on the repository that owns the workflow run. Browser UI forms use CSRF-protected repo routes for the same operation. Only terminal workflow runs are rerunnable. A re-run reads the original workflow file from the source run's `head_sha`, not from the current branch tip, then enqueues a new `workflow_runs` row with: - the same `repo_id`, `workflow_file`, `head_sha`, `head_ref`, event, and event payload - `actor_user_id` set to the user requesting the re-run - `parent_run_id` set to the source run - a fresh `trigger_event_id` in the `rerun::` namespace `POST /api/v1/jobs/{id}/cancel-check` Auth: job JWT. Returns: ```json {"cancelled":false,"next_token":"..."} ``` The boolean mirrors `workflow_jobs.cancel_requested`. `shithubd-runner` polls this endpoint during job execution, serializing it through the same single-use JWT chain as logs and status updates. On `cancelled: true`, the Docker engine runs `docker kill ` and the runner posts terminal job status `cancelled`. ## Metrics - `shithub_actions_runner_registrations_total` - `shithub_actions_runner_heartbeats_total{result="claimed|no_job"}` - `shithub_actions_runner_jwt_total{result="issued|rejected|replay"}` - `shithub_actions_queue_depth{resource="runs|jobs"}` - `shithub_actions_active{resource="runs|jobs"}` - `shithub_actions_runner_heartbeat_age_seconds{runner,status}` - `shithub_actions_runner_capacity{runner,status}` - `shithub_actions_runs_completed_total{event,conclusion}` - `shithub_actions_run_duration_seconds{event,conclusion}` - `shithub_actions_steps_completed_total{step_type,conclusion}` - `shithub_actions_jobs_cancelled_total{reason="user|concurrency|timeout"}` - `shithub_actions_concurrency_queued_total` - `shithub_actions_log_scrub_replacements_total{location="server"}` - `shithub_actions_log_chunks_total{location="server"}` - `shithub_actions_log_chunk_bytes_total{location="server"}` - `shithub_actions_runs_pruned_total{kind="chunks|blobs|runs|jwt_used"}` - `shithub_actions_step_timeouts_total` - `shithub_actions_storage_objects{kind="artifacts|step_logs|hot_log_chunks"}` - `shithub_actions_storage_bytes{kind="artifacts|step_logs|hot_log_chunks"}`