Actions runner deploy runbook
This runbook owns the S41d deployment path for shithubd-runner: the
Nix-built default image, systemd unit, and Ansible role. The smoke flow
for an already-installed runner lives in actions-runner.md.
Prereqs
- The app database is migrated through S41d and the web API has
auth.totp_key_b64configured so job JWTs can be minted. - Docker is installed on the runner host and the
dockergroup exists. S41e narrows the sandbox; S41d runner hosts must be treated as trusted. bin/shithubd-runnerexists locally.make buildbuilds bothbin/shithubdandbin/shithubd-runnerwith the same version ldflags.- The default image has been loaded or published. Build it with:
nix build ./deploy/runner-images#runnerImage
docker load < result
The committed deploy/runner-images/flake.lock pins the nixpkgs input.
Update it deliberately when changing the default image toolchain.
Publishing to GHCR is manual through .github/workflows/runner-image.yml
because forks may not control the upstream ghcr.io/shithub namespace.
Leave the workflow's image input blank to publish under the current
repository's package namespace, or set it explicitly for upstream
publishing.
Register
Run this once from a host that can reach the production database config:
shithubd admin runner register \
--name prod-runner-1 \
--labels self-hosted,linux,ubuntu-latest \
--capacity 1
Store the printed token in ansible-vault or the deployment secret store. Only the token hash is stored in Postgres; the raw token cannot be recovered later.
Inventory
Enable the role explicitly. The default is disabled so ordinary app deploys do not start a runner by accident.
[shithub:vars]
shithub_runner_enabled=true
shithub_runner_token=REPLACE_ME
shithub_runner_labels=self-hosted,linux,ubuntu-latest
shithub_runner_capacity=1
shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
The role writes non-secret config to
/etc/shithubd-runner/config.toml and the registration token to
/etc/shithubd-runner/runner.env with mode 0600.
Keep shithub_runner_workspace_root under /var/lib/shithubd-runner;
the systemd unit grants runner writes only to that subtree.
Deploy
For the runner role only:
make build
cd deploy/ansible
ansible-playbook -i inventory/production site.yml -t shithubd-runner
The role:
- creates the
shithub-runnersystem user and joins it todocker - uploads
/usr/local/bin/shithubd-runner - renders
/etc/shithubd-runner/config.tomlandrunner.env - installs
deploy/systemd/shithubd-runner.service - pulls the configured runner image
- enables and starts
shithubd-runner
Verify
On the runner host:
systemctl status shithubd-runner
journalctl -u shithubd-runner -n 100 --no-pager
Then push a workflow with a simple run: step:
name: ci
on: push
jobs:
build:
runs-on: ubuntu-latest
steps:
- run: bash -c "echo hello && exit 0"
Expected state:
- a runner heartbeat claims the queued job within one idle poll interval
- the step emits SQL log chunks during execution
workflow:finalize_stepuploadsactions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log- the job and check run complete with conclusion
success
Repeat with exit 1; the check should complete with conclusion
failure.
Rollback
Stop the runner first so it does not claim new jobs:
systemctl stop shithubd-runner
systemctl disable shithubd-runner
If the binary itself is bad, copy a prior archived binary from
/var/lib/shithubd-runner/binaries/ back to
/usr/local/bin/shithubd-runner and restart the unit. Jobs already
claimed by the stopped runner remain visible in the database; S41g adds
operator cancel/re-run controls.
View source
| 1 | # Actions runner deploy runbook |
| 2 | |
| 3 | This runbook owns the S41d deployment path for `shithubd-runner`: the |
| 4 | Nix-built default image, systemd unit, and Ansible role. The smoke flow |
| 5 | for an already-installed runner lives in [actions-runner.md](./actions-runner.md). |
| 6 | |
| 7 | ## Prereqs |
| 8 | |
| 9 | - The app database is migrated through S41d and the web API has |
| 10 | `auth.totp_key_b64` configured so job JWTs can be minted. |
| 11 | - Docker is installed on the runner host and the `docker` group exists. |
| 12 | S41e narrows the sandbox; S41d runner hosts must be treated as trusted. |
| 13 | - `bin/shithubd-runner` exists locally. `make build` builds both |
| 14 | `bin/shithubd` and `bin/shithubd-runner` with the same version ldflags. |
| 15 | - The default image has been loaded or published. Build it with: |
| 16 | |
| 17 | ```sh |
| 18 | nix build ./deploy/runner-images#runnerImage |
| 19 | docker load < result |
| 20 | ``` |
| 21 | |
| 22 | The committed `deploy/runner-images/flake.lock` pins the nixpkgs input. |
| 23 | Update it deliberately when changing the default image toolchain. |
| 24 | |
| 25 | Publishing to GHCR is manual through `.github/workflows/runner-image.yml` |
| 26 | because forks may not control the upstream `ghcr.io/shithub` namespace. |
| 27 | Leave the workflow's `image` input blank to publish under the current |
| 28 | repository's package namespace, or set it explicitly for upstream |
| 29 | publishing. |
| 30 | |
| 31 | ## Register |
| 32 | |
| 33 | Run this once from a host that can reach the production database config: |
| 34 | |
| 35 | ```sh |
| 36 | shithubd admin runner register \ |
| 37 | --name prod-runner-1 \ |
| 38 | --labels self-hosted,linux,ubuntu-latest \ |
| 39 | --capacity 1 |
| 40 | ``` |
| 41 | |
| 42 | Store the printed token in ansible-vault or the deployment secret store. |
| 43 | Only the token hash is stored in Postgres; the raw token cannot be |
| 44 | recovered later. |
| 45 | |
| 46 | ## Inventory |
| 47 | |
| 48 | Enable the role explicitly. The default is disabled so ordinary app |
| 49 | deploys do not start a runner by accident. |
| 50 | |
| 51 | ```ini |
| 52 | [shithub:vars] |
| 53 | shithub_runner_enabled=true |
| 54 | shithub_runner_token=REPLACE_ME |
| 55 | shithub_runner_labels=self-hosted,linux,ubuntu-latest |
| 56 | shithub_runner_capacity=1 |
| 57 | shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0 |
| 58 | ``` |
| 59 | |
| 60 | The role writes non-secret config to |
| 61 | `/etc/shithubd-runner/config.toml` and the registration token to |
| 62 | `/etc/shithubd-runner/runner.env` with mode `0600`. |
| 63 | Keep `shithub_runner_workspace_root` under `/var/lib/shithubd-runner`; |
| 64 | the systemd unit grants runner writes only to that subtree. |
| 65 | |
| 66 | ## Deploy |
| 67 | |
| 68 | For the runner role only: |
| 69 | |
| 70 | ```sh |
| 71 | make build |
| 72 | cd deploy/ansible |
| 73 | ansible-playbook -i inventory/production site.yml -t shithubd-runner |
| 74 | ``` |
| 75 | |
| 76 | The role: |
| 77 | |
| 78 | - creates the `shithub-runner` system user and joins it to `docker` |
| 79 | - uploads `/usr/local/bin/shithubd-runner` |
| 80 | - renders `/etc/shithubd-runner/config.toml` and `runner.env` |
| 81 | - installs `deploy/systemd/shithubd-runner.service` |
| 82 | - pulls the configured runner image |
| 83 | - enables and starts `shithubd-runner` |
| 84 | |
| 85 | ## Verify |
| 86 | |
| 87 | On the runner host: |
| 88 | |
| 89 | ```sh |
| 90 | systemctl status shithubd-runner |
| 91 | journalctl -u shithubd-runner -n 100 --no-pager |
| 92 | ``` |
| 93 | |
| 94 | Then push a workflow with a simple `run:` step: |
| 95 | |
| 96 | ```yaml |
| 97 | name: ci |
| 98 | on: push |
| 99 | jobs: |
| 100 | build: |
| 101 | runs-on: ubuntu-latest |
| 102 | steps: |
| 103 | - run: bash -c "echo hello && exit 0" |
| 104 | ``` |
| 105 | |
| 106 | Expected state: |
| 107 | |
| 108 | - a runner heartbeat claims the queued job within one idle poll interval |
| 109 | - the step emits SQL log chunks during execution |
| 110 | - `workflow:finalize_step` uploads |
| 111 | `actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log` |
| 112 | - the job and check run complete with conclusion `success` |
| 113 | |
| 114 | Repeat with `exit 1`; the check should complete with conclusion |
| 115 | `failure`. |
| 116 | |
| 117 | ## Rollback |
| 118 | |
| 119 | Stop the runner first so it does not claim new jobs: |
| 120 | |
| 121 | ```sh |
| 122 | systemctl stop shithubd-runner |
| 123 | systemctl disable shithubd-runner |
| 124 | ``` |
| 125 | |
| 126 | If the binary itself is bad, copy a prior archived binary from |
| 127 | `/var/lib/shithubd-runner/binaries/` back to |
| 128 | `/usr/local/bin/shithubd-runner` and restart the unit. Jobs already |
| 129 | claimed by the stopped runner remain visible in the database; S41g adds |
| 130 | operator cancel/re-run controls. |