Actions runner deploy runbook
This runbook owns the S41d deployment path for shithubd-runner: the
Nix-built default image, systemd unit, and Ansible role. The smoke flow
for an already-installed runner lives in actions-runner.md.
Prereqs
- The app database is migrated through S41d and the web API has
auth.totp_key_b64configured so job JWTs can be minted. - Docker is installed on the runner host and the
dockergroup exists. The runner process needs Docker socket access; treat the host itself as trusted even though individual step containers are sandboxed. bin/shithubd-runnerexists locally.make buildbuilds bothbin/shithubdandbin/shithubd-runnerwith the same version ldflags.- The default image has been loaded or published. Build it with:
nix build ./deploy/runner-images#runnerImage
docker load < result
The committed deploy/runner-images/flake.lock pins the nixpkgs input.
Update it deliberately when changing the default image toolchain.
Publishing to GHCR is manual through .github/workflows/runner-image.yml
because forks may not control the upstream ghcr.io/shithub namespace.
Leave the workflow's image input blank to publish under the current
repository's package namespace, or set it explicitly for upstream
publishing.
Register
Run this once from a host that can reach the production database config:
shithubd admin runner register \
--name prod-runner-1 \
--labels self-hosted,linux,ubuntu-latest \
--capacity 1
Store the printed token in ansible-vault or the deployment secret store. Only the token hash is stored in Postgres; the raw token cannot be recovered later.
Inventory
Enable the role explicitly. The default is disabled so ordinary app deploys do not start a runner by accident.
[shithub:vars]
shithub_runner_enabled=true
shithub_runner_token=REPLACE_ME
shithub_runner_labels=self-hosted,linux,ubuntu-latest
shithub_runner_capacity=1
shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
shithub_runner_seccomp_profile=/etc/shithubd-runner/seccomp.json
shithub_runner_container_user=65534:65534
shithub_runner_pids_limit=512
shithub_runner_dns_servers=172.30.0.1
The role writes non-secret config to
/etc/shithubd-runner/config.toml and the registration token to
/etc/shithubd-runner/runner.env with mode 0600.
Keep shithub_runner_workspace_root under /var/lib/shithubd-runner;
the systemd unit grants runner writes only to that subtree.
shithub_runner_network_allowlist defaults to GitHub source/archive
hosts plus Docker Hub registry hosts. Override it when a runner must
fetch from an internal package registry. shithub_runner_dns_servers
is empty by default; set it only after a DNS allowlist resolver exists
on the runner network.
Deploy
For the runner role only:
make build
cd deploy/ansible
ansible-playbook -i inventory/production site.yml -t shithubd-runner
The role:
- creates the
shithub-runnersystem user and joins it todocker - uploads
/usr/local/bin/shithubd-runner - renders
/etc/shithubd-runner/config.tomlandrunner.env - renders
/etc/shithubd-runner/dnsmasq.conffrom the network allowlist for operators who run a local DNS allowlist resolver - installs the pinned seccomp profile at
/etc/shithubd-runner/seccomp.json - installs
deploy/systemd/shithubd-runner.service - pulls the configured runner image
- enables and starts
shithubd-runner
Verify
On the runner host:
systemctl status shithubd-runner
journalctl -u shithubd-runner -n 100 --no-pager
Then push a workflow with a simple run: step:
name: ci
on: push
jobs:
build:
runs-on: ubuntu-latest
steps:
- run: bash -c "echo hello && exit 0"
Expected state:
- a runner heartbeat claims the queued job within one idle poll interval
- the step emits SQL log chunks during execution
workflow:finalize_stepuploadsactions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log- the job and check run complete with conclusion
success
Repeat with exit 1; the check should complete with conclusion
failure.
Sandbox smoke checks:
name: sandbox-smoke
on: push
jobs:
smoke:
runs-on: ubuntu-latest
steps:
- run: id -u
- run: test "$(id -u)" = "65534"
- run: if mkdir /etc/shithub-smoke 2>/dev/null; then exit 1; fi
- run: if mount -t tmpfs tmpfs /mnt 2>/dev/null; then exit 1; fi
Expected state:
- the UID check prints
65534 - writing under
/etcfails because the root filesystem is read-only mountfails because the container does not haveCAP_SYS_ADMIN- step logs and systemd journal include the configured image, network, CPU/memory limits, PID limit, container user, and seccomp profile
Network Allowlist
The runner config carries two separate network controls:
runner.network_allowlist: the host patterns allowed by the operator's DNS allowlist resolver.engine.dns_servers: DNS servers passed to each step container with Docker--dns.
For a single-host deployment, create a dedicated Docker bridge for
Actions jobs, run dnsmasq bound to that bridge, render
/etc/shithubd-runner/dnsmasq.conf, and set
shithub_runner_dns_servers to the bridge address of that resolver.
The rendered dnsmasq config has no default upstream resolver; names not
matching the allowlist fail DNS resolution.
DNS filtering is not a complete egress boundary by itself. Block direct-IP egress from the Actions bridge with host firewall rules, and allow only DNS to the resolver plus established outbound connections opened by that resolver. Keep the runner on a separate host from web and database services.
Rollback
Stop the runner first so it does not claim new jobs:
systemctl stop shithubd-runner
systemctl disable shithubd-runner
If the binary itself is bad, copy a prior archived binary from
/var/lib/shithubd-runner/binaries/ back to
/usr/local/bin/shithubd-runner and restart the unit. Jobs already
claimed by the stopped runner remain visible in the database; S41g adds
operator cancel/re-run controls.
View source
| 1 | # Actions runner deploy runbook |
| 2 | |
| 3 | This runbook owns the S41d deployment path for `shithubd-runner`: the |
| 4 | Nix-built default image, systemd unit, and Ansible role. The smoke flow |
| 5 | for an already-installed runner lives in [actions-runner.md](./actions-runner.md). |
| 6 | |
| 7 | ## Prereqs |
| 8 | |
| 9 | - The app database is migrated through S41d and the web API has |
| 10 | `auth.totp_key_b64` configured so job JWTs can be minted. |
| 11 | - Docker is installed on the runner host and the `docker` group exists. |
| 12 | The runner process needs Docker socket access; treat the host itself |
| 13 | as trusted even though individual step containers are sandboxed. |
| 14 | - `bin/shithubd-runner` exists locally. `make build` builds both |
| 15 | `bin/shithubd` and `bin/shithubd-runner` with the same version ldflags. |
| 16 | - The default image has been loaded or published. Build it with: |
| 17 | |
| 18 | ```sh |
| 19 | nix build ./deploy/runner-images#runnerImage |
| 20 | docker load < result |
| 21 | ``` |
| 22 | |
| 23 | The committed `deploy/runner-images/flake.lock` pins the nixpkgs input. |
| 24 | Update it deliberately when changing the default image toolchain. |
| 25 | |
| 26 | Publishing to GHCR is manual through `.github/workflows/runner-image.yml` |
| 27 | because forks may not control the upstream `ghcr.io/shithub` namespace. |
| 28 | Leave the workflow's `image` input blank to publish under the current |
| 29 | repository's package namespace, or set it explicitly for upstream |
| 30 | publishing. |
| 31 | |
| 32 | ## Register |
| 33 | |
| 34 | Run this once from a host that can reach the production database config: |
| 35 | |
| 36 | ```sh |
| 37 | shithubd admin runner register \ |
| 38 | --name prod-runner-1 \ |
| 39 | --labels self-hosted,linux,ubuntu-latest \ |
| 40 | --capacity 1 |
| 41 | ``` |
| 42 | |
| 43 | Store the printed token in ansible-vault or the deployment secret store. |
| 44 | Only the token hash is stored in Postgres; the raw token cannot be |
| 45 | recovered later. |
| 46 | |
| 47 | ## Inventory |
| 48 | |
| 49 | Enable the role explicitly. The default is disabled so ordinary app |
| 50 | deploys do not start a runner by accident. |
| 51 | |
| 52 | ```ini |
| 53 | [shithub:vars] |
| 54 | shithub_runner_enabled=true |
| 55 | shithub_runner_token=REPLACE_ME |
| 56 | shithub_runner_labels=self-hosted,linux,ubuntu-latest |
| 57 | shithub_runner_capacity=1 |
| 58 | shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0 |
| 59 | shithub_runner_seccomp_profile=/etc/shithubd-runner/seccomp.json |
| 60 | shithub_runner_container_user=65534:65534 |
| 61 | shithub_runner_pids_limit=512 |
| 62 | shithub_runner_dns_servers=172.30.0.1 |
| 63 | ``` |
| 64 | |
| 65 | The role writes non-secret config to |
| 66 | `/etc/shithubd-runner/config.toml` and the registration token to |
| 67 | `/etc/shithubd-runner/runner.env` with mode `0600`. |
| 68 | Keep `shithub_runner_workspace_root` under `/var/lib/shithubd-runner`; |
| 69 | the systemd unit grants runner writes only to that subtree. |
| 70 | |
| 71 | `shithub_runner_network_allowlist` defaults to GitHub source/archive |
| 72 | hosts plus Docker Hub registry hosts. Override it when a runner must |
| 73 | fetch from an internal package registry. `shithub_runner_dns_servers` |
| 74 | is empty by default; set it only after a DNS allowlist resolver exists |
| 75 | on the runner network. |
| 76 | |
| 77 | ## Deploy |
| 78 | |
| 79 | For the runner role only: |
| 80 | |
| 81 | ```sh |
| 82 | make build |
| 83 | cd deploy/ansible |
| 84 | ansible-playbook -i inventory/production site.yml -t shithubd-runner |
| 85 | ``` |
| 86 | |
| 87 | The role: |
| 88 | |
| 89 | - creates the `shithub-runner` system user and joins it to `docker` |
| 90 | - uploads `/usr/local/bin/shithubd-runner` |
| 91 | - renders `/etc/shithubd-runner/config.toml` and `runner.env` |
| 92 | - renders `/etc/shithubd-runner/dnsmasq.conf` from the network |
| 93 | allowlist for operators who run a local DNS allowlist resolver |
| 94 | - installs the pinned seccomp profile at |
| 95 | `/etc/shithubd-runner/seccomp.json` |
| 96 | - installs `deploy/systemd/shithubd-runner.service` |
| 97 | - pulls the configured runner image |
| 98 | - enables and starts `shithubd-runner` |
| 99 | |
| 100 | ## Verify |
| 101 | |
| 102 | On the runner host: |
| 103 | |
| 104 | ```sh |
| 105 | systemctl status shithubd-runner |
| 106 | journalctl -u shithubd-runner -n 100 --no-pager |
| 107 | ``` |
| 108 | |
| 109 | Then push a workflow with a simple `run:` step: |
| 110 | |
| 111 | ```yaml |
| 112 | name: ci |
| 113 | on: push |
| 114 | jobs: |
| 115 | build: |
| 116 | runs-on: ubuntu-latest |
| 117 | steps: |
| 118 | - run: bash -c "echo hello && exit 0" |
| 119 | ``` |
| 120 | |
| 121 | Expected state: |
| 122 | |
| 123 | - a runner heartbeat claims the queued job within one idle poll interval |
| 124 | - the step emits SQL log chunks during execution |
| 125 | - `workflow:finalize_step` uploads |
| 126 | `actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log` |
| 127 | - the job and check run complete with conclusion `success` |
| 128 | |
| 129 | Repeat with `exit 1`; the check should complete with conclusion |
| 130 | `failure`. |
| 131 | |
| 132 | Sandbox smoke checks: |
| 133 | |
| 134 | ```yaml |
| 135 | name: sandbox-smoke |
| 136 | on: push |
| 137 | jobs: |
| 138 | smoke: |
| 139 | runs-on: ubuntu-latest |
| 140 | steps: |
| 141 | - run: id -u |
| 142 | - run: test "$(id -u)" = "65534" |
| 143 | - run: if mkdir /etc/shithub-smoke 2>/dev/null; then exit 1; fi |
| 144 | - run: if mount -t tmpfs tmpfs /mnt 2>/dev/null; then exit 1; fi |
| 145 | ``` |
| 146 | |
| 147 | Expected state: |
| 148 | |
| 149 | - the UID check prints `65534` |
| 150 | - writing under `/etc` fails because the root filesystem is read-only |
| 151 | - `mount` fails because the container does not have `CAP_SYS_ADMIN` |
| 152 | - step logs and systemd journal include the configured image, network, |
| 153 | CPU/memory limits, PID limit, container user, and seccomp profile |
| 154 | |
| 155 | ## Network Allowlist |
| 156 | |
| 157 | The runner config carries two separate network controls: |
| 158 | |
| 159 | - `runner.network_allowlist`: the host patterns allowed by the |
| 160 | operator's DNS allowlist resolver. |
| 161 | - `engine.dns_servers`: DNS servers passed to each step container with |
| 162 | Docker `--dns`. |
| 163 | |
| 164 | For a single-host deployment, create a dedicated Docker bridge for |
| 165 | Actions jobs, run dnsmasq bound to that bridge, render |
| 166 | `/etc/shithubd-runner/dnsmasq.conf`, and set |
| 167 | `shithub_runner_dns_servers` to the bridge address of that resolver. |
| 168 | The rendered dnsmasq config has no default upstream resolver; names not |
| 169 | matching the allowlist fail DNS resolution. |
| 170 | |
| 171 | DNS filtering is not a complete egress boundary by itself. Block |
| 172 | direct-IP egress from the Actions bridge with host firewall rules, and |
| 173 | allow only DNS to the resolver plus established outbound connections |
| 174 | opened by that resolver. Keep the runner on a separate host from web |
| 175 | and database services. |
| 176 | |
| 177 | ## Rollback |
| 178 | |
| 179 | Stop the runner first so it does not claim new jobs: |
| 180 | |
| 181 | ```sh |
| 182 | systemctl stop shithubd-runner |
| 183 | systemctl disable shithubd-runner |
| 184 | ``` |
| 185 | |
| 186 | If the binary itself is bad, copy a prior archived binary from |
| 187 | `/var/lib/shithubd-runner/binaries/` back to |
| 188 | `/usr/local/bin/shithubd-runner` and restart the unit. Jobs already |
| 189 | claimed by the stopped runner remain visible in the database; S41g adds |
| 190 | operator cancel/re-run controls. |