markdown · 3824 bytes Raw Blame History

Actions runner deploy runbook

This runbook owns the S41d deployment path for shithubd-runner: the Nix-built default image, systemd unit, and Ansible role. The smoke flow for an already-installed runner lives in actions-runner.md.

Prereqs

  • The app database is migrated through S41d and the web API has auth.totp_key_b64 configured so job JWTs can be minted.
  • Docker is installed on the runner host and the docker group exists. S41e narrows the sandbox; S41d runner hosts must be treated as trusted.
  • bin/shithubd-runner exists locally. make build builds both bin/shithubd and bin/shithubd-runner with the same version ldflags.
  • The default image has been loaded or published. Build it with:
nix build ./deploy/runner-images#runnerImage
docker load < result

The committed deploy/runner-images/flake.lock pins the nixpkgs input. Update it deliberately when changing the default image toolchain.

Publishing to GHCR is manual through .github/workflows/runner-image.yml because forks may not control the upstream ghcr.io/shithub namespace. Leave the workflow's image input blank to publish under the current repository's package namespace, or set it explicitly for upstream publishing.

Register

Run this once from a host that can reach the production database config:

shithubd admin runner register \
  --name prod-runner-1 \
  --labels self-hosted,linux,ubuntu-latest \
  --capacity 1

Store the printed token in ansible-vault or the deployment secret store. Only the token hash is stored in Postgres; the raw token cannot be recovered later.

Inventory

Enable the role explicitly. The default is disabled so ordinary app deploys do not start a runner by accident.

[shithub:vars]
shithub_runner_enabled=true
shithub_runner_token=REPLACE_ME
shithub_runner_labels=self-hosted,linux,ubuntu-latest
shithub_runner_capacity=1
shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0

The role writes non-secret config to /etc/shithubd-runner/config.toml and the registration token to /etc/shithubd-runner/runner.env with mode 0600. Keep shithub_runner_workspace_root under /var/lib/shithubd-runner; the systemd unit grants runner writes only to that subtree.

Deploy

For the runner role only:

make build
cd deploy/ansible
ansible-playbook -i inventory/production site.yml -t shithubd-runner

The role:

  • creates the shithub-runner system user and joins it to docker
  • uploads /usr/local/bin/shithubd-runner
  • renders /etc/shithubd-runner/config.toml and runner.env
  • installs deploy/systemd/shithubd-runner.service
  • pulls the configured runner image
  • enables and starts shithubd-runner

Verify

On the runner host:

systemctl status shithubd-runner
journalctl -u shithubd-runner -n 100 --no-pager

Then push a workflow with a simple run: step:

name: ci
on: push
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: bash -c "echo hello && exit 0"

Expected state:

  • a runner heartbeat claims the queued job within one idle poll interval
  • the step emits SQL log chunks during execution
  • workflow:finalize_step uploads actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log
  • the job and check run complete with conclusion success

Repeat with exit 1; the check should complete with conclusion failure.

Rollback

Stop the runner first so it does not claim new jobs:

systemctl stop shithubd-runner
systemctl disable shithubd-runner

If the binary itself is bad, copy a prior archived binary from /var/lib/shithubd-runner/binaries/ back to /usr/local/bin/shithubd-runner and restart the unit. Jobs already claimed by the stopped runner remain visible in the database; S41g adds operator cancel/re-run controls.

View source
1 # Actions runner deploy runbook
2
3 This runbook owns the S41d deployment path for `shithubd-runner`: the
4 Nix-built default image, systemd unit, and Ansible role. The smoke flow
5 for an already-installed runner lives in [actions-runner.md](./actions-runner.md).
6
7 ## Prereqs
8
9 - The app database is migrated through S41d and the web API has
10 `auth.totp_key_b64` configured so job JWTs can be minted.
11 - Docker is installed on the runner host and the `docker` group exists.
12 S41e narrows the sandbox; S41d runner hosts must be treated as trusted.
13 - `bin/shithubd-runner` exists locally. `make build` builds both
14 `bin/shithubd` and `bin/shithubd-runner` with the same version ldflags.
15 - The default image has been loaded or published. Build it with:
16
17 ```sh
18 nix build ./deploy/runner-images#runnerImage
19 docker load < result
20 ```
21
22 The committed `deploy/runner-images/flake.lock` pins the nixpkgs input.
23 Update it deliberately when changing the default image toolchain.
24
25 Publishing to GHCR is manual through `.github/workflows/runner-image.yml`
26 because forks may not control the upstream `ghcr.io/shithub` namespace.
27 Leave the workflow's `image` input blank to publish under the current
28 repository's package namespace, or set it explicitly for upstream
29 publishing.
30
31 ## Register
32
33 Run this once from a host that can reach the production database config:
34
35 ```sh
36 shithubd admin runner register \
37 --name prod-runner-1 \
38 --labels self-hosted,linux,ubuntu-latest \
39 --capacity 1
40 ```
41
42 Store the printed token in ansible-vault or the deployment secret store.
43 Only the token hash is stored in Postgres; the raw token cannot be
44 recovered later.
45
46 ## Inventory
47
48 Enable the role explicitly. The default is disabled so ordinary app
49 deploys do not start a runner by accident.
50
51 ```ini
52 [shithub:vars]
53 shithub_runner_enabled=true
54 shithub_runner_token=REPLACE_ME
55 shithub_runner_labels=self-hosted,linux,ubuntu-latest
56 shithub_runner_capacity=1
57 shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
58 ```
59
60 The role writes non-secret config to
61 `/etc/shithubd-runner/config.toml` and the registration token to
62 `/etc/shithubd-runner/runner.env` with mode `0600`.
63 Keep `shithub_runner_workspace_root` under `/var/lib/shithubd-runner`;
64 the systemd unit grants runner writes only to that subtree.
65
66 ## Deploy
67
68 For the runner role only:
69
70 ```sh
71 make build
72 cd deploy/ansible
73 ansible-playbook -i inventory/production site.yml -t shithubd-runner
74 ```
75
76 The role:
77
78 - creates the `shithub-runner` system user and joins it to `docker`
79 - uploads `/usr/local/bin/shithubd-runner`
80 - renders `/etc/shithubd-runner/config.toml` and `runner.env`
81 - installs `deploy/systemd/shithubd-runner.service`
82 - pulls the configured runner image
83 - enables and starts `shithubd-runner`
84
85 ## Verify
86
87 On the runner host:
88
89 ```sh
90 systemctl status shithubd-runner
91 journalctl -u shithubd-runner -n 100 --no-pager
92 ```
93
94 Then push a workflow with a simple `run:` step:
95
96 ```yaml
97 name: ci
98 on: push
99 jobs:
100 build:
101 runs-on: ubuntu-latest
102 steps:
103 - run: bash -c "echo hello && exit 0"
104 ```
105
106 Expected state:
107
108 - a runner heartbeat claims the queued job within one idle poll interval
109 - the step emits SQL log chunks during execution
110 - `workflow:finalize_step` uploads
111 `actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`
112 - the job and check run complete with conclusion `success`
113
114 Repeat with `exit 1`; the check should complete with conclusion
115 `failure`.
116
117 ## Rollback
118
119 Stop the runner first so it does not claim new jobs:
120
121 ```sh
122 systemctl stop shithubd-runner
123 systemctl disable shithubd-runner
124 ```
125
126 If the binary itself is bad, copy a prior archived binary from
127 `/var/lib/shithubd-runner/binaries/` back to
128 `/usr/local/bin/shithubd-runner` and restart the unit. Jobs already
129 claimed by the stopped runner remain visible in the database; S41g adds
130 operator cancel/re-run controls.