shithub Public

Watch 1 Fork 0 Star 0

markdown · 6137 bytes Raw Blame History

Actions runner deploy runbook

This runbook owns the S41d deployment path for shithubd-runner: the Nix-built default image, systemd unit, and Ansible role. The smoke flow for an already-installed runner lives in actions-runner.md.

Prereqs

The app database is migrated through S41d and the web API has auth.totp_key_b64 configured so job JWTs can be minted.
Docker is installed on the runner host and the docker group exists. The runner process needs Docker socket access; treat the host itself as trusted even though individual step containers are sandboxed.
bin/shithubd-runner exists locally. make build builds both bin/shithubd and bin/shithubd-runner with the same version ldflags.
The default image has been loaded or published. Build it with:

nix build ./deploy/runner-images#runnerImage
docker load < result

The committed deploy/runner-images/flake.lock pins the nixpkgs input. Update it deliberately when changing the default image toolchain.

Publishing to GHCR is manual through .github/workflows/runner-image.yml because forks may not control the upstream ghcr.io/shithub namespace. Leave the workflow's image input blank to publish under the current repository's package namespace, or set it explicitly for upstream publishing.

Register

Run this once from a host that can reach the production database config:

shithubd admin runner register \
  --name prod-runner-1 \
  --labels self-hosted,linux,ubuntu-latest \
  --capacity 1

Store the printed token in ansible-vault or the deployment secret store. Only the token hash is stored in Postgres; the raw token cannot be recovered later.

Inventory

Enable the role explicitly. The default is disabled so ordinary app deploys do not start a runner by accident.

[shithub:vars]
shithub_runner_enabled=true
shithub_runner_token=REPLACE_ME
shithub_runner_labels=self-hosted,linux,ubuntu-latest
shithub_runner_capacity=1
shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
shithub_runner_seccomp_profile=/etc/shithubd-runner/seccomp.json
shithub_runner_container_user=65534:65534
shithub_runner_pids_limit=512
shithub_runner_dns_servers=172.30.0.1

The role writes non-secret config to /etc/shithubd-runner/config.toml and the registration token to /etc/shithubd-runner/runner.env with mode 0600. Keep shithub_runner_workspace_root under /var/lib/shithubd-runner; the systemd unit grants runner writes only to that subtree.

shithub_runner_network_allowlist defaults to GitHub source/archive hosts plus Docker Hub registry hosts. Override it when a runner must fetch from an internal package registry. shithub_runner_dns_servers is empty by default; set it only after a DNS allowlist resolver exists on the runner network.

Deploy

For the runner role only:

make build
cd deploy/ansible
ansible-playbook -i inventory/production site.yml -t shithubd-runner

The role:

creates the shithub-runner system user and joins it to docker
uploads /usr/local/bin/shithubd-runner
renders /etc/shithubd-runner/config.toml and runner.env
renders /etc/shithubd-runner/dnsmasq.conf from the network allowlist for operators who run a local DNS allowlist resolver
installs the pinned seccomp profile at /etc/shithubd-runner/seccomp.json
installs deploy/systemd/shithubd-runner.service
pulls the configured runner image
enables and starts shithubd-runner

Verify

On the runner host:

systemctl status shithubd-runner
journalctl -u shithubd-runner -n 100 --no-pager

Then push a workflow with a simple run: step:

name: ci
on: push
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: bash -c "echo hello && exit 0"

Expected state:

a runner heartbeat claims the queued job within one idle poll interval
the step emits SQL log chunks during execution
workflow:finalize_step uploads actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log
the job and check run complete with conclusion success

Repeat with exit 1; the check should complete with conclusion failure.

Sandbox smoke checks:

name: sandbox-smoke
on: push
jobs:
  smoke:
    runs-on: ubuntu-latest
    steps:
      - run: id -u
      - run: test "$(id -u)" = "65534"
      - run: if mkdir /etc/shithub-smoke 2>/dev/null; then exit 1; fi
      - run: if mount -t tmpfs tmpfs /mnt 2>/dev/null; then exit 1; fi

Expected state:

the UID check prints 65534
writing under /etc fails because the root filesystem is read-only
mount fails because the container does not have CAP_SYS_ADMIN
step logs and systemd journal include the configured image, network, CPU/memory limits, PID limit, container user, and seccomp profile

Network Allowlist

The runner config carries two separate network controls:

runner.network_allowlist: the host patterns allowed by the operator's DNS allowlist resolver.
engine.dns_servers: DNS servers passed to each step container with Docker --dns.

For a single-host deployment, create a dedicated Docker bridge for Actions jobs, run dnsmasq bound to that bridge, render /etc/shithubd-runner/dnsmasq.conf, and set shithub_runner_dns_servers to the bridge address of that resolver. The rendered dnsmasq config has no default upstream resolver; names not matching the allowlist fail DNS resolution.

DNS filtering is not a complete egress boundary by itself. Block direct-IP egress from the Actions bridge with host firewall rules, and allow only DNS to the resolver plus established outbound connections opened by that resolver. Keep the runner on a separate host from web and database services.

Rollback

Stop the runner first so it does not claim new jobs:

systemctl stop shithubd-runner
systemctl disable shithubd-runner

If the binary itself is bad, copy a prior archived binary from /var/lib/shithubd-runner/binaries/ back to /usr/local/bin/shithubd-runner and restart the unit. Jobs already claimed by the stopped runner remain visible in the database; S41g adds operator cancel/re-run controls.

View source

  
        1
        # Actions runner deploy runbook
      
        2
        
        3
        This runbook owns the S41d deployment path for `shithubd-runner`: the
      
        4
        Nix-built default image, systemd unit, and Ansible role. The smoke flow
      
        5
        for an already-installed runner lives in [actions-runner.md](./actions-runner.md).
      
        6
        
        7
        ## Prereqs
      
        8
        
        9
        - The app database is migrated through S41d and the web API has
      
        10
          `auth.totp_key_b64` configured so job JWTs can be minted.
      
        11
        - Docker is installed on the runner host and the `docker` group exists.
      
        12
          The runner process needs Docker socket access; treat the host itself
      
        13
          as trusted even though individual step containers are sandboxed.
      
        14
        - `bin/shithubd-runner` exists locally. `make build` builds both
      
        15
          `bin/shithubd` and `bin/shithubd-runner` with the same version ldflags.
      
        16
        - The default image has been loaded or published. Build it with:
      
        17
        
        18
        ```sh
      
        19
        nix build ./deploy/runner-images#runnerImage
      
        20
        docker load < result
      
        21
        ```
      
        22
        
        23
        The committed `deploy/runner-images/flake.lock` pins the nixpkgs input.
      
        24
        Update it deliberately when changing the default image toolchain.
      
        25
        
        26
        Publishing to GHCR is manual through `.github/workflows/runner-image.yml`
      
        27
        because forks may not control the upstream `ghcr.io/shithub` namespace.
      
        28
        Leave the workflow's `image` input blank to publish under the current
      
        29
        repository's package namespace, or set it explicitly for upstream
      
        30
        publishing.
      
        31
        
        32
        ## Register
      
        33
        
        34
        Run this once from a host that can reach the production database config:
      
        35
        
        36
        ```sh
      
        37
        shithubd admin runner register \
      
        38
          --name prod-runner-1 \
      
        39
          --labels self-hosted,linux,ubuntu-latest \
      
        40
          --capacity 1
      
        41
        ```
      
        42
        
        43
        Store the printed token in ansible-vault or the deployment secret store.
      
        44
        Only the token hash is stored in Postgres; the raw token cannot be
      
        45
        recovered later.
      
        46
        
        47
        ## Inventory
      
        48
        
        49
        Enable the role explicitly. The default is disabled so ordinary app
      
        50
        deploys do not start a runner by accident.
      
        51
        
        52
        ```ini
      
        53
        [shithub:vars]
      
        54
        shithub_runner_enabled=true
      
        55
        shithub_runner_token=REPLACE_ME
      
        56
        shithub_runner_labels=self-hosted,linux,ubuntu-latest
      
        57
        shithub_runner_capacity=1
      
        58
        shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
      
        59
        shithub_runner_seccomp_profile=/etc/shithubd-runner/seccomp.json
      
        60
        shithub_runner_container_user=65534:65534
      
        61
        shithub_runner_pids_limit=512
      
        62
        shithub_runner_dns_servers=172.30.0.1
      
        63
        ```
      
        64
        
        65
        The role writes non-secret config to
      
        66
        `/etc/shithubd-runner/config.toml` and the registration token to
      
        67
        `/etc/shithubd-runner/runner.env` with mode `0600`.
      
        68
        Keep `shithub_runner_workspace_root` under `/var/lib/shithubd-runner`;
      
        69
        the systemd unit grants runner writes only to that subtree.
      
        70
        
        71
        `shithub_runner_network_allowlist` defaults to GitHub source/archive
      
        72
        hosts plus Docker Hub registry hosts. Override it when a runner must
      
        73
        fetch from an internal package registry. `shithub_runner_dns_servers`
      
        74
        is empty by default; set it only after a DNS allowlist resolver exists
      
        75
        on the runner network.
      
        76
        
        77
        ## Deploy
      
        78
        
        79
        For the runner role only:
      
        80
        
        81
        ```sh
      
        82
        make build
      
        83
        cd deploy/ansible
      
        84
        ansible-playbook -i inventory/production site.yml -t shithubd-runner
      
        85
        ```
      
        86
        
        87
        The role:
      
        88
        
        89
        - creates the `shithub-runner` system user and joins it to `docker`
      
        90
        - uploads `/usr/local/bin/shithubd-runner`
      
        91
        - renders `/etc/shithubd-runner/config.toml` and `runner.env`
      
        92
        - renders `/etc/shithubd-runner/dnsmasq.conf` from the network
      
        93
          allowlist for operators who run a local DNS allowlist resolver
      
        94
        - installs the pinned seccomp profile at
      
        95
          `/etc/shithubd-runner/seccomp.json`
      
        96
        - installs `deploy/systemd/shithubd-runner.service`
      
        97
        - pulls the configured runner image
      
        98
        - enables and starts `shithubd-runner`
      
        99
        
        100
        ## Verify
      
        101
        
        102
        On the runner host:
      
        103
        
        104
        ```sh
      
        105
        systemctl status shithubd-runner
      
        106
        journalctl -u shithubd-runner -n 100 --no-pager
      
        107
        ```
      
        108
        
        109
        Then push a workflow with a simple `run:` step:
      
        110
        
        111
        ```yaml
      
        112
        name: ci
      
        113
        on: push
      
        114
        jobs:
      
        115
          build:
      
        116
            runs-on: ubuntu-latest
      
        117
            steps:
      
        118
              - run: bash -c "echo hello && exit 0"
      
        119
        ```
      
        120
        
        121
        Expected state:
      
        122
        
        123
        - a runner heartbeat claims the queued job within one idle poll interval
      
        124
        - the step emits SQL log chunks during execution
      
        125
        - `workflow:finalize_step` uploads
      
        126
          `actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`
      
        127
        - the job and check run complete with conclusion `success`
      
        128
        
        129
        Repeat with `exit 1`; the check should complete with conclusion
      
        130
        `failure`.
      
        131
        
        132
        Sandbox smoke checks:
      
        133
        
        134
        ```yaml
      
        135
        name: sandbox-smoke
      
        136
        on: push
      
        137
        jobs:
      
        138
          smoke:
      
        139
            runs-on: ubuntu-latest
      
        140
            steps:
      
        141
              - run: id -u
      
        142
              - run: test "$(id -u)" = "65534"
      
        143
              - run: if mkdir /etc/shithub-smoke 2>/dev/null; then exit 1; fi
      
        144
              - run: if mount -t tmpfs tmpfs /mnt 2>/dev/null; then exit 1; fi
      
        145
        ```
      
        146
        
        147
        Expected state:
      
        148
        
        149
        - the UID check prints `65534`
      
        150
        - writing under `/etc` fails because the root filesystem is read-only
      
        151
        - `mount` fails because the container does not have `CAP_SYS_ADMIN`
      
        152
        - step logs and systemd journal include the configured image, network,
      
        153
          CPU/memory limits, PID limit, container user, and seccomp profile
      
        154
        
        155
        ## Network Allowlist
      
        156
        
        157
        The runner config carries two separate network controls:
      
        158
        
        159
        - `runner.network_allowlist`: the host patterns allowed by the
      
        160
          operator's DNS allowlist resolver.
      
        161
        - `engine.dns_servers`: DNS servers passed to each step container with
      
        162
          Docker `--dns`.
      
        163
        
        164
        For a single-host deployment, create a dedicated Docker bridge for
      
        165
        Actions jobs, run dnsmasq bound to that bridge, render
      
        166
        `/etc/shithubd-runner/dnsmasq.conf`, and set
      
        167
        `shithub_runner_dns_servers` to the bridge address of that resolver.
      
        168
        The rendered dnsmasq config has no default upstream resolver; names not
      
        169
        matching the allowlist fail DNS resolution.
      
        170
        
        171
        DNS filtering is not a complete egress boundary by itself. Block
      
        172
        direct-IP egress from the Actions bridge with host firewall rules, and
      
        173
        allow only DNS to the resolver plus established outbound connections
      
        174
        opened by that resolver. Keep the runner on a separate host from web
      
        175
        and database services.
      
        176
        
        177
        ## Rollback
      
        178
        
        179
        Stop the runner first so it does not claim new jobs:
      
        180
        
        181
        ```sh
      
        182
        systemctl stop shithubd-runner
      
        183
        systemctl disable shithubd-runner
      
        184
        ```
      
        185
        
        186
        If the binary itself is bad, copy a prior archived binary from
      
        187
        `/var/lib/shithubd-runner/binaries/` back to
      
        188
        `/usr/local/bin/shithubd-runner` and restart the unit. Jobs already
      
        189
        claimed by the stopped runner remain visible in the database; S41g adds
      
        190
        operator cancel/re-run controls.

1	# Actions runner deploy runbook
2
3	This runbook owns the S41d deployment path for `shithubd-runner`: the
4	Nix-built default image, systemd unit, and Ansible role. The smoke flow
5	for an already-installed runner lives in [actions-runner.md](./actions-runner.md).
6
7	## Prereqs
8
9	- The app database is migrated through S41d and the web API has
10	`auth.totp_key_b64` configured so job JWTs can be minted.
11	- Docker is installed on the runner host and the `docker` group exists.
12	The runner process needs Docker socket access; treat the host itself
13	as trusted even though individual step containers are sandboxed.
14	- `bin/shithubd-runner` exists locally. `make build` builds both
15	`bin/shithubd` and `bin/shithubd-runner` with the same version ldflags.
16	- The default image has been loaded or published. Build it with:
17
18	```sh
19	nix build ./deploy/runner-images#runnerImage
20	docker load < result
21	```
22
23	The committed `deploy/runner-images/flake.lock` pins the nixpkgs input.
24	Update it deliberately when changing the default image toolchain.
25
26	Publishing to GHCR is manual through `.github/workflows/runner-image.yml`
27	because forks may not control the upstream `ghcr.io/shithub` namespace.
28	Leave the workflow's `image` input blank to publish under the current
29	repository's package namespace, or set it explicitly for upstream
30	publishing.
31
32	## Register
33
34	Run this once from a host that can reach the production database config:
35
36	```sh
37	shithubd admin runner register \
38	--name prod-runner-1 \
39	--labels self-hosted,linux,ubuntu-latest \
40	--capacity 1
41	```
42
43	Store the printed token in ansible-vault or the deployment secret store.
44	Only the token hash is stored in Postgres; the raw token cannot be
45	recovered later.
46
47	## Inventory
48
49	Enable the role explicitly. The default is disabled so ordinary app
50	deploys do not start a runner by accident.
51
52	```ini
53	[shithub:vars]
54	shithub_runner_enabled=true
55	shithub_runner_token=REPLACE_ME
56	shithub_runner_labels=self-hosted,linux,ubuntu-latest
57	shithub_runner_capacity=1
58	shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
59	shithub_runner_seccomp_profile=/etc/shithubd-runner/seccomp.json
60	shithub_runner_container_user=65534:65534
61	shithub_runner_pids_limit=512
62	shithub_runner_dns_servers=172.30.0.1
63	```
64
65	The role writes non-secret config to
66	`/etc/shithubd-runner/config.toml` and the registration token to
67	`/etc/shithubd-runner/runner.env` with mode `0600`.
68	Keep `shithub_runner_workspace_root` under `/var/lib/shithubd-runner`;
69	the systemd unit grants runner writes only to that subtree.
70
71	`shithub_runner_network_allowlist` defaults to GitHub source/archive
72	hosts plus Docker Hub registry hosts. Override it when a runner must
73	fetch from an internal package registry. `shithub_runner_dns_servers`
74	is empty by default; set it only after a DNS allowlist resolver exists
75	on the runner network.
76
77	## Deploy
78
79	For the runner role only:
80
81	```sh
82	make build
83	cd deploy/ansible
84	ansible-playbook -i inventory/production site.yml -t shithubd-runner
85	```
86
87	The role:
88
89	- creates the `shithub-runner` system user and joins it to `docker`
90	- uploads `/usr/local/bin/shithubd-runner`
91	- renders `/etc/shithubd-runner/config.toml` and `runner.env`
92	- renders `/etc/shithubd-runner/dnsmasq.conf` from the network
93	allowlist for operators who run a local DNS allowlist resolver
94	- installs the pinned seccomp profile at
95	`/etc/shithubd-runner/seccomp.json`
96	- installs `deploy/systemd/shithubd-runner.service`
97	- pulls the configured runner image
98	- enables and starts `shithubd-runner`
99
100	## Verify
101
102	On the runner host:
103
104	```sh
105	systemctl status shithubd-runner
106	journalctl -u shithubd-runner -n 100 --no-pager
107	```
108
109	Then push a workflow with a simple `run:` step:
110
111	```yaml
112	name: ci
113	on: push
114	jobs:
115	build:
116	runs-on: ubuntu-latest
117	steps:
118	- run: bash -c "echo hello && exit 0"
119	```
120
121	Expected state:
122
123	- a runner heartbeat claims the queued job within one idle poll interval
124	- the step emits SQL log chunks during execution
125	- `workflow:finalize_step` uploads
126	`actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`
127	- the job and check run complete with conclusion `success`
128
129	Repeat with `exit 1`; the check should complete with conclusion
130	`failure`.
131
132	Sandbox smoke checks:
133
134	```yaml
135	name: sandbox-smoke
136	on: push
137	jobs:
138	smoke:
139	runs-on: ubuntu-latest
140	steps:
141	- run: id -u
142	- run: test "$(id -u)" = "65534"
143	- run: if mkdir /etc/shithub-smoke 2>/dev/null; then exit 1; fi
144	- run: if mount -t tmpfs tmpfs /mnt 2>/dev/null; then exit 1; fi
145	```
146
147	Expected state:
148
149	- the UID check prints `65534`
150	- writing under `/etc` fails because the root filesystem is read-only
151	- `mount` fails because the container does not have `CAP_SYS_ADMIN`
152	- step logs and systemd journal include the configured image, network,
153	CPU/memory limits, PID limit, container user, and seccomp profile
154
155	## Network Allowlist
156
157	The runner config carries two separate network controls:
158
159	- `runner.network_allowlist`: the host patterns allowed by the
160	operator's DNS allowlist resolver.
161	- `engine.dns_servers`: DNS servers passed to each step container with
162	Docker `--dns`.
163
164	For a single-host deployment, create a dedicated Docker bridge for
165	Actions jobs, run dnsmasq bound to that bridge, render
166	`/etc/shithubd-runner/dnsmasq.conf`, and set
167	`shithub_runner_dns_servers` to the bridge address of that resolver.
168	The rendered dnsmasq config has no default upstream resolver; names not
169	matching the allowlist fail DNS resolution.
170
171	DNS filtering is not a complete egress boundary by itself. Block
172	direct-IP egress from the Actions bridge with host firewall rules, and
173	allow only DNS to the resolver plus established outbound connections
174	opened by that resolver. Keep the runner on a separate host from web
175	and database services.
176
177	## Rollback
178
179	Stop the runner first so it does not claim new jobs:
180
181	```sh
182	systemctl stop shithubd-runner
183	systemctl disable shithubd-runner
184	```
185
186	If the binary itself is bad, copy a prior archived binary from
187	`/var/lib/shithubd-runner/binaries/` back to
188	`/usr/local/bin/shithubd-runner` and restart the unit. Jobs already
189	claimed by the stopped runner remain visible in the database; S41g adds
190	operator cancel/re-run controls.