shithub Public

Watch 1 Fork 0 Star 0

markdown · 9435 bytes Raw Blame History

Actions runner deploy runbook

This runbook owns the S41d/S41j deployment path for shithubd-runner: the Nix-built default image, systemd unit, Ansible role, and DigitalOcean runner pool bootstrap. The smoke flow for an already-installed runner lives in actions-runner.md.

Prereqs

The app database is migrated through S41d and the web API has auth.totp_key_b64 configured so job JWTs can be minted.
Docker is installed on the runner host and the docker group exists. The runner process needs Docker socket access; treat the host itself as trusted even though individual step containers are sandboxed.
For DigitalOcean runner pools: doctl and jq installed locally, authenticated with doctl auth init, and a named SSH key uploaded to the DigitalOcean account.
Runner host SSH ingress must be restricted to operator or VPN CIDRs. Do not create runner droplets with SSH open to 0.0.0.0/0.
bin/shithubd-runner exists locally. make build builds both bin/shithubd and bin/shithubd-runner with the same version ldflags.
The default image has been loaded or published. Build it with:

nix build ./deploy/runner-images#runnerImage
docker load < result

The committed deploy/runner-images/flake.lock pins the nixpkgs input. Update it deliberately when changing the default image toolchain.

Publishing to GHCR is manual through .github/workflows/runner-image.yml because forks may not control the upstream ghcr.io/shithub namespace. Leave the workflow's image input blank to publish under the current repository's package namespace, or set it explicitly for upstream publishing.

DigitalOcean pool bootstrap

S41j runner hosts are separate droplets tagged shithub-actions-runner. The app/database host must not double as the arbitrary-code runner host.

Validate the plan first:

SSH_KEY_NAME=macbook-pro \
SSH_ALLOWED_CIDRS=203.0.113.4/32 \
./deploy/doctl/provision-actions-runner-pool.sh --dry-run

Create the first shared Linux runner:

SSH_KEY_NAME=macbook-pro \
SSH_ALLOWED_CIDRS=203.0.113.4/32 \
./deploy/doctl/provision-actions-runner-pool.sh

Useful overrides:

POOL_NAME=shared-linux
PROJECT_NAME=shithub-prod
REGION=sfo3
SIZE=s-2vcpu-4gb
IMAGE=ubuntu-24-04-x64
COUNT=1
VPC_UUID=REPLACE_ME_OPTIONAL

The provisioner:

creates or reuses the DigitalOcean project;
creates or reuses a tag-targeted cloud firewall;
refuses public SSH CIDRs;
creates missing droplets named shithub-runner-<pool>-N;
applies the shithub-actions-runner and pool tags;
installs a no-secret cloud-init baseline that enables Docker;
prints machine-readable JSON for operator records.

Generate an Ansible inventory from the DigitalOcean tag:

./deploy/doctl/generate-actions-runner-inventory.sh \
  --output deploy/ansible/inventory/actions-runners

The generated file contains per-host shithub_runner_token placeholders. Replace them with values from shithubd admin runner register, ideally through ansible-vault or host_vars rather than committing plaintext inventory.

Register

Run this once from a host that can reach the production database config:

shithubd admin runner register \
  --name prod-runner-1 \
  --labels self-hosted,linux,ubuntu-latest,x64 \
  --capacity 1 \
  --output json

Store the returned token in ansible-vault or the deployment secret store. Only the token hash is stored in Postgres; the raw token cannot be recovered later. Use --expires-in only when the deployment rotates the token before expiry, because the runner presents the registration token on every heartbeat.

For a generated DigitalOcean inventory, register one token per runner host and use the host name as the runner name:

shithubd admin runner register \
  --name shithub-runner-shared-linux-1 \
  --labels self-hosted,linux,ubuntu-latest,x64 \
  --capacity 1 \
  --output json

Inventory

Enable the role explicitly. The default is disabled so ordinary app deploys do not start a runner by accident.

[shithub:vars]
shithub_runner_enabled=true
shithub_runner_token=REPLACE_ME
shithub_runner_labels=self-hosted,linux,ubuntu-latest,x64
shithub_runner_capacity=1
shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
shithub_runner_seccomp_profile=/etc/shithubd-runner/seccomp.json
shithub_runner_container_user=65534:65534
shithub_runner_pids_limit=512

The role writes non-secret config to /etc/shithubd-runner/config.toml and the registration token to /etc/shithubd-runner/runner.env with mode 0600. Keep shithub_runner_workspace_root under /var/lib/shithubd-runner; the systemd unit grants runner writes only to that subtree.

shithub_runner_network_allowlist defaults to GitHub source/archive hosts plus Docker Hub registry hosts. Override it when a runner must fetch from an internal package registry. The role creates the shithub-actions Docker bridge at 172.30.0.1/24, runs dnsmasq on that bridge, and sets engine.dns_servers to the bridge resolver by default.

Deploy

For the runner role only:

make build
cd deploy/ansible
ansible-playbook -i inventory/production site.yml -t shithubd-runner

For the generated DigitalOcean runner inventory:

make build
cd deploy/ansible
ansible-playbook -i inventory/actions-runners site.yml -t shithubd-runner

The role:

creates the shithub-runner system user and joins it to docker
uploads /usr/local/bin/shithubd-runner
renders /etc/shithubd-runner/config.toml and runner.env
creates the dedicated Actions Docker network and bridge
renders /etc/dnsmasq.d/shithubd-runner.conf from the network allowlist and starts dnsmasq bound to the Actions bridge
installs shithub-runner-firewall.service, which rejects direct-IP egress from step containers unless dnsmasq populated the destination in the allowlist ipset
installs the pinned seccomp profile at /etc/shithubd-runner/seccomp.json
installs deploy/systemd/shithubd-runner.service
pulls the configured runner image
enables and starts shithubd-runner

Verify

On the runner host:

systemctl status shithubd-runner
journalctl -u shithubd-runner -n 100 --no-pager

Then push a workflow with a simple run: step:

name: ci
on: push
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: bash -c "echo hello && exit 0"

Expected state:

a runner heartbeat claims the queued job within one idle poll interval
the step emits SQL log chunks during execution
workflow:finalize_step uploads actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log
the job and check run complete with conclusion success

Repeat with exit 1; the check should complete with conclusion failure.

Sandbox smoke checks:

name: sandbox-smoke
on: push
jobs:
  smoke:
    runs-on: ubuntu-latest
    steps:
      - run: id -u
      - run: test "$(id -u)" = "65534"
      - run: if mkdir /etc/shithub-smoke 2>/dev/null; then exit 1; fi
      - run: if mount -t tmpfs tmpfs /mnt 2>/dev/null; then exit 1; fi

Expected state:

the UID check prints 65534
a workflow-level request for permissions: {shithub-runner-root: write} still runs as 65534; root opt-in is disabled in the shipped runner config until a trusted-workflow policy exists
writing under /etc fails because the root filesystem is read-only
mount fails because the container does not have CAP_SYS_ADMIN
step logs and systemd journal include the configured image, network, CPU/memory limits, PID limit, container user, and seccomp profile

Network Allowlist

The runner config carries two separate network controls:

runner.network_allowlist: the host patterns allowed by the operator's DNS allowlist resolver.
engine.dns_servers: DNS servers passed to each step container with Docker --dns.

For a single-host deployment, create a dedicated Docker bridge for Actions jobs, run dnsmasq bound to that bridge, and set shithub_runner_dns_servers to the bridge address of that resolver. The Ansible role now does this by default. The rendered dnsmasq config has no default upstream resolver; names not matching the allowlist fail DNS resolution.

The firewall service closes the direct-IP bypass: containers on the Actions subnet may send DNS only to the bridge resolver, and other egress is allowed only when the destination IP is present in the dnsmasq-populated ipset. Keep the runner on a separate host from web and database services.

Rollback

Stop the runner first so it does not claim new jobs:

systemctl stop shithubd-runner
systemctl disable shithubd-runner

If the binary itself is bad, copy a prior archived binary from /var/lib/shithubd-runner/binaries/ back to /usr/local/bin/shithubd-runner and restart the unit. Jobs already claimed by the stopped runner remain visible in the database; S41g adds operator cancel/re-run controls.

For a DigitalOcean test runner, drain or revoke the runner in shithub before destroying the droplet. Then list and delete the specific test host:

doctl compute droplet list --tag-name shithub-actions-runner
doctl compute droplet delete shithub-runner-shared-linux-1 --force

If the pool firewall was created only for a disposable test and no runner droplets still use it:

doctl compute firewall list
doctl compute firewall delete <firewall-id> --force

View source

  
        1
        # Actions runner deploy runbook
      
        2
        
        3
        This runbook owns the S41d/S41j deployment path for `shithubd-runner`: the
      
        4
        Nix-built default image, systemd unit, Ansible role, and DigitalOcean runner
      
        5
        pool bootstrap. The smoke flow for an already-installed runner lives in
      
        6
        [actions-runner.md](./actions-runner.md).
      
        7
        
        8
        ## Prereqs
      
        9
        
        10
        - The app database is migrated through S41d and the web API has
      
        11
          `auth.totp_key_b64` configured so job JWTs can be minted.
      
        12
        - Docker is installed on the runner host and the `docker` group exists.
      
        13
          The runner process needs Docker socket access; treat the host itself
      
        14
          as trusted even though individual step containers are sandboxed.
      
        15
        - For DigitalOcean runner pools: `doctl` and `jq` installed locally,
      
        16
          authenticated with `doctl auth init`, and a named SSH key uploaded to
      
        17
          the DigitalOcean account.
      
        18
        - Runner host SSH ingress must be restricted to operator or VPN CIDRs.
      
        19
          Do not create runner droplets with SSH open to `0.0.0.0/0`.
      
        20
        - `bin/shithubd-runner` exists locally. `make build` builds both
      
        21
          `bin/shithubd` and `bin/shithubd-runner` with the same version ldflags.
      
        22
        - The default image has been loaded or published. Build it with:
      
        23
        
        24
        ```sh
      
        25
        nix build ./deploy/runner-images#runnerImage
      
        26
        docker load < result
      
        27
        ```
      
        28
        
        29
        The committed `deploy/runner-images/flake.lock` pins the nixpkgs input.
      
        30
        Update it deliberately when changing the default image toolchain.
      
        31
        
        32
        Publishing to GHCR is manual through `.github/workflows/runner-image.yml`
      
        33
        because forks may not control the upstream `ghcr.io/shithub` namespace.
      
        34
        Leave the workflow's `image` input blank to publish under the current
      
        35
        repository's package namespace, or set it explicitly for upstream
      
        36
        publishing.
      
        37
        
        38
        ## DigitalOcean pool bootstrap
      
        39
        
        40
        S41j runner hosts are separate droplets tagged `shithub-actions-runner`.
      
        41
        The app/database host must not double as the arbitrary-code runner host.
      
        42
        
        43
        Validate the plan first:
      
        44
        
        45
        ```sh
      
        46
        SSH_KEY_NAME=macbook-pro \
      
        47
        SSH_ALLOWED_CIDRS=203.0.113.4/32 \
      
        48
        ./deploy/doctl/provision-actions-runner-pool.sh --dry-run
      
        49
        ```
      
        50
        
        51
        Create the first shared Linux runner:
      
        52
        
        53
        ```sh
      
        54
        SSH_KEY_NAME=macbook-pro \
      
        55
        SSH_ALLOWED_CIDRS=203.0.113.4/32 \
      
        56
        ./deploy/doctl/provision-actions-runner-pool.sh
      
        57
        ```
      
        58
        
        59
        Useful overrides:
      
        60
        
        61
        ```sh
      
        62
        POOL_NAME=shared-linux
      
        63
        PROJECT_NAME=shithub-prod
      
        64
        REGION=sfo3
      
        65
        SIZE=s-2vcpu-4gb
      
        66
        IMAGE=ubuntu-24-04-x64
      
        67
        COUNT=1
      
        68
        VPC_UUID=REPLACE_ME_OPTIONAL
      
        69
        ```
      
        70
        
        71
        The provisioner:
      
        72
        
        73
        - creates or reuses the DigitalOcean project;
      
        74
        - creates or reuses a tag-targeted cloud firewall;
      
        75
        - refuses public SSH CIDRs;
      
        76
        - creates missing droplets named `shithub-runner-<pool>-N`;
      
        77
        - applies the `shithub-actions-runner` and pool tags;
      
        78
        - installs a no-secret cloud-init baseline that enables Docker;
      
        79
        - prints machine-readable JSON for operator records.
      
        80
        
        81
        Generate an Ansible inventory from the DigitalOcean tag:
      
        82
        
        83
        ```sh
      
        84
        ./deploy/doctl/generate-actions-runner-inventory.sh \
      
        85
          --output deploy/ansible/inventory/actions-runners
      
        86
        ```
      
        87
        
        88
        The generated file contains per-host `shithub_runner_token` placeholders.
      
        89
        Replace them with values from `shithubd admin runner register`, ideally through
      
        90
        ansible-vault or host_vars rather than committing plaintext inventory.
      
        91
        
        92
        ## Register
      
        93
        
        94
        Run this once from a host that can reach the production database config:
      
        95
        
        96
        ```sh
      
        97
        shithubd admin runner register \
      
        98
          --name prod-runner-1 \
      
        99
          --labels self-hosted,linux,ubuntu-latest,x64 \
      
        100
          --capacity 1 \
      
        101
          --output json
      
        102
        ```
      
        103
        
        104
        Store the returned `token` in ansible-vault or the deployment secret store.
      
        105
        Only the token hash is stored in Postgres; the raw token cannot be
      
        106
        recovered later.
      
        107
        Use `--expires-in` only when the deployment rotates the token before expiry,
      
        108
        because the runner presents the registration token on every heartbeat.
      
        109
        
        110
        For a generated DigitalOcean inventory, register one token per runner host and
      
        111
        use the host name as the runner name:
      
        112
        
        113
        ```sh
      
        114
        shithubd admin runner register \
      
        115
          --name shithub-runner-shared-linux-1 \
      
        116
          --labels self-hosted,linux,ubuntu-latest,x64 \
      
        117
          --capacity 1 \
      
        118
          --output json
      
        119
        ```
      
        120
        
        121
        ## Inventory
      
        122
        
        123
        Enable the role explicitly. The default is disabled so ordinary app
      
        124
        deploys do not start a runner by accident.
      
        125
        
        126
        ```ini
      
        127
        [shithub:vars]
      
        128
        shithub_runner_enabled=true
      
        129
        shithub_runner_token=REPLACE_ME
      
        130
        shithub_runner_labels=self-hosted,linux,ubuntu-latest,x64
      
        131
        shithub_runner_capacity=1
      
        132
        shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
      
        133
        shithub_runner_seccomp_profile=/etc/shithubd-runner/seccomp.json
      
        134
        shithub_runner_container_user=65534:65534
      
        135
        shithub_runner_pids_limit=512
      
        136
        ```
      
        137
        
        138
        The role writes non-secret config to
      
        139
        `/etc/shithubd-runner/config.toml` and the registration token to
      
        140
        `/etc/shithubd-runner/runner.env` with mode `0600`.
      
        141
        Keep `shithub_runner_workspace_root` under `/var/lib/shithubd-runner`;
      
        142
        the systemd unit grants runner writes only to that subtree.
      
        143
        
        144
        `shithub_runner_network_allowlist` defaults to GitHub source/archive
      
        145
        hosts plus Docker Hub registry hosts. Override it when a runner must
      
        146
        fetch from an internal package registry. The role creates the
      
        147
        `shithub-actions` Docker bridge at `172.30.0.1/24`, runs dnsmasq on
      
        148
        that bridge, and sets `engine.dns_servers` to the bridge resolver by
      
        149
        default.
      
        150
        
        151
        ## Deploy
      
        152
        
        153
        For the runner role only:
      
        154
        
        155
        ```sh
      
        156
        make build
      
        157
        cd deploy/ansible
      
        158
        ansible-playbook -i inventory/production site.yml -t shithubd-runner
      
        159
        ```
      
        160
        
        161
        For the generated DigitalOcean runner inventory:
      
        162
        
        163
        ```sh
      
        164
        make build
      
        165
        cd deploy/ansible
      
        166
        ansible-playbook -i inventory/actions-runners site.yml -t shithubd-runner
      
        167
        ```
      
        168
        
        169
        The role:
      
        170
        
        171
        - creates the `shithub-runner` system user and joins it to `docker`
      
        172
        - uploads `/usr/local/bin/shithubd-runner`
      
        173
        - renders `/etc/shithubd-runner/config.toml` and `runner.env`
      
        174
        - creates the dedicated Actions Docker network and bridge
      
        175
        - renders `/etc/dnsmasq.d/shithubd-runner.conf` from the network
      
        176
          allowlist and starts dnsmasq bound to the Actions bridge
      
        177
        - installs `shithub-runner-firewall.service`, which rejects direct-IP
      
        178
          egress from step containers unless dnsmasq populated the destination
      
        179
          in the allowlist ipset
      
        180
        - installs the pinned seccomp profile at
      
        181
          `/etc/shithubd-runner/seccomp.json`
      
        182
        - installs `deploy/systemd/shithubd-runner.service`
      
        183
        - pulls the configured runner image
      
        184
        - enables and starts `shithubd-runner`
      
        185
        
        186
        ## Verify
      
        187
        
        188
        On the runner host:
      
        189
        
        190
        ```sh
      
        191
        systemctl status shithubd-runner
      
        192
        journalctl -u shithubd-runner -n 100 --no-pager
      
        193
        ```
      
        194
        
        195
        Then push a workflow with a simple `run:` step:
      
        196
        
        197
        ```yaml
      
        198
        name: ci
      
        199
        on: push
      
        200
        jobs:
      
        201
          build:
      
        202
            runs-on: ubuntu-latest
      
        203
            steps:
      
        204
              - run: bash -c "echo hello && exit 0"
      
        205
        ```
      
        206
        
        207
        Expected state:
      
        208
        
        209
        - a runner heartbeat claims the queued job within one idle poll interval
      
        210
        - the step emits SQL log chunks during execution
      
        211
        - `workflow:finalize_step` uploads
      
        212
          `actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`
      
        213
        - the job and check run complete with conclusion `success`
      
        214
        
        215
        Repeat with `exit 1`; the check should complete with conclusion
      
        216
        `failure`.
      
        217
        
        218
        Sandbox smoke checks:
      
        219
        
        220
        ```yaml
      
        221
        name: sandbox-smoke
      
        222
        on: push
      
        223
        jobs:
      
        224
          smoke:
      
        225
            runs-on: ubuntu-latest
      
        226
            steps:
      
        227
              - run: id -u
      
        228
              - run: test "$(id -u)" = "65534"
      
        229
              - run: if mkdir /etc/shithub-smoke 2>/dev/null; then exit 1; fi
      
        230
              - run: if mount -t tmpfs tmpfs /mnt 2>/dev/null; then exit 1; fi
      
        231
        ```
      
        232
        
        233
        Expected state:
      
        234
        
        235
        - the UID check prints `65534`
      
        236
        - a workflow-level request for `permissions: {shithub-runner-root: write}`
      
        237
          still runs as `65534`; root opt-in is disabled in the shipped runner
      
        238
          config until a trusted-workflow policy exists
      
        239
        - writing under `/etc` fails because the root filesystem is read-only
      
        240
        - `mount` fails because the container does not have `CAP_SYS_ADMIN`
      
        241
        - step logs and systemd journal include the configured image, network,
      
        242
          CPU/memory limits, PID limit, container user, and seccomp profile
      
        243
        
        244
        ## Network Allowlist
      
        245
        
        246
        The runner config carries two separate network controls:
      
        247
        
        248
        - `runner.network_allowlist`: the host patterns allowed by the
      
        249
          operator's DNS allowlist resolver.
      
        250
        - `engine.dns_servers`: DNS servers passed to each step container with
      
        251
          Docker `--dns`.
      
        252
        
        253
        For a single-host deployment, create a dedicated Docker bridge for
      
        254
        Actions jobs, run dnsmasq bound to that bridge, and set
      
        255
        `shithub_runner_dns_servers` to the bridge address of that resolver.
      
        256
        The Ansible role now does this by default. The rendered dnsmasq config
      
        257
        has no default upstream resolver; names not matching the allowlist fail
      
        258
        DNS resolution.
      
        259
        
        260
        The firewall service closes the direct-IP bypass: containers on the
      
        261
        Actions subnet may send DNS only to the bridge resolver, and other
      
        262
        egress is allowed only when the destination IP is present in the
      
        263
        dnsmasq-populated ipset. Keep the runner on a separate host from web
      
        264
        and database services.
      
        265
        
        266
        ## Rollback
      
        267
        
        268
        Stop the runner first so it does not claim new jobs:
      
        269
        
        270
        ```sh
      
        271
        systemctl stop shithubd-runner
      
        272
        systemctl disable shithubd-runner
      
        273
        ```
      
        274
        
        275
        If the binary itself is bad, copy a prior archived binary from
      
        276
        `/var/lib/shithubd-runner/binaries/` back to
      
        277
        `/usr/local/bin/shithubd-runner` and restart the unit. Jobs already
      
        278
        claimed by the stopped runner remain visible in the database; S41g adds
      
        279
        operator cancel/re-run controls.
      
        280
        
        281
        For a DigitalOcean test runner, drain or revoke the runner in shithub before
      
        282
        destroying the droplet. Then list and delete the specific test host:
      
        283
        
        284
        ```sh
      
        285
        doctl compute droplet list --tag-name shithub-actions-runner
      
        286
        doctl compute droplet delete shithub-runner-shared-linux-1 --force
      
        287
        ```
      
        288
        
        289
        If the pool firewall was created only for a disposable test and no runner
      
        290
        droplets still use it:
      
        291
        
        292
        ```sh
      
        293
        doctl compute firewall list
      
        294
        doctl compute firewall delete <firewall-id> --force
      
        295
        ```

1	# Actions runner deploy runbook
2
3	This runbook owns the S41d/S41j deployment path for `shithubd-runner`: the
4	Nix-built default image, systemd unit, Ansible role, and DigitalOcean runner
5	pool bootstrap. The smoke flow for an already-installed runner lives in
6	[actions-runner.md](./actions-runner.md).
7
8	## Prereqs
9
10	- The app database is migrated through S41d and the web API has
11	`auth.totp_key_b64` configured so job JWTs can be minted.
12	- Docker is installed on the runner host and the `docker` group exists.
13	The runner process needs Docker socket access; treat the host itself
14	as trusted even though individual step containers are sandboxed.
15	- For DigitalOcean runner pools: `doctl` and `jq` installed locally,
16	authenticated with `doctl auth init`, and a named SSH key uploaded to
17	the DigitalOcean account.
18	- Runner host SSH ingress must be restricted to operator or VPN CIDRs.
19	Do not create runner droplets with SSH open to `0.0.0.0/0`.
20	- `bin/shithubd-runner` exists locally. `make build` builds both
21	`bin/shithubd` and `bin/shithubd-runner` with the same version ldflags.
22	- The default image has been loaded or published. Build it with:
23
24	```sh
25	nix build ./deploy/runner-images#runnerImage
26	docker load < result
27	```
28
29	The committed `deploy/runner-images/flake.lock` pins the nixpkgs input.
30	Update it deliberately when changing the default image toolchain.
31
32	Publishing to GHCR is manual through `.github/workflows/runner-image.yml`
33	because forks may not control the upstream `ghcr.io/shithub` namespace.
34	Leave the workflow's `image` input blank to publish under the current
35	repository's package namespace, or set it explicitly for upstream
36	publishing.
37
38	## DigitalOcean pool bootstrap
39
40	S41j runner hosts are separate droplets tagged `shithub-actions-runner`.
41	The app/database host must not double as the arbitrary-code runner host.
42
43	Validate the plan first:
44
45	```sh
46	SSH_KEY_NAME=macbook-pro \
47	SSH_ALLOWED_CIDRS=203.0.113.4/32 \
48	./deploy/doctl/provision-actions-runner-pool.sh --dry-run
49	```
50
51	Create the first shared Linux runner:
52
53	```sh
54	SSH_KEY_NAME=macbook-pro \
55	SSH_ALLOWED_CIDRS=203.0.113.4/32 \
56	./deploy/doctl/provision-actions-runner-pool.sh
57	```
58
59	Useful overrides:
60
61	```sh
62	POOL_NAME=shared-linux
63	PROJECT_NAME=shithub-prod
64	REGION=sfo3
65	SIZE=s-2vcpu-4gb
66	IMAGE=ubuntu-24-04-x64
67	COUNT=1
68	VPC_UUID=REPLACE_ME_OPTIONAL
69	```
70
71	The provisioner:
72
73	- creates or reuses the DigitalOcean project;
74	- creates or reuses a tag-targeted cloud firewall;
75	- refuses public SSH CIDRs;
76	- creates missing droplets named `shithub-runner-<pool>-N`;
77	- applies the `shithub-actions-runner` and pool tags;
78	- installs a no-secret cloud-init baseline that enables Docker;
79	- prints machine-readable JSON for operator records.
80
81	Generate an Ansible inventory from the DigitalOcean tag:
82
83	```sh
84	./deploy/doctl/generate-actions-runner-inventory.sh \
85	--output deploy/ansible/inventory/actions-runners
86	```
87
88	The generated file contains per-host `shithub_runner_token` placeholders.
89	Replace them with values from `shithubd admin runner register`, ideally through
90	ansible-vault or host_vars rather than committing plaintext inventory.
91
92	## Register
93
94	Run this once from a host that can reach the production database config:
95
96	```sh
97	shithubd admin runner register \
98	--name prod-runner-1 \
99	--labels self-hosted,linux,ubuntu-latest,x64 \
100	--capacity 1 \
101	--output json
102	```
103
104	Store the returned `token` in ansible-vault or the deployment secret store.
105	Only the token hash is stored in Postgres; the raw token cannot be
106	recovered later.
107	Use `--expires-in` only when the deployment rotates the token before expiry,
108	because the runner presents the registration token on every heartbeat.
109
110	For a generated DigitalOcean inventory, register one token per runner host and
111	use the host name as the runner name:
112
113	```sh
114	shithubd admin runner register \
115	--name shithub-runner-shared-linux-1 \
116	--labels self-hosted,linux,ubuntu-latest,x64 \
117	--capacity 1 \
118	--output json
119	```
120
121	## Inventory
122
123	Enable the role explicitly. The default is disabled so ordinary app
124	deploys do not start a runner by accident.
125
126	```ini
127	[shithub:vars]
128	shithub_runner_enabled=true
129	shithub_runner_token=REPLACE_ME
130	shithub_runner_labels=self-hosted,linux,ubuntu-latest,x64
131	shithub_runner_capacity=1
132	shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
133	shithub_runner_seccomp_profile=/etc/shithubd-runner/seccomp.json
134	shithub_runner_container_user=65534:65534
135	shithub_runner_pids_limit=512
136	```
137
138	The role writes non-secret config to
139	`/etc/shithubd-runner/config.toml` and the registration token to
140	`/etc/shithubd-runner/runner.env` with mode `0600`.
141	Keep `shithub_runner_workspace_root` under `/var/lib/shithubd-runner`;
142	the systemd unit grants runner writes only to that subtree.
143
144	`shithub_runner_network_allowlist` defaults to GitHub source/archive
145	hosts plus Docker Hub registry hosts. Override it when a runner must
146	fetch from an internal package registry. The role creates the
147	`shithub-actions` Docker bridge at `172.30.0.1/24`, runs dnsmasq on
148	that bridge, and sets `engine.dns_servers` to the bridge resolver by
149	default.
150
151	## Deploy
152
153	For the runner role only:
154
155	```sh
156	make build
157	cd deploy/ansible
158	ansible-playbook -i inventory/production site.yml -t shithubd-runner
159	```
160
161	For the generated DigitalOcean runner inventory:
162
163	```sh
164	make build
165	cd deploy/ansible
166	ansible-playbook -i inventory/actions-runners site.yml -t shithubd-runner
167	```
168
169	The role:
170
171	- creates the `shithub-runner` system user and joins it to `docker`
172	- uploads `/usr/local/bin/shithubd-runner`
173	- renders `/etc/shithubd-runner/config.toml` and `runner.env`
174	- creates the dedicated Actions Docker network and bridge
175	- renders `/etc/dnsmasq.d/shithubd-runner.conf` from the network
176	allowlist and starts dnsmasq bound to the Actions bridge
177	- installs `shithub-runner-firewall.service`, which rejects direct-IP
178	egress from step containers unless dnsmasq populated the destination
179	in the allowlist ipset
180	- installs the pinned seccomp profile at
181	`/etc/shithubd-runner/seccomp.json`
182	- installs `deploy/systemd/shithubd-runner.service`
183	- pulls the configured runner image
184	- enables and starts `shithubd-runner`
185
186	## Verify
187
188	On the runner host:
189
190	```sh
191	systemctl status shithubd-runner
192	journalctl -u shithubd-runner -n 100 --no-pager
193	```
194
195	Then push a workflow with a simple `run:` step:
196
197	```yaml
198	name: ci
199	on: push
200	jobs:
201	build:
202	runs-on: ubuntu-latest
203	steps:
204	- run: bash -c "echo hello && exit 0"
205	```
206
207	Expected state:
208
209	- a runner heartbeat claims the queued job within one idle poll interval
210	- the step emits SQL log chunks during execution
211	- `workflow:finalize_step` uploads
212	`actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`
213	- the job and check run complete with conclusion `success`
214
215	Repeat with `exit 1`; the check should complete with conclusion
216	`failure`.
217
218	Sandbox smoke checks:
219
220	```yaml
221	name: sandbox-smoke
222	on: push
223	jobs:
224	smoke:
225	runs-on: ubuntu-latest
226	steps:
227	- run: id -u
228	- run: test "$(id -u)" = "65534"
229	- run: if mkdir /etc/shithub-smoke 2>/dev/null; then exit 1; fi
230	- run: if mount -t tmpfs tmpfs /mnt 2>/dev/null; then exit 1; fi
231	```
232
233	Expected state:
234
235	- the UID check prints `65534`
236	- a workflow-level request for `permissions: {shithub-runner-root: write}`
237	still runs as `65534`; root opt-in is disabled in the shipped runner
238	config until a trusted-workflow policy exists
239	- writing under `/etc` fails because the root filesystem is read-only
240	- `mount` fails because the container does not have `CAP_SYS_ADMIN`
241	- step logs and systemd journal include the configured image, network,
242	CPU/memory limits, PID limit, container user, and seccomp profile
243
244	## Network Allowlist
245
246	The runner config carries two separate network controls:
247
248	- `runner.network_allowlist`: the host patterns allowed by the
249	operator's DNS allowlist resolver.
250	- `engine.dns_servers`: DNS servers passed to each step container with
251	Docker `--dns`.
252
253	For a single-host deployment, create a dedicated Docker bridge for
254	Actions jobs, run dnsmasq bound to that bridge, and set
255	`shithub_runner_dns_servers` to the bridge address of that resolver.
256	The Ansible role now does this by default. The rendered dnsmasq config
257	has no default upstream resolver; names not matching the allowlist fail
258	DNS resolution.
259
260	The firewall service closes the direct-IP bypass: containers on the
261	Actions subnet may send DNS only to the bridge resolver, and other
262	egress is allowed only when the destination IP is present in the
263	dnsmasq-populated ipset. Keep the runner on a separate host from web
264	and database services.
265
266	## Rollback
267
268	Stop the runner first so it does not claim new jobs:
269
270	```sh
271	systemctl stop shithubd-runner
272	systemctl disable shithubd-runner
273	```
274
275	If the binary itself is bad, copy a prior archived binary from
276	`/var/lib/shithubd-runner/binaries/` back to
277	`/usr/local/bin/shithubd-runner` and restart the unit. Jobs already
278	claimed by the stopped runner remain visible in the database; S41g adds
279	operator cancel/re-run controls.
280
281	For a DigitalOcean test runner, drain or revoke the runner in shithub before
282	destroying the droplet. Then list and delete the specific test host:
283
284	```sh
285	doctl compute droplet list --tag-name shithub-actions-runner
286	doctl compute droplet delete shithub-runner-shared-linux-1 --force
287	```
288
289	If the pool firewall was created only for a disposable test and no runner
290	droplets still use it:
291
292	```sh
293	doctl compute firewall list
294	doctl compute firewall delete <firewall-id> --force
295	```