shithub Public

Watch 1 Fork 0 Star 0

markdown · 9868 bytes Raw Blame History

Actions runner deploy runbook

This runbook owns the S41d/S41j deployment path for shithubd-runner: the Nix-built default image, systemd unit, Ansible role, and DigitalOcean runner pool bootstrap. The smoke flow for an already-installed runner lives in actions-runner.md.

Prereqs

The app database is migrated through S41d and the web API has auth.totp_key_b64 configured so job JWTs can be minted.
Docker is installed on the runner host and the docker group exists. The runner process needs Docker socket access; treat the host itself as trusted even though individual step containers are sandboxed.
For DigitalOcean runner pools: doctl and jq installed locally, authenticated with doctl auth init, and a named SSH key uploaded to the DigitalOcean account.
Runner host SSH ingress must be restricted to operator or VPN CIDRs. Do not create runner droplets with SSH open to 0.0.0.0/0.
bin/shithubd-runner exists locally. make build builds both bin/shithubd and bin/shithubd-runner with the same version ldflags.
The default image has been loaded or published. Build it with:

nix build ./deploy/runner-images#runnerImage
docker load < result

The committed deploy/runner-images/flake.lock pins the nixpkgs input. Update it deliberately when changing the default image toolchain.

Publishing to GHCR is manual through .github/workflows/runner-image.yml because forks may not control the upstream ghcr.io/shithub namespace. Leave the workflow's image input blank to publish under the current repository's package namespace, or set it explicitly for upstream publishing.

DigitalOcean pool bootstrap

S41j runner hosts are separate droplets tagged shithub-actions-runner. The app/database host must not double as the arbitrary-code runner host.

Validate the plan first:

SSH_KEY_NAME=macbook-pro \
SSH_ALLOWED_CIDRS=203.0.113.4/32 \
./deploy/doctl/provision-actions-runner-pool.sh --dry-run

Create the first shared Linux runner:

SSH_KEY_NAME=macbook-pro \
SSH_ALLOWED_CIDRS=203.0.113.4/32 \
./deploy/doctl/provision-actions-runner-pool.sh

Useful overrides:

POOL_NAME=shared-linux
PROJECT_NAME=shithub-prod
REGION=sfo3
SIZE=s-2vcpu-4gb
IMAGE=ubuntu-24-04-x64
COUNT=1
VPC_UUID=REPLACE_ME_OPTIONAL

The provisioner:

creates or reuses the DigitalOcean project;
creates or reuses a tag-targeted cloud firewall;
refuses public SSH CIDRs;
creates missing droplets named shithub-runner-<pool>-N;
applies the shithub-actions-runner and pool tags;
installs a no-secret cloud-init baseline that enables Docker;
prints machine-readable JSON for operator records.

Generate an Ansible inventory from the DigitalOcean tag:

./deploy/doctl/generate-actions-runner-inventory.sh \
  --output deploy/ansible/inventory/actions-runners

The generated file contains per-host shithub_runner_token placeholders. Replace them with values from shithubd admin runner register, ideally through ansible-vault or host_vars rather than committing plaintext inventory.

Register

Run this once from a host that can reach the production database config:

shithubd admin runner register \
  --name prod-runner-1 \
  --labels self-hosted,linux,ubuntu-latest,x64 \
  --capacity 1 \
  --output json

Store the returned token in ansible-vault or the deployment secret store. Only the token hash is stored in Postgres; the raw token cannot be recovered later. Use --expires-in only when the deployment rotates the token before expiry, because the runner presents the registration token on every heartbeat.

For a generated DigitalOcean inventory, register one token per runner host and use the host name as the runner name:

shithubd admin runner register \
  --name shithub-runner-shared-linux-1 \
  --labels self-hosted,linux,ubuntu-latest,x64 \
  --capacity 1 \
  --output json

Inventory

Enable the role explicitly. The default is disabled so ordinary app deploys do not start a runner by accident.

[shithub:vars]
shithub_runner_enabled=true
shithub_runner_token=REPLACE_ME
shithub_runner_labels=self-hosted,linux,ubuntu-latest,x64
shithub_runner_capacity=1
shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
shithub_runner_seccomp_profile=/etc/shithubd-runner/seccomp.json
shithub_runner_container_user=65534:65534
shithub_runner_pids_limit=512

The role writes non-secret config to /etc/shithubd-runner/config.toml and the registration token to /etc/shithubd-runner/runner.env with mode 0600. Keep shithub_runner_workspace_root under /var/lib/shithubd-runner; the systemd unit grants runner writes only to that subtree.

shithub_runner_network_allowlist defaults to GitHub source/archive hosts plus Docker Hub registry hosts. Override it when a runner must fetch from an internal package registry. The role creates the shithub-actions Docker bridge at 172.30.0.1/24, runs dnsmasq on that bridge, and sets engine.dns_servers to the bridge resolver by default.

Deploy

For the runner role only:

make build
cd deploy/ansible
ansible-playbook -i inventory/production site.yml -t shithubd-runner

For the generated DigitalOcean runner inventory:

make build
cd deploy/ansible
ansible-playbook -i inventory/actions-runners site.yml -t shithubd-runner

The role:

creates the shithub-runner system user and joins it to docker
uploads /usr/local/bin/shithubd-runner
renders /etc/shithubd-runner/config.toml and runner.env
creates the dedicated Actions Docker network and bridge
renders /etc/dnsmasq.d/shithubd-runner.conf from the network allowlist and starts dnsmasq bound to the Actions bridge
installs shithub-runner-firewall.service, which rejects direct-IP egress from step containers unless dnsmasq populated the destination in the allowlist ipset
installs the pinned seccomp profile at /etc/shithubd-runner/seccomp.json
installs deploy/systemd/shithubd-runner.service
pulls the configured runner image
enables and starts shithubd-runner

Verify

On the runner host:

systemctl status shithubd-runner
journalctl -u shithubd-runner -n 100 --no-pager

Then push a workflow with a simple run: step:

name: ci
on: push
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - run: bash -c "echo hello && exit 0"

Expected state:

a runner heartbeat claims the queued job within one idle poll interval
the step emits SQL log chunks during execution
workflow:finalize_step uploads actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log
the job and check run complete with conclusion success

Repeat with exit 1; the check should complete with conclusion failure.

Before declaring a shared pool available to arbitrary repositories, run the arbitrary repository smoke checklist against scratch and at least one additional repository. The single repo check above proves the runner process works; the arbitrary-repo smoke proves label routing, checkout scoping, archived logs, check runs, and negative queue/secret cases across normal repositories.

Sandbox smoke checks:

name: sandbox-smoke
on: push
jobs:
  smoke:
    runs-on: ubuntu-latest
    steps:
      - run: id -u
      - run: test "$(id -u)" = "65534"
      - run: if mkdir /etc/shithub-smoke 2>/dev/null; then exit 1; fi
      - run: if mount -t tmpfs tmpfs /mnt 2>/dev/null; then exit 1; fi

Expected state:

the UID check prints 65534
a workflow-level request for permissions: {shithub-runner-root: write} still runs as 65534; root opt-in is disabled in the shipped runner config until a trusted-workflow policy exists
writing under /etc fails because the root filesystem is read-only
mount fails because the container does not have CAP_SYS_ADMIN
step logs and systemd journal include the configured image, network, CPU/memory limits, PID limit, container user, and seccomp profile

Network Allowlist

The runner config carries two separate network controls:

runner.network_allowlist: the host patterns allowed by the operator's DNS allowlist resolver.
engine.dns_servers: DNS servers passed to each step container with Docker --dns.

For a single-host deployment, create a dedicated Docker bridge for Actions jobs, run dnsmasq bound to that bridge, and set shithub_runner_dns_servers to the bridge address of that resolver. The Ansible role now does this by default. The rendered dnsmasq config has no default upstream resolver; names not matching the allowlist fail DNS resolution.

The firewall service closes the direct-IP bypass: containers on the Actions subnet may send DNS only to the bridge resolver, and other egress is allowed only when the destination IP is present in the dnsmasq-populated ipset. Keep the runner on a separate host from web and database services.

Rollback

Stop the runner first so it does not claim new jobs:

systemctl stop shithubd-runner
systemctl disable shithubd-runner

If the binary itself is bad, copy a prior archived binary from /var/lib/shithubd-runner/binaries/ back to /usr/local/bin/shithubd-runner and restart the unit. Jobs already claimed by the stopped runner remain visible in the database; S41g adds operator cancel/re-run controls.

For a DigitalOcean test runner, drain or revoke the runner in shithub before destroying the droplet. Then list and delete the specific test host:

doctl compute droplet list --tag-name shithub-actions-runner
doctl compute droplet delete shithub-runner-shared-linux-1 --force

If the pool firewall was created only for a disposable test and no runner droplets still use it:

doctl compute firewall list
doctl compute firewall delete <firewall-id> --force

View source

  
        1
        # Actions runner deploy runbook
      
        2
        
        3
        This runbook owns the S41d/S41j deployment path for `shithubd-runner`: the
      
        4
        Nix-built default image, systemd unit, Ansible role, and DigitalOcean runner
      
        5
        pool bootstrap. The smoke flow for an already-installed runner lives in
      
        6
        [actions-runner.md](./actions-runner.md).
      
        7
        
        8
        ## Prereqs
      
        9
        
        10
        - The app database is migrated through S41d and the web API has
      
        11
          `auth.totp_key_b64` configured so job JWTs can be minted.
      
        12
        - Docker is installed on the runner host and the `docker` group exists.
      
        13
          The runner process needs Docker socket access; treat the host itself
      
        14
          as trusted even though individual step containers are sandboxed.
      
        15
        - For DigitalOcean runner pools: `doctl` and `jq` installed locally,
      
        16
          authenticated with `doctl auth init`, and a named SSH key uploaded to
      
        17
          the DigitalOcean account.
      
        18
        - Runner host SSH ingress must be restricted to operator or VPN CIDRs.
      
        19
          Do not create runner droplets with SSH open to `0.0.0.0/0`.
      
        20
        - `bin/shithubd-runner` exists locally. `make build` builds both
      
        21
          `bin/shithubd` and `bin/shithubd-runner` with the same version ldflags.
      
        22
        - The default image has been loaded or published. Build it with:
      
        23
        
        24
        ```sh
      
        25
        nix build ./deploy/runner-images#runnerImage
      
        26
        docker load < result
      
        27
        ```
      
        28
        
        29
        The committed `deploy/runner-images/flake.lock` pins the nixpkgs input.
      
        30
        Update it deliberately when changing the default image toolchain.
      
        31
        
        32
        Publishing to GHCR is manual through `.github/workflows/runner-image.yml`
      
        33
        because forks may not control the upstream `ghcr.io/shithub` namespace.
      
        34
        Leave the workflow's `image` input blank to publish under the current
      
        35
        repository's package namespace, or set it explicitly for upstream
      
        36
        publishing.
      
        37
        
        38
        ## DigitalOcean pool bootstrap
      
        39
        
        40
        S41j runner hosts are separate droplets tagged `shithub-actions-runner`.
      
        41
        The app/database host must not double as the arbitrary-code runner host.
      
        42
        
        43
        Validate the plan first:
      
        44
        
        45
        ```sh
      
        46
        SSH_KEY_NAME=macbook-pro \
      
        47
        SSH_ALLOWED_CIDRS=203.0.113.4/32 \
      
        48
        ./deploy/doctl/provision-actions-runner-pool.sh --dry-run
      
        49
        ```
      
        50
        
        51
        Create the first shared Linux runner:
      
        52
        
        53
        ```sh
      
        54
        SSH_KEY_NAME=macbook-pro \
      
        55
        SSH_ALLOWED_CIDRS=203.0.113.4/32 \
      
        56
        ./deploy/doctl/provision-actions-runner-pool.sh
      
        57
        ```
      
        58
        
        59
        Useful overrides:
      
        60
        
        61
        ```sh
      
        62
        POOL_NAME=shared-linux
      
        63
        PROJECT_NAME=shithub-prod
      
        64
        REGION=sfo3
      
        65
        SIZE=s-2vcpu-4gb
      
        66
        IMAGE=ubuntu-24-04-x64
      
        67
        COUNT=1
      
        68
        VPC_UUID=REPLACE_ME_OPTIONAL
      
        69
        ```
      
        70
        
        71
        The provisioner:
      
        72
        
        73
        - creates or reuses the DigitalOcean project;
      
        74
        - creates or reuses a tag-targeted cloud firewall;
      
        75
        - refuses public SSH CIDRs;
      
        76
        - creates missing droplets named `shithub-runner-<pool>-N`;
      
        77
        - applies the `shithub-actions-runner` and pool tags;
      
        78
        - installs a no-secret cloud-init baseline that enables Docker;
      
        79
        - prints machine-readable JSON for operator records.
      
        80
        
        81
        Generate an Ansible inventory from the DigitalOcean tag:
      
        82
        
        83
        ```sh
      
        84
        ./deploy/doctl/generate-actions-runner-inventory.sh \
      
        85
          --output deploy/ansible/inventory/actions-runners
      
        86
        ```
      
        87
        
        88
        The generated file contains per-host `shithub_runner_token` placeholders.
      
        89
        Replace them with values from `shithubd admin runner register`, ideally through
      
        90
        ansible-vault or host_vars rather than committing plaintext inventory.
      
        91
        
        92
        ## Register
      
        93
        
        94
        Run this once from a host that can reach the production database config:
      
        95
        
        96
        ```sh
      
        97
        shithubd admin runner register \
      
        98
          --name prod-runner-1 \
      
        99
          --labels self-hosted,linux,ubuntu-latest,x64 \
      
        100
          --capacity 1 \
      
        101
          --output json
      
        102
        ```
      
        103
        
        104
        Store the returned `token` in ansible-vault or the deployment secret store.
      
        105
        Only the token hash is stored in Postgres; the raw token cannot be
      
        106
        recovered later.
      
        107
        Use `--expires-in` only when the deployment rotates the token before expiry,
      
        108
        because the runner presents the registration token on every heartbeat.
      
        109
        
        110
        For a generated DigitalOcean inventory, register one token per runner host and
      
        111
        use the host name as the runner name:
      
        112
        
        113
        ```sh
      
        114
        shithubd admin runner register \
      
        115
          --name shithub-runner-shared-linux-1 \
      
        116
          --labels self-hosted,linux,ubuntu-latest,x64 \
      
        117
          --capacity 1 \
      
        118
          --output json
      
        119
        ```
      
        120
        
        121
        ## Inventory
      
        122
        
        123
        Enable the role explicitly. The default is disabled so ordinary app
      
        124
        deploys do not start a runner by accident.
      
        125
        
        126
        ```ini
      
        127
        [shithub:vars]
      
        128
        shithub_runner_enabled=true
      
        129
        shithub_runner_token=REPLACE_ME
      
        130
        shithub_runner_labels=self-hosted,linux,ubuntu-latest,x64
      
        131
        shithub_runner_capacity=1
      
        132
        shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
      
        133
        shithub_runner_seccomp_profile=/etc/shithubd-runner/seccomp.json
      
        134
        shithub_runner_container_user=65534:65534
      
        135
        shithub_runner_pids_limit=512
      
        136
        ```
      
        137
        
        138
        The role writes non-secret config to
      
        139
        `/etc/shithubd-runner/config.toml` and the registration token to
      
        140
        `/etc/shithubd-runner/runner.env` with mode `0600`.
      
        141
        Keep `shithub_runner_workspace_root` under `/var/lib/shithubd-runner`;
      
        142
        the systemd unit grants runner writes only to that subtree.
      
        143
        
        144
        `shithub_runner_network_allowlist` defaults to GitHub source/archive
      
        145
        hosts plus Docker Hub registry hosts. Override it when a runner must
      
        146
        fetch from an internal package registry. The role creates the
      
        147
        `shithub-actions` Docker bridge at `172.30.0.1/24`, runs dnsmasq on
      
        148
        that bridge, and sets `engine.dns_servers` to the bridge resolver by
      
        149
        default.
      
        150
        
        151
        ## Deploy
      
        152
        
        153
        For the runner role only:
      
        154
        
        155
        ```sh
      
        156
        make build
      
        157
        cd deploy/ansible
      
        158
        ansible-playbook -i inventory/production site.yml -t shithubd-runner
      
        159
        ```
      
        160
        
        161
        For the generated DigitalOcean runner inventory:
      
        162
        
        163
        ```sh
      
        164
        make build
      
        165
        cd deploy/ansible
      
        166
        ansible-playbook -i inventory/actions-runners site.yml -t shithubd-runner
      
        167
        ```
      
        168
        
        169
        The role:
      
        170
        
        171
        - creates the `shithub-runner` system user and joins it to `docker`
      
        172
        - uploads `/usr/local/bin/shithubd-runner`
      
        173
        - renders `/etc/shithubd-runner/config.toml` and `runner.env`
      
        174
        - creates the dedicated Actions Docker network and bridge
      
        175
        - renders `/etc/dnsmasq.d/shithubd-runner.conf` from the network
      
        176
          allowlist and starts dnsmasq bound to the Actions bridge
      
        177
        - installs `shithub-runner-firewall.service`, which rejects direct-IP
      
        178
          egress from step containers unless dnsmasq populated the destination
      
        179
          in the allowlist ipset
      
        180
        - installs the pinned seccomp profile at
      
        181
          `/etc/shithubd-runner/seccomp.json`
      
        182
        - installs `deploy/systemd/shithubd-runner.service`
      
        183
        - pulls the configured runner image
      
        184
        - enables and starts `shithubd-runner`
      
        185
        
        186
        ## Verify
      
        187
        
        188
        On the runner host:
      
        189
        
        190
        ```sh
      
        191
        systemctl status shithubd-runner
      
        192
        journalctl -u shithubd-runner -n 100 --no-pager
      
        193
        ```
      
        194
        
        195
        Then push a workflow with a simple `run:` step:
      
        196
        
        197
        ```yaml
      
        198
        name: ci
      
        199
        on: push
      
        200
        jobs:
      
        201
          build:
      
        202
            runs-on: ubuntu-latest
      
        203
            steps:
      
        204
              - run: bash -c "echo hello && exit 0"
      
        205
        ```
      
        206
        
        207
        Expected state:
      
        208
        
        209
        - a runner heartbeat claims the queued job within one idle poll interval
      
        210
        - the step emits SQL log chunks during execution
      
        211
        - `workflow:finalize_step` uploads
      
        212
          `actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`
      
        213
        - the job and check run complete with conclusion `success`
      
        214
        
        215
        Repeat with `exit 1`; the check should complete with conclusion
      
        216
        `failure`.
      
        217
        
        218
        Before declaring a shared pool available to arbitrary repositories, run the
      
        219
        [arbitrary repository smoke](./actions-runner.md#arbitrary-repository-smoke)
      
        220
        checklist against scratch and at least one additional repository. The single
      
        221
        repo check above proves the runner process works; the arbitrary-repo smoke
      
        222
        proves label routing, checkout scoping, archived logs, check runs, and negative
      
        223
        queue/secret cases across normal repositories.
      
        224
        
        225
        Sandbox smoke checks:
      
        226
        
        227
        ```yaml
      
        228
        name: sandbox-smoke
      
        229
        on: push
      
        230
        jobs:
      
        231
          smoke:
      
        232
            runs-on: ubuntu-latest
      
        233
            steps:
      
        234
              - run: id -u
      
        235
              - run: test "$(id -u)" = "65534"
      
        236
              - run: if mkdir /etc/shithub-smoke 2>/dev/null; then exit 1; fi
      
        237
              - run: if mount -t tmpfs tmpfs /mnt 2>/dev/null; then exit 1; fi
      
        238
        ```
      
        239
        
        240
        Expected state:
      
        241
        
        242
        - the UID check prints `65534`
      
        243
        - a workflow-level request for `permissions: {shithub-runner-root: write}`
      
        244
          still runs as `65534`; root opt-in is disabled in the shipped runner
      
        245
          config until a trusted-workflow policy exists
      
        246
        - writing under `/etc` fails because the root filesystem is read-only
      
        247
        - `mount` fails because the container does not have `CAP_SYS_ADMIN`
      
        248
        - step logs and systemd journal include the configured image, network,
      
        249
          CPU/memory limits, PID limit, container user, and seccomp profile
      
        250
        
        251
        ## Network Allowlist
      
        252
        
        253
        The runner config carries two separate network controls:
      
        254
        
        255
        - `runner.network_allowlist`: the host patterns allowed by the
      
        256
          operator's DNS allowlist resolver.
      
        257
        - `engine.dns_servers`: DNS servers passed to each step container with
      
        258
          Docker `--dns`.
      
        259
        
        260
        For a single-host deployment, create a dedicated Docker bridge for
      
        261
        Actions jobs, run dnsmasq bound to that bridge, and set
      
        262
        `shithub_runner_dns_servers` to the bridge address of that resolver.
      
        263
        The Ansible role now does this by default. The rendered dnsmasq config
      
        264
        has no default upstream resolver; names not matching the allowlist fail
      
        265
        DNS resolution.
      
        266
        
        267
        The firewall service closes the direct-IP bypass: containers on the
      
        268
        Actions subnet may send DNS only to the bridge resolver, and other
      
        269
        egress is allowed only when the destination IP is present in the
      
        270
        dnsmasq-populated ipset. Keep the runner on a separate host from web
      
        271
        and database services.
      
        272
        
        273
        ## Rollback
      
        274
        
        275
        Stop the runner first so it does not claim new jobs:
      
        276
        
        277
        ```sh
      
        278
        systemctl stop shithubd-runner
      
        279
        systemctl disable shithubd-runner
      
        280
        ```
      
        281
        
        282
        If the binary itself is bad, copy a prior archived binary from
      
        283
        `/var/lib/shithubd-runner/binaries/` back to
      
        284
        `/usr/local/bin/shithubd-runner` and restart the unit. Jobs already
      
        285
        claimed by the stopped runner remain visible in the database; S41g adds
      
        286
        operator cancel/re-run controls.
      
        287
        
        288
        For a DigitalOcean test runner, drain or revoke the runner in shithub before
      
        289
        destroying the droplet. Then list and delete the specific test host:
      
        290
        
        291
        ```sh
      
        292
        doctl compute droplet list --tag-name shithub-actions-runner
      
        293
        doctl compute droplet delete shithub-runner-shared-linux-1 --force
      
        294
        ```
      
        295
        
        296
        If the pool firewall was created only for a disposable test and no runner
      
        297
        droplets still use it:
      
        298
        
        299
        ```sh
      
        300
        doctl compute firewall list
      
        301
        doctl compute firewall delete <firewall-id> --force
      
        302
        ```

1	# Actions runner deploy runbook
2
3	This runbook owns the S41d/S41j deployment path for `shithubd-runner`: the
4	Nix-built default image, systemd unit, Ansible role, and DigitalOcean runner
5	pool bootstrap. The smoke flow for an already-installed runner lives in
6	[actions-runner.md](./actions-runner.md).
7
8	## Prereqs
9
10	- The app database is migrated through S41d and the web API has
11	`auth.totp_key_b64` configured so job JWTs can be minted.
12	- Docker is installed on the runner host and the `docker` group exists.
13	The runner process needs Docker socket access; treat the host itself
14	as trusted even though individual step containers are sandboxed.
15	- For DigitalOcean runner pools: `doctl` and `jq` installed locally,
16	authenticated with `doctl auth init`, and a named SSH key uploaded to
17	the DigitalOcean account.
18	- Runner host SSH ingress must be restricted to operator or VPN CIDRs.
19	Do not create runner droplets with SSH open to `0.0.0.0/0`.
20	- `bin/shithubd-runner` exists locally. `make build` builds both
21	`bin/shithubd` and `bin/shithubd-runner` with the same version ldflags.
22	- The default image has been loaded or published. Build it with:
23
24	```sh
25	nix build ./deploy/runner-images#runnerImage
26	docker load < result
27	```
28
29	The committed `deploy/runner-images/flake.lock` pins the nixpkgs input.
30	Update it deliberately when changing the default image toolchain.
31
32	Publishing to GHCR is manual through `.github/workflows/runner-image.yml`
33	because forks may not control the upstream `ghcr.io/shithub` namespace.
34	Leave the workflow's `image` input blank to publish under the current
35	repository's package namespace, or set it explicitly for upstream
36	publishing.
37
38	## DigitalOcean pool bootstrap
39
40	S41j runner hosts are separate droplets tagged `shithub-actions-runner`.
41	The app/database host must not double as the arbitrary-code runner host.
42
43	Validate the plan first:
44
45	```sh
46	SSH_KEY_NAME=macbook-pro \
47	SSH_ALLOWED_CIDRS=203.0.113.4/32 \
48	./deploy/doctl/provision-actions-runner-pool.sh --dry-run
49	```
50
51	Create the first shared Linux runner:
52
53	```sh
54	SSH_KEY_NAME=macbook-pro \
55	SSH_ALLOWED_CIDRS=203.0.113.4/32 \
56	./deploy/doctl/provision-actions-runner-pool.sh
57	```
58
59	Useful overrides:
60
61	```sh
62	POOL_NAME=shared-linux
63	PROJECT_NAME=shithub-prod
64	REGION=sfo3
65	SIZE=s-2vcpu-4gb
66	IMAGE=ubuntu-24-04-x64
67	COUNT=1
68	VPC_UUID=REPLACE_ME_OPTIONAL
69	```
70
71	The provisioner:
72
73	- creates or reuses the DigitalOcean project;
74	- creates or reuses a tag-targeted cloud firewall;
75	- refuses public SSH CIDRs;
76	- creates missing droplets named `shithub-runner-<pool>-N`;
77	- applies the `shithub-actions-runner` and pool tags;
78	- installs a no-secret cloud-init baseline that enables Docker;
79	- prints machine-readable JSON for operator records.
80
81	Generate an Ansible inventory from the DigitalOcean tag:
82
83	```sh
84	./deploy/doctl/generate-actions-runner-inventory.sh \
85	--output deploy/ansible/inventory/actions-runners
86	```
87
88	The generated file contains per-host `shithub_runner_token` placeholders.
89	Replace them with values from `shithubd admin runner register`, ideally through
90	ansible-vault or host_vars rather than committing plaintext inventory.
91
92	## Register
93
94	Run this once from a host that can reach the production database config:
95
96	```sh
97	shithubd admin runner register \
98	--name prod-runner-1 \
99	--labels self-hosted,linux,ubuntu-latest,x64 \
100	--capacity 1 \
101	--output json
102	```
103
104	Store the returned `token` in ansible-vault or the deployment secret store.
105	Only the token hash is stored in Postgres; the raw token cannot be
106	recovered later.
107	Use `--expires-in` only when the deployment rotates the token before expiry,
108	because the runner presents the registration token on every heartbeat.
109
110	For a generated DigitalOcean inventory, register one token per runner host and
111	use the host name as the runner name:
112
113	```sh
114	shithubd admin runner register \
115	--name shithub-runner-shared-linux-1 \
116	--labels self-hosted,linux,ubuntu-latest,x64 \
117	--capacity 1 \
118	--output json
119	```
120
121	## Inventory
122
123	Enable the role explicitly. The default is disabled so ordinary app
124	deploys do not start a runner by accident.
125
126	```ini
127	[shithub:vars]
128	shithub_runner_enabled=true
129	shithub_runner_token=REPLACE_ME
130	shithub_runner_labels=self-hosted,linux,ubuntu-latest,x64
131	shithub_runner_capacity=1
132	shithub_runner_default_image=ghcr.io/shithub/runner-nix:1.0
133	shithub_runner_seccomp_profile=/etc/shithubd-runner/seccomp.json
134	shithub_runner_container_user=65534:65534
135	shithub_runner_pids_limit=512
136	```
137
138	The role writes non-secret config to
139	`/etc/shithubd-runner/config.toml` and the registration token to
140	`/etc/shithubd-runner/runner.env` with mode `0600`.
141	Keep `shithub_runner_workspace_root` under `/var/lib/shithubd-runner`;
142	the systemd unit grants runner writes only to that subtree.
143
144	`shithub_runner_network_allowlist` defaults to GitHub source/archive
145	hosts plus Docker Hub registry hosts. Override it when a runner must
146	fetch from an internal package registry. The role creates the
147	`shithub-actions` Docker bridge at `172.30.0.1/24`, runs dnsmasq on
148	that bridge, and sets `engine.dns_servers` to the bridge resolver by
149	default.
150
151	## Deploy
152
153	For the runner role only:
154
155	```sh
156	make build
157	cd deploy/ansible
158	ansible-playbook -i inventory/production site.yml -t shithubd-runner
159	```
160
161	For the generated DigitalOcean runner inventory:
162
163	```sh
164	make build
165	cd deploy/ansible
166	ansible-playbook -i inventory/actions-runners site.yml -t shithubd-runner
167	```
168
169	The role:
170
171	- creates the `shithub-runner` system user and joins it to `docker`
172	- uploads `/usr/local/bin/shithubd-runner`
173	- renders `/etc/shithubd-runner/config.toml` and `runner.env`
174	- creates the dedicated Actions Docker network and bridge
175	- renders `/etc/dnsmasq.d/shithubd-runner.conf` from the network
176	allowlist and starts dnsmasq bound to the Actions bridge
177	- installs `shithub-runner-firewall.service`, which rejects direct-IP
178	egress from step containers unless dnsmasq populated the destination
179	in the allowlist ipset
180	- installs the pinned seccomp profile at
181	`/etc/shithubd-runner/seccomp.json`
182	- installs `deploy/systemd/shithubd-runner.service`
183	- pulls the configured runner image
184	- enables and starts `shithubd-runner`
185
186	## Verify
187
188	On the runner host:
189
190	```sh
191	systemctl status shithubd-runner
192	journalctl -u shithubd-runner -n 100 --no-pager
193	```
194
195	Then push a workflow with a simple `run:` step:
196
197	```yaml
198	name: ci
199	on: push
200	jobs:
201	build:
202	runs-on: ubuntu-latest
203	steps:
204	- run: bash -c "echo hello && exit 0"
205	```
206
207	Expected state:
208
209	- a runner heartbeat claims the queued job within one idle poll interval
210	- the step emits SQL log chunks during execution
211	- `workflow:finalize_step` uploads
212	`actions/runs/<run_id>/jobs/<job_id>/steps/<step_id>.log`
213	- the job and check run complete with conclusion `success`
214
215	Repeat with `exit 1`; the check should complete with conclusion
216	`failure`.
217
218	Before declaring a shared pool available to arbitrary repositories, run the
219	[arbitrary repository smoke](./actions-runner.md#arbitrary-repository-smoke)
220	checklist against scratch and at least one additional repository. The single
221	repo check above proves the runner process works; the arbitrary-repo smoke
222	proves label routing, checkout scoping, archived logs, check runs, and negative
223	queue/secret cases across normal repositories.
224
225	Sandbox smoke checks:
226
227	```yaml
228	name: sandbox-smoke
229	on: push
230	jobs:
231	smoke:
232	runs-on: ubuntu-latest
233	steps:
234	- run: id -u
235	- run: test "$(id -u)" = "65534"
236	- run: if mkdir /etc/shithub-smoke 2>/dev/null; then exit 1; fi
237	- run: if mount -t tmpfs tmpfs /mnt 2>/dev/null; then exit 1; fi
238	```
239
240	Expected state:
241
242	- the UID check prints `65534`
243	- a workflow-level request for `permissions: {shithub-runner-root: write}`
244	still runs as `65534`; root opt-in is disabled in the shipped runner
245	config until a trusted-workflow policy exists
246	- writing under `/etc` fails because the root filesystem is read-only
247	- `mount` fails because the container does not have `CAP_SYS_ADMIN`
248	- step logs and systemd journal include the configured image, network,
249	CPU/memory limits, PID limit, container user, and seccomp profile
250
251	## Network Allowlist
252
253	The runner config carries two separate network controls:
254
255	- `runner.network_allowlist`: the host patterns allowed by the
256	operator's DNS allowlist resolver.
257	- `engine.dns_servers`: DNS servers passed to each step container with
258	Docker `--dns`.
259
260	For a single-host deployment, create a dedicated Docker bridge for
261	Actions jobs, run dnsmasq bound to that bridge, and set
262	`shithub_runner_dns_servers` to the bridge address of that resolver.
263	The Ansible role now does this by default. The rendered dnsmasq config
264	has no default upstream resolver; names not matching the allowlist fail
265	DNS resolution.
266
267	The firewall service closes the direct-IP bypass: containers on the
268	Actions subnet may send DNS only to the bridge resolver, and other
269	egress is allowed only when the destination IP is present in the
270	dnsmasq-populated ipset. Keep the runner on a separate host from web
271	and database services.
272
273	## Rollback
274
275	Stop the runner first so it does not claim new jobs:
276
277	```sh
278	systemctl stop shithubd-runner
279	systemctl disable shithubd-runner
280	```
281
282	If the binary itself is bad, copy a prior archived binary from
283	`/var/lib/shithubd-runner/binaries/` back to
284	`/usr/local/bin/shithubd-runner` and restart the unit. Jobs already
285	claimed by the stopped runner remain visible in the database; S41g adds
286	operator cancel/re-run controls.
287
288	For a DigitalOcean test runner, drain or revoke the runner in shithub before
289	destroying the droplet. Then list and delete the specific test host:
290
291	```sh
292	doctl compute droplet list --tag-name shithub-actions-runner
293	doctl compute droplet delete shithub-runner-shared-linux-1 --force
294	```
295
296	If the pool firewall was created only for a disposable test and no runner
297	droplets still use it:
298
299	```sh
300	doctl compute firewall list
301	doctl compute firewall delete <firewall-id> --force
302	```