tenseleyflow/shithub / 9994c68

Browse files

docs: record public runner readiness audit

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
9994c685bdf2c7bb88370a1e9931fd88aa37d71a
Parents
85b6823
Tree
6e02feb

4 changed files

StatusFile+-
M docs/internal/actions-ga-readiness.md 5 0
A docs/internal/actions-public-runners.md 124 0
M docs/internal/runbooks/actions-runner.md 13 0
A scripts/audit-actions-public-runners.sh 73 0
docs/internal/actions-ga-readiness.mdmodified
@@ -4,6 +4,11 @@ This is the S41h-5 pre-GA packet for shithub Actions. It records what is ready,
44
 what remains intentionally deferred, and why shithub's full project CI is not
55
 yet moved from GitHub Actions to shithub Actions.
66
 
7
+For shared-pool public runner rollout, see
8
+[`actions-public-runners.md`](./actions-public-runners.md). S41h covers whether
9
+shithub can dogfood its own CI; S41j covers whether normal repositories can use
10
+the production runner pool safely.
11
+
712
 ## Current decision
813
 
914
 Do not move `.github/workflows/ci.yml` to `.shithub/workflows/ci.yml` in S41h.
docs/internal/actions-public-runners.mdadded
@@ -0,0 +1,124 @@
1
+# Shared Actions Runner Readiness
2
+
3
+This document is the S41j-6 readiness packet for letting normal repositories
4
+use the shithub.sh shared Actions pool. S41j is the safety and operations track.
5
+S41k is the Actions UI parity track; S41k improves the product surface but is
6
+not a blocker for controlled arbitrary-repo execution.
7
+
8
+## Current Decision
9
+
10
+Status: **controlled dogfood, not broad public GA**.
11
+
12
+The platform can run ordinary repositories that meet the Actions policy, runner
13
+label, and syntax constraints. Broad public shared-runner enablement should wait
14
+until this checklist has a deployed/manual pass and no open Critical or High
15
+findings.
16
+
17
+## Eligibility Contract
18
+
19
+A repository may use the shared pool when all of these are true:
20
+
21
+1. Site Actions policy allows runner dispatch. The site switch is a hard kill
22
+   switch and overrides repo/org policy.
23
+2. The repo or its owner org has Actions enabled, or inherits an enabled site
24
+   policy.
25
+3. The workflow lives under `.shithub/workflows/*.yml` and parses under the
26
+   supported v1 subset.
27
+4. The triggering actor can run Actions on that repo. Untrusted pull requests
28
+   queue in an approval-required state before runner dispatch.
29
+5. The job's `runs-on` labels match an online runner, normally
30
+   `ubuntu-latest` for the first shared Linux pool.
31
+6. Repo queued-run, repo concurrency, owner concurrency, and actor hourly caps
32
+   permit the run.
33
+7. The repo is not archived or deleted.
34
+
35
+For public shithub.sh rollout, operators should keep site caps conservative and
36
+raise them only after real queue/claim/host-cost data exists.
37
+
38
+## Billing And Entitlements
39
+
40
+Billing is present, but Actions minute metering is not enforcement-ready yet.
41
+The current entitlement boundary includes:
42
+
43
+- `org.actions_minutes_quota`
44
+- `LimitOrgActionsMinutesQuota`
45
+
46
+`LimitOrgActionsMinutesQuota` intentionally reports no concrete number until
47
+usage accounting lands. Do not gate public shared-runner execution by scattered
48
+plan checks. When billing gates arrive, they must go through
49
+`internal/entitlements` and keep authorization separate from entitlement
50
+denials.
51
+
52
+Recommended rollout posture:
53
+
54
+- personal/public dogfood repos: allowed only under site policy and conservative
55
+  caps;
56
+- organization-level Actions secrets/variables: already Team-gated;
57
+- paid shared-runner minutes: defer until metering can record usage and enforce
58
+  limits consistently;
59
+- unpaid or past-due orgs: keep paid-only Actions configuration read-only, but
60
+  do not delete secrets, variables, or prior run history.
61
+
62
+## S41j-6 Findings
63
+
64
+| ID | Severity | Status | Finding | Resolution |
65
+| --- | --- | --- | --- | --- |
66
+| S41J6-H1 | High | Fixed in S41j-6 | Site Actions disable was not a hard kill switch; explicit repo/org enablement could still evaluate true and queued jobs could be claimed. | Effective policy and runner claim SQL now return false whenever `actions_site_policy.actions_enabled=false`. Tests cover enqueue-time policy and claim-time dispatch. |
67
+| S41J6-M1 | Medium | Open with compensating control | Actions minutes billing has an entitlement key but no usage accounting or numeric limits. | Do not market or sell metered Actions minutes yet. Use site/org/repo policy caps and runner capacity as the public-runner control until billing SP08 defines usage accounting. |
68
+| S41J6-M2 | Medium | Manual validation pending | The S41j-5 arbitrary-repo smoke must run on production after deploy. | Run the scratch plus second-repo checklist in `runbooks/actions-runner.md` before declaring broad availability. |
69
+
70
+No Critical findings are open in this packet.
71
+
72
+## Required Evidence Before Broad Enablement
73
+
74
+- `scripts/audit-actions-public-runners.sh` passes on the deployed commit.
75
+- Focused Go tests pass for site kill switch, repo/owner concurrency caps,
76
+  unsupported label diagnostics, token gates, and untrusted PR secret behavior.
77
+- Live smoke passes on `mfwolffe/scratch`.
78
+- Live smoke passes on at least one additional normal public repository with
79
+  `runs-on: ubuntu-latest`.
80
+- Unsupported-label workflow shows a queued diagnostic with zero matching
81
+  runners.
82
+- Untrusted pull request run receives no secrets or mask values before approval.
83
+- Drained and revoked runners do not claim or complete new work.
84
+- A job-container network bypass attempt cannot reach direct IP destinations or
85
+  the DigitalOcean metadata service unless explicitly allowlisted.
86
+
87
+## Operator Controls
88
+
89
+Emergency stop:
90
+
91
+```sql
92
+UPDATE actions_site_policy
93
+   SET actions_enabled = false,
94
+       updated_at = now()
95
+ WHERE id = true;
96
+```
97
+
98
+After this change, newly matched workflows should be skipped by policy and
99
+already queued jobs should not be claimed by runners. Keep this SQL in the
100
+incident runbook until a site-admin UI exists.
101
+
102
+Capacity limits:
103
+
104
+- `max_repo_queued_runs` bounds backlog.
105
+- `max_repo_concurrent_jobs` bounds active jobs for one repository.
106
+- `max_owner_concurrent_jobs` bounds active jobs across one user or org owner.
107
+- `actor_trigger_limit_per_hour` bounds trigger spam by a single actor.
108
+
109
+These are policy controls, not billing meters. They protect the shared pool
110
+while Actions minute accounting is still future work.
111
+
112
+## Relationship To S41k
113
+
114
+S41k should follow S41j because it is UI parity:
115
+
116
+- Actions sidebar and management placeholders;
117
+- workflow-specific run pages;
118
+- run graph canvas;
119
+- log viewer and annotations;
120
+- caches, runners, and metrics pages.
121
+
122
+None of those replace S41j's security gates. S41k can make unsupported labels,
123
+queue state, runner health, and usage easier to see, but it should not be the
124
+first line of defense for arbitrary code execution.
docs/internal/runbooks/actions-runner.mdmodified
@@ -133,6 +133,19 @@ Emergency controls:
133133
   `shithub-actions-runner` in DigitalOcean, then rotate or revoke the
134134
   affected runner tokens before allowing replacement hosts to connect.
135135
 
136
+The site-level disable path is a hard kill switch and overrides repo/org
137
+Actions policy. Until a site-admin UI exists, use:
138
+
139
+```sql
140
+UPDATE actions_site_policy
141
+   SET actions_enabled = false,
142
+       updated_at = now()
143
+ WHERE id = true;
144
+```
145
+
146
+The public shared-runner rollout criteria live in
147
+[`actions-public-runners.md`](../actions-public-runners.md).
148
+
136149
 Equivalent config file:
137150
 
138151
 ```toml
scripts/audit-actions-public-runners.shadded
@@ -0,0 +1,73 @@
1
+#!/usr/bin/env bash
2
+# SPDX-License-Identifier: AGPL-3.0-or-later
3
+
4
+set -eu
5
+
6
+ROOT="$(git rev-parse --show-toplevel)"
7
+cd "$ROOT"
8
+
9
+fail() {
10
+  printf 'audit-actions-public-runners: %s\n' "$*" >&2
11
+  exit 1
12
+}
13
+
14
+ok() {
15
+  printf 'ok: %s\n' "$*"
16
+}
17
+
18
+require_file() {
19
+  [ -f "$1" ] || fail "missing required file: $1"
20
+  ok "found $1"
21
+}
22
+
23
+require_grep() {
24
+  pattern="$1"
25
+  file="$2"
26
+  desc="$3"
27
+  rg -q -- "$pattern" "$file" || fail "$desc not found in $file"
28
+  ok "$desc"
29
+}
30
+
31
+require_file "docs/internal/actions-public-runners.md"
32
+require_file "docs/internal/runbooks/actions-runner.md"
33
+require_file "docs/internal/runbooks/runner-deploy.md"
34
+require_file "deploy/doctl/provision-actions-runner-pool.sh"
35
+require_file "deploy/doctl/generate-actions-runner-inventory.sh"
36
+require_file "deploy/runner-config/firewall.sh.j2"
37
+require_file "deploy/runner-config/dnsmasq.conf.j2"
38
+require_file "deploy/runner-config/seccomp.json"
39
+
40
+require_grep 'WHEN COALESCE\(sp\.actions_enabled, true\) = false THEN false' \
41
+  "internal/actions/queries/actions_policy.sql" \
42
+  "site kill switch in effective policy"
43
+require_grep 'WHEN COALESCE\(sp\.actions_enabled, true\) = false THEN false' \
44
+  "internal/actions/queries/workflow_jobs.sql" \
45
+  "site kill switch in runner claim"
46
+require_grep 'TestEvaluateTrigger_SiteDisableOverridesRepoEnable' \
47
+  "internal/actions/policy/policy_test.go" \
48
+  "enqueue-time site kill switch test"
49
+require_grep 'TestRunnerHeartbeatSiteDisableOverridesRepoEnable' \
50
+  "internal/web/handlers/api/runners_test.go" \
51
+  "claim-time site kill switch test"
52
+require_grep 'TestRunnerHeartbeatRespectsRepoConcurrencyCap' \
53
+  "internal/web/handlers/api/runners_test.go" \
54
+  "repo concurrency claim test"
55
+require_grep 'TestRunnerHeartbeatRespectsOwnerConcurrencyCap' \
56
+  "internal/web/handlers/api/runners_test.go" \
57
+  "owner concurrency claim test"
58
+
59
+require_grep '--cap-drop=ALL' "internal/runner/engine/docker.go" "cap drop in Docker engine"
60
+require_grep '--read-only' "internal/runner/engine/docker.go" "read-only rootfs in Docker engine"
61
+require_grep '--security-opt=no-new-privileges' "internal/runner/engine/docker.go" "no-new-privileges in Docker engine"
62
+require_grep 'seccomp=' "internal/runner/engine/docker.go" "seccomp profile in Docker engine"
63
+require_grep '--user' "internal/runner/engine/docker.go" "non-root container user in Docker engine"
64
+require_grep 'rejects direct-IP' "docs/internal/runbooks/runner-deploy.md" "direct-IP egress runbook note"
65
+require_grep '-j REJECT' "deploy/runner-config/firewall.sh.j2" "runner firewall default reject"
66
+require_grep 'Do not put runner tokens' "deploy/doctl/actions-runner-cloud-init.yaml" "no-secret cloud-init warning"
67
+
68
+require_grep 'FeatureOrgActionsMinutesQuota' "internal/entitlements/entitlements.go" "actions minutes entitlement key"
69
+require_grep 'LimitOrgActionsMinutesQuota' "internal/entitlements/entitlements.go" "actions minutes limit key"
70
+require_grep 'no concrete number until' "docs/internal/actions-public-runners.md" "billing-metering caveat"
71
+require_grep 'controlled dogfood, not broad public GA' "docs/internal/actions-public-runners.md" "public runner rollout status"
72
+
73
+ok "S41j-6 public runner readiness static audit complete"