tenseleyflow/shithub / 9994c68

Browse files

docs: record public runner readiness audit

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
9994c685bdf2c7bb88370a1e9931fd88aa37d71a
Parents
85b6823
Tree
6e02feb

4 changed files

StatusFile+-
M docs/internal/actions-ga-readiness.md 5 0
A docs/internal/actions-public-runners.md 124 0
M docs/internal/runbooks/actions-runner.md 13 0
A scripts/audit-actions-public-runners.sh 73 0
docs/internal/actions-ga-readiness.mdmodified
@@ -4,6 +4,11 @@ This is the S41h-5 pre-GA packet for shithub Actions. It records what is ready,
4
 what remains intentionally deferred, and why shithub's full project CI is not
4
 what remains intentionally deferred, and why shithub's full project CI is not
5
 yet moved from GitHub Actions to shithub Actions.
5
 yet moved from GitHub Actions to shithub Actions.
6
 
6
 
7
+For shared-pool public runner rollout, see
8
+[`actions-public-runners.md`](./actions-public-runners.md). S41h covers whether
9
+shithub can dogfood its own CI; S41j covers whether normal repositories can use
10
+the production runner pool safely.
11
+
7
 ## Current decision
12
 ## Current decision
8
 
13
 
9
 Do not move `.github/workflows/ci.yml` to `.shithub/workflows/ci.yml` in S41h.
14
 Do not move `.github/workflows/ci.yml` to `.shithub/workflows/ci.yml` in S41h.
docs/internal/actions-public-runners.mdadded
@@ -0,0 +1,124 @@
1
+# Shared Actions Runner Readiness
2
+
3
+This document is the S41j-6 readiness packet for letting normal repositories
4
+use the shithub.sh shared Actions pool. S41j is the safety and operations track.
5
+S41k is the Actions UI parity track; S41k improves the product surface but is
6
+not a blocker for controlled arbitrary-repo execution.
7
+
8
+## Current Decision
9
+
10
+Status: **controlled dogfood, not broad public GA**.
11
+
12
+The platform can run ordinary repositories that meet the Actions policy, runner
13
+label, and syntax constraints. Broad public shared-runner enablement should wait
14
+until this checklist has a deployed/manual pass and no open Critical or High
15
+findings.
16
+
17
+## Eligibility Contract
18
+
19
+A repository may use the shared pool when all of these are true:
20
+
21
+1. Site Actions policy allows runner dispatch. The site switch is a hard kill
22
+   switch and overrides repo/org policy.
23
+2. The repo or its owner org has Actions enabled, or inherits an enabled site
24
+   policy.
25
+3. The workflow lives under `.shithub/workflows/*.yml` and parses under the
26
+   supported v1 subset.
27
+4. The triggering actor can run Actions on that repo. Untrusted pull requests
28
+   queue in an approval-required state before runner dispatch.
29
+5. The job's `runs-on` labels match an online runner, normally
30
+   `ubuntu-latest` for the first shared Linux pool.
31
+6. Repo queued-run, repo concurrency, owner concurrency, and actor hourly caps
32
+   permit the run.
33
+7. The repo is not archived or deleted.
34
+
35
+For public shithub.sh rollout, operators should keep site caps conservative and
36
+raise them only after real queue/claim/host-cost data exists.
37
+
38
+## Billing And Entitlements
39
+
40
+Billing is present, but Actions minute metering is not enforcement-ready yet.
41
+The current entitlement boundary includes:
42
+
43
+- `org.actions_minutes_quota`
44
+- `LimitOrgActionsMinutesQuota`
45
+
46
+`LimitOrgActionsMinutesQuota` intentionally reports no concrete number until
47
+usage accounting lands. Do not gate public shared-runner execution by scattered
48
+plan checks. When billing gates arrive, they must go through
49
+`internal/entitlements` and keep authorization separate from entitlement
50
+denials.
51
+
52
+Recommended rollout posture:
53
+
54
+- personal/public dogfood repos: allowed only under site policy and conservative
55
+  caps;
56
+- organization-level Actions secrets/variables: already Team-gated;
57
+- paid shared-runner minutes: defer until metering can record usage and enforce
58
+  limits consistently;
59
+- unpaid or past-due orgs: keep paid-only Actions configuration read-only, but
60
+  do not delete secrets, variables, or prior run history.
61
+
62
+## S41j-6 Findings
63
+
64
+| ID | Severity | Status | Finding | Resolution |
65
+| --- | --- | --- | --- | --- |
66
+| S41J6-H1 | High | Fixed in S41j-6 | Site Actions disable was not a hard kill switch; explicit repo/org enablement could still evaluate true and queued jobs could be claimed. | Effective policy and runner claim SQL now return false whenever `actions_site_policy.actions_enabled=false`. Tests cover enqueue-time policy and claim-time dispatch. |
67
+| S41J6-M1 | Medium | Open with compensating control | Actions minutes billing has an entitlement key but no usage accounting or numeric limits. | Do not market or sell metered Actions minutes yet. Use site/org/repo policy caps and runner capacity as the public-runner control until billing SP08 defines usage accounting. |
68
+| S41J6-M2 | Medium | Manual validation pending | The S41j-5 arbitrary-repo smoke must run on production after deploy. | Run the scratch plus second-repo checklist in `runbooks/actions-runner.md` before declaring broad availability. |
69
+
70
+No Critical findings are open in this packet.
71
+
72
+## Required Evidence Before Broad Enablement
73
+
74
+- `scripts/audit-actions-public-runners.sh` passes on the deployed commit.
75
+- Focused Go tests pass for site kill switch, repo/owner concurrency caps,
76
+  unsupported label diagnostics, token gates, and untrusted PR secret behavior.
77
+- Live smoke passes on `mfwolffe/scratch`.
78
+- Live smoke passes on at least one additional normal public repository with
79
+  `runs-on: ubuntu-latest`.
80
+- Unsupported-label workflow shows a queued diagnostic with zero matching
81
+  runners.
82
+- Untrusted pull request run receives no secrets or mask values before approval.
83
+- Drained and revoked runners do not claim or complete new work.
84
+- A job-container network bypass attempt cannot reach direct IP destinations or
85
+  the DigitalOcean metadata service unless explicitly allowlisted.
86
+
87
+## Operator Controls
88
+
89
+Emergency stop:
90
+
91
+```sql
92
+UPDATE actions_site_policy
93
+   SET actions_enabled = false,
94
+       updated_at = now()
95
+ WHERE id = true;
96
+```
97
+
98
+After this change, newly matched workflows should be skipped by policy and
99
+already queued jobs should not be claimed by runners. Keep this SQL in the
100
+incident runbook until a site-admin UI exists.
101
+
102
+Capacity limits:
103
+
104
+- `max_repo_queued_runs` bounds backlog.
105
+- `max_repo_concurrent_jobs` bounds active jobs for one repository.
106
+- `max_owner_concurrent_jobs` bounds active jobs across one user or org owner.
107
+- `actor_trigger_limit_per_hour` bounds trigger spam by a single actor.
108
+
109
+These are policy controls, not billing meters. They protect the shared pool
110
+while Actions minute accounting is still future work.
111
+
112
+## Relationship To S41k
113
+
114
+S41k should follow S41j because it is UI parity:
115
+
116
+- Actions sidebar and management placeholders;
117
+- workflow-specific run pages;
118
+- run graph canvas;
119
+- log viewer and annotations;
120
+- caches, runners, and metrics pages.
121
+
122
+None of those replace S41j's security gates. S41k can make unsupported labels,
123
+queue state, runner health, and usage easier to see, but it should not be the
124
+first line of defense for arbitrary code execution.
docs/internal/runbooks/actions-runner.mdmodified
@@ -133,6 +133,19 @@ Emergency controls:
133
   `shithub-actions-runner` in DigitalOcean, then rotate or revoke the
133
   `shithub-actions-runner` in DigitalOcean, then rotate or revoke the
134
   affected runner tokens before allowing replacement hosts to connect.
134
   affected runner tokens before allowing replacement hosts to connect.
135
 
135
 
136
+The site-level disable path is a hard kill switch and overrides repo/org
137
+Actions policy. Until a site-admin UI exists, use:
138
+
139
+```sql
140
+UPDATE actions_site_policy
141
+   SET actions_enabled = false,
142
+       updated_at = now()
143
+ WHERE id = true;
144
+```
145
+
146
+The public shared-runner rollout criteria live in
147
+[`actions-public-runners.md`](../actions-public-runners.md).
148
+
136
 Equivalent config file:
149
 Equivalent config file:
137
 
150
 
138
 ```toml
151
 ```toml
scripts/audit-actions-public-runners.shadded
@@ -0,0 +1,73 @@
1
+#!/usr/bin/env bash
2
+# SPDX-License-Identifier: AGPL-3.0-or-later
3
+
4
+set -eu
5
+
6
+ROOT="$(git rev-parse --show-toplevel)"
7
+cd "$ROOT"
8
+
9
+fail() {
10
+  printf 'audit-actions-public-runners: %s\n' "$*" >&2
11
+  exit 1
12
+}
13
+
14
+ok() {
15
+  printf 'ok: %s\n' "$*"
16
+}
17
+
18
+require_file() {
19
+  [ -f "$1" ] || fail "missing required file: $1"
20
+  ok "found $1"
21
+}
22
+
23
+require_grep() {
24
+  pattern="$1"
25
+  file="$2"
26
+  desc="$3"
27
+  rg -q -- "$pattern" "$file" || fail "$desc not found in $file"
28
+  ok "$desc"
29
+}
30
+
31
+require_file "docs/internal/actions-public-runners.md"
32
+require_file "docs/internal/runbooks/actions-runner.md"
33
+require_file "docs/internal/runbooks/runner-deploy.md"
34
+require_file "deploy/doctl/provision-actions-runner-pool.sh"
35
+require_file "deploy/doctl/generate-actions-runner-inventory.sh"
36
+require_file "deploy/runner-config/firewall.sh.j2"
37
+require_file "deploy/runner-config/dnsmasq.conf.j2"
38
+require_file "deploy/runner-config/seccomp.json"
39
+
40
+require_grep 'WHEN COALESCE\(sp\.actions_enabled, true\) = false THEN false' \
41
+  "internal/actions/queries/actions_policy.sql" \
42
+  "site kill switch in effective policy"
43
+require_grep 'WHEN COALESCE\(sp\.actions_enabled, true\) = false THEN false' \
44
+  "internal/actions/queries/workflow_jobs.sql" \
45
+  "site kill switch in runner claim"
46
+require_grep 'TestEvaluateTrigger_SiteDisableOverridesRepoEnable' \
47
+  "internal/actions/policy/policy_test.go" \
48
+  "enqueue-time site kill switch test"
49
+require_grep 'TestRunnerHeartbeatSiteDisableOverridesRepoEnable' \
50
+  "internal/web/handlers/api/runners_test.go" \
51
+  "claim-time site kill switch test"
52
+require_grep 'TestRunnerHeartbeatRespectsRepoConcurrencyCap' \
53
+  "internal/web/handlers/api/runners_test.go" \
54
+  "repo concurrency claim test"
55
+require_grep 'TestRunnerHeartbeatRespectsOwnerConcurrencyCap' \
56
+  "internal/web/handlers/api/runners_test.go" \
57
+  "owner concurrency claim test"
58
+
59
+require_grep '--cap-drop=ALL' "internal/runner/engine/docker.go" "cap drop in Docker engine"
60
+require_grep '--read-only' "internal/runner/engine/docker.go" "read-only rootfs in Docker engine"
61
+require_grep '--security-opt=no-new-privileges' "internal/runner/engine/docker.go" "no-new-privileges in Docker engine"
62
+require_grep 'seccomp=' "internal/runner/engine/docker.go" "seccomp profile in Docker engine"
63
+require_grep '--user' "internal/runner/engine/docker.go" "non-root container user in Docker engine"
64
+require_grep 'rejects direct-IP' "docs/internal/runbooks/runner-deploy.md" "direct-IP egress runbook note"
65
+require_grep '-j REJECT' "deploy/runner-config/firewall.sh.j2" "runner firewall default reject"
66
+require_grep 'Do not put runner tokens' "deploy/doctl/actions-runner-cloud-init.yaml" "no-secret cloud-init warning"
67
+
68
+require_grep 'FeatureOrgActionsMinutesQuota' "internal/entitlements/entitlements.go" "actions minutes entitlement key"
69
+require_grep 'LimitOrgActionsMinutesQuota' "internal/entitlements/entitlements.go" "actions minutes limit key"
70
+require_grep 'no concrete number until' "docs/internal/actions-public-runners.md" "billing-metering caveat"
71
+require_grep 'controlled dogfood, not broad public GA' "docs/internal/actions-public-runners.md" "public runner rollout status"
72
+
73
+ok "S41j-6 public runner readiness static audit complete"