@@ -0,0 +1,124 @@ |
| 1 | +# Shared Actions Runner Readiness |
| 2 | + |
| 3 | +This document is the S41j-6 readiness packet for letting normal repositories |
| 4 | +use the shithub.sh shared Actions pool. S41j is the safety and operations track. |
| 5 | +S41k is the Actions UI parity track; S41k improves the product surface but is |
| 6 | +not a blocker for controlled arbitrary-repo execution. |
| 7 | + |
| 8 | +## Current Decision |
| 9 | + |
| 10 | +Status: **controlled dogfood, not broad public GA**. |
| 11 | + |
| 12 | +The platform can run ordinary repositories that meet the Actions policy, runner |
| 13 | +label, and syntax constraints. Broad public shared-runner enablement should wait |
| 14 | +until this checklist has a deployed/manual pass and no open Critical or High |
| 15 | +findings. |
| 16 | + |
| 17 | +## Eligibility Contract |
| 18 | + |
| 19 | +A repository may use the shared pool when all of these are true: |
| 20 | + |
| 21 | +1. Site Actions policy allows runner dispatch. The site switch is a hard kill |
| 22 | + switch and overrides repo/org policy. |
| 23 | +2. The repo or its owner org has Actions enabled, or inherits an enabled site |
| 24 | + policy. |
| 25 | +3. The workflow lives under `.shithub/workflows/*.yml` and parses under the |
| 26 | + supported v1 subset. |
| 27 | +4. The triggering actor can run Actions on that repo. Untrusted pull requests |
| 28 | + queue in an approval-required state before runner dispatch. |
| 29 | +5. The job's `runs-on` labels match an online runner, normally |
| 30 | + `ubuntu-latest` for the first shared Linux pool. |
| 31 | +6. Repo queued-run, repo concurrency, owner concurrency, and actor hourly caps |
| 32 | + permit the run. |
| 33 | +7. The repo is not archived or deleted. |
| 34 | + |
| 35 | +For public shithub.sh rollout, operators should keep site caps conservative and |
| 36 | +raise them only after real queue/claim/host-cost data exists. |
| 37 | + |
| 38 | +## Billing And Entitlements |
| 39 | + |
| 40 | +Billing is present, but Actions minute metering is not enforcement-ready yet. |
| 41 | +The current entitlement boundary includes: |
| 42 | + |
| 43 | +- `org.actions_minutes_quota` |
| 44 | +- `LimitOrgActionsMinutesQuota` |
| 45 | + |
| 46 | +`LimitOrgActionsMinutesQuota` intentionally reports no concrete number until |
| 47 | +usage accounting lands. Do not gate public shared-runner execution by scattered |
| 48 | +plan checks. When billing gates arrive, they must go through |
| 49 | +`internal/entitlements` and keep authorization separate from entitlement |
| 50 | +denials. |
| 51 | + |
| 52 | +Recommended rollout posture: |
| 53 | + |
| 54 | +- personal/public dogfood repos: allowed only under site policy and conservative |
| 55 | + caps; |
| 56 | +- organization-level Actions secrets/variables: already Team-gated; |
| 57 | +- paid shared-runner minutes: defer until metering can record usage and enforce |
| 58 | + limits consistently; |
| 59 | +- unpaid or past-due orgs: keep paid-only Actions configuration read-only, but |
| 60 | + do not delete secrets, variables, or prior run history. |
| 61 | + |
| 62 | +## S41j-6 Findings |
| 63 | + |
| 64 | +| ID | Severity | Status | Finding | Resolution | |
| 65 | +| --- | --- | --- | --- | --- | |
| 66 | +| S41J6-H1 | High | Fixed in S41j-6 | Site Actions disable was not a hard kill switch; explicit repo/org enablement could still evaluate true and queued jobs could be claimed. | Effective policy and runner claim SQL now return false whenever `actions_site_policy.actions_enabled=false`. Tests cover enqueue-time policy and claim-time dispatch. | |
| 67 | +| S41J6-M1 | Medium | Open with compensating control | Actions minutes billing has an entitlement key but no usage accounting or numeric limits. | Do not market or sell metered Actions minutes yet. Use site/org/repo policy caps and runner capacity as the public-runner control until billing SP08 defines usage accounting. | |
| 68 | +| S41J6-M2 | Medium | Manual validation pending | The S41j-5 arbitrary-repo smoke must run on production after deploy. | Run the scratch plus second-repo checklist in `runbooks/actions-runner.md` before declaring broad availability. | |
| 69 | + |
| 70 | +No Critical findings are open in this packet. |
| 71 | + |
| 72 | +## Required Evidence Before Broad Enablement |
| 73 | + |
| 74 | +- `scripts/audit-actions-public-runners.sh` passes on the deployed commit. |
| 75 | +- Focused Go tests pass for site kill switch, repo/owner concurrency caps, |
| 76 | + unsupported label diagnostics, token gates, and untrusted PR secret behavior. |
| 77 | +- Live smoke passes on `mfwolffe/scratch`. |
| 78 | +- Live smoke passes on at least one additional normal public repository with |
| 79 | + `runs-on: ubuntu-latest`. |
| 80 | +- Unsupported-label workflow shows a queued diagnostic with zero matching |
| 81 | + runners. |
| 82 | +- Untrusted pull request run receives no secrets or mask values before approval. |
| 83 | +- Drained and revoked runners do not claim or complete new work. |
| 84 | +- A job-container network bypass attempt cannot reach direct IP destinations or |
| 85 | + the DigitalOcean metadata service unless explicitly allowlisted. |
| 86 | + |
| 87 | +## Operator Controls |
| 88 | + |
| 89 | +Emergency stop: |
| 90 | + |
| 91 | +```sql |
| 92 | +UPDATE actions_site_policy |
| 93 | + SET actions_enabled = false, |
| 94 | + updated_at = now() |
| 95 | + WHERE id = true; |
| 96 | +``` |
| 97 | + |
| 98 | +After this change, newly matched workflows should be skipped by policy and |
| 99 | +already queued jobs should not be claimed by runners. Keep this SQL in the |
| 100 | +incident runbook until a site-admin UI exists. |
| 101 | + |
| 102 | +Capacity limits: |
| 103 | + |
| 104 | +- `max_repo_queued_runs` bounds backlog. |
| 105 | +- `max_repo_concurrent_jobs` bounds active jobs for one repository. |
| 106 | +- `max_owner_concurrent_jobs` bounds active jobs across one user or org owner. |
| 107 | +- `actor_trigger_limit_per_hour` bounds trigger spam by a single actor. |
| 108 | + |
| 109 | +These are policy controls, not billing meters. They protect the shared pool |
| 110 | +while Actions minute accounting is still future work. |
| 111 | + |
| 112 | +## Relationship To S41k |
| 113 | + |
| 114 | +S41k should follow S41j because it is UI parity: |
| 115 | + |
| 116 | +- Actions sidebar and management placeholders; |
| 117 | +- workflow-specific run pages; |
| 118 | +- run graph canvas; |
| 119 | +- log viewer and annotations; |
| 120 | +- caches, runners, and metrics pages. |
| 121 | + |
| 122 | +None of those replace S41j's security gates. S41k can make unsupported labels, |
| 123 | +queue state, runner health, and usage easier to see, but it should not be the |
| 124 | +first line of defense for arbitrary code execution. |