Actions GA readiness and dogfood decision
This is the S41h-5 pre-GA packet for shithub Actions. It records what is ready, what remains intentionally deferred, and why shithub's full project CI is not yet moved from GitHub Actions to shithub Actions.
For shared-pool public runner rollout, see
actions-public-runners.md. S41h covers whether
shithub can dogfood its own CI; S41j covers whether normal repositories can use
the production runner pool safely.
Current decision
Do not move .github/workflows/ci.yml to .shithub/workflows/ci.yml in S41h.
The v1 runner is useful and dogfoodable, but it is not a drop-in GitHub Actions runner. The current project CI uses:
actions/setup-go@v5golangci/golangci-lint-action@v8- GitHub-hosted runner caches and tool installation semantics
shithub Actions v1 intentionally accepts only:
actions/checkout@v4shithub/upload-artifact@v1shithub/download-artifact@v1- ordinary
run:steps
The committed dogfood path is therefore
.shithub/workflows/checkout-canary.yml. That canary proves the self-hosted
trigger pipeline, runner claim, scoped checkout credential, containerized
run: execution, log path, check-run sync, and Actions UI without pretending
marketplace-action parity exists.
Promotion criteria for full CI
Move the project's full CI to shithub Actions only after all criteria below are true:
- The default runner image or a first-party setup step provides Go, git, bash,
golangci-lint, and any build tools required bymake ci. - The workflow can be expressed with
actions/checkout@v4plusrun:steps, or shithub grows first-party equivalents for the missing setup/cache steps. - Required status checks on protected branches point at shithub check runs and preserve the same merge gate strength as the current GitHub-hosted CI.
- Production deploy remains gated separately from untrusted pull-request code. Deploy secrets must not be available to fork/PR workflows.
- The checkout canary and a production-like
make ciworkflow both pass on a trusted runner host. - The Actions load harness completes without queue starvation, runner deadlock, or log p99 above five seconds.
- The pre-GA audit below has no open Critical or High findings.
Evidence matrix
| Area | Evidence | Notes |
|---|---|---|
| Workflow parser and dialect | internal/actions/workflow, docs/internal/actions-schema.md, docs/public/user/actions.md |
Unknown YAML keys and unsupported uses: aliases are rejected at parse time. |
| Trigger idempotency | internal/actions/trigger, workflow_runs.trigger_event_id |
Retries and admin replays do not duplicate runs for the same triggering event. |
| Secrets and variables | internal/actions/secrets, internal/actions/variables, workflow_job_secret_masks |
Secrets are AEAD-encrypted at rest; claim-time mask snapshots preserve log masking after rotation. |
| Runner JWT replay gate | internal/auth/runnerjwt, runner_jwt_used, internal/web/handlers/api/runners.go |
Job API JWTs are short-lived and single-use. Checkout JWTs are separate and scoped to read-only git fetch. |
| Checkout credential scope | internal/web/handlers/githttp tests |
Checkout credentials permit git-upload-pack for the claimed repo only and reject pushes. |
| Sandbox controls | internal/runner/engine, deploy/runner-config, deploy/ansible/roles/shithubd-runner |
Container defaults drop caps, run as uid 65534, use read-only rootfs, pids/memory/CPU limits, seccomp, and no-new-privileges. |
| Network egress controls | deploy/runner-config/dnsmasq.conf.j2, deploy/runner-config/firewall.sh.j2 |
Runner role closes direct-IP egress by combining an allowlist resolver with ipset firewall rules. |
| Log safety | internal/runner/scrub, server-side runner log path, shithub_actions_log_scrub_replacements_total |
Runner scrubs before send; server re-scrubs claim-time mask values and handles chunk-boundary secrets. |
| Lifecycle controls | internal/actions/lifecycle, repo Actions handlers, runner cancel-check |
Cancel, re-run, timeout, concurrency, and retention paths are covered by focused tests and runbooks. |
| Observability | deploy/monitoring/grafana/dashboards/actions.json, deploy/monitoring/prometheus/rules.yml |
Dashboard and alerts cover stale runners, queue depth, p99 regression, and scrubber health. |
| Operator docs | docs/internal/runbooks/actions.md, docs/internal/runbooks/runner-deploy.md |
A fresh operator has register, deploy, smoke, emergency cancel, load, and incident procedures. |
Pre-GA audit checklist
Run the local static packet first:
make audit-actions-ga
Then run focused tests:
go test -trimpath ./internal/actions/... ./internal/auth/runnerjwt ./internal/runner/... ./internal/web/handlers/api ./internal/web/handlers/repo ./internal/web/handlers/githttp
For a production-like runner host, manually verify:
- Register a new runner with
self-hosted,linux,ubuntu-latest,x64. - Trigger
.shithub/workflows/checkout-canary.ymlon trunk. - Confirm the run appears in the Actions tab and the check run completes.
- Confirm step logs stream while the job is running and finalize to object storage after completion.
- Run the sandbox smoke from
docs/internal/runbooks/runner-deploy.md. - Re-test the S41e network fix: direct-IP egress and workflow-supplied resolvers must be blocked by the runner bridge firewall.
- Echo a controlled test secret and confirm logs show
***, not plaintext. - Reuse a consumed job JWT and confirm the API returns 401.
- Run
bench/k6/actions-load.jsagainst a pre-seeded queue. - Verify the Grafana Actions dashboard is populated and Prometheus alert expressions parse in the monitoring stack.
Accepted S41h deferrals
These are not S41h GA blockers because they were explicitly parked from v1 or depend on the post-GA Nix/tooling work:
- Full marketplace-action compatibility.
actions/setup-go,golangci-lint-action, cache actions, Docker actions, and composite actions.- Matrix builds and reusable workflows.
- Hosted runner image provisioning by workflow authors.
runs-onis a label selector; operators own the backing image. - Full project CI migration from
.github/workflows/ci.ymlto.shithub/workflows/ci.yml.
Next sprint hook
S41i should close the toolchain gap without weakening v1's security boundary:
- keep marketplace
uses:rejected; - provide a reproducible runner image or Nix engine path that can run
make cifrom first-partyrun:steps; - keep deploy secrets out of untrusted PR workflows.
View source
| 1 | # Actions GA readiness and dogfood decision |
| 2 | |
| 3 | This is the S41h-5 pre-GA packet for shithub Actions. It records what is ready, |
| 4 | what remains intentionally deferred, and why shithub's full project CI is not |
| 5 | yet moved from GitHub Actions to shithub Actions. |
| 6 | |
| 7 | For shared-pool public runner rollout, see |
| 8 | [`actions-public-runners.md`](./actions-public-runners.md). S41h covers whether |
| 9 | shithub can dogfood its own CI; S41j covers whether normal repositories can use |
| 10 | the production runner pool safely. |
| 11 | |
| 12 | ## Current decision |
| 13 | |
| 14 | Do not move `.github/workflows/ci.yml` to `.shithub/workflows/ci.yml` in S41h. |
| 15 | |
| 16 | The v1 runner is useful and dogfoodable, but it is not a drop-in GitHub Actions |
| 17 | runner. The current project CI uses: |
| 18 | |
| 19 | - `actions/setup-go@v5` |
| 20 | - `golangci/golangci-lint-action@v8` |
| 21 | - GitHub-hosted runner caches and tool installation semantics |
| 22 | |
| 23 | shithub Actions v1 intentionally accepts only: |
| 24 | |
| 25 | - `actions/checkout@v4` |
| 26 | - `shithub/upload-artifact@v1` |
| 27 | - `shithub/download-artifact@v1` |
| 28 | - ordinary `run:` steps |
| 29 | |
| 30 | The committed dogfood path is therefore |
| 31 | `.shithub/workflows/checkout-canary.yml`. That canary proves the self-hosted |
| 32 | trigger pipeline, runner claim, scoped checkout credential, containerized |
| 33 | `run:` execution, log path, check-run sync, and Actions UI without pretending |
| 34 | marketplace-action parity exists. |
| 35 | |
| 36 | ## Promotion criteria for full CI |
| 37 | |
| 38 | Move the project's full CI to shithub Actions only after all criteria below are |
| 39 | true: |
| 40 | |
| 41 | 1. The default runner image or a first-party setup step provides Go, git, bash, |
| 42 | `golangci-lint`, and any build tools required by `make ci`. |
| 43 | 2. The workflow can be expressed with `actions/checkout@v4` plus `run:` steps, |
| 44 | or shithub grows first-party equivalents for the missing setup/cache steps. |
| 45 | 3. Required status checks on protected branches point at shithub check runs and |
| 46 | preserve the same merge gate strength as the current GitHub-hosted CI. |
| 47 | 4. Production deploy remains gated separately from untrusted pull-request code. |
| 48 | Deploy secrets must not be available to fork/PR workflows. |
| 49 | 5. The checkout canary and a production-like `make ci` workflow both pass on a |
| 50 | trusted runner host. |
| 51 | 6. The Actions load harness completes without queue starvation, runner |
| 52 | deadlock, or log p99 above five seconds. |
| 53 | 7. The pre-GA audit below has no open Critical or High findings. |
| 54 | |
| 55 | ## Evidence matrix |
| 56 | |
| 57 | | Area | Evidence | Notes | |
| 58 | |---|---|---| |
| 59 | | Workflow parser and dialect | `internal/actions/workflow`, `docs/internal/actions-schema.md`, `docs/public/user/actions.md` | Unknown YAML keys and unsupported `uses:` aliases are rejected at parse time. | |
| 60 | | Trigger idempotency | `internal/actions/trigger`, `workflow_runs.trigger_event_id` | Retries and admin replays do not duplicate runs for the same triggering event. | |
| 61 | | Secrets and variables | `internal/actions/secrets`, `internal/actions/variables`, `workflow_job_secret_masks` | Secrets are AEAD-encrypted at rest; claim-time mask snapshots preserve log masking after rotation. | |
| 62 | | Runner JWT replay gate | `internal/auth/runnerjwt`, `runner_jwt_used`, `internal/web/handlers/api/runners.go` | Job API JWTs are short-lived and single-use. Checkout JWTs are separate and scoped to read-only git fetch. | |
| 63 | | Checkout credential scope | `internal/web/handlers/githttp` tests | Checkout credentials permit `git-upload-pack` for the claimed repo only and reject pushes. | |
| 64 | | Sandbox controls | `internal/runner/engine`, `deploy/runner-config`, `deploy/ansible/roles/shithubd-runner` | Container defaults drop caps, run as uid 65534, use read-only rootfs, pids/memory/CPU limits, seccomp, and no-new-privileges. | |
| 65 | | Network egress controls | `deploy/runner-config/dnsmasq.conf.j2`, `deploy/runner-config/firewall.sh.j2` | Runner role closes direct-IP egress by combining an allowlist resolver with ipset firewall rules. | |
| 66 | | Log safety | `internal/runner/scrub`, server-side runner log path, `shithub_actions_log_scrub_replacements_total` | Runner scrubs before send; server re-scrubs claim-time mask values and handles chunk-boundary secrets. | |
| 67 | | Lifecycle controls | `internal/actions/lifecycle`, repo Actions handlers, runner cancel-check | Cancel, re-run, timeout, concurrency, and retention paths are covered by focused tests and runbooks. | |
| 68 | | Observability | `deploy/monitoring/grafana/dashboards/actions.json`, `deploy/monitoring/prometheus/rules.yml` | Dashboard and alerts cover stale runners, queue depth, p99 regression, and scrubber health. | |
| 69 | | Operator docs | `docs/internal/runbooks/actions.md`, `docs/internal/runbooks/runner-deploy.md` | A fresh operator has register, deploy, smoke, emergency cancel, load, and incident procedures. | |
| 70 | |
| 71 | ## Pre-GA audit checklist |
| 72 | |
| 73 | Run the local static packet first: |
| 74 | |
| 75 | ```sh |
| 76 | make audit-actions-ga |
| 77 | ``` |
| 78 | |
| 79 | Then run focused tests: |
| 80 | |
| 81 | ```sh |
| 82 | go test -trimpath ./internal/actions/... ./internal/auth/runnerjwt ./internal/runner/... ./internal/web/handlers/api ./internal/web/handlers/repo ./internal/web/handlers/githttp |
| 83 | ``` |
| 84 | |
| 85 | For a production-like runner host, manually verify: |
| 86 | |
| 87 | 1. Register a new runner with `self-hosted,linux,ubuntu-latest,x64`. |
| 88 | 2. Trigger `.shithub/workflows/checkout-canary.yml` on trunk. |
| 89 | 3. Confirm the run appears in the Actions tab and the check run completes. |
| 90 | 4. Confirm step logs stream while the job is running and finalize to object |
| 91 | storage after completion. |
| 92 | 5. Run the sandbox smoke from `docs/internal/runbooks/runner-deploy.md`. |
| 93 | 6. Re-test the S41e network fix: direct-IP egress and workflow-supplied |
| 94 | resolvers must be blocked by the runner bridge firewall. |
| 95 | 7. Echo a controlled test secret and confirm logs show `***`, not plaintext. |
| 96 | 8. Reuse a consumed job JWT and confirm the API returns 401. |
| 97 | 9. Run `bench/k6/actions-load.js` against a pre-seeded queue. |
| 98 | 10. Verify the Grafana Actions dashboard is populated and Prometheus alert |
| 99 | expressions parse in the monitoring stack. |
| 100 | |
| 101 | ## Accepted S41h deferrals |
| 102 | |
| 103 | These are not S41h GA blockers because they were explicitly parked from v1 or |
| 104 | depend on the post-GA Nix/tooling work: |
| 105 | |
| 106 | - Full marketplace-action compatibility. |
| 107 | - `actions/setup-go`, `golangci-lint-action`, cache actions, Docker actions, |
| 108 | and composite actions. |
| 109 | - Matrix builds and reusable workflows. |
| 110 | - Hosted runner image provisioning by workflow authors. `runs-on` is a label |
| 111 | selector; operators own the backing image. |
| 112 | - Full project CI migration from `.github/workflows/ci.yml` to |
| 113 | `.shithub/workflows/ci.yml`. |
| 114 | |
| 115 | ## Next sprint hook |
| 116 | |
| 117 | S41i should close the toolchain gap without weakening v1's security boundary: |
| 118 | |
| 119 | - keep marketplace `uses:` rejected; |
| 120 | - provide a reproducible runner image or Nix engine path that can run `make ci` |
| 121 | from first-party `run:` steps; |
| 122 | - keep deploy secrets out of untrusted PR workflows. |