@@ -0,0 +1,117 @@ |
| | 1 | +# Actions GA readiness and dogfood decision |
| | 2 | + |
| | 3 | +This is the S41h-5 pre-GA packet for shithub Actions. It records what is ready, |
| | 4 | +what remains intentionally deferred, and why shithub's full project CI is not |
| | 5 | +yet moved from GitHub Actions to shithub Actions. |
| | 6 | + |
| | 7 | +## Current decision |
| | 8 | + |
| | 9 | +Do not move `.github/workflows/ci.yml` to `.shithub/workflows/ci.yml` in S41h. |
| | 10 | + |
| | 11 | +The v1 runner is useful and dogfoodable, but it is not a drop-in GitHub Actions |
| | 12 | +runner. The current project CI uses: |
| | 13 | + |
| | 14 | +- `actions/setup-go@v5` |
| | 15 | +- `golangci/golangci-lint-action@v8` |
| | 16 | +- GitHub-hosted runner caches and tool installation semantics |
| | 17 | + |
| | 18 | +shithub Actions v1 intentionally accepts only: |
| | 19 | + |
| | 20 | +- `actions/checkout@v4` |
| | 21 | +- `shithub/upload-artifact@v1` |
| | 22 | +- `shithub/download-artifact@v1` |
| | 23 | +- ordinary `run:` steps |
| | 24 | + |
| | 25 | +The committed dogfood path is therefore |
| | 26 | +`.shithub/workflows/checkout-canary.yml`. That canary proves the self-hosted |
| | 27 | +trigger pipeline, runner claim, scoped checkout credential, containerized |
| | 28 | +`run:` execution, log path, check-run sync, and Actions UI without pretending |
| | 29 | +marketplace-action parity exists. |
| | 30 | + |
| | 31 | +## Promotion criteria for full CI |
| | 32 | + |
| | 33 | +Move the project's full CI to shithub Actions only after all criteria below are |
| | 34 | +true: |
| | 35 | + |
| | 36 | +1. The default runner image or a first-party setup step provides Go, git, bash, |
| | 37 | + `golangci-lint`, and any build tools required by `make ci`. |
| | 38 | +2. The workflow can be expressed with `actions/checkout@v4` plus `run:` steps, |
| | 39 | + or shithub grows first-party equivalents for the missing setup/cache steps. |
| | 40 | +3. Required status checks on protected branches point at shithub check runs and |
| | 41 | + preserve the same merge gate strength as the current GitHub-hosted CI. |
| | 42 | +4. Production deploy remains gated separately from untrusted pull-request code. |
| | 43 | + Deploy secrets must not be available to fork/PR workflows. |
| | 44 | +5. The checkout canary and a production-like `make ci` workflow both pass on a |
| | 45 | + trusted runner host. |
| | 46 | +6. The Actions load harness completes without queue starvation, runner |
| | 47 | + deadlock, or log p99 above five seconds. |
| | 48 | +7. The pre-GA audit below has no open Critical or High findings. |
| | 49 | + |
| | 50 | +## Evidence matrix |
| | 51 | + |
| | 52 | +| Area | Evidence | Notes | |
| | 53 | +|---|---|---| |
| | 54 | +| Workflow parser and dialect | `internal/actions/workflow`, `docs/internal/actions-schema.md`, `docs/public/user/actions.md` | Unknown YAML keys and unsupported `uses:` aliases are rejected at parse time. | |
| | 55 | +| Trigger idempotency | `internal/actions/trigger`, `workflow_runs.trigger_event_id` | Retries and admin replays do not duplicate runs for the same triggering event. | |
| | 56 | +| Secrets and variables | `internal/actions/secrets`, `internal/actions/variables`, `workflow_job_secret_masks` | Secrets are AEAD-encrypted at rest; claim-time mask snapshots preserve log masking after rotation. | |
| | 57 | +| Runner JWT replay gate | `internal/auth/runnerjwt`, `runner_jwt_used`, `internal/web/handlers/api/runners.go` | Job API JWTs are short-lived and single-use. Checkout JWTs are separate and scoped to read-only git fetch. | |
| | 58 | +| Checkout credential scope | `internal/web/handlers/githttp` tests | Checkout credentials permit `git-upload-pack` for the claimed repo only and reject pushes. | |
| | 59 | +| Sandbox controls | `internal/runner/engine`, `deploy/runner-config`, `deploy/ansible/roles/shithubd-runner` | Container defaults drop caps, run as uid 65534, use read-only rootfs, pids/memory/CPU limits, seccomp, and no-new-privileges. | |
| | 60 | +| Network egress controls | `deploy/runner-config/dnsmasq.conf.j2`, `deploy/runner-config/firewall.sh.j2` | Runner role closes direct-IP egress by combining an allowlist resolver with ipset firewall rules. | |
| | 61 | +| Log safety | `internal/runner/scrub`, server-side runner log path, `shithub_actions_log_scrub_replacements_total` | Runner scrubs before send; server re-scrubs claim-time mask values and handles chunk-boundary secrets. | |
| | 62 | +| Lifecycle controls | `internal/actions/lifecycle`, repo Actions handlers, runner cancel-check | Cancel, re-run, timeout, concurrency, and retention paths are covered by focused tests and runbooks. | |
| | 63 | +| Observability | `deploy/monitoring/grafana/dashboards/actions.json`, `deploy/monitoring/prometheus/rules.yml` | Dashboard and alerts cover stale runners, queue depth, p99 regression, and scrubber health. | |
| | 64 | +| Operator docs | `docs/internal/runbooks/actions.md`, `docs/internal/runbooks/runner-deploy.md` | A fresh operator has register, deploy, smoke, emergency cancel, load, and incident procedures. | |
| | 65 | + |
| | 66 | +## Pre-GA audit checklist |
| | 67 | + |
| | 68 | +Run the local static packet first: |
| | 69 | + |
| | 70 | +```sh |
| | 71 | +make audit-actions-ga |
| | 72 | +``` |
| | 73 | + |
| | 74 | +Then run focused tests: |
| | 75 | + |
| | 76 | +```sh |
| | 77 | +go test -trimpath ./internal/actions/... ./internal/auth/runnerjwt ./internal/runner/... ./internal/web/handlers/api ./internal/web/handlers/repo ./internal/web/handlers/githttp |
| | 78 | +``` |
| | 79 | + |
| | 80 | +For a production-like runner host, manually verify: |
| | 81 | + |
| | 82 | +1. Register a new runner with `self-hosted,linux,ubuntu-latest`. |
| | 83 | +2. Trigger `.shithub/workflows/checkout-canary.yml` on trunk. |
| | 84 | +3. Confirm the run appears in the Actions tab and the check run completes. |
| | 85 | +4. Confirm step logs stream while the job is running and finalize to object |
| | 86 | + storage after completion. |
| | 87 | +5. Run the sandbox smoke from `docs/internal/runbooks/runner-deploy.md`. |
| | 88 | +6. Re-test the S41e network fix: direct-IP egress and workflow-supplied |
| | 89 | + resolvers must be blocked by the runner bridge firewall. |
| | 90 | +7. Echo a controlled test secret and confirm logs show `***`, not plaintext. |
| | 91 | +8. Reuse a consumed job JWT and confirm the API returns 401. |
| | 92 | +9. Run `bench/k6/actions-load.js` against a pre-seeded queue. |
| | 93 | +10. Verify the Grafana Actions dashboard is populated and Prometheus alert |
| | 94 | + expressions parse in the monitoring stack. |
| | 95 | + |
| | 96 | +## Accepted S41h deferrals |
| | 97 | + |
| | 98 | +These are not S41h GA blockers because they were explicitly parked from v1 or |
| | 99 | +depend on the post-GA Nix/tooling work: |
| | 100 | + |
| | 101 | +- Full marketplace-action compatibility. |
| | 102 | +- `actions/setup-go`, `golangci-lint-action`, cache actions, Docker actions, |
| | 103 | + and composite actions. |
| | 104 | +- Matrix builds and reusable workflows. |
| | 105 | +- Hosted runner image provisioning by workflow authors. `runs-on` is a label |
| | 106 | + selector; operators own the backing image. |
| | 107 | +- Full project CI migration from `.github/workflows/ci.yml` to |
| | 108 | + `.shithub/workflows/ci.yml`. |
| | 109 | + |
| | 110 | +## Next sprint hook |
| | 111 | + |
| | 112 | +S41i should close the toolchain gap without weakening v1's security boundary: |
| | 113 | + |
| | 114 | +- keep marketplace `uses:` rejected; |
| | 115 | +- provide a reproducible runner image or Nix engine path that can run `make ci` |
| | 116 | + from first-party `run:` steps; |
| | 117 | +- keep deploy secrets out of untrusted PR workflows. |