shithub Public

Watch 1 Fork 0 Star 0

markdown · 6672 bytes Raw Blame History

Actions GA readiness and dogfood decision

This is the S41h-5 pre-GA packet for shithub Actions. It records what is ready, what remains intentionally deferred, and why shithub's full project CI is not yet moved from GitHub Actions to shithub Actions.

For shared-pool public runner rollout, see actions-public-runners.md. S41h covers whether shithub can dogfood its own CI; S41j covers whether normal repositories can use the production runner pool safely.

Current decision

Do not move .github/workflows/ci.yml to .shithub/workflows/ci.yml in S41h.

The v1 runner is useful and dogfoodable, but it is not a drop-in GitHub Actions runner. The current project CI uses:

actions/setup-go@v5
golangci/golangci-lint-action@v8
GitHub-hosted runner caches and tool installation semantics

shithub Actions v1 intentionally accepts only:

actions/checkout@v4
shithub/upload-artifact@v1
shithub/download-artifact@v1
ordinary run: steps

The committed dogfood path is therefore .shithub/workflows/checkout-canary.yml. That canary proves the self-hosted trigger pipeline, runner claim, scoped checkout credential, containerized run: execution, log path, check-run sync, and Actions UI without pretending marketplace-action parity exists.

Promotion criteria for full CI

Move the project's full CI to shithub Actions only after all criteria below are true:

The default runner image or a first-party setup step provides Go, git, bash, golangci-lint, and any build tools required by make ci.
The workflow can be expressed with actions/checkout@v4 plus run: steps, or shithub grows first-party equivalents for the missing setup/cache steps.
Required status checks on protected branches point at shithub check runs and preserve the same merge gate strength as the current GitHub-hosted CI.
Production deploy remains gated separately from untrusted pull-request code. Deploy secrets must not be available to fork/PR workflows.
The checkout canary and a production-like make ci workflow both pass on a trusted runner host.
The Actions load harness completes without queue starvation, runner deadlock, or log p99 above five seconds.
The pre-GA audit below has no open Critical or High findings.

Evidence matrix

Area	Evidence	Notes
Workflow parser and dialect	`internal/actions/workflow`, `docs/internal/actions-schema.md`, `docs/public/user/actions.md`	Unknown YAML keys and unsupported `uses:` aliases are rejected at parse time.
Trigger idempotency	`internal/actions/trigger`, `workflow_runs.trigger_event_id`	Retries and admin replays do not duplicate runs for the same triggering event.
Secrets and variables	`internal/actions/secrets`, `internal/actions/variables`, `workflow_job_secret_masks`	Secrets are AEAD-encrypted at rest; claim-time mask snapshots preserve log masking after rotation.
Runner JWT replay gate	`internal/auth/runnerjwt`, `runner_jwt_used`, `internal/web/handlers/api/runners.go`	Job API JWTs are short-lived and single-use. Checkout JWTs are separate and scoped to read-only git fetch.
Checkout credential scope	`internal/web/handlers/githttp` tests	Checkout credentials permit `git-upload-pack` for the claimed repo only and reject pushes.
Sandbox controls	`internal/runner/engine`, `deploy/runner-config`, `deploy/ansible/roles/shithubd-runner`	Container defaults drop caps, run as uid 65534, use read-only rootfs, pids/memory/CPU limits, seccomp, and no-new-privileges.
Network egress controls	`deploy/runner-config/dnsmasq.conf.j2`, `deploy/runner-config/firewall.sh.j2`	Runner role closes direct-IP egress by combining an allowlist resolver with ipset firewall rules.
Log safety	`internal/runner/scrub`, server-side runner log path, `shithub_actions_log_scrub_replacements_total`	Runner scrubs before send; server re-scrubs claim-time mask values and handles chunk-boundary secrets.
Lifecycle controls	`internal/actions/lifecycle`, repo Actions handlers, runner cancel-check	Cancel, re-run, timeout, concurrency, and retention paths are covered by focused tests and runbooks.
Observability	`deploy/monitoring/grafana/dashboards/actions.json`, `deploy/monitoring/prometheus/rules.yml`	Dashboard and alerts cover stale runners, queue depth, p99 regression, and scrubber health.
Operator docs	`docs/internal/runbooks/actions.md`, `docs/internal/runbooks/runner-deploy.md`	A fresh operator has register, deploy, smoke, emergency cancel, load, and incident procedures.

Pre-GA audit checklist

Run the local static packet first:

make audit-actions-ga

Then run focused tests:

go test -trimpath ./internal/actions/... ./internal/auth/runnerjwt ./internal/runner/... ./internal/web/handlers/api ./internal/web/handlers/repo ./internal/web/handlers/githttp

For a production-like runner host, manually verify:

Register a new runner with self-hosted,linux,ubuntu-latest,x64.
Trigger .shithub/workflows/checkout-canary.yml on trunk.
Confirm the run appears in the Actions tab and the check run completes.
Confirm step logs stream while the job is running and finalize to object storage after completion.
Run the sandbox smoke from docs/internal/runbooks/runner-deploy.md.
Re-test the S41e network fix: direct-IP egress and workflow-supplied resolvers must be blocked by the runner bridge firewall.
Echo a controlled test secret and confirm logs show ***, not plaintext.
Reuse a consumed job JWT and confirm the API returns 401.
Run bench/k6/actions-load.js against a pre-seeded queue.
Verify the Grafana Actions dashboard is populated and Prometheus alert expressions parse in the monitoring stack.

Accepted S41h deferrals

These are not S41h GA blockers because they were explicitly parked from v1 or depend on the post-GA Nix/tooling work:

Full marketplace-action compatibility.
actions/setup-go, golangci-lint-action, cache actions, Docker actions, and composite actions.
Matrix builds and reusable workflows.
Hosted runner image provisioning by workflow authors. runs-on is a label selector; operators own the backing image.
Full project CI migration from .github/workflows/ci.yml to .shithub/workflows/ci.yml.

Next sprint hook

S41i should close the toolchain gap without weakening v1's security boundary:

keep marketplace uses: rejected;
provide a reproducible runner image or Nix engine path that can run make ci from first-party run: steps;
keep deploy secrets out of untrusted PR workflows.

View source

  
        1
        # Actions GA readiness and dogfood decision
      
        2
        
        3
        This is the S41h-5 pre-GA packet for shithub Actions. It records what is ready,
      
        4
        what remains intentionally deferred, and why shithub's full project CI is not
      
        5
        yet moved from GitHub Actions to shithub Actions.
      
        6
        
        7
        For shared-pool public runner rollout, see
      
        8
        [`actions-public-runners.md`](./actions-public-runners.md). S41h covers whether
      
        9
        shithub can dogfood its own CI; S41j covers whether normal repositories can use
      
        10
        the production runner pool safely.
      
        11
        
        12
        ## Current decision
      
        13
        
        14
        Do not move `.github/workflows/ci.yml` to `.shithub/workflows/ci.yml` in S41h.
      
        15
        
        16
        The v1 runner is useful and dogfoodable, but it is not a drop-in GitHub Actions
      
        17
        runner. The current project CI uses:
      
        18
        
        19
        - `actions/setup-go@v5`
      
        20
        - `golangci/golangci-lint-action@v8`
      
        21
        - GitHub-hosted runner caches and tool installation semantics
      
        22
        
        23
        shithub Actions v1 intentionally accepts only:
      
        24
        
        25
        - `actions/checkout@v4`
      
        26
        - `shithub/upload-artifact@v1`
      
        27
        - `shithub/download-artifact@v1`
      
        28
        - ordinary `run:` steps
      
        29
        
        30
        The committed dogfood path is therefore
      
        31
        `.shithub/workflows/checkout-canary.yml`. That canary proves the self-hosted
      
        32
        trigger pipeline, runner claim, scoped checkout credential, containerized
      
        33
        `run:` execution, log path, check-run sync, and Actions UI without pretending
      
        34
        marketplace-action parity exists.
      
        35
        
        36
        ## Promotion criteria for full CI
      
        37
        
        38
        Move the project's full CI to shithub Actions only after all criteria below are
      
        39
        true:
      
        40
        
        41
        1. The default runner image or a first-party setup step provides Go, git, bash,
      
        42
           `golangci-lint`, and any build tools required by `make ci`.
      
        43
        2. The workflow can be expressed with `actions/checkout@v4` plus `run:` steps,
      
        44
           or shithub grows first-party equivalents for the missing setup/cache steps.
      
        45
        3. Required status checks on protected branches point at shithub check runs and
      
        46
           preserve the same merge gate strength as the current GitHub-hosted CI.
      
        47
        4. Production deploy remains gated separately from untrusted pull-request code.
      
        48
           Deploy secrets must not be available to fork/PR workflows.
      
        49
        5. The checkout canary and a production-like `make ci` workflow both pass on a
      
        50
           trusted runner host.
      
        51
        6. The Actions load harness completes without queue starvation, runner
      
        52
           deadlock, or log p99 above five seconds.
      
        53
        7. The pre-GA audit below has no open Critical or High findings.
      
        54
        
        55
        ## Evidence matrix
      
        56
        
        57
        | Area | Evidence | Notes |
      
        58
        |---|---|---|
      
        59
        | Workflow parser and dialect | `internal/actions/workflow`, `docs/internal/actions-schema.md`, `docs/public/user/actions.md` | Unknown YAML keys and unsupported `uses:` aliases are rejected at parse time. |
      
        60
        | Trigger idempotency | `internal/actions/trigger`, `workflow_runs.trigger_event_id` | Retries and admin replays do not duplicate runs for the same triggering event. |
      
        61
        | Secrets and variables | `internal/actions/secrets`, `internal/actions/variables`, `workflow_job_secret_masks` | Secrets are AEAD-encrypted at rest; claim-time mask snapshots preserve log masking after rotation. |
      
        62
        | Runner JWT replay gate | `internal/auth/runnerjwt`, `runner_jwt_used`, `internal/web/handlers/api/runners.go` | Job API JWTs are short-lived and single-use. Checkout JWTs are separate and scoped to read-only git fetch. |
      
        63
        | Checkout credential scope | `internal/web/handlers/githttp` tests | Checkout credentials permit `git-upload-pack` for the claimed repo only and reject pushes. |
      
        64
        | Sandbox controls | `internal/runner/engine`, `deploy/runner-config`, `deploy/ansible/roles/shithubd-runner` | Container defaults drop caps, run as uid 65534, use read-only rootfs, pids/memory/CPU limits, seccomp, and no-new-privileges. |
      
        65
        | Network egress controls | `deploy/runner-config/dnsmasq.conf.j2`, `deploy/runner-config/firewall.sh.j2` | Runner role closes direct-IP egress by combining an allowlist resolver with ipset firewall rules. |
      
        66
        | Log safety | `internal/runner/scrub`, server-side runner log path, `shithub_actions_log_scrub_replacements_total` | Runner scrubs before send; server re-scrubs claim-time mask values and handles chunk-boundary secrets. |
      
        67
        | Lifecycle controls | `internal/actions/lifecycle`, repo Actions handlers, runner cancel-check | Cancel, re-run, timeout, concurrency, and retention paths are covered by focused tests and runbooks. |
      
        68
        | Observability | `deploy/monitoring/grafana/dashboards/actions.json`, `deploy/monitoring/prometheus/rules.yml` | Dashboard and alerts cover stale runners, queue depth, p99 regression, and scrubber health. |
      
        69
        | Operator docs | `docs/internal/runbooks/actions.md`, `docs/internal/runbooks/runner-deploy.md` | A fresh operator has register, deploy, smoke, emergency cancel, load, and incident procedures. |
      
        70
        
        71
        ## Pre-GA audit checklist
      
        72
        
        73
        Run the local static packet first:
      
        74
        
        75
        ```sh
      
        76
        make audit-actions-ga
      
        77
        ```
      
        78
        
        79
        Then run focused tests:
      
        80
        
        81
        ```sh
      
        82
        go test -trimpath ./internal/actions/... ./internal/auth/runnerjwt ./internal/runner/... ./internal/web/handlers/api ./internal/web/handlers/repo ./internal/web/handlers/githttp
      
        83
        ```
      
        84
        
        85
        For a production-like runner host, manually verify:
      
        86
        
        87
        1. Register a new runner with `self-hosted,linux,ubuntu-latest,x64`.
      
        88
        2. Trigger `.shithub/workflows/checkout-canary.yml` on trunk.
      
        89
        3. Confirm the run appears in the Actions tab and the check run completes.
      
        90
        4. Confirm step logs stream while the job is running and finalize to object
      
        91
           storage after completion.
      
        92
        5. Run the sandbox smoke from `docs/internal/runbooks/runner-deploy.md`.
      
        93
        6. Re-test the S41e network fix: direct-IP egress and workflow-supplied
      
        94
           resolvers must be blocked by the runner bridge firewall.
      
        95
        7. Echo a controlled test secret and confirm logs show `***`, not plaintext.
      
        96
        8. Reuse a consumed job JWT and confirm the API returns 401.
      
        97
        9. Run `bench/k6/actions-load.js` against a pre-seeded queue.
      
        98
        10. Verify the Grafana Actions dashboard is populated and Prometheus alert
      
        99
            expressions parse in the monitoring stack.
      
        100
        
        101
        ## Accepted S41h deferrals
      
        102
        
        103
        These are not S41h GA blockers because they were explicitly parked from v1 or
      
        104
        depend on the post-GA Nix/tooling work:
      
        105
        
        106
        - Full marketplace-action compatibility.
      
        107
        - `actions/setup-go`, `golangci-lint-action`, cache actions, Docker actions,
      
        108
          and composite actions.
      
        109
        - Matrix builds and reusable workflows.
      
        110
        - Hosted runner image provisioning by workflow authors. `runs-on` is a label
      
        111
          selector; operators own the backing image.
      
        112
        - Full project CI migration from `.github/workflows/ci.yml` to
      
        113
          `.shithub/workflows/ci.yml`.
      
        114
        
        115
        ## Next sprint hook
      
        116
        
        117
        S41i should close the toolchain gap without weakening v1's security boundary:
      
        118
        
        119
        - keep marketplace `uses:` rejected;
      
        120
        - provide a reproducible runner image or Nix engine path that can run `make ci`
      
        121
          from first-party `run:` steps;
      
        122
        - keep deploy secrets out of untrusted PR workflows.

1	# Actions GA readiness and dogfood decision
2
3	This is the S41h-5 pre-GA packet for shithub Actions. It records what is ready,
4	what remains intentionally deferred, and why shithub's full project CI is not
5	yet moved from GitHub Actions to shithub Actions.
6
7	For shared-pool public runner rollout, see
8	[`actions-public-runners.md`](./actions-public-runners.md). S41h covers whether
9	shithub can dogfood its own CI; S41j covers whether normal repositories can use
10	the production runner pool safely.
11
12	## Current decision
13
14	Do not move `.github/workflows/ci.yml` to `.shithub/workflows/ci.yml` in S41h.
15
16	The v1 runner is useful and dogfoodable, but it is not a drop-in GitHub Actions
17	runner. The current project CI uses:
18
19	- `actions/setup-go@v5`
20	- `golangci/golangci-lint-action@v8`
21	- GitHub-hosted runner caches and tool installation semantics
22
23	shithub Actions v1 intentionally accepts only:
24
25	- `actions/checkout@v4`
26	- `shithub/upload-artifact@v1`
27	- `shithub/download-artifact@v1`
28	- ordinary `run:` steps
29
30	The committed dogfood path is therefore
31	`.shithub/workflows/checkout-canary.yml`. That canary proves the self-hosted
32	trigger pipeline, runner claim, scoped checkout credential, containerized
33	`run:` execution, log path, check-run sync, and Actions UI without pretending
34	marketplace-action parity exists.
35
36	## Promotion criteria for full CI
37
38	Move the project's full CI to shithub Actions only after all criteria below are
39	true:
40
41	1. The default runner image or a first-party setup step provides Go, git, bash,
42	`golangci-lint`, and any build tools required by `make ci`.
43	2. The workflow can be expressed with `actions/checkout@v4` plus `run:` steps,
44	or shithub grows first-party equivalents for the missing setup/cache steps.
45	3. Required status checks on protected branches point at shithub check runs and
46	preserve the same merge gate strength as the current GitHub-hosted CI.
47	4. Production deploy remains gated separately from untrusted pull-request code.
48	Deploy secrets must not be available to fork/PR workflows.
49	5. The checkout canary and a production-like `make ci` workflow both pass on a
50	trusted runner host.
51	6. The Actions load harness completes without queue starvation, runner
52	deadlock, or log p99 above five seconds.
53	7. The pre-GA audit below has no open Critical or High findings.
54
55	## Evidence matrix
56
57	\| Area \| Evidence \| Notes \|
58	\|---\|---\|---\|
59	\| Workflow parser and dialect \| `internal/actions/workflow`, `docs/internal/actions-schema.md`, `docs/public/user/actions.md` \| Unknown YAML keys and unsupported `uses:` aliases are rejected at parse time. \|
60	\| Trigger idempotency \| `internal/actions/trigger`, `workflow_runs.trigger_event_id` \| Retries and admin replays do not duplicate runs for the same triggering event. \|
61	\| Secrets and variables \| `internal/actions/secrets`, `internal/actions/variables`, `workflow_job_secret_masks` \| Secrets are AEAD-encrypted at rest; claim-time mask snapshots preserve log masking after rotation. \|
62	\| Runner JWT replay gate \| `internal/auth/runnerjwt`, `runner_jwt_used`, `internal/web/handlers/api/runners.go` \| Job API JWTs are short-lived and single-use. Checkout JWTs are separate and scoped to read-only git fetch. \|
63	\| Checkout credential scope \| `internal/web/handlers/githttp` tests \| Checkout credentials permit `git-upload-pack` for the claimed repo only and reject pushes. \|
64	\| Sandbox controls \| `internal/runner/engine`, `deploy/runner-config`, `deploy/ansible/roles/shithubd-runner` \| Container defaults drop caps, run as uid 65534, use read-only rootfs, pids/memory/CPU limits, seccomp, and no-new-privileges. \|
65	\| Network egress controls \| `deploy/runner-config/dnsmasq.conf.j2`, `deploy/runner-config/firewall.sh.j2` \| Runner role closes direct-IP egress by combining an allowlist resolver with ipset firewall rules. \|
66	\| Log safety \| `internal/runner/scrub`, server-side runner log path, `shithub_actions_log_scrub_replacements_total` \| Runner scrubs before send; server re-scrubs claim-time mask values and handles chunk-boundary secrets. \|
67	\| Lifecycle controls \| `internal/actions/lifecycle`, repo Actions handlers, runner cancel-check \| Cancel, re-run, timeout, concurrency, and retention paths are covered by focused tests and runbooks. \|
68	\| Observability \| `deploy/monitoring/grafana/dashboards/actions.json`, `deploy/monitoring/prometheus/rules.yml` \| Dashboard and alerts cover stale runners, queue depth, p99 regression, and scrubber health. \|
69	\| Operator docs \| `docs/internal/runbooks/actions.md`, `docs/internal/runbooks/runner-deploy.md` \| A fresh operator has register, deploy, smoke, emergency cancel, load, and incident procedures. \|
70
71	## Pre-GA audit checklist
72
73	Run the local static packet first:
74
75	```sh
76	make audit-actions-ga
77	```
78
79	Then run focused tests:
80
81	```sh
82	go test -trimpath ./internal/actions/... ./internal/auth/runnerjwt ./internal/runner/... ./internal/web/handlers/api ./internal/web/handlers/repo ./internal/web/handlers/githttp
83	```
84
85	For a production-like runner host, manually verify:
86
87	1. Register a new runner with `self-hosted,linux,ubuntu-latest,x64`.
88	2. Trigger `.shithub/workflows/checkout-canary.yml` on trunk.
89	3. Confirm the run appears in the Actions tab and the check run completes.
90	4. Confirm step logs stream while the job is running and finalize to object
91	storage after completion.
92	5. Run the sandbox smoke from `docs/internal/runbooks/runner-deploy.md`.
93	6. Re-test the S41e network fix: direct-IP egress and workflow-supplied
94	resolvers must be blocked by the runner bridge firewall.
95	7. Echo a controlled test secret and confirm logs show `***`, not plaintext.
96	8. Reuse a consumed job JWT and confirm the API returns 401.
97	9. Run `bench/k6/actions-load.js` against a pre-seeded queue.
98	10. Verify the Grafana Actions dashboard is populated and Prometheus alert
99	expressions parse in the monitoring stack.
100
101	## Accepted S41h deferrals
102
103	These are not S41h GA blockers because they were explicitly parked from v1 or
104	depend on the post-GA Nix/tooling work:
105
106	- Full marketplace-action compatibility.
107	- `actions/setup-go`, `golangci-lint-action`, cache actions, Docker actions,
108	and composite actions.
109	- Matrix builds and reusable workflows.
110	- Hosted runner image provisioning by workflow authors. `runs-on` is a label
111	selector; operators own the backing image.
112	- Full project CI migration from `.github/workflows/ci.yml` to
113	`.shithub/workflows/ci.yml`.
114
115	## Next sprint hook
116
117	S41i should close the toolchain gap without weakening v1's security boundary:
118
119	- keep marketplace `uses:` rejected;
120	- provide a reproducible runner image or Nix engine path that can run `make ci`
121	from first-party `run:` steps;
122	- keep deploy secrets out of untrusted PR workflows.