tenseleyflow/shithub / 1538b65

Browse files

actions: document GA dogfood readiness

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
1538b650d3c408d519d6ec42a4a9cd6067396079
Parents
7b7e6dc
Tree
a1d148f

4 changed files

StatusFile+-
M README.md 3 2
A docs/internal/actions-ga-readiness.md 117 0
M docs/internal/index.md 2 1
M docs/internal/security-checklist.md 16 0
README.mdmodified
@@ -12,7 +12,7 @@
12
 
12
 
13
 **Status: v0.1.0 launched — early days. Honest about WIP areas.**
13
 **Status: v0.1.0 launched — early days. Honest about WIP areas.**
14
 
14
 
15
-shithub is an attempt to recreate GitHub — the platform, the UI, the workflows — as faithfully as we can, as a self-hostable open-source forge. The goal is "you should barely notice you switched." We are not there yet. The core forge loop works end-to-end (see "What works today"); large surfaces (SSH transport, Actions/CI, GraphQL, Packages) are explicitly not shipped at v0.1.0.
15
+shithub is an attempt to recreate GitHub — the platform, the UI, the workflows — as faithfully as we can, as a self-hostable open-source forge. The goal is "you should barely notice you switched." We are not there yet. The core forge loop works end-to-end (see "What works today"); large surfaces such as GraphQL, Packages, Pages, Projects, Releases, and Gists are still explicitly outside the current shipped surface.
16
 
16
 
17
 The hosted instance is at **[shithub.sh](https://shithub.sh)**. The project's own source has migrated here from GitHub; this GitHub repo is a one-way mirror for the first 90 days post-launch as a recovery surface.
17
 The hosted instance is at **[shithub.sh](https://shithub.sh)**. The project's own source has migrated here from GitHub; this GitHub repo is a one-way mirror for the first 90 days post-launch as a recovery surface.
18
 
18
 
@@ -33,13 +33,14 @@ The core forge loop works end-to-end against the codebase you're reading:
33
 - **Organizations & teams** — create, member roles (member/owner), invitations, one-level team nesting, team grants on repos with max-of-sources policy aggregation.
33
 - **Organizations & teams** — create, member roles (member/owner), invitations, one-level team nesting, team grants on repos with max-of-sources policy aggregation.
34
 - **Repo settings** — General (description, topics, features, merge methods), Access (collaborators + team grants), Branches (protection rules), Danger (rename/transfer/archive/visibility/delete).
34
 - **Repo settings** — General (description, topics, features, merge methods), Access (collaborators + team grants), Branches (protection rules), Danger (rename/transfer/archive/visibility/delete).
35
 - **Webhooks** — outbound delivery with HMAC-SHA256 signing, exponential backoff with jitter, auto-disable on persistent failure, SSRF defense (DNS resolve + IP block-list + dial-the-IP transport, no redirect-following), redelivery UI, ping events.
35
 - **Webhooks** — outbound delivery with HMAC-SHA256 signing, exponential backoff with jitter, auto-disable on persistent failure, SSRF defense (DNS resolve + IP block-list + dial-the-IP transport, no redirect-following), redelivery UI, ping events.
36
+- **Actions / CI v1** — `.shithub/workflows` parser, push/PR/schedule/dispatch triggers, per-repo/org secrets and variables, runner registration, single-use job JWTs, scoped checkout, containerized `run:` steps, live logs, cancel/re-run, retention, check-run sync, Atom feed, and monitoring. v1 intentionally supports `actions/checkout@v4` plus `run:` steps, not arbitrary marketplace actions.
36
 
37
 
37
 ## What doesn't work yet
38
 ## What doesn't work yet
38
 
39
 
39
 Pulled directly from the sprint plan we're working through:
40
 Pulled directly from the sprint plan we're working through:
40
 
41
 
41
 - **SSH git service** — HTTPS works; the SSH front-end is planned. Use HTTPS clone URLs for now.
42
 - **SSH git service** — HTTPS works; the SSH front-end is planned. Use HTTPS clone URLs for now.
42
-- **Actions / CI** — there is no CI runner. Status checks are wired into PR gates so a future runner can publish into them.
43
+- **Actions marketplace parity** — shithub Actions does not execute arbitrary `uses:` steps, matrix builds, service containers, composite actions, or hosted runner images. The project's full CI remains on GitHub Actions until the first-party runner image/Nix toolchain can run it without marketplace setup actions; `.shithub/workflows/checkout-canary.yml` is the current dogfood canary.
43
 - **Packages, Pages, Projects, Releases, Gists** — none of these surfaces exist yet.
44
 - **Packages, Pages, Projects, Releases, Gists** — none of these surfaces exist yet.
44
 - **GraphQL API** — only the internal HTTP surface exists. There is no public REST or GraphQL API.
45
 - **GraphQL API** — only the internal HTTP surface exists. There is no public REST or GraphQL API.
45
 - **Admin / site-admin surface** — there is no `/admin` UI. Operator tooling is via `shithubd` subcommands and SQL.
46
 - **Admin / site-admin surface** — there is no `/admin` UI. Operator tooling is via `shithubd` subcommands and SQL.
docs/internal/actions-ga-readiness.mdadded
@@ -0,0 +1,117 @@
1
+# Actions GA readiness and dogfood decision
2
+
3
+This is the S41h-5 pre-GA packet for shithub Actions. It records what is ready,
4
+what remains intentionally deferred, and why shithub's full project CI is not
5
+yet moved from GitHub Actions to shithub Actions.
6
+
7
+## Current decision
8
+
9
+Do not move `.github/workflows/ci.yml` to `.shithub/workflows/ci.yml` in S41h.
10
+
11
+The v1 runner is useful and dogfoodable, but it is not a drop-in GitHub Actions
12
+runner. The current project CI uses:
13
+
14
+- `actions/setup-go@v5`
15
+- `golangci/golangci-lint-action@v8`
16
+- GitHub-hosted runner caches and tool installation semantics
17
+
18
+shithub Actions v1 intentionally accepts only:
19
+
20
+- `actions/checkout@v4`
21
+- `shithub/upload-artifact@v1`
22
+- `shithub/download-artifact@v1`
23
+- ordinary `run:` steps
24
+
25
+The committed dogfood path is therefore
26
+`.shithub/workflows/checkout-canary.yml`. That canary proves the self-hosted
27
+trigger pipeline, runner claim, scoped checkout credential, containerized
28
+`run:` execution, log path, check-run sync, and Actions UI without pretending
29
+marketplace-action parity exists.
30
+
31
+## Promotion criteria for full CI
32
+
33
+Move the project's full CI to shithub Actions only after all criteria below are
34
+true:
35
+
36
+1. The default runner image or a first-party setup step provides Go, git, bash,
37
+   `golangci-lint`, and any build tools required by `make ci`.
38
+2. The workflow can be expressed with `actions/checkout@v4` plus `run:` steps,
39
+   or shithub grows first-party equivalents for the missing setup/cache steps.
40
+3. Required status checks on protected branches point at shithub check runs and
41
+   preserve the same merge gate strength as the current GitHub-hosted CI.
42
+4. Production deploy remains gated separately from untrusted pull-request code.
43
+   Deploy secrets must not be available to fork/PR workflows.
44
+5. The checkout canary and a production-like `make ci` workflow both pass on a
45
+   trusted runner host.
46
+6. The Actions load harness completes without queue starvation, runner
47
+   deadlock, or log p99 above five seconds.
48
+7. The pre-GA audit below has no open Critical or High findings.
49
+
50
+## Evidence matrix
51
+
52
+| Area | Evidence | Notes |
53
+|---|---|---|
54
+| Workflow parser and dialect | `internal/actions/workflow`, `docs/internal/actions-schema.md`, `docs/public/user/actions.md` | Unknown YAML keys and unsupported `uses:` aliases are rejected at parse time. |
55
+| Trigger idempotency | `internal/actions/trigger`, `workflow_runs.trigger_event_id` | Retries and admin replays do not duplicate runs for the same triggering event. |
56
+| Secrets and variables | `internal/actions/secrets`, `internal/actions/variables`, `workflow_job_secret_masks` | Secrets are AEAD-encrypted at rest; claim-time mask snapshots preserve log masking after rotation. |
57
+| Runner JWT replay gate | `internal/auth/runnerjwt`, `runner_jwt_used`, `internal/web/handlers/api/runners.go` | Job API JWTs are short-lived and single-use. Checkout JWTs are separate and scoped to read-only git fetch. |
58
+| Checkout credential scope | `internal/web/handlers/githttp` tests | Checkout credentials permit `git-upload-pack` for the claimed repo only and reject pushes. |
59
+| Sandbox controls | `internal/runner/engine`, `deploy/runner-config`, `deploy/ansible/roles/shithubd-runner` | Container defaults drop caps, run as uid 65534, use read-only rootfs, pids/memory/CPU limits, seccomp, and no-new-privileges. |
60
+| Network egress controls | `deploy/runner-config/dnsmasq.conf.j2`, `deploy/runner-config/firewall.sh.j2` | Runner role closes direct-IP egress by combining an allowlist resolver with ipset firewall rules. |
61
+| Log safety | `internal/runner/scrub`, server-side runner log path, `shithub_actions_log_scrub_replacements_total` | Runner scrubs before send; server re-scrubs claim-time mask values and handles chunk-boundary secrets. |
62
+| Lifecycle controls | `internal/actions/lifecycle`, repo Actions handlers, runner cancel-check | Cancel, re-run, timeout, concurrency, and retention paths are covered by focused tests and runbooks. |
63
+| Observability | `deploy/monitoring/grafana/dashboards/actions.json`, `deploy/monitoring/prometheus/rules.yml` | Dashboard and alerts cover stale runners, queue depth, p99 regression, and scrubber health. |
64
+| Operator docs | `docs/internal/runbooks/actions.md`, `docs/internal/runbooks/runner-deploy.md` | A fresh operator has register, deploy, smoke, emergency cancel, load, and incident procedures. |
65
+
66
+## Pre-GA audit checklist
67
+
68
+Run the local static packet first:
69
+
70
+```sh
71
+make audit-actions-ga
72
+```
73
+
74
+Then run focused tests:
75
+
76
+```sh
77
+go test -trimpath ./internal/actions/... ./internal/auth/runnerjwt ./internal/runner/... ./internal/web/handlers/api ./internal/web/handlers/repo ./internal/web/handlers/githttp
78
+```
79
+
80
+For a production-like runner host, manually verify:
81
+
82
+1. Register a new runner with `self-hosted,linux,ubuntu-latest`.
83
+2. Trigger `.shithub/workflows/checkout-canary.yml` on trunk.
84
+3. Confirm the run appears in the Actions tab and the check run completes.
85
+4. Confirm step logs stream while the job is running and finalize to object
86
+   storage after completion.
87
+5. Run the sandbox smoke from `docs/internal/runbooks/runner-deploy.md`.
88
+6. Re-test the S41e network fix: direct-IP egress and workflow-supplied
89
+   resolvers must be blocked by the runner bridge firewall.
90
+7. Echo a controlled test secret and confirm logs show `***`, not plaintext.
91
+8. Reuse a consumed job JWT and confirm the API returns 401.
92
+9. Run `bench/k6/actions-load.js` against a pre-seeded queue.
93
+10. Verify the Grafana Actions dashboard is populated and Prometheus alert
94
+    expressions parse in the monitoring stack.
95
+
96
+## Accepted S41h deferrals
97
+
98
+These are not S41h GA blockers because they were explicitly parked from v1 or
99
+depend on the post-GA Nix/tooling work:
100
+
101
+- Full marketplace-action compatibility.
102
+- `actions/setup-go`, `golangci-lint-action`, cache actions, Docker actions,
103
+  and composite actions.
104
+- Matrix builds and reusable workflows.
105
+- Hosted runner image provisioning by workflow authors. `runs-on` is a label
106
+  selector; operators own the backing image.
107
+- Full project CI migration from `.github/workflows/ci.yml` to
108
+  `.shithub/workflows/ci.yml`.
109
+
110
+## Next sprint hook
111
+
112
+S41i should close the toolchain gap without weakening v1's security boundary:
113
+
114
+- keep marketplace `uses:` rejected;
115
+- provide a reproducible runner image or Nix engine path that can run `make ci`
116
+  from first-party `run:` steps;
117
+- keep deploy secrets out of untrusted PR workflows.
docs/internal/index.mdmodified
@@ -56,7 +56,8 @@ site.
56
 - [branch-protection.md](./branch-protection.md),
56
 - [branch-protection.md](./branch-protection.md),
57
   [checks.md](./checks.md)
57
   [checks.md](./checks.md)
58
 - [actions-schema.md](./actions-schema.md),
58
 - [actions-schema.md](./actions-schema.md),
59
-  [actions-runner-api.md](./actions-runner-api.md)
59
+  [actions-runner-api.md](./actions-runner-api.md),
60
+  [actions-ga-readiness.md](./actions-ga-readiness.md)
60
 - [orgs.md](./orgs.md), [teams.md](./teams.md)
61
 - [orgs.md](./orgs.md), [teams.md](./teams.md)
61
 - [billing.md](./billing.md) — paid org product contract,
62
 - [billing.md](./billing.md) — paid org product contract,
62
   entitlements, and Stripe integration guardrails.
63
   entitlements, and Stripe integration guardrails.
docs/internal/security-checklist.mdmodified
@@ -93,6 +93,22 @@ document.
93
 | AEAD key rotation procedure documented | `docs/internal/2fa.md` | S06 |
93
 | AEAD key rotation procedure documented | `docs/internal/2fa.md` | S06 |
94
 | Webhook secret-decryption failure auto-disables hook | `webhook.Deliver` + `AutoDisableWebhook` | S33 |
94
 | Webhook secret-decryption failure auto-disables hook | `webhook.Deliver` + `AutoDisableWebhook` | S33 |
95
 
95
 
96
+## Actions / CI runner
97
+
98
+| Control | Enforced by | Sprint |
99
+|---|---|---|
100
+| Workflow parser rejects unsupported `uses:` aliases and unknown keys | `internal/actions/workflow` + parser tests | S41a |
101
+| Event-derived expressions are tainted and never spliced directly into shell text | `internal/actions/expr` + runner env-binding tests | S41a/S41d |
102
+| Actions secrets encrypted at rest | `internal/actions/secrets` + secretbox round-trip tests | S41c |
103
+| Claim-time secret mask snapshots survive later secret rotation/deletion | `workflow_job_secret_masks` + runner API log tests | S41e |
104
+| Runner job API JWTs are short-lived and single-use | `runner_jwt_used`, `runnerjwt`, runner API replay tests | S41c |
105
+| Checkout JWTs are separate, repo-scoped, and read-only | git HTTP checkout-token tests | S41h |
106
+| Runner step containers drop privileges and capabilities | Docker engine argv tests + runner deploy runbook | S41d/S41e |
107
+| Runner bridge blocks direct-IP egress and workflow-supplied DNS bypasses | `deploy/runner-config/{dnsmasq,firewall}.j2` + deploy runbook smoke | S41e |
108
+| Logs are scrubbed runner-side and server-side | `internal/runner/scrub`, server log path tests, scrub metrics | S41e |
109
+| Actions observability alerts cover stale runners, queue depth, p99 regression, and scrubber health | Prometheus rules + `make audit-actions-ga` | S41h |
110
+| Full project CI dogfood remains canary-only until marketplace/toolchain gaps close | `docs/internal/actions-ga-readiness.md` + `make audit-actions-ga` | S41h |
111
+
96
 ## Operator controls
112
 ## Operator controls
97
 
113
 
98
 | Control | Enforced by | Sprint |
114
 | Control | Enforced by | Sprint |