markdown · 26395 bytes Raw Blame History

Actions/CI — schema + workflow dialect (S41a)

The Actions/CI subsystem is shipping in eight sub-sprints (S41a through S41h, plus optional S41i Nix engine). This doc covers what S41a lays down: the SQL schema, the workflow YAML dialect, the expression evaluator, and the load-bearing taint contract every later sub-sprint depends on.

S41a is parser + schema only — no triggers, no runner, no UI. The goal is to land a frozen contract that S41b/c/d/e can build against without churning under them.

SQL schema

Actions migrations currently span 0042–0051 and 0053. Migration 0052 belongs to the repo source-remotes feature and was already deployed before the runner JWT replay table landed.

# Table Purpose
0042 workflow_runs One row per triggered workflow execution
0043 workflow_jobs Jobs within a run (one row per jobs.<key>)
0044 workflow_steps Steps within a job (one row per steps[i])
0045 workflow_secrets Per-repo + per-org encrypted secrets
0046 workflow_runners Registered runners + runner_tokens
0047 workflow_step_log_chunks Hot-path append log buffer (concatenated to blob on finalize)
0048 workflow_artifacts Per-run artifact metadata (90-day default expiry)
0049 actions_variables Non-secret per-repo/org config (Forgejo parity)
0050 workflow_steps.step_with Parsed with: inputs for magic uses: aliases
0051 workflow_runs.trigger_event_id Trigger idempotency for retries/admin replays
0053 runner_jwt_used Single-use replay gate for runner job JWTs

A few load-bearing choices, called out so they're easy to spot in a later schema diff:

  • workflow_runs.run_index — per-repo monotonic counter. Each repo gets #1, #2, … so URLs like /{owner}/{repo}/actions/runs/42 are stable and human-friendly. Crib from Forgejo's actions_run.index.
  • workflow_runs.version — optimistic-lock counter. Mutators bump-and-check rather than SELECT … FOR UPDATE. Required for S41g's race between a cancel request and a state transition.
  • workflow_runs.concurrency_group — the concurrency-slot key, resolved at trigger time from the workflow's concurrency.group: expression. S41g's slot manager keys off this column.
  • workflow_runs.parent_run_id — for re-runs. The new run references the original; the UI shows a "re-ran from #N" link.
  • workflow_jobs.runner_id — FK added in 0046 (after the runners table exists). Nullable until claimed.
  • workflow_steps has a CHECK constraint enforcing (run_command IS NOT NULL) <> (uses_alias IS NOT NULL) — exactly one of run: or uses:. The uses_alias column is further CHECK-constrained to the three magic aliases we accept in v1.
  • workflow_secrets owns its value as bytea ChaCha20Poly1305- sealed via internal/auth/secretbox. Key derivation uses cfg.Auth.TOTPKeyB64 (already an operator-managed root) + (owner, kind, name) salt so re-keying is per-row.
  • workflow_step_log_chunks.chunk is capped at 512 KB per row. The runner sends bigger payloads in pieces. (step_id, seq) is UNIQUE so duplicate sends are idempotent.
  • actions_variables — non-secret, plaintext, scoped exactly like secrets (per-repo or per-org, never both on the same row). Forgejo has the same split; we mirror it for parity.
  • runner_jwt_used — primary-keyed by JWT jti. Job endpoints insert into this table during auth; zero inserted rows means replay and the API returns 401. JWTs are HMAC-SHA256 and use an HKDF subkey derived from auth.totp_key_b64 with label actions-runner-jwt-v1.

The version and run_index patterns are the two pieces I'd point out to a future maintainer first. Both are cheap to add now and miserable to retrofit later.

Workflow YAML dialect (v1)

We accept a strict subset of GitHub Actions YAML. The parser rejects unknown keys at parse time so workflow authors find their typos immediately instead of shipping a workflow that does nothing.

Top level

name: my-pipeline                         # optional human name
on: [push, pull_request]                  # or full-form (see below)
permissions: read-all                     # default if omitted
env: { GREETING: "hello" }                # workflow-level env
concurrency:                              # optional slot manager
  group: ${{ shithub.ref }}
  cancel-in-progress: true
jobs:
  <key>:                                  # 1+ entries
    runs-on: ubuntu-latest
    needs: [other-key]                    # optional dep edge
    if: ${{ shithub.actor == 'alice' }}   # optional gate
    timeout-minutes: 60                   # 1..4320, default 360
    permissions: { contents: read }       # narrow workflow perms
    env: { K: v }                         # job overlay
    steps:
      - name: ...
        id: ...
        if: ...
        run: echo hi                      # run XOR uses
        uses: actions/checkout@v4         # exactly one of three aliases
        working-directory: ...
        env: { ... }
        continue-on-error: false

Triggers (on:)

v1 supports four triggers — anything else is a parse error.

Trigger Surface
push branches:, tags:, paths: (include + !exclude semantics)
pull_request types: (opened/synchronize/reopened/...), branches:, paths:
schedule one or more - cron: <5-field-expr>
workflow_dispatch inputs: map (string/boolean/choice/environment)

uses: allowlist

Exactly three aliases, no exceptions:

Alias What it does
actions/checkout@v4 Clones the repo into the workspace
shithub/upload-artifact@v1 Uploads files to workflow_artifacts
shithub/download-artifact@v1 Pulls artifacts back in a downstream job

Any other uses: value (community actions, Docker images, composite actions) is an Error-severity diagnostic. The marketplace problem is explicitly out of scope for v1; revisit only if a real demand exists and we have an answer for supply-chain trust.

File-size + parser caps

  • 64 KB workflow file size cap (workflow.MaxWorkflowFileBytes). Files larger than this are rejected before YAML decode begins — defends against pathological inputs and gives operators a predictable upper bound on parser memory.
  • 100 anchors per document (workflow.MaxYAMLAliases) — the billion-laughs guard. yaml.v3 doesn't expose a direct knob; we count alias nodes during a tree walk and bail.

${{ github.* }} alias

The dialect is intentionally rebranded to ${{ shithub.* }}. Authors who paste GHA workflows in unmodified will see their ${{ github.* }} references continue to work because the evaluator rewrites path[0] from github to shithub at the top of evalRef before taint computation, dispatch, and error rendering.

The alias is intentionally scope-narrow: only fields that exist in our shithub.* namespace (run_id, sha, ref, actor, event) route through. GHA fields we don't expose in v1 — event_name, repository, run_number, workspace, etc. — error with the canonical unknown shithub field "X" message. Slightly confusing for a GHA-flavored author but keeps the v1 namespace surface tight.

The alias preserves the load-bearing taint flag: github.event.X taints exactly like shithub.event.X. TestEval_GithubAliasIsTainted pins this contract.

Migration to strict-compat (drop the alias entirely) later is a one-PR flip; moving the other direction is much harder.

This is a deliberate decision recorded in the campaign plan.

Expression evaluator

${{ … }} expressions are parsed into a tiny AST and evaluated by internal/actions/expr. The surface is intentionally minimal:

Allowed namespaces

Namespace Source Tainted?
secrets.X workflow_secrets no (operator-controlled)
vars.X actions_variables no (operator-controlled)
env.X workflow file no (workflow author's text)
shithub.run_id dispatch context no
shithub.sha dispatch context no
shithub.ref dispatch context no
shithub.actor dispatch context no (resolved username)
shithub.event.* trigger payload yes — always

runner.*, steps.*, needs.*, matrix.*, inputs.* are all parse-time errors. They're parked for v2 and the parser's allowlist-closed posture means a future PR can't widen this accidentally without a clearly visible diff.

Allowed functions

contains(haystack, needle), startsWith(s, prefix), endsWith(s, suffix), plus the four job-status predicates success(), failure(), cancelled(), always(). That's the whole list. fromJSON, hashFiles, toJSON, format, and friends are explicitly rejected — they each carry footgun risk (parser DoS, FS access, side-channel injection) that we don't want to take on in v1.

Missing-value semantics

Reference Missing → ?
secrets.NOT_BOUND error (loud — workflow won't run)
vars.MISSING empty string (GHA parity)
env.MISSING empty string (GHA parity)
shithub.event.deeply.missing null but still tainted

The "missing event path → null but tainted" case is a defence-in- depth choice: even if the path doesn't resolve, the result still came from the event payload, and we'd rather over-flag than under.

Taint contract — the load-bearing piece

This is the contract every later sub-sprint hangs off. Get it wrong and we have an injection-shaped hole in the runner.

Where the flag lives

The taint flag lives on expr.Value (the evaluator-produced value), not workflow.Value (the parser-produced value). Two different structs share the name Value because they live in different packages, but they have different jobs:

  • workflow.Value carries the raw source string the parser read out of the YAML (an env entry, a with: input, a concurrency group expression). At parse time we don't know what the ${{ … }} body will resolve to, so there's nothing to taint yet.
  • expr.Value is what the evaluator returns when it resolves a reference at runtime. This struct carries Tainted bool. The runner's exec layer (S41d) consumes that flag.

Pre-L5 the parser-side struct also had a Tainted bool field plus a Tainted() constructor — both unused, both confusing because they suggested two sources of truth. Dropped in S41a-L5 cleanup.

Propagation

Every expr.Value carries a Tainted bool. Set true iff the value transitively depends on shithub.event.*. Operators control secrets, vars, env, the rest of shithub.*. Authors control the workflow file. Only the event payload is attacker-controlled: a PR title, a commit message, a branch name from a fork. Those values must never be interpolated into a shell string.

Propagation rules:

  • Reading shithub.event.XTainted: true (always, including missing-path null results).
  • Reading any other namespace → Tainted: false, except env.X preserves the taint of the resolved env value. This closes the escape where an event-derived value is first assigned to env and then interpolated through ${{ env.X }}.
  • Binary op (==, !=, &&, ||) → tainted if either operand is.
  • Unary op (!) → tainted iff its operand is.
  • Function call (contains, startsWith, endsWith) → tainted if any argument is.

The runner consumes Tainted and refuses to interpolate tainted values into shell strings. Instead, tainted values are bound to runner-owned SHITHUB_INPUT_xx envvars and the shell source only references those placeholders. The author writes:

- run: echo "PR title was: ${{ shithub.event.pull_request.title }}"

The runner sees a tainted reference; it compiles the step to:

SHITHUB_INPUT_0="$user_pr_title" exec sh -c 'echo "PR title was: $SHITHUB_INPUT_0"'

…where $user_pr_title is set via Go's cmd.Env, never inserted into the shell source string. Backticks, $(), ;, && — none of those work as command-injection vectors when the value reaches the shell as environment data instead of syntax.

The shared renderer lives in internal/runner/exec, so future engines consume the same injection boundary instead of reimplementing it. The runner claim payload includes workflow_runs.event_payload; without that field, the runner cannot evaluate and taint ${{ shithub.event.* }} references.

Tests for this contract live in internal/actions/expr/eval_test.go, internal/runner/exec/render_test.go, and internal/runner/engine/docker_test.go. Do not weaken them in a later PR without an audit-checkpoint review — they're explicitly load-bearing for S41e's threat model.

Runner log chunks pass through internal/runner/scrub before they are posted to the API. It masks exact secret values and preserves enough tail bytes between chunks to catch a secret split across chunk boundaries. S41e wires resolved workflow secrets into the runner claim payload and mask set, then applies the same exact-value scrub again in the runner API before persisting chunks. The server path also carries a possible secret-prefix tail from the prior persisted chunk, so a runner that bypasses client-side scrubbing cannot leak a secret by splitting it across adjacent log POSTs.

shithub.event payload schema (v1)

The event payload is the most user-facing part of the contract: once authors write workflows that template against shithub.event.X, schema changes are breaking. The v1 schema is pinned and labelled v1. Any addition is fine; renames and removals require a major bump.

The schema is enforced by typed constructors in the internal/actions/event package — one per trigger. S41b's pipeline calls these to build payloads; the function signatures pin the field set so adding a key requires editing the constructor in a visible diff. This is the same closed-door discipline as the expression evaluator's namespace allowlist.

Trigger Constructor Top-level keys
push event.Push ref, before, after, head_commit{message,id,author}
pull_request event.PullRequest action, number, pull_request{title,head{ref,sha},base{ref,sha},user{login}}
schedule event.Schedule (empty map — cron fired; cron expression is on the workflow_runs row)
workflow_dispatch event.WorkflowDispatch inputs{<name>: <stringified>}

Anything not in this table doesn't exist in v1. Accessing it returns null+tainted (the missing-path semantics above).

Adding a field: edit the constructor in internal/actions/event/, add a row to this doc, and update the corresponding *_FlowsThroughEvaluator test in event_test.go so the new path is exercised end-to-end. Reviewer-required note in the commit message — same standard as a new evaluator function.

Renaming or removing: that's a v1→v2 break. Don't.

Operator surface

shithubd admin actions parse <file> reads a workflow off disk, runs the parser, and dumps diagnostics + a canonical JSON rendering of the parsed AST. Useful for:

  • debugging "why is my workflow not picking up changes" reports
  • validating a workflow file before committing it
  • producing a stable AST snapshot for inclusion in bug reports

Exit codes:

Code Meaning
0 clean parse, no Error-severity diagnostics
1 file unreadable, oversized, or YAML malformed
2 parse produced Error-severity diagnostics

Other admin surfaces are scoped to later sub-sprints:

  • S41c: shithubd admin runner register --name <foo> issues a registration token + writes a row to workflow_runners.
  • S41g: shithubd admin actions cancel <run-id> flips cancel_requested.

Trigger pipeline (S41b)

Three layers between a triggering event and a queued workflow_run:

caller (push_process / pulls.Create / pr_jobs.PRSynchronize / dispatch HTTP)
    │
    └─► worker.Enqueue(KindWorkflowTrigger, JobPayload)
            │
            └─► trigger.Handler picks up:
                  Discover .shithub/workflows/*.yml at HEAD SHA
                  Parse each (skip + log on Error diagnostics)
                  Match each against trigger.Event
                  Enqueue each match
                        │
                        └─► trigger.Enqueue (one tx):
                              INSERT workflow_runs (ON CONFLICT DO NOTHING)
                              INSERT workflow_jobs per parsed job
                              INSERT workflow_steps per parsed step
                              (commit)
                              checks.Create per job (post-tx, idempotent
                                via ExternalID 'workflow_run:<id>:job:<key>')

Idempotency on the triggering event

The robust pattern, not a UNIQUE on (repo_id, head_sha). Each caller constructs a stable trigger_event_id from its triggering event's identity:

Caller trigger_event_id format
push_process push:<push_event_id>
pulls.Create pr_opened:<pr_id>:<head_sha>
pr_jobs.PRSynchronize pr_synchronize:<pr_id>:<head_sha>
dispatch HTTP dispatch:<file>:<sha>:<8-byte-random-hex>
schedule sweep (S41b-2) schedule:<workflow_id>:<window_start_unix>

Migration 0051 adds workflow_runs.trigger_event_id (text NOT NULL DEFAULT '') with a partial UNIQUE on (repo_id, workflow_file, trigger_event_id) WHERE trigger_event_id <> ''. The trigger handler does INSERT … ON CONFLICT DO NOTHING so:

  • Worker retries (the same push_process replay) → no duplicate runs.
  • Admin replays via shithubd admin run-job workflow:trigger ... → no duplicate runs.
  • Re-runs (the future "Re-run" button) explicitly construct a NEW trigger_event_id (rerun:<original_run_id>:<request_uuid>) and chain back via parent_run_id. History is preserved, no collision.

Each caller's collision-free namespace is short-lived and human-debuggable: a Postgres operator can grep workflow_runs.trigger_event_id to see exactly which triggering event produced a given run.

Filter evaluation

trigger.Match(workflow, event) is a pure function (no I/O, no DB). For each event kind:

  • push: branch vs tag classified from the ref; only the matching filter list applies (a branches: filter rejects tag pushes and vice versa). paths: (when set) requires at least one changed path to match. Empty filter = match-all.
  • pull_request: types: defaults to [opened, synchronize, reopened] when omitted (GHA parity). branches: applies to the base ref. paths: as for push.
  • schedule: requires the workflow to declare the cron expression that fired. The sweep is the source of truth for which cron fires; we just gate on declaration. Avoids interpreting cron semantics in two places.
  • workflow_dispatch: matches whenever the workflow declares on.workflow_dispatch.

Glob semantics in branches:/tags:/paths:: minimatch subset with * (single segment), ** (any), /** end-anchor (optional trailing path), **/ start-anchor, and !exclude (last-match-wins, exclusion-only list implies include-all).

Collaborator gate

Per the S41b spec's "external-PR support is parked" decision: PR triggers (both opened and synchronize) only fire when the PR's author is the repo's owning user. Conservative — drops legitimate non-owner collaborators in the org-repo case. Expanding the gate requires plumbing policy.Can into the worker context, which we defer to S41g where the lifecycle work touches that surface anyway.

Operator surface

  • POST /{owner}/{repo}/actions/workflows/{file}/dispatches Body: {"ref": "...", "inputs": {"key": "value"}} (both optional; ref defaults to the repo's default branch). Returns 204 No Content on success. Synchronous trigger.Enqueue (no discovery — file is named in the URL). Auth: requires repo write.

What S41b deliberately doesn't do

  • Run jobs. S41c adds runner claim/status APIs; S41d adds the actual shithubd-runner execution binary.
  • Schedule sweep. Cron-driven triggers split into S41b-2 to keep this PR reviewable; the trigger pipeline accepts schedule events, but no caller produces them yet. S41b-2 adds the sweep + the robfig/cron/v3 dep + shithubd-cron.service wiring.
  • External-PR triggers. Conservative collaborator gate above.
  • workflow_run webhook events. S41h adds the webhook event family
    • atom feed.

Secrets + variables settings surface (S41c)

S41c wires the previously schema-only workflow_secrets and actions_variables tables into repo/org settings.

Repository routes are gated through policy.ActionRepoSettingsActions (repo:settings:actions, admin role minimum):

  • GET /{owner}/{repo}/settings/secrets/actions
  • POST /{owner}/{repo}/settings/secrets/actions
  • POST /{owner}/{repo}/settings/secrets/actions/{name}/delete
  • GET /{owner}/{repo}/settings/variables/actions
  • POST /{owner}/{repo}/settings/variables/actions
  • POST /{owner}/{repo}/settings/variables/actions/{name}/delete

Organization routes follow the existing org-settings prefix and are owner-only:

  • GET /organizations/{org}/settings/secrets/actions
  • POST /organizations/{org}/settings/secrets/actions
  • POST /organizations/{org}/settings/secrets/actions/{name}/delete
  • GET /organizations/{org}/settings/variables/actions
  • POST /organizations/{org}/settings/variables/actions
  • POST /organizations/{org}/settings/variables/actions/{name}/delete

Secrets are sealed through internal/auth/secretbox using the operator-managed Auth.TOTPKeyB64 root key. Secret list pages render names/metadata only; the plaintext value is accepted once on create or rotation and never rendered back. Variables are non-secret plaintext configuration, so settings pages render their values. Both stores use the same name grammar as the database constraints: ^[A-Za-z_][A-Za-z0-9_]*$, 1-100 characters. Variables additionally enforce the 4096-character value cap in Go before hitting the DB constraint.

What S41a deliberately doesn't do

  • No trigger pipeline. domain_events aren't matched against on: yet — that's S41b.
  • No runner. S41c/S41d add runner claim APIs and the execution binary.
  • No UI. The Actions tab still renders the placeholder — S41f.
  • No secret encryption helpers wired to anything writable — S41c.
  • No JWT issuance, no runner registration flow — S41c.
  • No log streaming, no SSE — S41d/f.
  • No execution sandbox, no scrubbing, no injection guards enforced at the runner — S41d/e (the parser-side taint contract is the foundation those depend on, not a substitute).

Why these choices, in two paragraphs

The schema work is front-loaded so later sub-sprints don't ripple a migration through every PR. version (optimistic locking) and run_index (per-repo monotonic) are the two columns I'd flag to a new maintainer immediately — both are nearly free to add up front and painful to retrofit. The split between hot-path log chunks (Postgres) and finalized blob (Spaces) is shaped after Forgejo's log path; we pick the boring well-trodden answer over the clever one because log throughput is the failure mode that bites first.

The taint contract is the security-load-bearing piece. Every later sub-sprint trusts that the Tainted flag is set correctly here, in the parser/evaluator, and never re-derived downstream. The narrow allowlist of namespaces and functions exists exactly so a future PR that adds, say, fromJSON has to do it knowingly — by widening the allowlist in a visible diff, with a reviewer-required note, rather than by accident. The ${{ github.* }} alias is a pragmatic concession to copy-paste users; the rebrand to ${{ shithub.* }} is the canonical form so future divergence isn't awkward.

See also

  • internal/actions/workflow/parse.go — the parser
  • internal/actions/expr/eval.go — the evaluator
  • internal/migrationsfs/migrations/0042..0049_*.sql — the schema
  • tests/fixtures/workflows/*.yml — canonical input shapes
  • internal/actions/workflow/parse_test.go — fixture-driven tests
  • internal/actions/expr/eval_test.go — taint-contract tests
  • .refs/forgejo/services/actions/ — reference architecture
  • Campaign plan in conversation memory (humble-cooking-bunny)
View source
1 # Actions/CI — schema + workflow dialect (S41a)
2
3 The Actions/CI subsystem is shipping in eight sub-sprints (S41a through
4 S41h, plus optional S41i Nix engine). This doc covers what S41a lays
5 down: the SQL schema, the workflow YAML dialect, the expression
6 evaluator, and the load-bearing taint contract every later sub-sprint
7 depends on.
8
9 S41a is parser + schema only — no triggers, no runner, no UI. The
10 goal is to land a frozen contract that S41b/c/d/e can build against
11 without churning under them.
12
13 ## SQL schema
14
15 Actions migrations currently span 0042–0051 and 0053. Migration 0052 belongs to
16 the repo source-remotes feature and was already deployed before the runner JWT
17 replay table landed.
18
19 | # | Table | Purpose |
20 | ----- | --------------------------- | ------------------------------------------------------------- |
21 | 0042 | `workflow_runs` | One row per triggered workflow execution |
22 | 0043 | `workflow_jobs` | Jobs within a run (one row per `jobs.<key>`) |
23 | 0044 | `workflow_steps` | Steps within a job (one row per `steps[i]`) |
24 | 0045 | `workflow_secrets` | Per-repo + per-org encrypted secrets |
25 | 0046 | `workflow_runners` | Registered runners + `runner_tokens` |
26 | 0047 | `workflow_step_log_chunks` | Hot-path append log buffer (concatenated to blob on finalize) |
27 | 0048 | `workflow_artifacts` | Per-run artifact metadata (90-day default expiry) |
28 | 0049 | `actions_variables` | Non-secret per-repo/org config (Forgejo parity) |
29 | 0050 | `workflow_steps.step_with` | Parsed `with:` inputs for magic `uses:` aliases |
30 | 0051 | `workflow_runs.trigger_event_id` | Trigger idempotency for retries/admin replays |
31 | 0053 | `runner_jwt_used` | Single-use replay gate for runner job JWTs |
32
33 A few load-bearing choices, called out so they're easy to spot in a
34 later schema diff:
35
36 - **`workflow_runs.run_index`** — per-repo monotonic counter. Each
37 repo gets `#1`, `#2`, … so URLs like
38 `/{owner}/{repo}/actions/runs/42` are stable and human-friendly.
39 Crib from Forgejo's `actions_run.index`.
40 - **`workflow_runs.version`** — optimistic-lock counter. Mutators
41 bump-and-check rather than `SELECT … FOR UPDATE`. Required for
42 S41g's race between a cancel request and a state transition.
43 - **`workflow_runs.concurrency_group`** — the concurrency-slot key,
44 resolved at trigger time from the workflow's `concurrency.group:`
45 expression. S41g's slot manager keys off this column.
46 - **`workflow_runs.parent_run_id`** — for re-runs. The new run
47 references the original; the UI shows a "re-ran from #N" link.
48 - **`workflow_jobs.runner_id`** — FK added in 0046 (after the
49 runners table exists). Nullable until claimed.
50 - **`workflow_steps`** has a CHECK constraint enforcing
51 `(run_command IS NOT NULL) <> (uses_alias IS NOT NULL)` — exactly
52 one of `run:` or `uses:`. The `uses_alias` column is further
53 CHECK-constrained to the three magic aliases we accept in v1.
54 - **`workflow_secrets`** owns its value as `bytea` ChaCha20Poly1305-
55 sealed via `internal/auth/secretbox`. Key derivation uses
56 `cfg.Auth.TOTPKeyB64` (already an operator-managed root) +
57 `(owner, kind, name)` salt so re-keying is per-row.
58 - **`workflow_step_log_chunks.chunk`** is capped at 512 KB per row.
59 The runner sends bigger payloads in pieces. `(step_id, seq)` is
60 UNIQUE so duplicate sends are idempotent.
61 - **`actions_variables`** — non-secret, plaintext, scoped exactly
62 like secrets (per-repo or per-org, never both on the same row).
63 Forgejo has the same split; we mirror it for parity.
64 - **`runner_jwt_used`** — primary-keyed by JWT `jti`. Job endpoints
65 insert into this table during auth; zero inserted rows means replay
66 and the API returns 401. JWTs are HMAC-SHA256 and use an HKDF
67 subkey derived from `auth.totp_key_b64` with label
68 `actions-runner-jwt-v1`.
69
70 The `version` and `run_index` patterns are the two pieces I'd point
71 out to a future maintainer first. Both are cheap to add now and
72 miserable to retrofit later.
73
74 ## Workflow YAML dialect (v1)
75
76 We accept a strict subset of GitHub Actions YAML. The parser rejects
77 unknown keys at parse time so workflow authors find their typos
78 immediately instead of shipping a workflow that does nothing.
79
80 ### Top level
81
82 ```yaml
83 name: my-pipeline # optional human name
84 on: [push, pull_request] # or full-form (see below)
85 permissions: read-all # default if omitted
86 env: { GREETING: "hello" } # workflow-level env
87 concurrency: # optional slot manager
88 group: ${{ shithub.ref }}
89 cancel-in-progress: true
90 jobs:
91 <key>: # 1+ entries
92 runs-on: ubuntu-latest
93 needs: [other-key] # optional dep edge
94 if: ${{ shithub.actor == 'alice' }} # optional gate
95 timeout-minutes: 60 # 1..4320, default 360
96 permissions: { contents: read } # narrow workflow perms
97 env: { K: v } # job overlay
98 steps:
99 - name: ...
100 id: ...
101 if: ...
102 run: echo hi # run XOR uses
103 uses: actions/checkout@v4 # exactly one of three aliases
104 working-directory: ...
105 env: { ... }
106 continue-on-error: false
107 ```
108
109 ### Triggers (`on:`)
110
111 v1 supports four triggers — anything else is a parse error.
112
113 | Trigger | Surface |
114 | ------------------- | ---------------------------------------------------------------- |
115 | `push` | `branches:`, `tags:`, `paths:` (include + `!exclude` semantics) |
116 | `pull_request` | `types:` (opened/synchronize/reopened/...), `branches:`, `paths:` |
117 | `schedule` | one or more `- cron: <5-field-expr>` |
118 | `workflow_dispatch` | `inputs:` map (string/boolean/choice/environment) |
119
120 ### `uses:` allowlist
121
122 Exactly three aliases, no exceptions:
123
124 | Alias | What it does |
125 | -------------------------------- | ----------------------------------------- |
126 | `actions/checkout@v4` | Clones the repo into the workspace |
127 | `shithub/upload-artifact@v1` | Uploads files to `workflow_artifacts` |
128 | `shithub/download-artifact@v1` | Pulls artifacts back in a downstream job |
129
130 Any other `uses:` value (community actions, Docker images, composite
131 actions) is an Error-severity diagnostic. The marketplace problem is
132 explicitly out of scope for v1; revisit only if a real demand exists
133 and we have an answer for supply-chain trust.
134
135 ### File-size + parser caps
136
137 - **64 KB** workflow file size cap (`workflow.MaxWorkflowFileBytes`).
138 Files larger than this are rejected before YAML decode begins —
139 defends against pathological inputs and gives operators a
140 predictable upper bound on parser memory.
141 - **100 anchors** per document (`workflow.MaxYAMLAliases`) — the
142 billion-laughs guard. yaml.v3 doesn't expose a direct knob; we
143 count alias nodes during a tree walk and bail.
144
145 ### `${{ github.* }}` alias
146
147 The dialect is intentionally rebranded to `${{ shithub.* }}`.
148 Authors who paste GHA workflows in unmodified will see their
149 `${{ github.* }}` references continue to work because the evaluator
150 rewrites `path[0]` from `github` to `shithub` at the top of `evalRef`
151 before taint computation, dispatch, and error rendering.
152
153 The alias is intentionally **scope-narrow**: only fields that exist
154 in our `shithub.*` namespace (`run_id`, `sha`, `ref`, `actor`,
155 `event`) route through. GHA fields we don't expose in v1 —
156 `event_name`, `repository`, `run_number`, `workspace`, etc. — error
157 with the canonical `unknown shithub field "X"` message. Slightly
158 confusing for a GHA-flavored author but keeps the v1 namespace
159 surface tight.
160
161 The alias preserves the load-bearing taint flag: `github.event.X`
162 taints exactly like `shithub.event.X`. `TestEval_GithubAliasIsTainted`
163 pins this contract.
164
165 Migration to strict-compat (drop the alias entirely) later is a
166 one-PR flip; moving the other direction is much harder.
167
168 This is a deliberate decision recorded in the campaign plan.
169
170 ## Expression evaluator
171
172 `${{ … }}` expressions are parsed into a tiny AST and evaluated by
173 `internal/actions/expr`. The surface is intentionally minimal:
174
175 ### Allowed namespaces
176
177 | Namespace | Source | Tainted? |
178 | ---------------- | ----------------- | --------------------------- |
179 | `secrets.X` | workflow_secrets | no (operator-controlled) |
180 | `vars.X` | actions_variables | no (operator-controlled) |
181 | `env.X` | workflow file | no (workflow author's text) |
182 | `shithub.run_id` | dispatch context | no |
183 | `shithub.sha` | dispatch context | no |
184 | `shithub.ref` | dispatch context | no |
185 | `shithub.actor` | dispatch context | no (resolved username) |
186 | `shithub.event.*`| trigger payload | **yes — always** |
187
188 `runner.*`, `steps.*`, `needs.*`, `matrix.*`, `inputs.*` are all
189 parse-time errors. They're parked for v2 and the parser's
190 allowlist-closed posture means a future PR can't widen this
191 accidentally without a clearly visible diff.
192
193 ### Allowed functions
194
195 `contains(haystack, needle)`, `startsWith(s, prefix)`,
196 `endsWith(s, suffix)`, plus the four job-status predicates
197 `success()`, `failure()`, `cancelled()`, `always()`. That's the
198 whole list. `fromJSON`, `hashFiles`, `toJSON`, `format`, and
199 friends are explicitly rejected — they each carry footgun risk
200 (parser DoS, FS access, side-channel injection) that we don't want
201 to take on in v1.
202
203 ### Missing-value semantics
204
205 | Reference | Missing → ? |
206 | -------------------------------- | ------------------------------------ |
207 | `secrets.NOT_BOUND` | error (loud — workflow won't run) |
208 | `vars.MISSING` | empty string (GHA parity) |
209 | `env.MISSING` | empty string (GHA parity) |
210 | `shithub.event.deeply.missing` | null **but still tainted** |
211
212 The "missing event path → null but tainted" case is a defence-in-
213 depth choice: even if the path doesn't resolve, the result still
214 came from the event payload, and we'd rather over-flag than under.
215
216 ## Taint contract — the load-bearing piece
217
218 This is the contract every later sub-sprint hangs off. Get it wrong
219 and we have an injection-shaped hole in the runner.
220
221 ### Where the flag lives
222
223 The taint flag lives on `expr.Value` (the evaluator-produced value),
224 not `workflow.Value` (the parser-produced value). Two different
225 structs share the name `Value` because they live in different
226 packages, but they have different jobs:
227
228 - **`workflow.Value`** carries the raw source string the parser read
229 out of the YAML (an env entry, a `with:` input, a concurrency
230 group expression). At parse time we don't know what the
231 `${{ … }}` body will resolve to, so there's nothing to taint yet.
232 - **`expr.Value`** is what the evaluator returns when it resolves a
233 reference at runtime. *This* struct carries `Tainted bool`. The
234 runner's exec layer (S41d) consumes that flag.
235
236 Pre-L5 the parser-side struct also had a `Tainted bool` field plus a
237 `Tainted()` constructor — both unused, both confusing because they
238 suggested two sources of truth. Dropped in S41a-L5 cleanup.
239
240 ### Propagation
241
242 **Every `expr.Value` carries a `Tainted bool`.** Set true iff the
243 value transitively depends on `shithub.event.*`. Operators control
244 secrets, vars, env, the rest of `shithub.*`. Authors control the
245 workflow file. Only the event payload is *attacker-controlled*: a
246 PR title, a commit message, a branch name from a fork. Those values
247 must never be interpolated into a shell string.
248
249 Propagation rules:
250
251 - Reading `shithub.event.X``Tainted: true` (always, including
252 missing-path null results).
253 - Reading any other namespace → `Tainted: false`, except `env.X`
254 preserves the taint of the resolved env value. This closes the
255 escape where an event-derived value is first assigned to env and
256 then interpolated through `${{ env.X }}`.
257 - Binary op (`==`, `!=`, `&&`, `||`) → tainted if either operand is.
258 - Unary op (`!`) → tainted iff its operand is.
259 - Function call (`contains`, `startsWith`, `endsWith`) → tainted
260 if any argument is.
261
262 The runner consumes `Tainted` and refuses to interpolate tainted
263 values into shell strings. Instead, tainted values are bound to
264 runner-owned `SHITHUB_INPUT_xx` envvars and the shell source only
265 references those placeholders. The author writes:
266
267 ```yaml
268 - run: echo "PR title was: ${{ shithub.event.pull_request.title }}"
269 ```
270
271 The runner sees a tainted reference; it compiles the step to:
272
273 ```bash
274 SHITHUB_INPUT_0="$user_pr_title" exec sh -c 'echo "PR title was: $SHITHUB_INPUT_0"'
275 ```
276
277 …where `$user_pr_title` is set via Go's `cmd.Env`, never inserted into
278 the shell source string. Backticks, `$()`, `;`, `&&` — none of those
279 work as command-injection vectors when the value reaches the shell as
280 environment data instead of syntax.
281
282 The shared renderer lives in `internal/runner/exec`, so future engines
283 consume the same injection boundary instead of reimplementing it. The
284 runner claim payload includes `workflow_runs.event_payload`; without
285 that field, the runner cannot evaluate and taint
286 `${{ shithub.event.* }}` references.
287
288 Tests for this contract live in `internal/actions/expr/eval_test.go`,
289 `internal/runner/exec/render_test.go`, and
290 `internal/runner/engine/docker_test.go`. **Do not** weaken them in a
291 later PR without an audit-checkpoint review — they're explicitly
292 load-bearing for S41e's threat model.
293
294 Runner log chunks pass through `internal/runner/scrub` before they are
295 posted to the API. It masks exact secret values and preserves enough
296 tail bytes between chunks to catch a secret split across chunk
297 boundaries. S41e wires resolved workflow secrets into the runner claim
298 payload and mask set, then applies the same exact-value scrub again in
299 the runner API before persisting chunks. The server path also carries a
300 possible secret-prefix tail from the prior persisted chunk, so a runner
301 that bypasses client-side scrubbing cannot leak a secret by splitting
302 it across adjacent log POSTs.
303
304 ## `shithub.event` payload schema (v1)
305
306 The event payload is the most user-facing part of the contract: once
307 authors write workflows that template against `shithub.event.X`,
308 schema changes are breaking. The v1 schema is pinned and labelled
309 `v1`. Any addition is fine; renames and removals require a major
310 bump.
311
312 The schema is enforced by **typed constructors** in the
313 `internal/actions/event` package — one per trigger. S41b's pipeline
314 calls these to build payloads; the function signatures pin the
315 field set so adding a key requires editing the constructor in a
316 visible diff. This is the same closed-door discipline as the
317 expression evaluator's namespace allowlist.
318
319 | Trigger | Constructor | Top-level keys |
320 | ------------------- | ----------------------- | --------------------------------------------------------------------------------- |
321 | `push` | `event.Push` | `ref`, `before`, `after`, `head_commit{message,id,author}` |
322 | `pull_request` | `event.PullRequest` | `action`, `number`, `pull_request{title,head{ref,sha},base{ref,sha},user{login}}` |
323 | `schedule` | `event.Schedule` | (empty map — cron fired; cron expression is on the `workflow_runs` row) |
324 | `workflow_dispatch` | `event.WorkflowDispatch`| `inputs{<name>: <stringified>}` |
325
326 Anything not in this table doesn't exist in v1. Accessing it returns
327 null+tainted (the missing-path semantics above).
328
329 **Adding a field**: edit the constructor in `internal/actions/event/`,
330 add a row to this doc, and update the corresponding `*_FlowsThroughEvaluator`
331 test in `event_test.go` so the new path is exercised end-to-end.
332 Reviewer-required note in the commit message — same standard as a
333 new evaluator function.
334
335 **Renaming or removing**: that's a v1→v2 break. Don't.
336
337 ## Operator surface
338
339 `shithubd admin actions parse <file>` reads a workflow off disk,
340 runs the parser, and dumps diagnostics + a canonical JSON rendering
341 of the parsed AST. Useful for:
342
343 - debugging "why is my workflow not picking up changes" reports
344 - validating a workflow file before committing it
345 - producing a stable AST snapshot for inclusion in bug reports
346
347 Exit codes:
348
349 | Code | Meaning |
350 | ---- | --------------------------------------------- |
351 | 0 | clean parse, no Error-severity diagnostics |
352 | 1 | file unreadable, oversized, or YAML malformed |
353 | 2 | parse produced Error-severity diagnostics |
354
355 Other admin surfaces are scoped to later sub-sprints:
356
357 - S41c: `shithubd admin runner register --name <foo>` issues a
358 registration token + writes a row to `workflow_runners`.
359 - S41g: `shithubd admin actions cancel <run-id>` flips
360 `cancel_requested`.
361
362 ## Trigger pipeline (S41b)
363
364 Three layers between a triggering event and a queued `workflow_run`:
365
366 ```
367 caller (push_process / pulls.Create / pr_jobs.PRSynchronize / dispatch HTTP)
368
369 └─► worker.Enqueue(KindWorkflowTrigger, JobPayload)
370
371 └─► trigger.Handler picks up:
372 Discover .shithub/workflows/*.yml at HEAD SHA
373 Parse each (skip + log on Error diagnostics)
374 Match each against trigger.Event
375 Enqueue each match
376
377 └─► trigger.Enqueue (one tx):
378 INSERT workflow_runs (ON CONFLICT DO NOTHING)
379 INSERT workflow_jobs per parsed job
380 INSERT workflow_steps per parsed step
381 (commit)
382 checks.Create per job (post-tx, idempotent
383 via ExternalID 'workflow_run:<id>:job:<key>')
384 ```
385
386 ### Idempotency on the triggering event
387
388 The robust pattern, not a UNIQUE on `(repo_id, head_sha)`. Each
389 caller constructs a stable `trigger_event_id` from its triggering
390 event's identity:
391
392 | Caller | trigger_event_id format |
393 | ------------------- | ------------------------------------------------ |
394 | push_process | `push:<push_event_id>` |
395 | pulls.Create | `pr_opened:<pr_id>:<head_sha>` |
396 | pr_jobs.PRSynchronize | `pr_synchronize:<pr_id>:<head_sha>` |
397 | dispatch HTTP | `dispatch:<file>:<sha>:<8-byte-random-hex>` |
398 | schedule sweep (S41b-2) | `schedule:<workflow_id>:<window_start_unix>` |
399
400 Migration 0051 adds `workflow_runs.trigger_event_id` (text NOT NULL
401 DEFAULT '') with a partial UNIQUE on
402 `(repo_id, workflow_file, trigger_event_id) WHERE trigger_event_id <> ''`.
403 The trigger handler does `INSERT … ON CONFLICT DO NOTHING` so:
404
405 - Worker retries (the same push_process replay) → no duplicate runs.
406 - Admin replays via `shithubd admin run-job workflow:trigger ...`
407 → no duplicate runs.
408 - Re-runs (the future "Re-run" button) explicitly construct a NEW
409 trigger_event_id (`rerun:<original_run_id>:<request_uuid>`) and
410 chain back via `parent_run_id`. History is preserved, no
411 collision.
412
413 Each caller's collision-free namespace is short-lived and
414 human-debuggable: a Postgres operator can grep
415 `workflow_runs.trigger_event_id` to see exactly which triggering
416 event produced a given run.
417
418 ### Filter evaluation
419
420 `trigger.Match(workflow, event)` is a pure function (no I/O, no DB).
421 For each event kind:
422
423 - **push**: branch vs tag classified from the ref; only the matching
424 filter list applies (a `branches:` filter rejects tag pushes and
425 vice versa). `paths:` (when set) requires at least one changed
426 path to match. Empty filter = match-all.
427 - **pull_request**: `types:` defaults to
428 `[opened, synchronize, reopened]` when omitted (GHA parity).
429 `branches:` applies to the **base** ref. `paths:` as for push.
430 - **schedule**: requires the workflow to declare the cron expression
431 that fired. The sweep is the source of truth for which cron
432 fires; we just gate on declaration. Avoids interpreting cron
433 semantics in two places.
434 - **workflow_dispatch**: matches whenever the workflow declares
435 `on.workflow_dispatch`.
436
437 Glob semantics in `branches:`/`tags:`/`paths:`: minimatch subset
438 with `*` (single segment), `**` (any), `/**` end-anchor (optional
439 trailing path), `**/` start-anchor, and `!exclude` (last-match-wins,
440 exclusion-only list implies include-all).
441
442 ### Collaborator gate
443
444 Per the S41b spec's "external-PR support is parked" decision: PR
445 triggers (both `opened` and `synchronize`) only fire when the PR's
446 author is the repo's owning user. Conservative — drops legitimate
447 non-owner collaborators in the org-repo case. Expanding the gate
448 requires plumbing `policy.Can` into the worker context, which we
449 defer to S41g where the lifecycle work touches that surface anyway.
450
451 ### Operator surface
452
453 - `POST /{owner}/{repo}/actions/workflows/{file}/dispatches`
454 Body: `{"ref": "...", "inputs": {"key": "value"}}` (both optional;
455 ref defaults to the repo's default branch). Returns 204 No Content
456 on success. Synchronous trigger.Enqueue (no discovery — file is
457 named in the URL). Auth: requires repo write.
458
459 ### What S41b deliberately doesn't do
460
461 - Run jobs. S41c adds runner claim/status APIs; S41d adds the actual
462 `shithubd-runner` execution binary.
463 - Schedule sweep. Cron-driven triggers split into S41b-2 to keep
464 this PR reviewable; the trigger pipeline accepts schedule events,
465 but no caller produces them yet. S41b-2 adds the sweep + the
466 `robfig/cron/v3` dep + `shithubd-cron.service` wiring.
467 - External-PR triggers. Conservative collaborator gate above.
468 - `workflow_run` webhook events. S41h adds the webhook event family
469 + atom feed.
470
471 ## Secrets + variables settings surface (S41c)
472
473 S41c wires the previously schema-only `workflow_secrets` and
474 `actions_variables` tables into repo/org settings.
475
476 Repository routes are gated through
477 `policy.ActionRepoSettingsActions` (`repo:settings:actions`, admin
478 role minimum):
479
480 - `GET /{owner}/{repo}/settings/secrets/actions`
481 - `POST /{owner}/{repo}/settings/secrets/actions`
482 - `POST /{owner}/{repo}/settings/secrets/actions/{name}/delete`
483 - `GET /{owner}/{repo}/settings/variables/actions`
484 - `POST /{owner}/{repo}/settings/variables/actions`
485 - `POST /{owner}/{repo}/settings/variables/actions/{name}/delete`
486
487 Organization routes follow the existing org-settings prefix and are
488 owner-only:
489
490 - `GET /organizations/{org}/settings/secrets/actions`
491 - `POST /organizations/{org}/settings/secrets/actions`
492 - `POST /organizations/{org}/settings/secrets/actions/{name}/delete`
493 - `GET /organizations/{org}/settings/variables/actions`
494 - `POST /organizations/{org}/settings/variables/actions`
495 - `POST /organizations/{org}/settings/variables/actions/{name}/delete`
496
497 Secrets are sealed through `internal/auth/secretbox` using the
498 operator-managed `Auth.TOTPKeyB64` root key. Secret list pages render
499 names/metadata only; the plaintext value is accepted once on create or
500 rotation and never rendered back. Variables are non-secret plaintext
501 configuration, so settings pages render their values. Both stores use
502 the same name grammar as the database constraints:
503 `^[A-Za-z_][A-Za-z0-9_]*$`, 1-100 characters. Variables additionally
504 enforce the 4096-character value cap in Go before hitting the DB
505 constraint.
506
507 ## What S41a deliberately doesn't do
508
509 - No trigger pipeline. `domain_events` aren't matched against `on:`
510 yet — that's S41b.
511 - No runner. S41c/S41d add runner claim APIs and the execution binary.
512 - No UI. The Actions tab still renders the placeholder — S41f.
513 - No secret encryption helpers wired to anything writable — S41c.
514 - No JWT issuance, no runner registration flow — S41c.
515 - No log streaming, no SSE — S41d/f.
516 - No execution sandbox, no scrubbing, no injection guards
517 *enforced at the runner* — S41d/e (the parser-side taint contract
518 is the foundation those depend on, not a substitute).
519
520 ## Why these choices, in two paragraphs
521
522 The schema work is front-loaded so later sub-sprints don't ripple a
523 migration through every PR. `version` (optimistic locking) and
524 `run_index` (per-repo monotonic) are the two columns I'd flag to a
525 new maintainer immediately — both are nearly free to add up front
526 and painful to retrofit. The split between hot-path log chunks
527 (Postgres) and finalized blob (Spaces) is shaped after Forgejo's
528 log path; we pick the boring well-trodden answer over the clever
529 one because log throughput is the failure mode that bites first.
530
531 The taint contract is the security-load-bearing piece. Every later
532 sub-sprint trusts that the `Tainted` flag is set correctly here, in
533 the parser/evaluator, and never re-derived downstream. The narrow
534 allowlist of namespaces and functions exists exactly so a future PR
535 that adds, say, `fromJSON` has to do it knowingly — by widening the
536 allowlist in a visible diff, with a reviewer-required note, rather
537 than by accident. The `${{ github.* }}` alias is a pragmatic
538 concession to copy-paste users; the rebrand to `${{ shithub.* }}`
539 is the canonical form so future divergence isn't awkward.
540
541 ## See also
542
543 - `internal/actions/workflow/parse.go` — the parser
544 - `internal/actions/expr/eval.go` — the evaluator
545 - `internal/migrationsfs/migrations/0042..0049_*.sql` — the schema
546 - `tests/fixtures/workflows/*.yml` — canonical input shapes
547 - `internal/actions/workflow/parse_test.go` — fixture-driven tests
548 - `internal/actions/expr/eval_test.go` — taint-contract tests
549 - `.refs/forgejo/services/actions/` — reference architecture
550 - Campaign plan in conversation memory (humble-cooking-bunny)