markdown · 27201 bytes Raw Blame History

Actions/CI — schema + workflow dialect (S41a)

The Actions/CI subsystem is shipping in eight sub-sprints (S41a through S41h, plus optional S41i Nix engine). This doc covers what S41a lays down: the SQL schema, the workflow YAML dialect, the expression evaluator, and the load-bearing taint contract every later sub-sprint depends on.

S41a is parser + schema only — no triggers, no runner, no UI. The goal is to land a frozen contract that S41b/c/d/e can build against without churning under them.

SQL schema

Actions migrations currently span 0042–0051, 0053, and 0057. Migration 0052 belongs to the repo source-remotes feature, 0054 belongs to push event protocol tracking, 0055 belongs to the social feed, and 0056 belongs to user profile contribution settings.

# Table Purpose
0042 workflow_runs One row per triggered workflow execution
0043 workflow_jobs Jobs within a run (one row per jobs.<key>)
0044 workflow_steps Steps within a job (one row per steps[i])
0045 workflow_secrets Per-repo + per-org encrypted secrets
0046 workflow_runners Registered runners + runner_tokens
0047 workflow_step_log_chunks Hot-path append log buffer (concatenated to blob on finalize)
0048 workflow_artifacts Per-run artifact metadata (90-day default expiry)
0049 actions_variables Non-secret per-repo/org config (Forgejo parity)
0050 workflow_steps.step_with Parsed with: inputs for magic uses: aliases
0051 workflow_runs.trigger_event_id Trigger idempotency for retries/admin replays
0053 runner_jwt_used Single-use replay gate for runner job JWTs
0057 workflow_job_secret_masks Encrypted claim-time log mask snapshots per job

A few load-bearing choices, called out so they're easy to spot in a later schema diff:

  • workflow_runs.run_index — per-repo monotonic counter. Each repo gets #1, #2, … so URLs like /{owner}/{repo}/actions/runs/42 are stable and human-friendly. Crib from Forgejo's actions_run.index.
  • workflow_runs.version — optimistic-lock counter. Mutators bump-and-check rather than SELECT … FOR UPDATE. Required for S41g's race between a cancel request and a state transition.
  • workflow_runs.concurrency_group — the concurrency-slot key, resolved at trigger time from the workflow's concurrency.group: expression. S41g's slot manager keys off this column.
  • workflow_runs.parent_run_id — for re-runs. The new run references the original; the UI shows a "re-ran from #N" link.
  • workflow_jobs.runner_id — FK added in 0046 (after the runners table exists). Nullable until claimed.
  • workflow_steps has a CHECK constraint enforcing (run_command IS NOT NULL) <> (uses_alias IS NOT NULL) — exactly one of run: or uses:. The uses_alias column is further CHECK-constrained to the three magic aliases we accept in v1.
  • workflow_secrets owns its value as bytea ChaCha20Poly1305- sealed via internal/auth/secretbox. Key derivation uses cfg.Auth.TOTPKeyB64 (already an operator-managed root) + (owner, kind, name) salt so re-keying is per-row.
  • workflow_step_log_chunks.chunk is capped at 512 KB per row. The runner sends bigger payloads in pieces. (step_id, seq) is UNIQUE so duplicate sends are idempotent.
  • actions_variables — non-secret, plaintext, scoped exactly like secrets (per-repo or per-org, never both on the same row). Forgejo has the same split; we mirror it for parity.
  • runner_jwt_used — primary-keyed by JWT jti. Job endpoints insert into this table during auth; zero inserted rows means replay and the API returns 401. JWTs are HMAC-SHA256 and use an HKDF subkey derived from auth.totp_key_b64 with label actions-runner-jwt-v1.
  • workflow_job_secret_masks — one encrypted JSON array of exact secret values per claimed job. It snapshots the log scrub set at claim time, preventing a rotated or deleted secret from disappearing from server-side masking while the old value is still in a runner's job payload.

The version and run_index patterns are the two pieces I'd point out to a future maintainer first. Both are cheap to add now and miserable to retrofit later.

Workflow YAML dialect (v1)

We accept a strict subset of GitHub Actions YAML. The parser rejects unknown keys at parse time so workflow authors find their typos immediately instead of shipping a workflow that does nothing.

Top level

name: my-pipeline                         # optional human name
on: [push, pull_request]                  # or full-form (see below)
permissions: read-all                     # default if omitted
env: { GREETING: "hello" }                # workflow-level env
concurrency:                              # optional slot manager
  group: ${{ shithub.ref }}
  cancel-in-progress: true
jobs:
  <key>:                                  # 1+ entries
    runs-on: ubuntu-latest
    needs: [other-key]                    # optional dep edge
    if: ${{ shithub.actor == 'alice' }}   # optional gate
    timeout-minutes: 60                   # 1..4320, default 360
    permissions: { contents: read }       # narrow workflow perms
    env: { K: v }                         # job overlay
    steps:
      - name: ...
        id: ...
        if: ...
        run: echo hi                      # run XOR uses
        uses: actions/checkout@v4         # exactly one of three aliases
        working-directory: ...
        env: { ... }
        continue-on-error: false

Triggers (on:)

v1 supports four triggers — anything else is a parse error.

Trigger Surface
push branches:, tags:, paths: (include + !exclude semantics)
pull_request types: (opened/synchronize/reopened/...), branches:, paths:
schedule one or more - cron: <5-field-expr>
workflow_dispatch inputs: map (string/boolean/choice/environment)

uses: allowlist

Exactly three aliases, no exceptions:

Alias What it does
actions/checkout@v4 Clones the repo into the workspace
shithub/upload-artifact@v1 Uploads files to workflow_artifacts
shithub/download-artifact@v1 Pulls artifacts back in a downstream job

Any other uses: value (community actions, Docker images, composite actions) is an Error-severity diagnostic. The marketplace problem is explicitly out of scope for v1; revisit only if a real demand exists and we have an answer for supply-chain trust.

File-size + parser caps

  • 64 KB workflow file size cap (workflow.MaxWorkflowFileBytes). Files larger than this are rejected before YAML decode begins — defends against pathological inputs and gives operators a predictable upper bound on parser memory.
  • 100 anchors per document (workflow.MaxYAMLAliases) — the billion-laughs guard. yaml.v3 doesn't expose a direct knob; we count alias nodes during a tree walk and bail.

${{ github.* }} alias

The dialect is intentionally rebranded to ${{ shithub.* }}. Authors who paste GHA workflows in unmodified will see their ${{ github.* }} references continue to work because the evaluator rewrites path[0] from github to shithub at the top of evalRef before taint computation, dispatch, and error rendering.

The alias is intentionally scope-narrow: only fields that exist in our shithub.* namespace (run_id, sha, ref, actor, event) route through. GHA fields we don't expose in v1 — event_name, repository, run_number, workspace, etc. — error with the canonical unknown shithub field "X" message. Slightly confusing for a GHA-flavored author but keeps the v1 namespace surface tight.

The alias preserves the load-bearing taint flag: github.event.X taints exactly like shithub.event.X. TestEval_GithubAliasIsTainted pins this contract.

Migration to strict-compat (drop the alias entirely) later is a one-PR flip; moving the other direction is much harder.

This is a deliberate decision recorded in the campaign plan.

Expression evaluator

${{ … }} expressions are parsed into a tiny AST and evaluated by internal/actions/expr. The surface is intentionally minimal:

Allowed namespaces

Namespace Source Tainted?
secrets.X workflow_secrets no, but sensitive
vars.X actions_variables no (operator-controlled)
env.X workflow file no (workflow author's text)
shithub.run_id dispatch context no
shithub.sha dispatch context no
shithub.ref dispatch context no
shithub.actor dispatch context no (resolved username)
shithub.event.* trigger payload yes — always

runner.*, steps.*, needs.*, matrix.*, inputs.* are all parse-time errors. They're parked for v2 and the parser's allowlist-closed posture means a future PR can't widen this accidentally without a clearly visible diff.

Allowed functions

contains(haystack, needle), startsWith(s, prefix), endsWith(s, suffix), plus the four job-status predicates success(), failure(), cancelled(), always(). That's the whole list. fromJSON, hashFiles, toJSON, format, and friends are explicitly rejected — they each carry footgun risk (parser DoS, FS access, side-channel injection) that we don't want to take on in v1.

Missing-value semantics

Reference Missing → ?
secrets.NOT_BOUND error (loud — workflow won't run)
vars.MISSING empty string (GHA parity)
env.MISSING empty string (GHA parity)
shithub.event.deeply.missing null but still tainted

The "missing event path → null but tainted" case is a defence-in- depth choice: even if the path doesn't resolve, the result still came from the event payload, and we'd rather over-flag than under.

Taint contract — the load-bearing piece

This is the contract every later sub-sprint hangs off. Get it wrong and we have an injection-shaped hole in the runner.

Where the flag lives

The taint flag lives on expr.Value (the evaluator-produced value), not workflow.Value (the parser-produced value). Two different structs share the name Value because they live in different packages, but they have different jobs:

  • workflow.Value carries the raw source string the parser read out of the YAML (an env entry, a with: input, a concurrency group expression). At parse time we don't know what the ${{ … }} body will resolve to, so there's nothing to taint yet.
  • expr.Value is what the evaluator returns when it resolves a reference at runtime. This struct carries Tainted bool. The runner's exec layer (S41d) consumes that flag.

Pre-L5 the parser-side struct also had a Tainted bool field plus a Tainted() constructor — both unused, both confusing because they suggested two sources of truth. Dropped in S41a-L5 cleanup.

Propagation

Every expr.Value carries a Tainted bool. Set true iff the value transitively depends on shithub.event.*. Operators control secrets, vars, env, the rest of shithub.*. Authors control the workflow file. Only the event payload is attacker-controlled: a PR title, a commit message, a branch name from a fork. Those values must never be interpolated into a shell string.

Propagation rules:

  • Reading shithub.event.XTainted: true (always, including missing-path null results).
  • Reading secrets.XSensitive: true. Secrets are operator- controlled, so they are not tainted, but they must not appear in shell source strings or Docker argv.
  • Reading any other namespace → Tainted: false and Sensitive: false, except env.X preserves both flags of the resolved env value. This closes the escape where an event-derived or secret-derived value is first assigned to env and then interpolated through ${{ env.X }}.
  • Binary op (==, !=, &&, ||) → tainted or sensitive if either operand is.
  • Unary op (!) → tainted/sensitive iff its operand is.
  • Function call (contains, startsWith, endsWith) → tainted or sensitive if any argument is.

The runner consumes Tainted and Sensitive and refuses to interpolate either class into shell strings. Instead, those values are bound to runner-owned SHITHUB_INPUT_xx envvars and the shell source only references those placeholders. The author writes:

- run: echo "PR title was: ${{ shithub.event.pull_request.title }}"

The runner sees a tainted reference; it compiles the step to:

SHITHUB_INPUT_0="$user_pr_title" exec sh -c 'echo "PR title was: $SHITHUB_INPUT_0"'

…where $user_pr_title is set via Go's cmd.Env, never inserted into the shell source string or Docker CLI argv. Backticks, $(), ;, && — none of those work as command-injection vectors when the value reaches the shell as environment data instead of syntax.

The shared renderer lives in internal/runner/exec, so future engines consume the same injection boundary instead of reimplementing it. The runner claim payload includes workflow_runs.event_payload; without that field, the runner cannot evaluate and taint ${{ shithub.event.* }} references.

Tests for this contract live in internal/actions/expr/eval_test.go, internal/runner/exec/render_test.go, and internal/runner/engine/docker_test.go. Do not weaken them in a later PR without an audit-checkpoint review — they're explicitly load-bearing for S41e's threat model.

Runner log chunks pass through internal/runner/scrub before they are posted to the API. It masks exact secret values and preserves enough tail bytes between chunks to catch a secret split across chunk boundaries. S41e wires resolved workflow secrets into the runner claim payload and mask set, snapshots that mask set encrypted on the job, then applies the same exact-value scrub again in the runner API before persisting chunks. The server path also carries a possible secret-prefix tail from the prior persisted chunk, so a runner that bypasses client-side scrubbing cannot leak a secret by splitting it across adjacent log POSTs.

shithub.event payload schema (v1)

The event payload is the most user-facing part of the contract: once authors write workflows that template against shithub.event.X, schema changes are breaking. The v1 schema is pinned and labelled v1. Any addition is fine; renames and removals require a major bump.

The schema is enforced by typed constructors in the internal/actions/event package — one per trigger. S41b's pipeline calls these to build payloads; the function signatures pin the field set so adding a key requires editing the constructor in a visible diff. This is the same closed-door discipline as the expression evaluator's namespace allowlist.

Trigger Constructor Top-level keys
push event.Push ref, before, after, head_commit{message,id,author}
pull_request event.PullRequest action, number, pull_request{title,head{ref,sha},base{ref,sha},user{login}}
schedule event.Schedule (empty map — cron fired; cron expression is on the workflow_runs row)
workflow_dispatch event.WorkflowDispatch inputs{<name>: <stringified>}

Anything not in this table doesn't exist in v1. Accessing it returns null+tainted (the missing-path semantics above).

Adding a field: edit the constructor in internal/actions/event/, add a row to this doc, and update the corresponding *_FlowsThroughEvaluator test in event_test.go so the new path is exercised end-to-end. Reviewer-required note in the commit message — same standard as a new evaluator function.

Renaming or removing: that's a v1→v2 break. Don't.

Operator surface

shithubd admin actions parse <file> reads a workflow off disk, runs the parser, and dumps diagnostics + a canonical JSON rendering of the parsed AST. Useful for:

  • debugging "why is my workflow not picking up changes" reports
  • validating a workflow file before committing it
  • producing a stable AST snapshot for inclusion in bug reports

Exit codes:

Code Meaning
0 clean parse, no Error-severity diagnostics
1 file unreadable, oversized, or YAML malformed
2 parse produced Error-severity diagnostics

Other admin surfaces are scoped to later sub-sprints:

  • S41c: shithubd admin runner register --name <foo> issues a registration token + writes a row to workflow_runners.
  • S41g: shithubd admin actions cancel <run-id> flips cancel_requested.

Trigger pipeline (S41b)

Three layers between a triggering event and a queued workflow_run:

caller (push_process / pulls.Create / pr_jobs.PRSynchronize / dispatch HTTP)
    │
    └─► worker.Enqueue(KindWorkflowTrigger, JobPayload)
            │
            └─► trigger.Handler picks up:
                  Discover .shithub/workflows/*.yml at HEAD SHA
                  Parse each (skip + log on Error diagnostics)
                  Match each against trigger.Event
                  Enqueue each match
                        │
                        └─► trigger.Enqueue (one tx):
                              INSERT workflow_runs (ON CONFLICT DO NOTHING)
                              INSERT workflow_jobs per parsed job
                              INSERT workflow_steps per parsed step
                              (commit)
                              checks.Create per job (post-tx, idempotent
                                via ExternalID 'workflow_run:<id>:job:<key>')

Idempotency on the triggering event

The robust pattern, not a UNIQUE on (repo_id, head_sha). Each caller constructs a stable trigger_event_id from its triggering event's identity:

Caller trigger_event_id format
push_process push:<push_event_id>
pulls.Create pr_opened:<pr_id>:<head_sha>
pr_jobs.PRSynchronize pr_synchronize:<pr_id>:<head_sha>
dispatch HTTP dispatch:<file>:<sha>:<8-byte-random-hex>
schedule sweep (S41b-2) schedule:<workflow_id>:<window_start_unix>

Migration 0051 adds workflow_runs.trigger_event_id (text NOT NULL DEFAULT '') with a partial UNIQUE on (repo_id, workflow_file, trigger_event_id) WHERE trigger_event_id <> ''. The trigger handler does INSERT … ON CONFLICT DO NOTHING so:

  • Worker retries (the same push_process replay) → no duplicate runs.
  • Admin replays via shithubd admin run-job workflow:trigger ... → no duplicate runs.
  • Re-runs (the future "Re-run" button) explicitly construct a NEW trigger_event_id (rerun:<original_run_id>:<request_uuid>) and chain back via parent_run_id. History is preserved, no collision.

Each caller's collision-free namespace is short-lived and human-debuggable: a Postgres operator can grep workflow_runs.trigger_event_id to see exactly which triggering event produced a given run.

Filter evaluation

trigger.Match(workflow, event) is a pure function (no I/O, no DB). For each event kind:

  • push: branch vs tag classified from the ref; only the matching filter list applies (a branches: filter rejects tag pushes and vice versa). paths: (when set) requires at least one changed path to match. Empty filter = match-all.
  • pull_request: types: defaults to [opened, synchronize, reopened] when omitted (GHA parity). branches: applies to the base ref. paths: as for push.
  • schedule: requires the workflow to declare the cron expression that fired. The sweep is the source of truth for which cron fires; we just gate on declaration. Avoids interpreting cron semantics in two places.
  • workflow_dispatch: matches whenever the workflow declares on.workflow_dispatch.

Glob semantics in branches:/tags:/paths:: minimatch subset with * (single segment), ** (any), /** end-anchor (optional trailing path), **/ start-anchor, and !exclude (last-match-wins, exclusion-only list implies include-all).

Collaborator gate

Per the S41b spec's "external-PR support is parked" decision: PR triggers (both opened and synchronize) only fire when the PR's author is the repo's owning user. Conservative — drops legitimate non-owner collaborators in the org-repo case. Expanding the gate requires plumbing policy.Can into the worker context, which we defer to S41g where the lifecycle work touches that surface anyway.

Operator surface

  • POST /{owner}/{repo}/actions/workflows/{file}/dispatches Body: {"ref": "...", "inputs": {"key": "value"}} (both optional; ref defaults to the repo's default branch). Returns 204 No Content on success. Synchronous trigger.Enqueue (no discovery — file is named in the URL). Auth: requires repo write.

What S41b deliberately doesn't do

  • Run jobs. S41c adds runner claim/status APIs; S41d adds the actual shithubd-runner execution binary.
  • Schedule sweep. Cron-driven triggers split into S41b-2 to keep this PR reviewable; the trigger pipeline accepts schedule events, but no caller produces them yet. S41b-2 adds the sweep + the robfig/cron/v3 dep + shithubd-cron.service wiring.
  • External-PR triggers. Conservative collaborator gate above.
  • workflow_run webhook events. S41h adds the webhook event family
    • atom feed.

Secrets + variables settings surface (S41c)

S41c wires the previously schema-only workflow_secrets and actions_variables tables into repo/org settings.

Repository routes are gated through policy.ActionRepoSettingsActions (repo:settings:actions, admin role minimum):

  • GET /{owner}/{repo}/settings/secrets/actions
  • POST /{owner}/{repo}/settings/secrets/actions
  • POST /{owner}/{repo}/settings/secrets/actions/{name}/delete
  • GET /{owner}/{repo}/settings/variables/actions
  • POST /{owner}/{repo}/settings/variables/actions
  • POST /{owner}/{repo}/settings/variables/actions/{name}/delete

Organization routes follow the existing org-settings prefix and are owner-only:

  • GET /organizations/{org}/settings/secrets/actions
  • POST /organizations/{org}/settings/secrets/actions
  • POST /organizations/{org}/settings/secrets/actions/{name}/delete
  • GET /organizations/{org}/settings/variables/actions
  • POST /organizations/{org}/settings/variables/actions
  • POST /organizations/{org}/settings/variables/actions/{name}/delete

Secrets are sealed through internal/auth/secretbox using the operator-managed Auth.TOTPKeyB64 root key. Secret list pages render names/metadata only; the plaintext value is accepted once on create or rotation and never rendered back. Variables are non-secret plaintext configuration, so settings pages render their values. Both stores use the same name grammar as the database constraints: ^[A-Za-z_][A-Za-z0-9_]*$, 1-100 characters. Variables additionally enforce the 4096-character value cap in Go before hitting the DB constraint.

What S41a deliberately doesn't do

  • No trigger pipeline. domain_events aren't matched against on: yet — that's S41b.
  • No runner. S41c/S41d add runner claim APIs and the execution binary.
  • No UI. The Actions tab still renders the placeholder — S41f.
  • No secret encryption helpers wired to anything writable — S41c.
  • No JWT issuance, no runner registration flow — S41c.
  • No log streaming, no SSE — S41d/f.
  • No execution sandbox, no scrubbing, no injection guards enforced at the runner — S41d/e (the parser-side taint contract is the foundation those depend on, not a substitute).

Why these choices, in two paragraphs

The schema work is front-loaded so later sub-sprints don't ripple a migration through every PR. version (optimistic locking) and run_index (per-repo monotonic) are the two columns I'd flag to a new maintainer immediately — both are nearly free to add up front and painful to retrofit. The split between hot-path log chunks (Postgres) and finalized blob (Spaces) is shaped after Forgejo's log path; we pick the boring well-trodden answer over the clever one because log throughput is the failure mode that bites first.

The taint contract is the security-load-bearing piece. Every later sub-sprint trusts that the Tainted flag is set correctly here, in the parser/evaluator, and never re-derived downstream. The narrow allowlist of namespaces and functions exists exactly so a future PR that adds, say, fromJSON has to do it knowingly — by widening the allowlist in a visible diff, with a reviewer-required note, rather than by accident. The ${{ github.* }} alias is a pragmatic concession to copy-paste users; the rebrand to ${{ shithub.* }} is the canonical form so future divergence isn't awkward.

See also

  • internal/actions/workflow/parse.go — the parser
  • internal/actions/expr/eval.go — the evaluator
  • internal/migrationsfs/migrations/0042..0049_*.sql — the schema
  • tests/fixtures/workflows/*.yml — canonical input shapes
  • internal/actions/workflow/parse_test.go — fixture-driven tests
  • internal/actions/expr/eval_test.go — taint-contract tests
  • .refs/forgejo/services/actions/ — reference architecture
  • Campaign plan in conversation memory (humble-cooking-bunny)
View source
1 # Actions/CI — schema + workflow dialect (S41a)
2
3 The Actions/CI subsystem is shipping in eight sub-sprints (S41a through
4 S41h, plus optional S41i Nix engine). This doc covers what S41a lays
5 down: the SQL schema, the workflow YAML dialect, the expression
6 evaluator, and the load-bearing taint contract every later sub-sprint
7 depends on.
8
9 S41a is parser + schema only — no triggers, no runner, no UI. The
10 goal is to land a frozen contract that S41b/c/d/e can build against
11 without churning under them.
12
13 ## SQL schema
14
15 Actions migrations currently span 0042–0051, 0053, and 0057. Migration
16 0052 belongs to the repo source-remotes feature, 0054 belongs to push
17 event protocol tracking, 0055 belongs to the social feed, and 0056
18 belongs to user profile contribution settings.
19
20 | # | Table | Purpose |
21 | ----- | --------------------------- | ------------------------------------------------------------- |
22 | 0042 | `workflow_runs` | One row per triggered workflow execution |
23 | 0043 | `workflow_jobs` | Jobs within a run (one row per `jobs.<key>`) |
24 | 0044 | `workflow_steps` | Steps within a job (one row per `steps[i]`) |
25 | 0045 | `workflow_secrets` | Per-repo + per-org encrypted secrets |
26 | 0046 | `workflow_runners` | Registered runners + `runner_tokens` |
27 | 0047 | `workflow_step_log_chunks` | Hot-path append log buffer (concatenated to blob on finalize) |
28 | 0048 | `workflow_artifacts` | Per-run artifact metadata (90-day default expiry) |
29 | 0049 | `actions_variables` | Non-secret per-repo/org config (Forgejo parity) |
30 | 0050 | `workflow_steps.step_with` | Parsed `with:` inputs for magic `uses:` aliases |
31 | 0051 | `workflow_runs.trigger_event_id` | Trigger idempotency for retries/admin replays |
32 | 0053 | `runner_jwt_used` | Single-use replay gate for runner job JWTs |
33 | 0057 | `workflow_job_secret_masks` | Encrypted claim-time log mask snapshots per job |
34
35 A few load-bearing choices, called out so they're easy to spot in a
36 later schema diff:
37
38 - **`workflow_runs.run_index`** — per-repo monotonic counter. Each
39 repo gets `#1`, `#2`, … so URLs like
40 `/{owner}/{repo}/actions/runs/42` are stable and human-friendly.
41 Crib from Forgejo's `actions_run.index`.
42 - **`workflow_runs.version`** — optimistic-lock counter. Mutators
43 bump-and-check rather than `SELECT … FOR UPDATE`. Required for
44 S41g's race between a cancel request and a state transition.
45 - **`workflow_runs.concurrency_group`** — the concurrency-slot key,
46 resolved at trigger time from the workflow's `concurrency.group:`
47 expression. S41g's slot manager keys off this column.
48 - **`workflow_runs.parent_run_id`** — for re-runs. The new run
49 references the original; the UI shows a "re-ran from #N" link.
50 - **`workflow_jobs.runner_id`** — FK added in 0046 (after the
51 runners table exists). Nullable until claimed.
52 - **`workflow_steps`** has a CHECK constraint enforcing
53 `(run_command IS NOT NULL) <> (uses_alias IS NOT NULL)` — exactly
54 one of `run:` or `uses:`. The `uses_alias` column is further
55 CHECK-constrained to the three magic aliases we accept in v1.
56 - **`workflow_secrets`** owns its value as `bytea` ChaCha20Poly1305-
57 sealed via `internal/auth/secretbox`. Key derivation uses
58 `cfg.Auth.TOTPKeyB64` (already an operator-managed root) +
59 `(owner, kind, name)` salt so re-keying is per-row.
60 - **`workflow_step_log_chunks.chunk`** is capped at 512 KB per row.
61 The runner sends bigger payloads in pieces. `(step_id, seq)` is
62 UNIQUE so duplicate sends are idempotent.
63 - **`actions_variables`** — non-secret, plaintext, scoped exactly
64 like secrets (per-repo or per-org, never both on the same row).
65 Forgejo has the same split; we mirror it for parity.
66 - **`runner_jwt_used`** — primary-keyed by JWT `jti`. Job endpoints
67 insert into this table during auth; zero inserted rows means replay
68 and the API returns 401. JWTs are HMAC-SHA256 and use an HKDF
69 subkey derived from `auth.totp_key_b64` with label
70 `actions-runner-jwt-v1`.
71 - **`workflow_job_secret_masks`** — one encrypted JSON array of exact
72 secret values per claimed job. It snapshots the log scrub set at
73 claim time, preventing a rotated or deleted secret from disappearing
74 from server-side masking while the old value is still in a runner's
75 job payload.
76
77 The `version` and `run_index` patterns are the two pieces I'd point
78 out to a future maintainer first. Both are cheap to add now and
79 miserable to retrofit later.
80
81 ## Workflow YAML dialect (v1)
82
83 We accept a strict subset of GitHub Actions YAML. The parser rejects
84 unknown keys at parse time so workflow authors find their typos
85 immediately instead of shipping a workflow that does nothing.
86
87 ### Top level
88
89 ```yaml
90 name: my-pipeline # optional human name
91 on: [push, pull_request] # or full-form (see below)
92 permissions: read-all # default if omitted
93 env: { GREETING: "hello" } # workflow-level env
94 concurrency: # optional slot manager
95 group: ${{ shithub.ref }}
96 cancel-in-progress: true
97 jobs:
98 <key>: # 1+ entries
99 runs-on: ubuntu-latest
100 needs: [other-key] # optional dep edge
101 if: ${{ shithub.actor == 'alice' }} # optional gate
102 timeout-minutes: 60 # 1..4320, default 360
103 permissions: { contents: read } # narrow workflow perms
104 env: { K: v } # job overlay
105 steps:
106 - name: ...
107 id: ...
108 if: ...
109 run: echo hi # run XOR uses
110 uses: actions/checkout@v4 # exactly one of three aliases
111 working-directory: ...
112 env: { ... }
113 continue-on-error: false
114 ```
115
116 ### Triggers (`on:`)
117
118 v1 supports four triggers — anything else is a parse error.
119
120 | Trigger | Surface |
121 | ------------------- | ---------------------------------------------------------------- |
122 | `push` | `branches:`, `tags:`, `paths:` (include + `!exclude` semantics) |
123 | `pull_request` | `types:` (opened/synchronize/reopened/...), `branches:`, `paths:` |
124 | `schedule` | one or more `- cron: <5-field-expr>` |
125 | `workflow_dispatch` | `inputs:` map (string/boolean/choice/environment) |
126
127 ### `uses:` allowlist
128
129 Exactly three aliases, no exceptions:
130
131 | Alias | What it does |
132 | -------------------------------- | ----------------------------------------- |
133 | `actions/checkout@v4` | Clones the repo into the workspace |
134 | `shithub/upload-artifact@v1` | Uploads files to `workflow_artifacts` |
135 | `shithub/download-artifact@v1` | Pulls artifacts back in a downstream job |
136
137 Any other `uses:` value (community actions, Docker images, composite
138 actions) is an Error-severity diagnostic. The marketplace problem is
139 explicitly out of scope for v1; revisit only if a real demand exists
140 and we have an answer for supply-chain trust.
141
142 ### File-size + parser caps
143
144 - **64 KB** workflow file size cap (`workflow.MaxWorkflowFileBytes`).
145 Files larger than this are rejected before YAML decode begins —
146 defends against pathological inputs and gives operators a
147 predictable upper bound on parser memory.
148 - **100 anchors** per document (`workflow.MaxYAMLAliases`) — the
149 billion-laughs guard. yaml.v3 doesn't expose a direct knob; we
150 count alias nodes during a tree walk and bail.
151
152 ### `${{ github.* }}` alias
153
154 The dialect is intentionally rebranded to `${{ shithub.* }}`.
155 Authors who paste GHA workflows in unmodified will see their
156 `${{ github.* }}` references continue to work because the evaluator
157 rewrites `path[0]` from `github` to `shithub` at the top of `evalRef`
158 before taint computation, dispatch, and error rendering.
159
160 The alias is intentionally **scope-narrow**: only fields that exist
161 in our `shithub.*` namespace (`run_id`, `sha`, `ref`, `actor`,
162 `event`) route through. GHA fields we don't expose in v1 —
163 `event_name`, `repository`, `run_number`, `workspace`, etc. — error
164 with the canonical `unknown shithub field "X"` message. Slightly
165 confusing for a GHA-flavored author but keeps the v1 namespace
166 surface tight.
167
168 The alias preserves the load-bearing taint flag: `github.event.X`
169 taints exactly like `shithub.event.X`. `TestEval_GithubAliasIsTainted`
170 pins this contract.
171
172 Migration to strict-compat (drop the alias entirely) later is a
173 one-PR flip; moving the other direction is much harder.
174
175 This is a deliberate decision recorded in the campaign plan.
176
177 ## Expression evaluator
178
179 `${{ … }}` expressions are parsed into a tiny AST and evaluated by
180 `internal/actions/expr`. The surface is intentionally minimal:
181
182 ### Allowed namespaces
183
184 | Namespace | Source | Tainted? |
185 | ---------------- | ----------------- | --------------------------- |
186 | `secrets.X` | workflow_secrets | no, but sensitive |
187 | `vars.X` | actions_variables | no (operator-controlled) |
188 | `env.X` | workflow file | no (workflow author's text) |
189 | `shithub.run_id` | dispatch context | no |
190 | `shithub.sha` | dispatch context | no |
191 | `shithub.ref` | dispatch context | no |
192 | `shithub.actor` | dispatch context | no (resolved username) |
193 | `shithub.event.*`| trigger payload | **yes — always** |
194
195 `runner.*`, `steps.*`, `needs.*`, `matrix.*`, `inputs.*` are all
196 parse-time errors. They're parked for v2 and the parser's
197 allowlist-closed posture means a future PR can't widen this
198 accidentally without a clearly visible diff.
199
200 ### Allowed functions
201
202 `contains(haystack, needle)`, `startsWith(s, prefix)`,
203 `endsWith(s, suffix)`, plus the four job-status predicates
204 `success()`, `failure()`, `cancelled()`, `always()`. That's the
205 whole list. `fromJSON`, `hashFiles`, `toJSON`, `format`, and
206 friends are explicitly rejected — they each carry footgun risk
207 (parser DoS, FS access, side-channel injection) that we don't want
208 to take on in v1.
209
210 ### Missing-value semantics
211
212 | Reference | Missing → ? |
213 | -------------------------------- | ------------------------------------ |
214 | `secrets.NOT_BOUND` | error (loud — workflow won't run) |
215 | `vars.MISSING` | empty string (GHA parity) |
216 | `env.MISSING` | empty string (GHA parity) |
217 | `shithub.event.deeply.missing` | null **but still tainted** |
218
219 The "missing event path → null but tainted" case is a defence-in-
220 depth choice: even if the path doesn't resolve, the result still
221 came from the event payload, and we'd rather over-flag than under.
222
223 ## Taint contract — the load-bearing piece
224
225 This is the contract every later sub-sprint hangs off. Get it wrong
226 and we have an injection-shaped hole in the runner.
227
228 ### Where the flag lives
229
230 The taint flag lives on `expr.Value` (the evaluator-produced value),
231 not `workflow.Value` (the parser-produced value). Two different
232 structs share the name `Value` because they live in different
233 packages, but they have different jobs:
234
235 - **`workflow.Value`** carries the raw source string the parser read
236 out of the YAML (an env entry, a `with:` input, a concurrency
237 group expression). At parse time we don't know what the
238 `${{ … }}` body will resolve to, so there's nothing to taint yet.
239 - **`expr.Value`** is what the evaluator returns when it resolves a
240 reference at runtime. *This* struct carries `Tainted bool`. The
241 runner's exec layer (S41d) consumes that flag.
242
243 Pre-L5 the parser-side struct also had a `Tainted bool` field plus a
244 `Tainted()` constructor — both unused, both confusing because they
245 suggested two sources of truth. Dropped in S41a-L5 cleanup.
246
247 ### Propagation
248
249 **Every `expr.Value` carries a `Tainted bool`.** Set true iff the
250 value transitively depends on `shithub.event.*`. Operators control
251 secrets, vars, env, the rest of `shithub.*`. Authors control the
252 workflow file. Only the event payload is *attacker-controlled*: a
253 PR title, a commit message, a branch name from a fork. Those values
254 must never be interpolated into a shell string.
255
256 Propagation rules:
257
258 - Reading `shithub.event.X``Tainted: true` (always, including
259 missing-path null results).
260 - Reading `secrets.X``Sensitive: true`. Secrets are operator-
261 controlled, so they are not tainted, but they must not appear in
262 shell source strings or Docker argv.
263 - Reading any other namespace → `Tainted: false` and
264 `Sensitive: false`, except `env.X` preserves both flags of the
265 resolved env value. This closes the escape where an event-derived or
266 secret-derived value is first assigned to env and then interpolated
267 through `${{ env.X }}`.
268 - Binary op (`==`, `!=`, `&&`, `||`) → tainted or sensitive if either
269 operand is.
270 - Unary op (`!`) → tainted/sensitive iff its operand is.
271 - Function call (`contains`, `startsWith`, `endsWith`) → tainted or
272 sensitive if any argument is.
273
274 The runner consumes `Tainted` and `Sensitive` and refuses to interpolate
275 either class into shell strings. Instead, those values are bound to
276 runner-owned `SHITHUB_INPUT_xx` envvars and the shell source only
277 references those placeholders. The author writes:
278
279 ```yaml
280 - run: echo "PR title was: ${{ shithub.event.pull_request.title }}"
281 ```
282
283 The runner sees a tainted reference; it compiles the step to:
284
285 ```bash
286 SHITHUB_INPUT_0="$user_pr_title" exec sh -c 'echo "PR title was: $SHITHUB_INPUT_0"'
287 ```
288
289 …where `$user_pr_title` is set via Go's `cmd.Env`, never inserted into
290 the shell source string or Docker CLI argv. Backticks, `$()`, `;`,
291 `&&` — none of those work as command-injection vectors when the value
292 reaches the shell as environment data instead of syntax.
293
294 The shared renderer lives in `internal/runner/exec`, so future engines
295 consume the same injection boundary instead of reimplementing it. The
296 runner claim payload includes `workflow_runs.event_payload`; without
297 that field, the runner cannot evaluate and taint
298 `${{ shithub.event.* }}` references.
299
300 Tests for this contract live in `internal/actions/expr/eval_test.go`,
301 `internal/runner/exec/render_test.go`, and
302 `internal/runner/engine/docker_test.go`. **Do not** weaken them in a
303 later PR without an audit-checkpoint review — they're explicitly
304 load-bearing for S41e's threat model.
305
306 Runner log chunks pass through `internal/runner/scrub` before they are
307 posted to the API. It masks exact secret values and preserves enough
308 tail bytes between chunks to catch a secret split across chunk
309 boundaries. S41e wires resolved workflow secrets into the runner claim
310 payload and mask set, snapshots that mask set encrypted on the job, then
311 applies the same exact-value scrub again in the runner API before
312 persisting chunks. The server path also carries a possible secret-prefix
313 tail from the prior persisted chunk, so a runner that bypasses
314 client-side scrubbing cannot leak a secret by splitting it across
315 adjacent log POSTs.
316
317 ## `shithub.event` payload schema (v1)
318
319 The event payload is the most user-facing part of the contract: once
320 authors write workflows that template against `shithub.event.X`,
321 schema changes are breaking. The v1 schema is pinned and labelled
322 `v1`. Any addition is fine; renames and removals require a major
323 bump.
324
325 The schema is enforced by **typed constructors** in the
326 `internal/actions/event` package — one per trigger. S41b's pipeline
327 calls these to build payloads; the function signatures pin the
328 field set so adding a key requires editing the constructor in a
329 visible diff. This is the same closed-door discipline as the
330 expression evaluator's namespace allowlist.
331
332 | Trigger | Constructor | Top-level keys |
333 | ------------------- | ----------------------- | --------------------------------------------------------------------------------- |
334 | `push` | `event.Push` | `ref`, `before`, `after`, `head_commit{message,id,author}` |
335 | `pull_request` | `event.PullRequest` | `action`, `number`, `pull_request{title,head{ref,sha},base{ref,sha},user{login}}` |
336 | `schedule` | `event.Schedule` | (empty map — cron fired; cron expression is on the `workflow_runs` row) |
337 | `workflow_dispatch` | `event.WorkflowDispatch`| `inputs{<name>: <stringified>}` |
338
339 Anything not in this table doesn't exist in v1. Accessing it returns
340 null+tainted (the missing-path semantics above).
341
342 **Adding a field**: edit the constructor in `internal/actions/event/`,
343 add a row to this doc, and update the corresponding `*_FlowsThroughEvaluator`
344 test in `event_test.go` so the new path is exercised end-to-end.
345 Reviewer-required note in the commit message — same standard as a
346 new evaluator function.
347
348 **Renaming or removing**: that's a v1→v2 break. Don't.
349
350 ## Operator surface
351
352 `shithubd admin actions parse <file>` reads a workflow off disk,
353 runs the parser, and dumps diagnostics + a canonical JSON rendering
354 of the parsed AST. Useful for:
355
356 - debugging "why is my workflow not picking up changes" reports
357 - validating a workflow file before committing it
358 - producing a stable AST snapshot for inclusion in bug reports
359
360 Exit codes:
361
362 | Code | Meaning |
363 | ---- | --------------------------------------------- |
364 | 0 | clean parse, no Error-severity diagnostics |
365 | 1 | file unreadable, oversized, or YAML malformed |
366 | 2 | parse produced Error-severity diagnostics |
367
368 Other admin surfaces are scoped to later sub-sprints:
369
370 - S41c: `shithubd admin runner register --name <foo>` issues a
371 registration token + writes a row to `workflow_runners`.
372 - S41g: `shithubd admin actions cancel <run-id>` flips
373 `cancel_requested`.
374
375 ## Trigger pipeline (S41b)
376
377 Three layers between a triggering event and a queued `workflow_run`:
378
379 ```
380 caller (push_process / pulls.Create / pr_jobs.PRSynchronize / dispatch HTTP)
381
382 └─► worker.Enqueue(KindWorkflowTrigger, JobPayload)
383
384 └─► trigger.Handler picks up:
385 Discover .shithub/workflows/*.yml at HEAD SHA
386 Parse each (skip + log on Error diagnostics)
387 Match each against trigger.Event
388 Enqueue each match
389
390 └─► trigger.Enqueue (one tx):
391 INSERT workflow_runs (ON CONFLICT DO NOTHING)
392 INSERT workflow_jobs per parsed job
393 INSERT workflow_steps per parsed step
394 (commit)
395 checks.Create per job (post-tx, idempotent
396 via ExternalID 'workflow_run:<id>:job:<key>')
397 ```
398
399 ### Idempotency on the triggering event
400
401 The robust pattern, not a UNIQUE on `(repo_id, head_sha)`. Each
402 caller constructs a stable `trigger_event_id` from its triggering
403 event's identity:
404
405 | Caller | trigger_event_id format |
406 | ------------------- | ------------------------------------------------ |
407 | push_process | `push:<push_event_id>` |
408 | pulls.Create | `pr_opened:<pr_id>:<head_sha>` |
409 | pr_jobs.PRSynchronize | `pr_synchronize:<pr_id>:<head_sha>` |
410 | dispatch HTTP | `dispatch:<file>:<sha>:<8-byte-random-hex>` |
411 | schedule sweep (S41b-2) | `schedule:<workflow_id>:<window_start_unix>` |
412
413 Migration 0051 adds `workflow_runs.trigger_event_id` (text NOT NULL
414 DEFAULT '') with a partial UNIQUE on
415 `(repo_id, workflow_file, trigger_event_id) WHERE trigger_event_id <> ''`.
416 The trigger handler does `INSERT … ON CONFLICT DO NOTHING` so:
417
418 - Worker retries (the same push_process replay) → no duplicate runs.
419 - Admin replays via `shithubd admin run-job workflow:trigger ...`
420 → no duplicate runs.
421 - Re-runs (the future "Re-run" button) explicitly construct a NEW
422 trigger_event_id (`rerun:<original_run_id>:<request_uuid>`) and
423 chain back via `parent_run_id`. History is preserved, no
424 collision.
425
426 Each caller's collision-free namespace is short-lived and
427 human-debuggable: a Postgres operator can grep
428 `workflow_runs.trigger_event_id` to see exactly which triggering
429 event produced a given run.
430
431 ### Filter evaluation
432
433 `trigger.Match(workflow, event)` is a pure function (no I/O, no DB).
434 For each event kind:
435
436 - **push**: branch vs tag classified from the ref; only the matching
437 filter list applies (a `branches:` filter rejects tag pushes and
438 vice versa). `paths:` (when set) requires at least one changed
439 path to match. Empty filter = match-all.
440 - **pull_request**: `types:` defaults to
441 `[opened, synchronize, reopened]` when omitted (GHA parity).
442 `branches:` applies to the **base** ref. `paths:` as for push.
443 - **schedule**: requires the workflow to declare the cron expression
444 that fired. The sweep is the source of truth for which cron
445 fires; we just gate on declaration. Avoids interpreting cron
446 semantics in two places.
447 - **workflow_dispatch**: matches whenever the workflow declares
448 `on.workflow_dispatch`.
449
450 Glob semantics in `branches:`/`tags:`/`paths:`: minimatch subset
451 with `*` (single segment), `**` (any), `/**` end-anchor (optional
452 trailing path), `**/` start-anchor, and `!exclude` (last-match-wins,
453 exclusion-only list implies include-all).
454
455 ### Collaborator gate
456
457 Per the S41b spec's "external-PR support is parked" decision: PR
458 triggers (both `opened` and `synchronize`) only fire when the PR's
459 author is the repo's owning user. Conservative — drops legitimate
460 non-owner collaborators in the org-repo case. Expanding the gate
461 requires plumbing `policy.Can` into the worker context, which we
462 defer to S41g where the lifecycle work touches that surface anyway.
463
464 ### Operator surface
465
466 - `POST /{owner}/{repo}/actions/workflows/{file}/dispatches`
467 Body: `{"ref": "...", "inputs": {"key": "value"}}` (both optional;
468 ref defaults to the repo's default branch). Returns 204 No Content
469 on success. Synchronous trigger.Enqueue (no discovery — file is
470 named in the URL). Auth: requires repo write.
471
472 ### What S41b deliberately doesn't do
473
474 - Run jobs. S41c adds runner claim/status APIs; S41d adds the actual
475 `shithubd-runner` execution binary.
476 - Schedule sweep. Cron-driven triggers split into S41b-2 to keep
477 this PR reviewable; the trigger pipeline accepts schedule events,
478 but no caller produces them yet. S41b-2 adds the sweep + the
479 `robfig/cron/v3` dep + `shithubd-cron.service` wiring.
480 - External-PR triggers. Conservative collaborator gate above.
481 - `workflow_run` webhook events. S41h adds the webhook event family
482 + atom feed.
483
484 ## Secrets + variables settings surface (S41c)
485
486 S41c wires the previously schema-only `workflow_secrets` and
487 `actions_variables` tables into repo/org settings.
488
489 Repository routes are gated through
490 `policy.ActionRepoSettingsActions` (`repo:settings:actions`, admin
491 role minimum):
492
493 - `GET /{owner}/{repo}/settings/secrets/actions`
494 - `POST /{owner}/{repo}/settings/secrets/actions`
495 - `POST /{owner}/{repo}/settings/secrets/actions/{name}/delete`
496 - `GET /{owner}/{repo}/settings/variables/actions`
497 - `POST /{owner}/{repo}/settings/variables/actions`
498 - `POST /{owner}/{repo}/settings/variables/actions/{name}/delete`
499
500 Organization routes follow the existing org-settings prefix and are
501 owner-only:
502
503 - `GET /organizations/{org}/settings/secrets/actions`
504 - `POST /organizations/{org}/settings/secrets/actions`
505 - `POST /organizations/{org}/settings/secrets/actions/{name}/delete`
506 - `GET /organizations/{org}/settings/variables/actions`
507 - `POST /organizations/{org}/settings/variables/actions`
508 - `POST /organizations/{org}/settings/variables/actions/{name}/delete`
509
510 Secrets are sealed through `internal/auth/secretbox` using the
511 operator-managed `Auth.TOTPKeyB64` root key. Secret list pages render
512 names/metadata only; the plaintext value is accepted once on create or
513 rotation and never rendered back. Variables are non-secret plaintext
514 configuration, so settings pages render their values. Both stores use
515 the same name grammar as the database constraints:
516 `^[A-Za-z_][A-Za-z0-9_]*$`, 1-100 characters. Variables additionally
517 enforce the 4096-character value cap in Go before hitting the DB
518 constraint.
519
520 ## What S41a deliberately doesn't do
521
522 - No trigger pipeline. `domain_events` aren't matched against `on:`
523 yet — that's S41b.
524 - No runner. S41c/S41d add runner claim APIs and the execution binary.
525 - No UI. The Actions tab still renders the placeholder — S41f.
526 - No secret encryption helpers wired to anything writable — S41c.
527 - No JWT issuance, no runner registration flow — S41c.
528 - No log streaming, no SSE — S41d/f.
529 - No execution sandbox, no scrubbing, no injection guards
530 *enforced at the runner* — S41d/e (the parser-side taint contract
531 is the foundation those depend on, not a substitute).
532
533 ## Why these choices, in two paragraphs
534
535 The schema work is front-loaded so later sub-sprints don't ripple a
536 migration through every PR. `version` (optimistic locking) and
537 `run_index` (per-repo monotonic) are the two columns I'd flag to a
538 new maintainer immediately — both are nearly free to add up front
539 and painful to retrofit. The split between hot-path log chunks
540 (Postgres) and finalized blob (Spaces) is shaped after Forgejo's
541 log path; we pick the boring well-trodden answer over the clever
542 one because log throughput is the failure mode that bites first.
543
544 The taint contract is the security-load-bearing piece. Every later
545 sub-sprint trusts that the `Tainted` flag is set correctly here, in
546 the parser/evaluator, and never re-derived downstream. The narrow
547 allowlist of namespaces and functions exists exactly so a future PR
548 that adds, say, `fromJSON` has to do it knowingly — by widening the
549 allowlist in a visible diff, with a reviewer-required note, rather
550 than by accident. The `${{ github.* }}` alias is a pragmatic
551 concession to copy-paste users; the rebrand to `${{ shithub.* }}`
552 is the canonical form so future divergence isn't awkward.
553
554 ## See also
555
556 - `internal/actions/workflow/parse.go` — the parser
557 - `internal/actions/expr/eval.go` — the evaluator
558 - `internal/migrationsfs/migrations/0042..0049_*.sql` — the schema
559 - `tests/fixtures/workflows/*.yml` — canonical input shapes
560 - `internal/actions/workflow/parse_test.go` — fixture-driven tests
561 - `internal/actions/expr/eval_test.go` — taint-contract tests
562 - `.refs/forgejo/services/actions/` — reference architecture
563 - Campaign plan in conversation memory (humble-cooking-bunny)