Actions/CI — schema + workflow dialect (S41a)
The Actions/CI subsystem is shipping in eight sub-sprints (S41a through S41h, plus optional S41i Nix engine). This doc covers what S41a lays down: the SQL schema, the workflow YAML dialect, the expression evaluator, and the load-bearing taint contract every later sub-sprint depends on.
S41a is parser + schema only — no triggers, no runner, no UI. The goal is to land a frozen contract that S41b/c/d/e can build against without churning under them.
SQL schema
Actions migrations currently span 0042–0051, 0053, 0057, 0060, and 0064–0067. Migration 0052 belongs to the repo source-remotes feature, 0054 belongs to push event protocol tracking, 0055 belongs to the social feed, 0056 belongs to user profile contribution settings, 0058 belongs to repo name reuse, and 0059 belongs to GitHub org imports.
| # | Table | Purpose |
|---|---|---|
| 0042 | workflow_runs |
One row per triggered workflow execution |
| 0043 | workflow_jobs |
Jobs within a run (one row per jobs.<key>) |
| 0044 | workflow_steps |
Steps within a job (one row per steps[i]) |
| 0045 | workflow_secrets |
Per-repo + per-org encrypted secrets |
| 0046 | workflow_runners |
Registered runners + runner_tokens |
| 0047 | workflow_step_log_chunks |
Hot-path append log buffer (concatenated to blob on finalize) |
| 0048 | workflow_artifacts |
Per-run artifact metadata (90-day default expiry) |
| 0049 | actions_variables |
Non-secret per-repo/org config (Forgejo parity) |
| 0050 | workflow_steps.step_with |
Parsed with: inputs for magic uses: aliases |
| 0051 | workflow_runs.trigger_event_id |
Trigger idempotency for retries/admin replays |
| 0053 | runner_jwt_used |
Single-use replay gate for runner job JWTs |
| 0057 | workflow_job_secret_masks |
Encrypted claim-time log mask snapshots per job |
| 0060 | Actions retention indexes | Narrow cleanup indexes for terminal steps/runs |
| 0066 | actions_*_policies, workflow_run_approvals |
Enablement, runner-pool caps, and approval decisions |
| 0067 | workflow_runners ops state |
Host/version metadata, drain state, and hard revocation state |
A few load-bearing choices, called out so they're easy to spot in a later schema diff:
workflow_runs.run_index— per-repo monotonic counter. Each repo gets#1,#2, … so URLs like/{owner}/{repo}/actions/runs/42are stable and human-friendly. Crib from Forgejo'sactions_run.index.workflow_runs.version— optimistic-lock counter. Mutators bump-and-check rather thanSELECT … FOR UPDATE. Required for S41g's race between a cancel request and a state transition.workflow_runs.concurrency_group— the concurrency-slot key, resolved at trigger time from the workflow'sconcurrency.group:expression. S41g's slot manager keys off this column and runner claim blocks younger runs while an older same-group run still has a queued/running job withoutcancel_requested=true.workflow_runs.parent_run_id— for re-runs. The new run references the original; the UI shows a "re-ran from #N" link.workflow_jobs.runner_id— FK added in 0046 (after the runners table exists). Nullable until claimed.workflow_stepshas a CHECK constraint enforcing(run_command IS NOT NULL) <> (uses_alias IS NOT NULL)— exactly one ofrun:oruses:. Theuses_aliascolumn is further CHECK-constrained to the three magic aliases we accept in v1.workflow_secretsowns its value asbyteaChaCha20Poly1305- sealed viainternal/auth/secretbox. Key derivation usescfg.Auth.TOTPKeyB64(already an operator-managed root) +(owner, kind, name)salt so re-keying is per-row.workflow_step_log_chunks.chunkis capped at 512 KB per row. The runner sends bigger payloads in pieces.(step_id, seq)is UNIQUE so duplicate sends are idempotent.actions_variables— non-secret, plaintext, scoped exactly like secrets (per-repo or per-org, never both on the same row). Forgejo has the same split; we mirror it for parity.runner_jwt_used— primary-keyed by JWTjti. Job endpoints insert into this table during auth; zero inserted rows means replay and the API returns 401. JWTs are HMAC-SHA256 and use an HKDF subkey derived fromauth.totp_key_b64with labelactions-runner-jwt-v1.workflow_job_secret_masks— one encrypted JSON array of exact secret values per claimed job. It snapshots the log scrub set at claim time, preventing a rotated or deleted secret from disappearing from server-side masking while the old value is still in a runner's job payload.actions_site_policy,actions_org_policies,actions_repo_policies— inherited Actions enablement and abuse caps. Runner claim and trigger enqueue both read the effective policy: repo override, then org override, then site default.workflow_run_approvals— one approval-decision row for every run whoseworkflow_runs.need_approvalflag is set. Approval records the maintainer and lets runner heartbeats claim the existing queued jobs; rejection completes the run withaction_required.
The version and run_index patterns are the two pieces I'd point
out to a future maintainer first. Both are cheap to add now and
miserable to retrofit later.
Workflow YAML dialect (v1)
We accept a strict subset of GitHub Actions YAML. The parser rejects unknown keys at parse time so workflow authors find their typos immediately instead of shipping a workflow that does nothing.
Top level
name: my-pipeline # optional human name
on: [push, pull_request] # or full-form (see below)
permissions: read-all # default if omitted
env: { GREETING: "hello" } # workflow-level env
concurrency: # optional slot manager
group: ${{ shithub.ref }}
cancel-in-progress: true
jobs:
<key>: # 1+ entries
runs-on: ubuntu-latest
needs: [other-key] # optional dep edge
if: ${{ shithub.actor == 'alice' }} # optional gate
timeout-minutes: 60 # 1..4320, default 360
permissions: { contents: read } # narrow workflow perms
env: { K: v } # job overlay
steps:
- name: ...
id: ...
if: ...
run: echo hi # run XOR uses
uses: actions/checkout@v4 # exactly one of three aliases
working-directory: ...
env: { ... }
continue-on-error: false
Triggers (on:)
v1 supports four triggers — anything else is a parse error.
| Trigger | Surface |
|---|---|
push |
branches:, tags:, paths: (include + !exclude semantics) |
pull_request |
types: (opened/synchronize/reopened/...), branches:, paths: |
schedule |
one or more - cron: <5-field-expr> |
workflow_dispatch |
inputs: map (string/boolean/choice/environment) |
uses: allowlist
Exactly three aliases are reserved at parse time, no exceptions:
| Alias | Parser status | Runner status |
|---|---|---|
actions/checkout@v4 |
accepted | executable with scoped checkout token |
shithub/upload-artifact@v1 |
accepted | rejected until artifact upload lands |
shithub/download-artifact@v1 |
accepted | rejected until artifact download lands |
Any other uses: value (community actions, Docker images, composite
actions) is an Error-severity diagnostic. The marketplace problem is
explicitly out of scope for v1; revisit only if a real demand exists
and we have an answer for supply-chain trust.
The current Docker executor runs actions/checkout@v4 and run: steps.
Checkout happens on the runner host before a containerized step mounts the
workspace. The server issues a short-lived checkout-purpose JWT scoped to
the claimed repository and running job; the smart-HTTP handler accepts it
only for read-only git-upload-pack. Artifact transfer remains explicit
follow-up work, and the artifact aliases fail deliberately until that path
exists.
Checkout v1 accepts only with.fetch-depth. The default is a depth-1 fetch
of the workflow run's head_sha; fetch-depth: 0 requests full history.
Submodules, LFS, path, persisted credentials, and marketplace actions are
rejected because they are not part of this dialect yet.
File-size + parser caps
- 64 KB workflow file size cap (
workflow.MaxWorkflowFileBytes). Files larger than this are rejected before YAML decode begins — defends against pathological inputs and gives operators a predictable upper bound on parser memory. - 100 anchors per document (
workflow.MaxYAMLAliases) — the billion-laughs guard. yaml.v3 doesn't expose a direct knob; we count alias nodes during a tree walk and bail.
${{ github.* }} alias
The dialect is intentionally rebranded to ${{ shithub.* }}.
Authors who paste GHA workflows in unmodified will see their
${{ github.* }} references continue to work because the evaluator
rewrites path[0] from github to shithub at the top of evalRef
before taint computation, dispatch, and error rendering.
The alias is intentionally scope-narrow: only fields that exist
in our shithub.* namespace (run_id, sha, ref, actor,
event) route through. GHA fields we don't expose in v1 —
event_name, repository, run_number, workspace, etc. — error
with the canonical unknown shithub field "X" message. Slightly
confusing for a GHA-flavored author but keeps the v1 namespace
surface tight.
The alias preserves the load-bearing taint flag: github.event.X
taints exactly like shithub.event.X. TestEval_GithubAliasIsTainted
pins this contract.
Migration to strict-compat (drop the alias entirely) later is a one-PR flip; moving the other direction is much harder.
This is a deliberate decision recorded in the campaign plan.
Expression evaluator
${{ … }} expressions are parsed into a tiny AST and evaluated by
internal/actions/expr. The surface is intentionally minimal:
Allowed namespaces
| Namespace | Source | Tainted? |
|---|---|---|
secrets.X |
workflow_secrets | no, but sensitive |
vars.X |
actions_variables | no (operator-controlled) |
env.X |
workflow file | no (workflow author's text) |
shithub.run_id |
dispatch context | no |
shithub.sha |
dispatch context | no |
shithub.ref |
dispatch context | no |
shithub.actor |
dispatch context | no (resolved username) |
shithub.event.* |
trigger payload | yes — always |
runner.*, steps.*, needs.*, matrix.*, inputs.* are all
parse-time errors. They're parked for v2 and the parser's
allowlist-closed posture means a future PR can't widen this
accidentally without a clearly visible diff.
Allowed functions
contains(haystack, needle), startsWith(s, prefix),
endsWith(s, suffix), plus the four job-status predicates
success(), failure(), cancelled(), always(). That's the
whole list. fromJSON, hashFiles, toJSON, format, and
friends are explicitly rejected — they each carry footgun risk
(parser DoS, FS access, side-channel injection) that we don't want
to take on in v1.
Missing-value semantics
| Reference | Missing → ? |
|---|---|
secrets.NOT_BOUND |
error (loud — workflow won't run) |
vars.MISSING |
empty string (GHA parity) |
env.MISSING |
empty string (GHA parity) |
shithub.event.deeply.missing |
null but still tainted |
The "missing event path → null but tainted" case is a defence-in- depth choice: even if the path doesn't resolve, the result still came from the event payload, and we'd rather over-flag than under.
Taint contract — the load-bearing piece
This is the contract every later sub-sprint hangs off. Get it wrong and we have an injection-shaped hole in the runner.
Where the flag lives
The taint flag lives on expr.Value (the evaluator-produced value),
not workflow.Value (the parser-produced value). Two different
structs share the name Value because they live in different
packages, but they have different jobs:
workflow.Valuecarries the raw source string the parser read out of the YAML (an env entry, awith:input, a concurrency group expression). At parse time we don't know what the${{ … }}body will resolve to, so there's nothing to taint yet.expr.Valueis what the evaluator returns when it resolves a reference at runtime. This struct carriesTainted bool. The runner's exec layer (S41d) consumes that flag.
Pre-L5 the parser-side struct also had a Tainted bool field plus a
Tainted() constructor — both unused, both confusing because they
suggested two sources of truth. Dropped in S41a-L5 cleanup.
Propagation
Every expr.Value carries a Tainted bool. Set true iff the
value transitively depends on shithub.event.*. Operators control
secrets, vars, env, the rest of shithub.*. Authors control the
workflow file. Only the event payload is attacker-controlled: a
PR title, a commit message, a branch name from a fork. Those values
must never be interpolated into a shell string.
Propagation rules:
- Reading
shithub.event.X→Tainted: true(always, including missing-path null results). - Reading
secrets.X→Sensitive: true. Secrets are operator- controlled, so they are not tainted, but they must not appear in shell source strings or Docker argv. - Reading any other namespace →
Tainted: falseandSensitive: false, exceptenv.Xpreserves both flags of the resolved env value. This closes the escape where an event-derived or secret-derived value is first assigned to env and then interpolated through${{ env.X }}. - Binary op (
==,!=,&&,||) → tainted or sensitive if either operand is. - Unary op (
!) → tainted/sensitive iff its operand is. - Function call (
contains,startsWith,endsWith) → tainted or sensitive if any argument is.
The runner consumes Tainted and Sensitive and refuses to interpolate
either class into shell strings. Instead, those values are bound to
runner-owned SHITHUB_INPUT_xx envvars and the shell source only
references those placeholders. The author writes:
- run: echo "PR title was: ${{ shithub.event.pull_request.title }}"
The runner sees a tainted reference; it compiles the step to:
SHITHUB_INPUT_0="$user_pr_title" exec sh -c 'echo "PR title was: $SHITHUB_INPUT_0"'
…where $user_pr_title is set via Go's cmd.Env, never inserted into
the shell source string or Docker CLI argv. Backticks, $(), ;,
&& — none of those work as command-injection vectors when the value
reaches the shell as environment data instead of syntax.
The shared renderer lives in internal/runner/exec, so future engines
consume the same injection boundary instead of reimplementing it. The
runner claim payload includes workflow_runs.event_payload; without
that field, the runner cannot evaluate and taint
${{ shithub.event.* }} references.
Tests for this contract live in internal/actions/expr/eval_test.go,
internal/runner/exec/render_test.go, and
internal/runner/engine/docker_test.go. Do not weaken them in a
later PR without an audit-checkpoint review — they're explicitly
load-bearing for S41e's threat model.
Runner log chunks pass through internal/runner/scrub before they are
posted to the API. It masks exact secret values and preserves enough
tail bytes between chunks to catch a secret split across chunk
boundaries. S41e wires resolved workflow secrets into the runner claim
payload and mask set, snapshots that mask set encrypted on the job, then
applies the same exact-value scrub again in the runner API before
persisting chunks. The server path also carries a possible secret-prefix
tail from the prior persisted chunk, so a runner that bypasses
client-side scrubbing cannot leak a secret by splitting it across
adjacent log POSTs.
shithub.event payload schema (v1)
The event payload is the most user-facing part of the contract: once
authors write workflows that template against shithub.event.X,
schema changes are breaking. The v1 schema is pinned and labelled
v1. Any addition is fine; renames and removals require a major
bump.
The schema is enforced by typed constructors in the
internal/actions/event package — one per trigger. S41b's pipeline
calls these to build payloads; the function signatures pin the
field set so adding a key requires editing the constructor in a
visible diff. This is the same closed-door discipline as the
expression evaluator's namespace allowlist.
| Trigger | Constructor | Top-level keys |
|---|---|---|
push |
event.Push |
ref, before, after, head_commit{message,id,author} |
pull_request |
event.PullRequest |
action, number, pull_request{title,head{ref,sha},base{ref,sha},user{login}} |
schedule |
event.Schedule |
(empty map — cron fired; cron expression is on the workflow_runs row) |
workflow_dispatch |
event.WorkflowDispatch |
inputs{<name>: <stringified>} |
Anything not in this table doesn't exist in v1. Accessing it returns null+tainted (the missing-path semantics above).
Adding a field: edit the constructor in internal/actions/event/,
add a row to this doc, and update the corresponding *_FlowsThroughEvaluator
test in event_test.go so the new path is exercised end-to-end.
Reviewer-required note in the commit message — same standard as a
new evaluator function.
Renaming or removing: that's a v1→v2 break. Don't.
Operator surface
shithubd admin actions parse <file> reads a workflow off disk,
runs the parser, and dumps diagnostics + a canonical JSON rendering
of the parsed AST. Useful for:
- debugging "why is my workflow not picking up changes" reports
- validating a workflow file before committing it
- producing a stable AST snapshot for inclusion in bug reports
Exit codes:
| Code | Meaning |
|---|---|
| 0 | clean parse, no Error-severity diagnostics |
| 1 | file unreadable, oversized, or YAML malformed |
| 2 | parse produced Error-severity diagnostics |
Other admin surfaces are scoped to later sub-sprints:
- S41c:
shithubd admin runner register --name <foo>issues a registration token + writes a row toworkflow_runners. - S41j:
shithubd admin runner drain|undrain|rotate-token|revoke|cleanup-stalegives operators pool controls. Drained runners keep heartbeating and may finish already claimed jobs but receive no new claims. Revoked runners are set offline, all registration tokens are revoked, and job API JWTs from that runner are rejected even if the runner still has an old config file. - S41g:
POST /api/v1/jobs/{id}/canceland the repository run-detail UI request cancellation. Running jobs flipcancel_requested; queued jobs are made terminal immediately. - S41g:
POST /api/v1/runs/{id}/rerunand the repository run-detail UI re-run completed/cancelled runs. Re-runs read the workflow YAML from the original run'shead_sha, create a fresh queuedworkflow_runsrow, and setparent_run_idto the source run. - S41g: workflow-level
concurrency.groupis resolved at enqueue time against the trigger context (shithub.ref,shithub.sha, andshithub.event.*). Withcancel-in-progress: true, enqueue requests cancellation for older active runs in the same group. Without it, runner claim leaves the younger run queued until the older run no longer has uncancelled queued/running jobs. - S41g:
workflow:cleanupis a daily retention worker enqueued byshithubd-cron.service. Operators can run it manually withshithubd admin run-job workflow:cleanup.
Workflow concurrency (S41g)
concurrency.group is a workflow-level slot key. The parser stores the
raw value, and internal/actions/concurrency evaluates ${{ ... }}
fragments when the run is enqueued. The trigger-time context deliberately
does not include secrets; event-derived values may be tainted but are
safe here because the value is only used as a database key.
When a run enters a non-empty group:
cancel-in-progress: falseleaves the new run queued behind older same-repo, same-group runs while those older runs still have queued/running jobs withcancel_requested=false.cancel-in-progress: truerequests cancellation on those older jobs. Queued jobs become terminal immediately; running jobs keep running withcancel_requested=trueso the runner can kill the active container. Once every active older job is cancel-requested, the group is released for the newer run.
The runner claim query enforces the queueing rule, not the web handler or UI. This keeps heartbeat races honest: multiple runners can poll at the same time, but only jobs whose dependency and concurrency blockers are clear can be claimed.
Runner timeouts (S41g)
jobs.<key>.timeout-minutes is enforced by shithubd-runner as a
whole-job deadline. The parser stores the value in
workflow_jobs.timeout_minutes with the GitHub-compatible default of
360 minutes and a 1..4320 cap.
When the deadline expires, the Docker engine explicitly kills the
active step container, emits a terminal step update with
status=completed and conclusion=timed_out, and the runner reports
the job itself as completed/timed_out. The server rolls the parent
workflow run up to timed_out when all jobs are terminal. A timed-out
step is not masked by continue-on-error; the job deadline always wins.
The runner API increments shithub_actions_step_timeouts_total the
first time a step reaches conclusion=timed_out. Duplicate terminal
step-status retries do not increment the counter again.
Retention cleanup (S41g)
workflow:cleanup applies the durable Actions retention contract in
this order:
- Delete hot
workflow_step_log_chunksfor steps completed more than 7 days ago. Finalized logs already live in object storage. - Delete expired
workflow_artifactsrows after deleting theiractions/runs/...blob objects. The row'sexpires_atvalue is authoritative so per-upload retention overrides keep working. - Delete unpinned terminal
workflow_runsolder than 365 days. Child jobs, steps, artifacts, and consumed JWT rows cascade through FK ownership. - Delete consumed
runner_jwt_usedrows whose JWT expiry is more than 30 days old. This preserves replay/audit evidence for recent jobs without letting the replay table grow forever.
The defaults can be overridden in the worker payload:
{"step_log_chunk_days":7,"run_days":365,"jwt_used_days":30,"artifact_batch":1000}
artifact_batch caps each object-delete page and may not exceed 10000.
Negative values are poison-job errors. The worker exports
shithub_actions_runs_pruned_total{kind} where kind is one of
chunks, blobs, runs, or jwt_used.
Production object storage also needs provider-side lifecycle on the
same prefix: deploy/spaces/actions-lifecycle.json expires
actions/runs/ objects after 90 days and aborts stale multipart
uploads after 2 days. Apply it with
deploy/cutover/apply-actions-lifecycle.sh.
Trigger pipeline (S41b)
Three layers between a triggering event and a queued workflow_run:
caller (push_process / pulls.Create / pr_jobs.PRSynchronize / dispatch HTTP)
│
└─► worker.Enqueue(KindWorkflowTrigger, JobPayload)
│
└─► trigger.Handler picks up:
Discover .shithub/workflows/*.yml at HEAD SHA
Parse each (skip + log on Error diagnostics)
Match each against trigger.Event
Enqueue each match
│
└─► trigger.Enqueue (one tx):
INSERT workflow_runs (ON CONFLICT DO NOTHING)
INSERT workflow_jobs per parsed job
INSERT workflow_steps per parsed step
(commit)
checks.Create per job (post-tx, idempotent
via ExternalID 'workflow_run:<id>:job:<key>')
Idempotency on the triggering event
The robust pattern, not a UNIQUE on (repo_id, head_sha). Each
caller constructs a stable trigger_event_id from its triggering
event's identity:
| Caller | trigger_event_id format |
|---|---|
| push_process | push:<push_event_id> |
| pulls.Create | pr_opened:<pr_id>:<head_sha> |
| pr_jobs.PRSynchronize | pr_synchronize:<pr_id>:<head_sha> |
| dispatch HTTP | dispatch:<file>:<sha>:<8-byte-random-hex> |
| schedule sweep (S41b-2) | schedule:<workflow_id>:<window_start_unix> |
Migration 0051 adds workflow_runs.trigger_event_id (text NOT NULL
DEFAULT '') with a partial UNIQUE on
(repo_id, workflow_file, trigger_event_id) WHERE trigger_event_id <> ''.
The trigger handler does INSERT … ON CONFLICT DO NOTHING so:
- Worker retries (the same push_process replay) → no duplicate runs.
- Admin replays via
shithubd admin run-job workflow:trigger ...→ no duplicate runs. - Re-runs explicitly construct a NEW
trigger_event_id (
rerun:<original_run_id>:<request_uuid>) and chain back viaparent_run_id. History is preserved, no collision.
Each caller's collision-free namespace is short-lived and
human-debuggable: a Postgres operator can grep
workflow_runs.trigger_event_id to see exactly which triggering
event produced a given run.
Filter evaluation
trigger.Match(workflow, event) is a pure function (no I/O, no DB).
For each event kind:
- push: branch vs tag classified from the ref; only the matching
filter list applies (a
branches:filter rejects tag pushes and vice versa).paths:(when set) requires at least one changed path to match. Empty filter = match-all. - pull_request:
types:defaults to[opened, synchronize, reopened]when omitted (GHA parity).branches:applies to the base ref.paths:as for push. - schedule: requires the workflow to declare the cron expression that fired. The sweep is the source of truth for which cron fires; we just gate on declaration. Avoids interpreting cron semantics in two places.
- workflow_dispatch: matches whenever the workflow declares
on.workflow_dispatch.
Glob semantics in branches:/tags:/paths:: minimatch subset
with * (single segment), ** (any), /** end-anchor (optional
trailing path), **/ start-anchor, and !exclude (last-match-wins,
exclusion-only list implies include-all).
Collaborator gate
Per the S41b spec's "external-PR support is parked" decision: PR
triggers (both opened and synchronize) only fire when the PR's
author is the repo's owning user. Conservative — drops legitimate
non-owner collaborators in the org-repo case. Expanding the gate
requires plumbing policy.Can into the worker context, which we
defer to S41g where the lifecycle work touches that surface anyway.
Operator surface
POST /{owner}/{repo}/actions/workflows/{file}/dispatchesBody:{"ref": "...", "inputs": {"key": "value"}}(both optional; ref defaults to the repo's default branch). Returns 204 No Content on success. Synchronous trigger.Enqueue (no discovery — file is named in the URL). Auth: requires repo write.GET /{owner}/{repo}/actions.atomReturns the last 50 workflow runs as an Atom feed. Auth and visibility match the Actions tab (repo:read). Entries link to/{owner}/{repo}/actions/runs/{run_index}and include the workflow name/path, event, branch, short SHA, status, and conclusion.
Webhook events (S41h)
Actions emits webhook-facing domain events through notif.EmitTx on
state transitions:
workflow_run, withpayload.actionset toqueued,running, orcompleted(completedmay carryconclusion:"cancelled").workflow_job, withpayload.actionset toqueued,running,completed, orcancelled.
Payloads are structural snapshots only. They include ids, run index,
workflow path/name, head SHA/ref, event kind, status, conclusion,
timestamps, job key/name/runner id, needs, timeout, and cancellation
state. They deliberately exclude workflow_runs.event_payload, env,
permissions, logs, runner JWTs, and secret values. This keeps the
webhook surface stable without turning arbitrary workflow input into
subscriber-facing data.
What S41b deliberately doesn't do
- Run jobs. S41c adds runner claim/status APIs; S41d adds the actual
shithubd-runnerexecution binary. - Schedule sweep. Cron-driven triggers split into S41b-2 to keep
this PR reviewable; the trigger pipeline accepts schedule events,
but no caller produces them yet. S41b-2 adds the sweep + the
robfig/cron/v3dep +shithubd-cron.servicewiring. - External-PR triggers. Conservative collaborator gate above.
Secrets + variables settings surface (S41c)
S41c wires the previously schema-only workflow_secrets and
actions_variables tables into repo/org settings.
Repository routes are gated through
policy.ActionRepoSettingsActions (repo:settings:actions, admin
role minimum):
GET /{owner}/{repo}/settings/secrets/actionsPOST /{owner}/{repo}/settings/secrets/actionsPOST /{owner}/{repo}/settings/secrets/actions/{name}/deleteGET /{owner}/{repo}/settings/variables/actionsPOST /{owner}/{repo}/settings/variables/actionsPOST /{owner}/{repo}/settings/variables/actions/{name}/delete
Organization routes follow the existing org-settings prefix and are owner-only:
GET /organizations/{org}/settings/secrets/actionsPOST /organizations/{org}/settings/secrets/actionsPOST /organizations/{org}/settings/secrets/actions/{name}/deleteGET /organizations/{org}/settings/variables/actionsPOST /organizations/{org}/settings/variables/actionsPOST /organizations/{org}/settings/variables/actions/{name}/delete
Secrets are sealed through internal/auth/secretbox using the
operator-managed Auth.TOTPKeyB64 root key. Secret list pages render
names/metadata only; the plaintext value is accepted once on create or
rotation and never rendered back. Variables are non-secret plaintext
configuration, so settings pages render their values. Both stores use
the same name grammar as the database constraints:
^[A-Za-z_][A-Za-z0-9_]*$, 1-100 characters. Variables additionally
enforce the 4096-character value cap in Go before hitting the DB
constraint.
What S41a deliberately doesn't do
- No trigger pipeline.
domain_eventsaren't matched againston:yet — that's S41b. - No runner. S41c/S41d add runner claim APIs and the execution binary.
- No UI. The Actions tab still renders the placeholder — S41f.
- No secret encryption helpers wired to anything writable — S41c.
- No JWT issuance, no runner registration flow — S41c.
- No log streaming, no SSE — S41d/f.
- No execution sandbox, no scrubbing, no injection guards enforced at the runner — S41d/e (the parser-side taint contract is the foundation those depend on, not a substitute).
Why these choices, in two paragraphs
The schema work is front-loaded so later sub-sprints don't ripple a
migration through every PR. version (optimistic locking) and
run_index (per-repo monotonic) are the two columns I'd flag to a
new maintainer immediately — both are nearly free to add up front
and painful to retrofit. The split between hot-path log chunks
(Postgres) and finalized blob (Spaces) is shaped after Forgejo's
log path; we pick the boring well-trodden answer over the clever
one because log throughput is the failure mode that bites first.
The taint contract is the security-load-bearing piece. Every later
sub-sprint trusts that the Tainted flag is set correctly here, in
the parser/evaluator, and never re-derived downstream. The narrow
allowlist of namespaces and functions exists exactly so a future PR
that adds, say, fromJSON has to do it knowingly — by widening the
allowlist in a visible diff, with a reviewer-required note, rather
than by accident. The ${{ github.* }} alias is a pragmatic
concession to copy-paste users; the rebrand to ${{ shithub.* }}
is the canonical form so future divergence isn't awkward.
See also
internal/actions/workflow/parse.go— the parserinternal/actions/expr/eval.go— the evaluatorinternal/migrationsfs/migrations/0042..0049_*.sql— the schematests/fixtures/workflows/*.yml— canonical input shapesinternal/actions/workflow/parse_test.go— fixture-driven testsinternal/actions/expr/eval_test.go— taint-contract tests.refs/forgejo/services/actions/— reference architecture- Campaign plan in conversation memory (humble-cooking-bunny)
View source
| 1 | # Actions/CI — schema + workflow dialect (S41a) |
| 2 | |
| 3 | The Actions/CI subsystem is shipping in eight sub-sprints (S41a through |
| 4 | S41h, plus optional S41i Nix engine). This doc covers what S41a lays |
| 5 | down: the SQL schema, the workflow YAML dialect, the expression |
| 6 | evaluator, and the load-bearing taint contract every later sub-sprint |
| 7 | depends on. |
| 8 | |
| 9 | S41a is parser + schema only — no triggers, no runner, no UI. The |
| 10 | goal is to land a frozen contract that S41b/c/d/e can build against |
| 11 | without churning under them. |
| 12 | |
| 13 | ## SQL schema |
| 14 | |
| 15 | Actions migrations currently span 0042–0051, 0053, 0057, 0060, and 0064–0067. |
| 16 | Migration 0052 belongs to the repo source-remotes feature, 0054 |
| 17 | belongs to push event protocol tracking, 0055 belongs to the social |
| 18 | feed, 0056 belongs to user profile contribution settings, 0058 belongs |
| 19 | to repo name reuse, and 0059 belongs to GitHub org imports. |
| 20 | |
| 21 | | # | Table | Purpose | |
| 22 | | ----- | --------------------------- | ------------------------------------------------------------- | |
| 23 | | 0042 | `workflow_runs` | One row per triggered workflow execution | |
| 24 | | 0043 | `workflow_jobs` | Jobs within a run (one row per `jobs.<key>`) | |
| 25 | | 0044 | `workflow_steps` | Steps within a job (one row per `steps[i]`) | |
| 26 | | 0045 | `workflow_secrets` | Per-repo + per-org encrypted secrets | |
| 27 | | 0046 | `workflow_runners` | Registered runners + `runner_tokens` | |
| 28 | | 0047 | `workflow_step_log_chunks` | Hot-path append log buffer (concatenated to blob on finalize) | |
| 29 | | 0048 | `workflow_artifacts` | Per-run artifact metadata (90-day default expiry) | |
| 30 | | 0049 | `actions_variables` | Non-secret per-repo/org config (Forgejo parity) | |
| 31 | | 0050 | `workflow_steps.step_with` | Parsed `with:` inputs for magic `uses:` aliases | |
| 32 | | 0051 | `workflow_runs.trigger_event_id` | Trigger idempotency for retries/admin replays | |
| 33 | | 0053 | `runner_jwt_used` | Single-use replay gate for runner job JWTs | |
| 34 | | 0057 | `workflow_job_secret_masks` | Encrypted claim-time log mask snapshots per job | |
| 35 | | 0060 | Actions retention indexes | Narrow cleanup indexes for terminal steps/runs | |
| 36 | | 0066 | `actions_*_policies`, `workflow_run_approvals` | Enablement, runner-pool caps, and approval decisions | |
| 37 | | 0067 | `workflow_runners` ops state | Host/version metadata, drain state, and hard revocation state | |
| 38 | |
| 39 | A few load-bearing choices, called out so they're easy to spot in a |
| 40 | later schema diff: |
| 41 | |
| 42 | - **`workflow_runs.run_index`** — per-repo monotonic counter. Each |
| 43 | repo gets `#1`, `#2`, … so URLs like |
| 44 | `/{owner}/{repo}/actions/runs/42` are stable and human-friendly. |
| 45 | Crib from Forgejo's `actions_run.index`. |
| 46 | - **`workflow_runs.version`** — optimistic-lock counter. Mutators |
| 47 | bump-and-check rather than `SELECT … FOR UPDATE`. Required for |
| 48 | S41g's race between a cancel request and a state transition. |
| 49 | - **`workflow_runs.concurrency_group`** — the concurrency-slot key, |
| 50 | resolved at trigger time from the workflow's `concurrency.group:` |
| 51 | expression. S41g's slot manager keys off this column and runner |
| 52 | claim blocks younger runs while an older same-group run still has a |
| 53 | queued/running job without `cancel_requested=true`. |
| 54 | - **`workflow_runs.parent_run_id`** — for re-runs. The new run |
| 55 | references the original; the UI shows a "re-ran from #N" link. |
| 56 | - **`workflow_jobs.runner_id`** — FK added in 0046 (after the |
| 57 | runners table exists). Nullable until claimed. |
| 58 | - **`workflow_steps`** has a CHECK constraint enforcing |
| 59 | `(run_command IS NOT NULL) <> (uses_alias IS NOT NULL)` — exactly |
| 60 | one of `run:` or `uses:`. The `uses_alias` column is further |
| 61 | CHECK-constrained to the three magic aliases we accept in v1. |
| 62 | - **`workflow_secrets`** owns its value as `bytea` ChaCha20Poly1305- |
| 63 | sealed via `internal/auth/secretbox`. Key derivation uses |
| 64 | `cfg.Auth.TOTPKeyB64` (already an operator-managed root) + |
| 65 | `(owner, kind, name)` salt so re-keying is per-row. |
| 66 | - **`workflow_step_log_chunks.chunk`** is capped at 512 KB per row. |
| 67 | The runner sends bigger payloads in pieces. `(step_id, seq)` is |
| 68 | UNIQUE so duplicate sends are idempotent. |
| 69 | - **`actions_variables`** — non-secret, plaintext, scoped exactly |
| 70 | like secrets (per-repo or per-org, never both on the same row). |
| 71 | Forgejo has the same split; we mirror it for parity. |
| 72 | - **`runner_jwt_used`** — primary-keyed by JWT `jti`. Job endpoints |
| 73 | insert into this table during auth; zero inserted rows means replay |
| 74 | and the API returns 401. JWTs are HMAC-SHA256 and use an HKDF |
| 75 | subkey derived from `auth.totp_key_b64` with label |
| 76 | `actions-runner-jwt-v1`. |
| 77 | - **`workflow_job_secret_masks`** — one encrypted JSON array of exact |
| 78 | secret values per claimed job. It snapshots the log scrub set at |
| 79 | claim time, preventing a rotated or deleted secret from disappearing |
| 80 | from server-side masking while the old value is still in a runner's |
| 81 | job payload. |
| 82 | - **`actions_site_policy`, `actions_org_policies`, |
| 83 | `actions_repo_policies`** — inherited Actions enablement and abuse |
| 84 | caps. Runner claim and trigger enqueue both read the effective policy: |
| 85 | repo override, then org override, then site default. |
| 86 | - **`workflow_run_approvals`** — one approval-decision row for every run |
| 87 | whose `workflow_runs.need_approval` flag is set. Approval records the |
| 88 | maintainer and lets runner heartbeats claim the existing queued jobs; |
| 89 | rejection completes the run with `action_required`. |
| 90 | |
| 91 | The `version` and `run_index` patterns are the two pieces I'd point |
| 92 | out to a future maintainer first. Both are cheap to add now and |
| 93 | miserable to retrofit later. |
| 94 | |
| 95 | ## Workflow YAML dialect (v1) |
| 96 | |
| 97 | We accept a strict subset of GitHub Actions YAML. The parser rejects |
| 98 | unknown keys at parse time so workflow authors find their typos |
| 99 | immediately instead of shipping a workflow that does nothing. |
| 100 | |
| 101 | ### Top level |
| 102 | |
| 103 | ```yaml |
| 104 | name: my-pipeline # optional human name |
| 105 | on: [push, pull_request] # or full-form (see below) |
| 106 | permissions: read-all # default if omitted |
| 107 | env: { GREETING: "hello" } # workflow-level env |
| 108 | concurrency: # optional slot manager |
| 109 | group: ${{ shithub.ref }} |
| 110 | cancel-in-progress: true |
| 111 | jobs: |
| 112 | <key>: # 1+ entries |
| 113 | runs-on: ubuntu-latest |
| 114 | needs: [other-key] # optional dep edge |
| 115 | if: ${{ shithub.actor == 'alice' }} # optional gate |
| 116 | timeout-minutes: 60 # 1..4320, default 360 |
| 117 | permissions: { contents: read } # narrow workflow perms |
| 118 | env: { K: v } # job overlay |
| 119 | steps: |
| 120 | - name: ... |
| 121 | id: ... |
| 122 | if: ... |
| 123 | run: echo hi # run XOR uses |
| 124 | uses: actions/checkout@v4 # exactly one of three aliases |
| 125 | working-directory: ... |
| 126 | env: { ... } |
| 127 | continue-on-error: false |
| 128 | ``` |
| 129 | |
| 130 | ### Triggers (`on:`) |
| 131 | |
| 132 | v1 supports four triggers — anything else is a parse error. |
| 133 | |
| 134 | | Trigger | Surface | |
| 135 | | ------------------- | ---------------------------------------------------------------- | |
| 136 | | `push` | `branches:`, `tags:`, `paths:` (include + `!exclude` semantics) | |
| 137 | | `pull_request` | `types:` (opened/synchronize/reopened/...), `branches:`, `paths:` | |
| 138 | | `schedule` | one or more `- cron: <5-field-expr>` | |
| 139 | | `workflow_dispatch` | `inputs:` map (string/boolean/choice/environment) | |
| 140 | |
| 141 | ### `uses:` allowlist |
| 142 | |
| 143 | Exactly three aliases are reserved at parse time, no exceptions: |
| 144 | |
| 145 | | Alias | Parser status | Runner status | |
| 146 | | -------------------------------- | ------------- | ------------------------------------------ | |
| 147 | | `actions/checkout@v4` | accepted | executable with scoped checkout token | |
| 148 | | `shithub/upload-artifact@v1` | accepted | rejected until artifact upload lands | |
| 149 | | `shithub/download-artifact@v1` | accepted | rejected until artifact download lands | |
| 150 | |
| 151 | Any other `uses:` value (community actions, Docker images, composite |
| 152 | actions) is an Error-severity diagnostic. The marketplace problem is |
| 153 | explicitly out of scope for v1; revisit only if a real demand exists |
| 154 | and we have an answer for supply-chain trust. |
| 155 | |
| 156 | The current Docker executor runs `actions/checkout@v4` and `run:` steps. |
| 157 | Checkout happens on the runner host before a containerized step mounts the |
| 158 | workspace. The server issues a short-lived checkout-purpose JWT scoped to |
| 159 | the claimed repository and running job; the smart-HTTP handler accepts it |
| 160 | only for read-only `git-upload-pack`. Artifact transfer remains explicit |
| 161 | follow-up work, and the artifact aliases fail deliberately until that path |
| 162 | exists. |
| 163 | |
| 164 | Checkout v1 accepts only `with.fetch-depth`. The default is a depth-1 fetch |
| 165 | of the workflow run's `head_sha`; `fetch-depth: 0` requests full history. |
| 166 | Submodules, LFS, `path`, persisted credentials, and marketplace actions are |
| 167 | rejected because they are not part of this dialect yet. |
| 168 | |
| 169 | ### File-size + parser caps |
| 170 | |
| 171 | - **64 KB** workflow file size cap (`workflow.MaxWorkflowFileBytes`). |
| 172 | Files larger than this are rejected before YAML decode begins — |
| 173 | defends against pathological inputs and gives operators a |
| 174 | predictable upper bound on parser memory. |
| 175 | - **100 anchors** per document (`workflow.MaxYAMLAliases`) — the |
| 176 | billion-laughs guard. yaml.v3 doesn't expose a direct knob; we |
| 177 | count alias nodes during a tree walk and bail. |
| 178 | |
| 179 | ### `${{ github.* }}` alias |
| 180 | |
| 181 | The dialect is intentionally rebranded to `${{ shithub.* }}`. |
| 182 | Authors who paste GHA workflows in unmodified will see their |
| 183 | `${{ github.* }}` references continue to work because the evaluator |
| 184 | rewrites `path[0]` from `github` to `shithub` at the top of `evalRef` |
| 185 | before taint computation, dispatch, and error rendering. |
| 186 | |
| 187 | The alias is intentionally **scope-narrow**: only fields that exist |
| 188 | in our `shithub.*` namespace (`run_id`, `sha`, `ref`, `actor`, |
| 189 | `event`) route through. GHA fields we don't expose in v1 — |
| 190 | `event_name`, `repository`, `run_number`, `workspace`, etc. — error |
| 191 | with the canonical `unknown shithub field "X"` message. Slightly |
| 192 | confusing for a GHA-flavored author but keeps the v1 namespace |
| 193 | surface tight. |
| 194 | |
| 195 | The alias preserves the load-bearing taint flag: `github.event.X` |
| 196 | taints exactly like `shithub.event.X`. `TestEval_GithubAliasIsTainted` |
| 197 | pins this contract. |
| 198 | |
| 199 | Migration to strict-compat (drop the alias entirely) later is a |
| 200 | one-PR flip; moving the other direction is much harder. |
| 201 | |
| 202 | This is a deliberate decision recorded in the campaign plan. |
| 203 | |
| 204 | ## Expression evaluator |
| 205 | |
| 206 | `${{ … }}` expressions are parsed into a tiny AST and evaluated by |
| 207 | `internal/actions/expr`. The surface is intentionally minimal: |
| 208 | |
| 209 | ### Allowed namespaces |
| 210 | |
| 211 | | Namespace | Source | Tainted? | |
| 212 | | ---------------- | ----------------- | --------------------------- | |
| 213 | | `secrets.X` | workflow_secrets | no, but sensitive | |
| 214 | | `vars.X` | actions_variables | no (operator-controlled) | |
| 215 | | `env.X` | workflow file | no (workflow author's text) | |
| 216 | | `shithub.run_id` | dispatch context | no | |
| 217 | | `shithub.sha` | dispatch context | no | |
| 218 | | `shithub.ref` | dispatch context | no | |
| 219 | | `shithub.actor` | dispatch context | no (resolved username) | |
| 220 | | `shithub.event.*`| trigger payload | **yes — always** | |
| 221 | |
| 222 | `runner.*`, `steps.*`, `needs.*`, `matrix.*`, `inputs.*` are all |
| 223 | parse-time errors. They're parked for v2 and the parser's |
| 224 | allowlist-closed posture means a future PR can't widen this |
| 225 | accidentally without a clearly visible diff. |
| 226 | |
| 227 | ### Allowed functions |
| 228 | |
| 229 | `contains(haystack, needle)`, `startsWith(s, prefix)`, |
| 230 | `endsWith(s, suffix)`, plus the four job-status predicates |
| 231 | `success()`, `failure()`, `cancelled()`, `always()`. That's the |
| 232 | whole list. `fromJSON`, `hashFiles`, `toJSON`, `format`, and |
| 233 | friends are explicitly rejected — they each carry footgun risk |
| 234 | (parser DoS, FS access, side-channel injection) that we don't want |
| 235 | to take on in v1. |
| 236 | |
| 237 | ### Missing-value semantics |
| 238 | |
| 239 | | Reference | Missing → ? | |
| 240 | | -------------------------------- | ------------------------------------ | |
| 241 | | `secrets.NOT_BOUND` | error (loud — workflow won't run) | |
| 242 | | `vars.MISSING` | empty string (GHA parity) | |
| 243 | | `env.MISSING` | empty string (GHA parity) | |
| 244 | | `shithub.event.deeply.missing` | null **but still tainted** | |
| 245 | |
| 246 | The "missing event path → null but tainted" case is a defence-in- |
| 247 | depth choice: even if the path doesn't resolve, the result still |
| 248 | came from the event payload, and we'd rather over-flag than under. |
| 249 | |
| 250 | ## Taint contract — the load-bearing piece |
| 251 | |
| 252 | This is the contract every later sub-sprint hangs off. Get it wrong |
| 253 | and we have an injection-shaped hole in the runner. |
| 254 | |
| 255 | ### Where the flag lives |
| 256 | |
| 257 | The taint flag lives on `expr.Value` (the evaluator-produced value), |
| 258 | not `workflow.Value` (the parser-produced value). Two different |
| 259 | structs share the name `Value` because they live in different |
| 260 | packages, but they have different jobs: |
| 261 | |
| 262 | - **`workflow.Value`** carries the raw source string the parser read |
| 263 | out of the YAML (an env entry, a `with:` input, a concurrency |
| 264 | group expression). At parse time we don't know what the |
| 265 | `${{ … }}` body will resolve to, so there's nothing to taint yet. |
| 266 | - **`expr.Value`** is what the evaluator returns when it resolves a |
| 267 | reference at runtime. *This* struct carries `Tainted bool`. The |
| 268 | runner's exec layer (S41d) consumes that flag. |
| 269 | |
| 270 | Pre-L5 the parser-side struct also had a `Tainted bool` field plus a |
| 271 | `Tainted()` constructor — both unused, both confusing because they |
| 272 | suggested two sources of truth. Dropped in S41a-L5 cleanup. |
| 273 | |
| 274 | ### Propagation |
| 275 | |
| 276 | **Every `expr.Value` carries a `Tainted bool`.** Set true iff the |
| 277 | value transitively depends on `shithub.event.*`. Operators control |
| 278 | secrets, vars, env, the rest of `shithub.*`. Authors control the |
| 279 | workflow file. Only the event payload is *attacker-controlled*: a |
| 280 | PR title, a commit message, a branch name from a fork. Those values |
| 281 | must never be interpolated into a shell string. |
| 282 | |
| 283 | Propagation rules: |
| 284 | |
| 285 | - Reading `shithub.event.X` → `Tainted: true` (always, including |
| 286 | missing-path null results). |
| 287 | - Reading `secrets.X` → `Sensitive: true`. Secrets are operator- |
| 288 | controlled, so they are not tainted, but they must not appear in |
| 289 | shell source strings or Docker argv. |
| 290 | - Reading any other namespace → `Tainted: false` and |
| 291 | `Sensitive: false`, except `env.X` preserves both flags of the |
| 292 | resolved env value. This closes the escape where an event-derived or |
| 293 | secret-derived value is first assigned to env and then interpolated |
| 294 | through `${{ env.X }}`. |
| 295 | - Binary op (`==`, `!=`, `&&`, `||`) → tainted or sensitive if either |
| 296 | operand is. |
| 297 | - Unary op (`!`) → tainted/sensitive iff its operand is. |
| 298 | - Function call (`contains`, `startsWith`, `endsWith`) → tainted or |
| 299 | sensitive if any argument is. |
| 300 | |
| 301 | The runner consumes `Tainted` and `Sensitive` and refuses to interpolate |
| 302 | either class into shell strings. Instead, those values are bound to |
| 303 | runner-owned `SHITHUB_INPUT_xx` envvars and the shell source only |
| 304 | references those placeholders. The author writes: |
| 305 | |
| 306 | ```yaml |
| 307 | - run: echo "PR title was: ${{ shithub.event.pull_request.title }}" |
| 308 | ``` |
| 309 | |
| 310 | The runner sees a tainted reference; it compiles the step to: |
| 311 | |
| 312 | ```bash |
| 313 | SHITHUB_INPUT_0="$user_pr_title" exec sh -c 'echo "PR title was: $SHITHUB_INPUT_0"' |
| 314 | ``` |
| 315 | |
| 316 | …where `$user_pr_title` is set via Go's `cmd.Env`, never inserted into |
| 317 | the shell source string or Docker CLI argv. Backticks, `$()`, `;`, |
| 318 | `&&` — none of those work as command-injection vectors when the value |
| 319 | reaches the shell as environment data instead of syntax. |
| 320 | |
| 321 | The shared renderer lives in `internal/runner/exec`, so future engines |
| 322 | consume the same injection boundary instead of reimplementing it. The |
| 323 | runner claim payload includes `workflow_runs.event_payload`; without |
| 324 | that field, the runner cannot evaluate and taint |
| 325 | `${{ shithub.event.* }}` references. |
| 326 | |
| 327 | Tests for this contract live in `internal/actions/expr/eval_test.go`, |
| 328 | `internal/runner/exec/render_test.go`, and |
| 329 | `internal/runner/engine/docker_test.go`. **Do not** weaken them in a |
| 330 | later PR without an audit-checkpoint review — they're explicitly |
| 331 | load-bearing for S41e's threat model. |
| 332 | |
| 333 | Runner log chunks pass through `internal/runner/scrub` before they are |
| 334 | posted to the API. It masks exact secret values and preserves enough |
| 335 | tail bytes between chunks to catch a secret split across chunk |
| 336 | boundaries. S41e wires resolved workflow secrets into the runner claim |
| 337 | payload and mask set, snapshots that mask set encrypted on the job, then |
| 338 | applies the same exact-value scrub again in the runner API before |
| 339 | persisting chunks. The server path also carries a possible secret-prefix |
| 340 | tail from the prior persisted chunk, so a runner that bypasses |
| 341 | client-side scrubbing cannot leak a secret by splitting it across |
| 342 | adjacent log POSTs. |
| 343 | |
| 344 | ## `shithub.event` payload schema (v1) |
| 345 | |
| 346 | The event payload is the most user-facing part of the contract: once |
| 347 | authors write workflows that template against `shithub.event.X`, |
| 348 | schema changes are breaking. The v1 schema is pinned and labelled |
| 349 | `v1`. Any addition is fine; renames and removals require a major |
| 350 | bump. |
| 351 | |
| 352 | The schema is enforced by **typed constructors** in the |
| 353 | `internal/actions/event` package — one per trigger. S41b's pipeline |
| 354 | calls these to build payloads; the function signatures pin the |
| 355 | field set so adding a key requires editing the constructor in a |
| 356 | visible diff. This is the same closed-door discipline as the |
| 357 | expression evaluator's namespace allowlist. |
| 358 | |
| 359 | | Trigger | Constructor | Top-level keys | |
| 360 | | ------------------- | ----------------------- | --------------------------------------------------------------------------------- | |
| 361 | | `push` | `event.Push` | `ref`, `before`, `after`, `head_commit{message,id,author}` | |
| 362 | | `pull_request` | `event.PullRequest` | `action`, `number`, `pull_request{title,head{ref,sha},base{ref,sha},user{login}}` | |
| 363 | | `schedule` | `event.Schedule` | (empty map — cron fired; cron expression is on the `workflow_runs` row) | |
| 364 | | `workflow_dispatch` | `event.WorkflowDispatch`| `inputs{<name>: <stringified>}` | |
| 365 | |
| 366 | Anything not in this table doesn't exist in v1. Accessing it returns |
| 367 | null+tainted (the missing-path semantics above). |
| 368 | |
| 369 | **Adding a field**: edit the constructor in `internal/actions/event/`, |
| 370 | add a row to this doc, and update the corresponding `*_FlowsThroughEvaluator` |
| 371 | test in `event_test.go` so the new path is exercised end-to-end. |
| 372 | Reviewer-required note in the commit message — same standard as a |
| 373 | new evaluator function. |
| 374 | |
| 375 | **Renaming or removing**: that's a v1→v2 break. Don't. |
| 376 | |
| 377 | ## Operator surface |
| 378 | |
| 379 | `shithubd admin actions parse <file>` reads a workflow off disk, |
| 380 | runs the parser, and dumps diagnostics + a canonical JSON rendering |
| 381 | of the parsed AST. Useful for: |
| 382 | |
| 383 | - debugging "why is my workflow not picking up changes" reports |
| 384 | - validating a workflow file before committing it |
| 385 | - producing a stable AST snapshot for inclusion in bug reports |
| 386 | |
| 387 | Exit codes: |
| 388 | |
| 389 | | Code | Meaning | |
| 390 | | ---- | --------------------------------------------- | |
| 391 | | 0 | clean parse, no Error-severity diagnostics | |
| 392 | | 1 | file unreadable, oversized, or YAML malformed | |
| 393 | | 2 | parse produced Error-severity diagnostics | |
| 394 | |
| 395 | Other admin surfaces are scoped to later sub-sprints: |
| 396 | |
| 397 | - S41c: `shithubd admin runner register --name <foo>` issues a |
| 398 | registration token + writes a row to `workflow_runners`. |
| 399 | - S41j: `shithubd admin runner drain|undrain|rotate-token|revoke|cleanup-stale` |
| 400 | gives operators pool controls. Drained runners keep heartbeating and |
| 401 | may finish already claimed jobs but receive no new claims. Revoked |
| 402 | runners are set offline, all registration tokens are revoked, and job |
| 403 | API JWTs from that runner are rejected even if the runner still has an |
| 404 | old config file. |
| 405 | - S41g: `POST /api/v1/jobs/{id}/cancel` and the repository run-detail |
| 406 | UI request cancellation. Running jobs flip `cancel_requested`; queued |
| 407 | jobs are made terminal immediately. |
| 408 | - S41g: `POST /api/v1/runs/{id}/rerun` and the repository run-detail |
| 409 | UI re-run completed/cancelled runs. Re-runs read the workflow YAML |
| 410 | from the original run's `head_sha`, create a fresh queued |
| 411 | `workflow_runs` row, and set `parent_run_id` to the source run. |
| 412 | - S41g: workflow-level `concurrency.group` is resolved at enqueue time |
| 413 | against the trigger context (`shithub.ref`, `shithub.sha`, and |
| 414 | `shithub.event.*`). With `cancel-in-progress: true`, enqueue requests |
| 415 | cancellation for older active runs in the same group. Without it, |
| 416 | runner claim leaves the younger run queued until the older run no |
| 417 | longer has uncancelled queued/running jobs. |
| 418 | - S41g: `workflow:cleanup` is a daily retention worker enqueued by |
| 419 | `shithubd-cron.service`. Operators can run it manually with |
| 420 | `shithubd admin run-job workflow:cleanup`. |
| 421 | |
| 422 | ## Workflow concurrency (S41g) |
| 423 | |
| 424 | `concurrency.group` is a workflow-level slot key. The parser stores the |
| 425 | raw value, and `internal/actions/concurrency` evaluates `${{ ... }}` |
| 426 | fragments when the run is enqueued. The trigger-time context deliberately |
| 427 | does not include secrets; event-derived values may be tainted but are |
| 428 | safe here because the value is only used as a database key. |
| 429 | |
| 430 | When a run enters a non-empty group: |
| 431 | |
| 432 | - `cancel-in-progress: false` leaves the new run queued behind older |
| 433 | same-repo, same-group runs while those older runs still have |
| 434 | queued/running jobs with `cancel_requested=false`. |
| 435 | - `cancel-in-progress: true` requests cancellation on those older jobs. |
| 436 | Queued jobs become terminal immediately; running jobs keep running |
| 437 | with `cancel_requested=true` so the runner can kill the active |
| 438 | container. Once every active older job is cancel-requested, the group |
| 439 | is released for the newer run. |
| 440 | |
| 441 | The runner claim query enforces the queueing rule, not the web handler |
| 442 | or UI. This keeps heartbeat races honest: multiple runners can poll at |
| 443 | the same time, but only jobs whose dependency and concurrency blockers |
| 444 | are clear can be claimed. |
| 445 | |
| 446 | ## Runner timeouts (S41g) |
| 447 | |
| 448 | `jobs.<key>.timeout-minutes` is enforced by `shithubd-runner` as a |
| 449 | whole-job deadline. The parser stores the value in |
| 450 | `workflow_jobs.timeout_minutes` with the GitHub-compatible default of |
| 451 | 360 minutes and a 1..4320 cap. |
| 452 | |
| 453 | When the deadline expires, the Docker engine explicitly kills the |
| 454 | active step container, emits a terminal step update with |
| 455 | `status=completed` and `conclusion=timed_out`, and the runner reports |
| 456 | the job itself as `completed/timed_out`. The server rolls the parent |
| 457 | workflow run up to `timed_out` when all jobs are terminal. A timed-out |
| 458 | step is not masked by `continue-on-error`; the job deadline always wins. |
| 459 | |
| 460 | The runner API increments `shithub_actions_step_timeouts_total` the |
| 461 | first time a step reaches `conclusion=timed_out`. Duplicate terminal |
| 462 | step-status retries do not increment the counter again. |
| 463 | |
| 464 | ## Retention cleanup (S41g) |
| 465 | |
| 466 | `workflow:cleanup` applies the durable Actions retention contract in |
| 467 | this order: |
| 468 | |
| 469 | 1. Delete hot `workflow_step_log_chunks` for steps completed more than |
| 470 | 7 days ago. Finalized logs already live in object storage. |
| 471 | 2. Delete expired `workflow_artifacts` rows after deleting their |
| 472 | `actions/runs/...` blob objects. The row's `expires_at` value is |
| 473 | authoritative so per-upload retention overrides keep working. |
| 474 | 3. Delete unpinned terminal `workflow_runs` older than 365 days. Child |
| 475 | jobs, steps, artifacts, and consumed JWT rows cascade through FK |
| 476 | ownership. |
| 477 | 4. Delete consumed `runner_jwt_used` rows whose JWT expiry is more than |
| 478 | 30 days old. This preserves replay/audit evidence for recent jobs |
| 479 | without letting the replay table grow forever. |
| 480 | |
| 481 | The defaults can be overridden in the worker payload: |
| 482 | |
| 483 | ```json |
| 484 | {"step_log_chunk_days":7,"run_days":365,"jwt_used_days":30,"artifact_batch":1000} |
| 485 | ``` |
| 486 | |
| 487 | `artifact_batch` caps each object-delete page and may not exceed 10000. |
| 488 | Negative values are poison-job errors. The worker exports |
| 489 | `shithub_actions_runs_pruned_total{kind}` where `kind` is one of |
| 490 | `chunks`, `blobs`, `runs`, or `jwt_used`. |
| 491 | |
| 492 | Production object storage also needs provider-side lifecycle on the |
| 493 | same prefix: `deploy/spaces/actions-lifecycle.json` expires |
| 494 | `actions/runs/` objects after 90 days and aborts stale multipart |
| 495 | uploads after 2 days. Apply it with |
| 496 | `deploy/cutover/apply-actions-lifecycle.sh`. |
| 497 | |
| 498 | ## Trigger pipeline (S41b) |
| 499 | |
| 500 | Three layers between a triggering event and a queued `workflow_run`: |
| 501 | |
| 502 | ``` |
| 503 | caller (push_process / pulls.Create / pr_jobs.PRSynchronize / dispatch HTTP) |
| 504 | │ |
| 505 | └─► worker.Enqueue(KindWorkflowTrigger, JobPayload) |
| 506 | │ |
| 507 | └─► trigger.Handler picks up: |
| 508 | Discover .shithub/workflows/*.yml at HEAD SHA |
| 509 | Parse each (skip + log on Error diagnostics) |
| 510 | Match each against trigger.Event |
| 511 | Enqueue each match |
| 512 | │ |
| 513 | └─► trigger.Enqueue (one tx): |
| 514 | INSERT workflow_runs (ON CONFLICT DO NOTHING) |
| 515 | INSERT workflow_jobs per parsed job |
| 516 | INSERT workflow_steps per parsed step |
| 517 | (commit) |
| 518 | checks.Create per job (post-tx, idempotent |
| 519 | via ExternalID 'workflow_run:<id>:job:<key>') |
| 520 | ``` |
| 521 | |
| 522 | ### Idempotency on the triggering event |
| 523 | |
| 524 | The robust pattern, not a UNIQUE on `(repo_id, head_sha)`. Each |
| 525 | caller constructs a stable `trigger_event_id` from its triggering |
| 526 | event's identity: |
| 527 | |
| 528 | | Caller | trigger_event_id format | |
| 529 | | ------------------- | ------------------------------------------------ | |
| 530 | | push_process | `push:<push_event_id>` | |
| 531 | | pulls.Create | `pr_opened:<pr_id>:<head_sha>` | |
| 532 | | pr_jobs.PRSynchronize | `pr_synchronize:<pr_id>:<head_sha>` | |
| 533 | | dispatch HTTP | `dispatch:<file>:<sha>:<8-byte-random-hex>` | |
| 534 | | schedule sweep (S41b-2) | `schedule:<workflow_id>:<window_start_unix>` | |
| 535 | |
| 536 | Migration 0051 adds `workflow_runs.trigger_event_id` (text NOT NULL |
| 537 | DEFAULT '') with a partial UNIQUE on |
| 538 | `(repo_id, workflow_file, trigger_event_id) WHERE trigger_event_id <> ''`. |
| 539 | The trigger handler does `INSERT … ON CONFLICT DO NOTHING` so: |
| 540 | |
| 541 | - Worker retries (the same push_process replay) → no duplicate runs. |
| 542 | - Admin replays via `shithubd admin run-job workflow:trigger ...` |
| 543 | → no duplicate runs. |
| 544 | - Re-runs explicitly construct a NEW |
| 545 | trigger_event_id (`rerun:<original_run_id>:<request_uuid>`) and |
| 546 | chain back via `parent_run_id`. History is preserved, no |
| 547 | collision. |
| 548 | |
| 549 | Each caller's collision-free namespace is short-lived and |
| 550 | human-debuggable: a Postgres operator can grep |
| 551 | `workflow_runs.trigger_event_id` to see exactly which triggering |
| 552 | event produced a given run. |
| 553 | |
| 554 | ### Filter evaluation |
| 555 | |
| 556 | `trigger.Match(workflow, event)` is a pure function (no I/O, no DB). |
| 557 | For each event kind: |
| 558 | |
| 559 | - **push**: branch vs tag classified from the ref; only the matching |
| 560 | filter list applies (a `branches:` filter rejects tag pushes and |
| 561 | vice versa). `paths:` (when set) requires at least one changed |
| 562 | path to match. Empty filter = match-all. |
| 563 | - **pull_request**: `types:` defaults to |
| 564 | `[opened, synchronize, reopened]` when omitted (GHA parity). |
| 565 | `branches:` applies to the **base** ref. `paths:` as for push. |
| 566 | - **schedule**: requires the workflow to declare the cron expression |
| 567 | that fired. The sweep is the source of truth for which cron |
| 568 | fires; we just gate on declaration. Avoids interpreting cron |
| 569 | semantics in two places. |
| 570 | - **workflow_dispatch**: matches whenever the workflow declares |
| 571 | `on.workflow_dispatch`. |
| 572 | |
| 573 | Glob semantics in `branches:`/`tags:`/`paths:`: minimatch subset |
| 574 | with `*` (single segment), `**` (any), `/**` end-anchor (optional |
| 575 | trailing path), `**/` start-anchor, and `!exclude` (last-match-wins, |
| 576 | exclusion-only list implies include-all). |
| 577 | |
| 578 | ### Collaborator gate |
| 579 | |
| 580 | Per the S41b spec's "external-PR support is parked" decision: PR |
| 581 | triggers (both `opened` and `synchronize`) only fire when the PR's |
| 582 | author is the repo's owning user. Conservative — drops legitimate |
| 583 | non-owner collaborators in the org-repo case. Expanding the gate |
| 584 | requires plumbing `policy.Can` into the worker context, which we |
| 585 | defer to S41g where the lifecycle work touches that surface anyway. |
| 586 | |
| 587 | ### Operator surface |
| 588 | |
| 589 | - `POST /{owner}/{repo}/actions/workflows/{file}/dispatches` |
| 590 | Body: `{"ref": "...", "inputs": {"key": "value"}}` (both optional; |
| 591 | ref defaults to the repo's default branch). Returns 204 No Content |
| 592 | on success. Synchronous trigger.Enqueue (no discovery — file is |
| 593 | named in the URL). Auth: requires repo write. |
| 594 | - `GET /{owner}/{repo}/actions.atom` |
| 595 | Returns the last 50 workflow runs as an Atom feed. Auth and visibility |
| 596 | match the Actions tab (`repo:read`). Entries link to |
| 597 | `/{owner}/{repo}/actions/runs/{run_index}` and include the workflow |
| 598 | name/path, event, branch, short SHA, status, and conclusion. |
| 599 | |
| 600 | ### Webhook events (S41h) |
| 601 | |
| 602 | Actions emits webhook-facing domain events through `notif.EmitTx` on |
| 603 | state transitions: |
| 604 | |
| 605 | - `workflow_run`, with `payload.action` set to `queued`, `running`, or |
| 606 | `completed` (`completed` may carry `conclusion:"cancelled"`). |
| 607 | - `workflow_job`, with `payload.action` set to `queued`, `running`, |
| 608 | `completed`, or `cancelled`. |
| 609 | |
| 610 | Payloads are structural snapshots only. They include ids, run index, |
| 611 | workflow path/name, head SHA/ref, event kind, status, conclusion, |
| 612 | timestamps, job key/name/runner id, needs, timeout, and cancellation |
| 613 | state. They deliberately exclude `workflow_runs.event_payload`, env, |
| 614 | permissions, logs, runner JWTs, and secret values. This keeps the |
| 615 | webhook surface stable without turning arbitrary workflow input into |
| 616 | subscriber-facing data. |
| 617 | |
| 618 | ### What S41b deliberately doesn't do |
| 619 | |
| 620 | - Run jobs. S41c adds runner claim/status APIs; S41d adds the actual |
| 621 | `shithubd-runner` execution binary. |
| 622 | - Schedule sweep. Cron-driven triggers split into S41b-2 to keep |
| 623 | this PR reviewable; the trigger pipeline accepts schedule events, |
| 624 | but no caller produces them yet. S41b-2 adds the sweep + the |
| 625 | `robfig/cron/v3` dep + `shithubd-cron.service` wiring. |
| 626 | - External-PR triggers. Conservative collaborator gate above. |
| 627 | |
| 628 | ## Secrets + variables settings surface (S41c) |
| 629 | |
| 630 | S41c wires the previously schema-only `workflow_secrets` and |
| 631 | `actions_variables` tables into repo/org settings. |
| 632 | |
| 633 | Repository routes are gated through |
| 634 | `policy.ActionRepoSettingsActions` (`repo:settings:actions`, admin |
| 635 | role minimum): |
| 636 | |
| 637 | - `GET /{owner}/{repo}/settings/secrets/actions` |
| 638 | - `POST /{owner}/{repo}/settings/secrets/actions` |
| 639 | - `POST /{owner}/{repo}/settings/secrets/actions/{name}/delete` |
| 640 | - `GET /{owner}/{repo}/settings/variables/actions` |
| 641 | - `POST /{owner}/{repo}/settings/variables/actions` |
| 642 | - `POST /{owner}/{repo}/settings/variables/actions/{name}/delete` |
| 643 | |
| 644 | Organization routes follow the existing org-settings prefix and are |
| 645 | owner-only: |
| 646 | |
| 647 | - `GET /organizations/{org}/settings/secrets/actions` |
| 648 | - `POST /organizations/{org}/settings/secrets/actions` |
| 649 | - `POST /organizations/{org}/settings/secrets/actions/{name}/delete` |
| 650 | - `GET /organizations/{org}/settings/variables/actions` |
| 651 | - `POST /organizations/{org}/settings/variables/actions` |
| 652 | - `POST /organizations/{org}/settings/variables/actions/{name}/delete` |
| 653 | |
| 654 | Secrets are sealed through `internal/auth/secretbox` using the |
| 655 | operator-managed `Auth.TOTPKeyB64` root key. Secret list pages render |
| 656 | names/metadata only; the plaintext value is accepted once on create or |
| 657 | rotation and never rendered back. Variables are non-secret plaintext |
| 658 | configuration, so settings pages render their values. Both stores use |
| 659 | the same name grammar as the database constraints: |
| 660 | `^[A-Za-z_][A-Za-z0-9_]*$`, 1-100 characters. Variables additionally |
| 661 | enforce the 4096-character value cap in Go before hitting the DB |
| 662 | constraint. |
| 663 | |
| 664 | ## What S41a deliberately doesn't do |
| 665 | |
| 666 | - No trigger pipeline. `domain_events` aren't matched against `on:` |
| 667 | yet — that's S41b. |
| 668 | - No runner. S41c/S41d add runner claim APIs and the execution binary. |
| 669 | - No UI. The Actions tab still renders the placeholder — S41f. |
| 670 | - No secret encryption helpers wired to anything writable — S41c. |
| 671 | - No JWT issuance, no runner registration flow — S41c. |
| 672 | - No log streaming, no SSE — S41d/f. |
| 673 | - No execution sandbox, no scrubbing, no injection guards |
| 674 | *enforced at the runner* — S41d/e (the parser-side taint contract |
| 675 | is the foundation those depend on, not a substitute). |
| 676 | |
| 677 | ## Why these choices, in two paragraphs |
| 678 | |
| 679 | The schema work is front-loaded so later sub-sprints don't ripple a |
| 680 | migration through every PR. `version` (optimistic locking) and |
| 681 | `run_index` (per-repo monotonic) are the two columns I'd flag to a |
| 682 | new maintainer immediately — both are nearly free to add up front |
| 683 | and painful to retrofit. The split between hot-path log chunks |
| 684 | (Postgres) and finalized blob (Spaces) is shaped after Forgejo's |
| 685 | log path; we pick the boring well-trodden answer over the clever |
| 686 | one because log throughput is the failure mode that bites first. |
| 687 | |
| 688 | The taint contract is the security-load-bearing piece. Every later |
| 689 | sub-sprint trusts that the `Tainted` flag is set correctly here, in |
| 690 | the parser/evaluator, and never re-derived downstream. The narrow |
| 691 | allowlist of namespaces and functions exists exactly so a future PR |
| 692 | that adds, say, `fromJSON` has to do it knowingly — by widening the |
| 693 | allowlist in a visible diff, with a reviewer-required note, rather |
| 694 | than by accident. The `${{ github.* }}` alias is a pragmatic |
| 695 | concession to copy-paste users; the rebrand to `${{ shithub.* }}` |
| 696 | is the canonical form so future divergence isn't awkward. |
| 697 | |
| 698 | ## See also |
| 699 | |
| 700 | - `internal/actions/workflow/parse.go` — the parser |
| 701 | - `internal/actions/expr/eval.go` — the evaluator |
| 702 | - `internal/migrationsfs/migrations/0042..0049_*.sql` — the schema |
| 703 | - `tests/fixtures/workflows/*.yml` — canonical input shapes |
| 704 | - `internal/actions/workflow/parse_test.go` — fixture-driven tests |
| 705 | - `internal/actions/expr/eval_test.go` — taint-contract tests |
| 706 | - `.refs/forgejo/services/actions/` — reference architecture |
| 707 | - Campaign plan in conversation memory (humble-cooking-bunny) |