@@ -46,7 +46,9 @@ later schema diff: |
| 46 | 46 | S41g's race between a cancel request and a state transition. |
| 47 | 47 | - **`workflow_runs.concurrency_group`** — the concurrency-slot key, |
| 48 | 48 | resolved at trigger time from the workflow's `concurrency.group:` |
| 49 | | - expression. S41g's slot manager keys off this column. |
| 49 | + expression. S41g's slot manager keys off this column and runner |
| 50 | + claim blocks younger runs while an older same-group run still has a |
| 51 | + queued/running job without `cancel_requested=true`. |
| 50 | 52 | - **`workflow_runs.parent_run_id`** — for re-runs. The new run |
| 51 | 53 | references the original; the UI shows a "re-ran from #N" link. |
| 52 | 54 | - **`workflow_jobs.runner_id`** — FK added in 0046 (after the |
@@ -378,10 +380,40 @@ Other admin surfaces are scoped to later sub-sprints: |
| 378 | 380 | UI re-run completed/cancelled runs. Re-runs read the workflow YAML |
| 379 | 381 | from the original run's `head_sha`, create a fresh queued |
| 380 | 382 | `workflow_runs` row, and set `parent_run_id` to the source run. |
| 383 | +- S41g: workflow-level `concurrency.group` is resolved at enqueue time |
| 384 | + against the trigger context (`shithub.ref`, `shithub.sha`, and |
| 385 | + `shithub.event.*`). With `cancel-in-progress: true`, enqueue requests |
| 386 | + cancellation for older active runs in the same group. Without it, |
| 387 | + runner claim leaves the younger run queued until the older run no |
| 388 | + longer has uncancelled queued/running jobs. |
| 381 | 389 | - S41g: `workflow:cleanup` is a daily retention worker enqueued by |
| 382 | 390 | `shithubd-cron.service`. Operators can run it manually with |
| 383 | 391 | `shithubd admin run-job workflow:cleanup`. |
| 384 | 392 | |
| 393 | +## Workflow concurrency (S41g) |
| 394 | + |
| 395 | +`concurrency.group` is a workflow-level slot key. The parser stores the |
| 396 | +raw value, and `internal/actions/concurrency` evaluates `${{ ... }}` |
| 397 | +fragments when the run is enqueued. The trigger-time context deliberately |
| 398 | +does not include secrets; event-derived values may be tainted but are |
| 399 | +safe here because the value is only used as a database key. |
| 400 | + |
| 401 | +When a run enters a non-empty group: |
| 402 | + |
| 403 | +- `cancel-in-progress: false` leaves the new run queued behind older |
| 404 | + same-repo, same-group runs while those older runs still have |
| 405 | + queued/running jobs with `cancel_requested=false`. |
| 406 | +- `cancel-in-progress: true` requests cancellation on those older jobs. |
| 407 | + Queued jobs become terminal immediately; running jobs keep running |
| 408 | + with `cancel_requested=true` so the runner can kill the active |
| 409 | + container. Once every active older job is cancel-requested, the group |
| 410 | + is released for the newer run. |
| 411 | + |
| 412 | +The runner claim query enforces the queueing rule, not the web handler |
| 413 | +or UI. This keeps heartbeat races honest: multiple runners can poll at |
| 414 | +the same time, but only jobs whose dependency and concurrency blockers |
| 415 | +are clear can be claimed. |
| 416 | + |
| 385 | 417 | ## Runner timeouts (S41g) |
| 386 | 418 | |
| 387 | 419 | `jobs.<key>.timeout-minutes` is enforced by `shithubd-runner` as a |