Forks (S27)
S27 ships fork creation, fork sync (fast-forward only), ahead/behind
stats, and the schema columns + triggers that maintain
repos.fork_count. Cross-fork PRs and the S16 hard-delete-cascade
amendment for repacking forks are scoped here too; the cross-fork PR
deferral pointer and the S16 amendment are landed in their own
sub-sections.
Schema
repos gained two columns in 0029:
fork_count bigint NOT NULL DEFAULT 0— maintained by theforks_count_inc/forks_count_decAFTER triggers onreposinsert/delete. Decrement usesGREATEST(... - 1, 0)so a hand-written DB tweak that violates the trigger doesn't underflow into negatives.init_status repo_init_status NOT NULL DEFAULT 'initialized'— enum('initialized', 'init_pending', 'init_failed'). Synchronous repo creates (the S11 path) write'initialized'directly. Forks start at'init_pending'; the worker job flips to'initialized'on success or'init_failed'on permanent failure.
fork_of_repo_id was already present from S11 (the only column the
S11 status block actually shipped — is_fork and fork_count were
the missed ones, same shape as the S11/S26 gap noted in the
stars-watchers doc).
We deliberately did NOT add is_fork — it would duplicate
fork_of_repo_id IS NOT NULL and create the kind of two-source-of-
truth drift that the audit penalises. Use the FK predicate.
On-disk layout
git clone --bare --shared <source> <fork> creates a fork whose
objects/info/alternates file points back at the source's objects/
directory. Disk usage of the fork is essentially refs + a small
overhead. The same-volume requirement (S04 RepoFS.Root()) is what
makes alternates safe — alternates across volumes is undefined
behaviour for git.
When a fork is created we additionally set
extensions.preciousObjects = true on the source so a future
git gc on the source can't prune objects the fork reaches via
alternates. Idempotent; the fork-clone worker re-asserts on every
new fork so missing config is self-healing.
Worker job
KindRepoForkClone (internal/worker/jobs/repo_fork_clone.go) runs
the on-disk clone out of band so fork-create returns fast even for
large source repos. Payload is {source_repo_id, fork_repo_id}.
The job's flow:
- Reload both repos by id (defends against soft-delete between enqueue and run).
CloneBareShared(sourcePath, forkPath)— git clone + alternates.hooks.Install(forkPath, shithubdPath)— same hook install as the synchronous repo-create path so subsequent user pushes firepush:process.SetPreciousObjects(sourcePath)— pin source's objects.SetRepoInitStatus(fork.ID, 'initialized').
On any permanent failure: flip to 'init_failed' and return
worker.PoisonError (no retries). The repo row stays so the user
sees the failure; we don't auto-cleanup because that races concurrent
retries.
Sync (fast-forward fork from upstream)
fork.Sync(ctx, deps, actorUserID, forkRepoID) only fast-forwards.
Anything else (merge, rebase) belongs in the user's client; doing
either server-side without the user's resolution preferences risks
producing commits the user doesn't want.
Algorithm:
- Resolve both default-branch OIDs (
fork,source). - If equal →
ErrSyncUpToDate. - If fork is NOT an ancestor of upstream →
ErrSyncDiverged. - CAS update via
repogit.UpdateRefCAS(fork, branch, upstream, fork)— the trailingforkargument is git's old-value guard. A concurrent push to the fork loses the CAS and surfaces asErrSyncRefRaced. - Update
repos.default_branch_oidso the home view reflects the new tip without waiting forpush:process(update-ref bypassespost-receive, same shape as the merge handler's fix in the audit-remediation sprint).
Empty fork (no branch yet) is handled via the 40-zero OID literal that git accepts as "ref must not exist yet" semantics — sync to an empty fork creates the branch from upstream's tip.
Ahead/behind
fork.AheadBehind(ctx, deps, forkRepoID) returns
{Ahead, Behind, Comparable} where:
Ahead= commits in fork's default branch not in source's.Behind= commits in source's default branch not in fork's.Comparable= false when either side's default ref is missing (empty fork, never-initialised source).
Implementation: read both OIDs, then run
git rev-list --left-right --count inside the fork's repo.
Because the fork shares object alternates with the source, the
upstream OID resolves without an explicit fetch.
This is the floor implementation. S36's perf-pass sprint adds an
LRU cache keyed on (fork_repo_id, fork_default_oid, upstream_default_oid) — already documented in S36's "Code-tab
caching" deliverables and on the S00–S25 audit's H4 deferral.
Visibility floor
fork.allowedTargetVisibility(source, target) enforces:
| source | target=public | target=private | target="" |
|---|---|---|---|
| public | ✓ | ✓ | public |
| private | ✗ | ✓ | private |
Forking private → public would expose previously-private content
and is always rejected (ErrVisibilityFloor).
Permission lattice
policy.ActionForkCreate was already in the registry from earlier
sprints. Today's gating shape:
- Anonymous on any repo → deny (
DenyAnonymous). - Logged-in on a repo they can read → allow (login-required, no role gate).
- Logged-in on a private repo they CAN'T read → deny
(
DenyVisibility, leaks as 404 at the handler layer).
Suspended actors are blocked by step 3 of policy.Can (suspended +
write action → deny). Fork create counts as write (it mutates the
target owner's namespace).
Cross-fork PRs (deferred to a follow-up)
S27's spec lists cross-fork PR support as in-scope, but the actual
plumbing — fetching the fork's head into the base repo's
refs/shithub-pr/<pr_id>/head namespace and routing the merge from
the internal ref — is large enough that this sprint ships fork
creation, sync, and ahead/behind only. The cross-fork PR work is
tracked here as a follow-up:
- Extend
pulls.Createto accepthead_repo_id != repo_id. - Add
repogit.FetchIntoNamespace(already shipped in this sprint for the eventual consumer). pulls.Synchronizereads head from the internal ref whenpull_request.head_repo_id != pull_request.repo_id.pulls.Mergeworktree-add reads head from the internal ref.- Re-check fork visibility at merge time (the merger may have lost read access on the head between PR open and merge).
The internal ref is private — we never advertise it via
info/refs. The git-http handler's ref filter already restricts
to refs/heads/* and refs/tags/*, so the namespace is
naturally hidden.
S16 hard-delete cascade amendment
When a source repo with active forks is hard-deleted, the forks
become orphans (fork_of_repo_id ON DELETE SET NULL from the
existing FK). Today the orphan forks have only the refs they
added since fork — the objects up to fork point still live in the
source. Hard-deleting the source would prune those objects and
break the orphan forks.
The fix is to repack each fork before removing the source:
git repack -a -d --no-shared
…runs in the fork's repo, copies all reachable objects into the fork's own pack, then we can safely delete the source.
This is a KindRepoForkRepackOnSourceDelete job (deferred from
S16; see ListForksOfRepoForRepack query that this sprint shipped
for it). The lifecycle worker's repo_hard_delete step needs to
fan out one repack job per fork, await completion, then proceed
with the FS delete.
The query is in place; the job + the cascade wiring land in a follow-up commit (or in S37 when the deploy plan freezes the hard-delete sequence).
Routes
| Method | Path | Auth | Notes |
|---|---|---|---|
| POST | /{owner}/{repo}/fork |
RequireUser | Create a fork |
| POST | /{owner}/{repo}/sync |
RequireUser | Fast-forward fork from upstream |
| GET | /{owner}/{repo}/forks |
public | Paginated list of forks |
The /fork POST emits a forked domain event (kind=forked,
source_kind=repo) into S26's domain_events log so the future
activity feed picks it up. The /sync POST emits repo_fork_synced
through the audit log only (no public event).
The fork-create handler also auto-watches the new fork at
level=all so the user sees fork-side events without having to
opt in. Matches GitHub's "watching your own forks" default.
Pitfalls noted in code
- Source-repo GC pruning fork-needed objects →
preciousObjects. - Source-repo deletion with active forks → S16 amendment (above).
- Cross-fork PR with deleted fork → mark
mergeable_state='blocked'with "head repository deleted" reason at the merge gate (lands with cross-fork PR work). - Fork rename / transfer →
fork_of_repo_idis by-id so the relationship survives. - Sync race with concurrent push → CAS on update-ref; surfaces as
ErrSyncRefRaced. - Fork-of-fork chains → spec leans "flatten alternates to root".
Today the clone uses
--sharedagainst whatever path we pass; if the source is itself a fork, the alternates chain is two levels deep. Acceptable for v1; the flattening lands when fork-of-fork becomes a real user complaint.
View source
| 1 | # Forks (S27) |
| 2 | |
| 3 | S27 ships fork creation, fork sync (fast-forward only), ahead/behind |
| 4 | stats, and the schema columns + triggers that maintain |
| 5 | `repos.fork_count`. Cross-fork PRs and the S16 hard-delete-cascade |
| 6 | amendment for repacking forks are scoped here too; the cross-fork PR |
| 7 | deferral pointer and the S16 amendment are landed in their own |
| 8 | sub-sections. |
| 9 | |
| 10 | ## Schema |
| 11 | |
| 12 | `repos` gained two columns in 0029: |
| 13 | |
| 14 | * `fork_count bigint NOT NULL DEFAULT 0` — maintained by the |
| 15 | `forks_count_inc` / `forks_count_dec` AFTER triggers on `repos` |
| 16 | insert/delete. Decrement uses `GREATEST(... - 1, 0)` so a |
| 17 | hand-written DB tweak that violates the trigger doesn't |
| 18 | underflow into negatives. |
| 19 | * `init_status repo_init_status NOT NULL DEFAULT 'initialized'` — |
| 20 | enum `('initialized', 'init_pending', 'init_failed')`. Synchronous |
| 21 | repo creates (the S11 path) write `'initialized'` directly. Forks |
| 22 | start at `'init_pending'`; the worker job flips to `'initialized'` |
| 23 | on success or `'init_failed'` on permanent failure. |
| 24 | |
| 25 | `fork_of_repo_id` was already present from S11 (the only column the |
| 26 | S11 status block actually shipped — `is_fork` and `fork_count` were |
| 27 | the missed ones, same shape as the S11/S26 gap noted in the |
| 28 | stars-watchers doc). |
| 29 | |
| 30 | We deliberately did NOT add `is_fork` — it would duplicate |
| 31 | `fork_of_repo_id IS NOT NULL` and create the kind of two-source-of- |
| 32 | truth drift that the audit penalises. Use the FK predicate. |
| 33 | |
| 34 | ## On-disk layout |
| 35 | |
| 36 | `git clone --bare --shared <source> <fork>` creates a fork whose |
| 37 | `objects/info/alternates` file points back at the source's `objects/` |
| 38 | directory. Disk usage of the fork is essentially refs + a small |
| 39 | overhead. The same-volume requirement (S04 `RepoFS.Root()`) is what |
| 40 | makes alternates safe — alternates across volumes is undefined |
| 41 | behaviour for git. |
| 42 | |
| 43 | When a fork is created we additionally set |
| 44 | `extensions.preciousObjects = true` on the **source** so a future |
| 45 | `git gc` on the source can't prune objects the fork reaches via |
| 46 | alternates. Idempotent; the fork-clone worker re-asserts on every |
| 47 | new fork so missing config is self-healing. |
| 48 | |
| 49 | ## Worker job |
| 50 | |
| 51 | `KindRepoForkClone` (`internal/worker/jobs/repo_fork_clone.go`) runs |
| 52 | the on-disk clone out of band so fork-create returns fast even for |
| 53 | large source repos. Payload is `{source_repo_id, fork_repo_id}`. |
| 54 | |
| 55 | The job's flow: |
| 56 | |
| 57 | 1. Reload both repos by id (defends against soft-delete between |
| 58 | enqueue and run). |
| 59 | 2. `CloneBareShared(sourcePath, forkPath)` — git clone + alternates. |
| 60 | 3. `hooks.Install(forkPath, shithubdPath)` — same hook install as |
| 61 | the synchronous repo-create path so subsequent user pushes fire |
| 62 | `push:process`. |
| 63 | 4. `SetPreciousObjects(sourcePath)` — pin source's objects. |
| 64 | 5. `SetRepoInitStatus(fork.ID, 'initialized')`. |
| 65 | |
| 66 | On any permanent failure: flip to `'init_failed'` and return |
| 67 | `worker.PoisonError` (no retries). The repo row stays so the user |
| 68 | sees the failure; we don't auto-cleanup because that races concurrent |
| 69 | retries. |
| 70 | |
| 71 | ## Sync (fast-forward fork from upstream) |
| 72 | |
| 73 | `fork.Sync(ctx, deps, actorUserID, forkRepoID)` only fast-forwards. |
| 74 | Anything else (merge, rebase) belongs in the user's client; doing |
| 75 | either server-side without the user's resolution preferences risks |
| 76 | producing commits the user doesn't want. |
| 77 | |
| 78 | Algorithm: |
| 79 | |
| 80 | 1. Resolve both default-branch OIDs (`fork`, `source`). |
| 81 | 2. If equal → `ErrSyncUpToDate`. |
| 82 | 3. If fork is NOT an ancestor of upstream → `ErrSyncDiverged`. |
| 83 | 4. CAS update via `repogit.UpdateRefCAS(fork, branch, upstream, fork)` |
| 84 | — the trailing `fork` argument is git's old-value guard. A |
| 85 | concurrent push to the fork loses the CAS and surfaces as |
| 86 | `ErrSyncRefRaced`. |
| 87 | 5. Update `repos.default_branch_oid` so the home view reflects the |
| 88 | new tip without waiting for `push:process` (update-ref bypasses |
| 89 | `post-receive`, same shape as the merge handler's fix in the |
| 90 | audit-remediation sprint). |
| 91 | |
| 92 | Empty fork (no branch yet) is handled via the 40-zero OID literal |
| 93 | that git accepts as "ref must not exist yet" semantics — sync to |
| 94 | an empty fork creates the branch from upstream's tip. |
| 95 | |
| 96 | ## Ahead/behind |
| 97 | |
| 98 | `fork.AheadBehind(ctx, deps, forkRepoID)` returns |
| 99 | `{Ahead, Behind, Comparable}` where: |
| 100 | |
| 101 | * `Ahead` = commits in fork's default branch not in source's. |
| 102 | * `Behind` = commits in source's default branch not in fork's. |
| 103 | * `Comparable` = false when either side's default ref is missing |
| 104 | (empty fork, never-initialised source). |
| 105 | |
| 106 | Implementation: read both OIDs, then run |
| 107 | `git rev-list --left-right --count` *inside the fork's repo*. |
| 108 | Because the fork shares object alternates with the source, the |
| 109 | upstream OID resolves without an explicit fetch. |
| 110 | |
| 111 | This is the floor implementation. S36's perf-pass sprint adds an |
| 112 | LRU cache keyed on `(fork_repo_id, fork_default_oid, |
| 113 | upstream_default_oid)` — already documented in S36's "Code-tab |
| 114 | caching" deliverables and on the S00–S25 audit's H4 deferral. |
| 115 | |
| 116 | ## Visibility floor |
| 117 | |
| 118 | `fork.allowedTargetVisibility(source, target)` enforces: |
| 119 | |
| 120 | | source | target=public | target=private | target="" | |
| 121 | |---------|---------------|----------------|-----------| |
| 122 | | public | ✓ | ✓ | public | |
| 123 | | private | ✗ | ✓ | private | |
| 124 | |
| 125 | Forking private → public would expose previously-private content |
| 126 | and is always rejected (`ErrVisibilityFloor`). |
| 127 | |
| 128 | ## Permission lattice |
| 129 | |
| 130 | `policy.ActionForkCreate` was already in the registry from earlier |
| 131 | sprints. Today's gating shape: |
| 132 | |
| 133 | * Anonymous on any repo → deny (`DenyAnonymous`). |
| 134 | * Logged-in on a repo they can read → allow (login-required, no |
| 135 | role gate). |
| 136 | * Logged-in on a private repo they CAN'T read → deny |
| 137 | (`DenyVisibility`, leaks as 404 at the handler layer). |
| 138 | |
| 139 | Suspended actors are blocked by step 3 of `policy.Can` (suspended + |
| 140 | write action → deny). Fork create counts as write (it mutates the |
| 141 | target owner's namespace). |
| 142 | |
| 143 | ## Cross-fork PRs (deferred to a follow-up) |
| 144 | |
| 145 | S27's spec lists cross-fork PR support as in-scope, but the actual |
| 146 | plumbing — fetching the fork's head into the base repo's |
| 147 | `refs/shithub-pr/<pr_id>/head` namespace and routing the merge from |
| 148 | the internal ref — is large enough that this sprint ships fork |
| 149 | creation, sync, and ahead/behind only. The cross-fork PR work is |
| 150 | tracked here as a follow-up: |
| 151 | |
| 152 | * Extend `pulls.Create` to accept `head_repo_id != repo_id`. |
| 153 | * Add `repogit.FetchIntoNamespace` (already shipped in this sprint |
| 154 | for the eventual consumer). |
| 155 | * `pulls.Synchronize` reads head from the internal ref when |
| 156 | `pull_request.head_repo_id != pull_request.repo_id`. |
| 157 | * `pulls.Merge` worktree-add reads head from the internal ref. |
| 158 | * Re-check fork visibility at merge time (the merger may have lost |
| 159 | read access on the head between PR open and merge). |
| 160 | |
| 161 | The internal ref is private — we never advertise it via |
| 162 | `info/refs`. The git-http handler's ref filter already restricts |
| 163 | to `refs/heads/*` and `refs/tags/*`, so the namespace is |
| 164 | naturally hidden. |
| 165 | |
| 166 | ## S16 hard-delete cascade amendment |
| 167 | |
| 168 | When a source repo with active forks is hard-deleted, the forks |
| 169 | become orphans (`fork_of_repo_id ON DELETE SET NULL` from the |
| 170 | existing FK). Today the orphan forks have only the *refs* they |
| 171 | added since fork — the objects up to fork point still live in the |
| 172 | source. Hard-deleting the source would prune those objects and |
| 173 | break the orphan forks. |
| 174 | |
| 175 | The fix is to repack each fork before removing the source: |
| 176 | |
| 177 | ``` |
| 178 | git repack -a -d --no-shared |
| 179 | ``` |
| 180 | |
| 181 | …runs in the fork's repo, copies all reachable objects into the |
| 182 | fork's own pack, then we can safely delete the source. |
| 183 | |
| 184 | This is a `KindRepoForkRepackOnSourceDelete` job (deferred from |
| 185 | S16; see `ListForksOfRepoForRepack` query that this sprint shipped |
| 186 | for it). The lifecycle worker's `repo_hard_delete` step needs to |
| 187 | fan out one repack job per fork, await completion, then proceed |
| 188 | with the FS delete. |
| 189 | |
| 190 | The query is in place; the job + the cascade wiring land in a |
| 191 | follow-up commit (or in S37 when the deploy plan freezes the |
| 192 | hard-delete sequence). |
| 193 | |
| 194 | ## Routes |
| 195 | |
| 196 | | Method | Path | Auth | Notes | |
| 197 | |--------|-------------------------------------|---------------|------------------------------------| |
| 198 | | POST | `/{owner}/{repo}/fork` | RequireUser | Create a fork | |
| 199 | | POST | `/{owner}/{repo}/sync` | RequireUser | Fast-forward fork from upstream | |
| 200 | | GET | `/{owner}/{repo}/forks` | public | Paginated list of forks | |
| 201 | |
| 202 | The `/fork` POST emits a `forked` domain event (kind=`forked`, |
| 203 | source_kind=`repo`) into S26's `domain_events` log so the future |
| 204 | activity feed picks it up. The `/sync` POST emits `repo_fork_synced` |
| 205 | through the audit log only (no public event). |
| 206 | |
| 207 | The fork-create handler also auto-watches the new fork at |
| 208 | `level=all` so the user sees fork-side events without having to |
| 209 | opt in. Matches GitHub's "watching your own forks" default. |
| 210 | |
| 211 | ## Pitfalls noted in code |
| 212 | |
| 213 | * Source-repo GC pruning fork-needed objects → `preciousObjects`. |
| 214 | * Source-repo deletion with active forks → S16 amendment (above). |
| 215 | * Cross-fork PR with deleted fork → mark |
| 216 | `mergeable_state='blocked'` with "head repository deleted" |
| 217 | reason at the merge gate (lands with cross-fork PR work). |
| 218 | * Fork rename / transfer → `fork_of_repo_id` is by-id so the |
| 219 | relationship survives. |
| 220 | * Sync race with concurrent push → CAS on update-ref; surfaces as |
| 221 | `ErrSyncRefRaced`. |
| 222 | * Fork-of-fork chains → spec leans "flatten alternates to root". |
| 223 | Today the clone uses `--shared` against whatever path we pass; if |
| 224 | the source is itself a fork, the alternates chain is two levels |
| 225 | deep. Acceptable for v1; the flattening lands when fork-of-fork |
| 226 | becomes a real user complaint. |