markdown · 9625 bytes Raw Blame History

Forks (S27)

S27 ships fork creation, fork sync (fast-forward only), ahead/behind stats, and the schema columns + triggers that maintain repos.fork_count. Cross-fork PRs and the S16 hard-delete-cascade amendment for repacking forks are scoped here too; the cross-fork PR deferral pointer and the S16 amendment are landed in their own sub-sections.

Schema

repos gained two columns in 0029:

  • fork_count bigint NOT NULL DEFAULT 0 — maintained by the forks_count_inc / forks_count_dec AFTER triggers on repos insert/delete. Decrement uses GREATEST(... - 1, 0) so a hand-written DB tweak that violates the trigger doesn't underflow into negatives.
  • init_status repo_init_status NOT NULL DEFAULT 'initialized' — enum ('initialized', 'init_pending', 'init_failed'). Synchronous repo creates (the S11 path) write 'initialized' directly. Forks start at 'init_pending'; the worker job flips to 'initialized' on success or 'init_failed' on permanent failure.

fork_of_repo_id was already present from S11 (the only column the S11 status block actually shipped — is_fork and fork_count were the missed ones, same shape as the S11/S26 gap noted in the stars-watchers doc).

We deliberately did NOT add is_fork — it would duplicate fork_of_repo_id IS NOT NULL and create the kind of two-source-of- truth drift that the audit penalises. Use the FK predicate.

On-disk layout

git clone --bare --shared <source> <fork> creates a fork whose objects/info/alternates file points back at the source's objects/ directory. Disk usage of the fork is essentially refs + a small overhead. The same-volume requirement (S04 RepoFS.Root()) is what makes alternates safe — alternates across volumes is undefined behaviour for git.

When a fork is created we additionally set extensions.preciousObjects = true on the source so a future git gc on the source can't prune objects the fork reaches via alternates. Idempotent; the fork-clone worker re-asserts on every new fork so missing config is self-healing.

Worker job

KindRepoForkClone (internal/worker/jobs/repo_fork_clone.go) runs the on-disk clone out of band so fork-create returns fast even for large source repos. Payload is {source_repo_id, fork_repo_id}.

The job's flow:

  1. Reload both repos by id (defends against soft-delete between enqueue and run).
  2. CloneBareShared(sourcePath, forkPath) — git clone + alternates.
  3. hooks.Install(forkPath, shithubdPath) — same hook install as the synchronous repo-create path so subsequent user pushes fire push:process.
  4. SetPreciousObjects(sourcePath) — pin source's objects.
  5. SetRepoInitStatus(fork.ID, 'initialized').

On any permanent failure: flip to 'init_failed' and return worker.PoisonError (no retries). The repo row stays so the user sees the failure; we don't auto-cleanup because that races concurrent retries.

Sync (fast-forward fork from upstream)

fork.Sync(ctx, deps, actorUserID, forkRepoID) only fast-forwards. Anything else (merge, rebase) belongs in the user's client; doing either server-side without the user's resolution preferences risks producing commits the user doesn't want.

Algorithm:

  1. Resolve both default-branch OIDs (fork, source).
  2. If equal → ErrSyncUpToDate.
  3. If fork is NOT an ancestor of upstream → ErrSyncDiverged.
  4. CAS update via repogit.UpdateRefCAS(fork, branch, upstream, fork) — the trailing fork argument is git's old-value guard. A concurrent push to the fork loses the CAS and surfaces as ErrSyncRefRaced.
  5. Update repos.default_branch_oid so the home view reflects the new tip without waiting for push:process (update-ref bypasses post-receive, same shape as the merge handler's fix in the audit-remediation sprint).

Empty fork (no branch yet) is handled via the 40-zero OID literal that git accepts as "ref must not exist yet" semantics — sync to an empty fork creates the branch from upstream's tip.

Ahead/behind

fork.AheadBehind(ctx, deps, forkRepoID) returns {Ahead, Behind, Comparable} where:

  • Ahead = commits in fork's default branch not in source's.
  • Behind = commits in source's default branch not in fork's.
  • Comparable = false when either side's default ref is missing (empty fork, never-initialised source).

Implementation: read both OIDs, then run git rev-list --left-right --count inside the fork's repo. Because the fork shares object alternates with the source, the upstream OID resolves without an explicit fetch.

This is the floor implementation. S36's perf-pass sprint adds an LRU cache keyed on (fork_repo_id, fork_default_oid, upstream_default_oid) — already documented in S36's "Code-tab caching" deliverables and on the S00–S25 audit's H4 deferral.

Visibility floor

fork.allowedTargetVisibility(source, target) enforces:

source target=public target=private target=""
public public
private private

Forking private → public would expose previously-private content and is always rejected (ErrVisibilityFloor).

Permission lattice

policy.ActionForkCreate was already in the registry from earlier sprints. Today's gating shape:

  • Anonymous on any repo → deny (DenyAnonymous).
  • Logged-in on a repo they can read → allow (login-required, no role gate).
  • Logged-in on a private repo they CAN'T read → deny (DenyVisibility, leaks as 404 at the handler layer).

Suspended actors are blocked by step 3 of policy.Can (suspended + write action → deny). Fork create counts as write (it mutates the target owner's namespace).

Cross-fork PRs (deferred to a follow-up)

S27's spec lists cross-fork PR support as in-scope, but the actual plumbing — fetching the fork's head into the base repo's refs/shithub-pr/<pr_id>/head namespace and routing the merge from the internal ref — is large enough that this sprint ships fork creation, sync, and ahead/behind only. The cross-fork PR work is tracked here as a follow-up:

  • Extend pulls.Create to accept head_repo_id != repo_id.
  • Add repogit.FetchIntoNamespace (already shipped in this sprint for the eventual consumer).
  • pulls.Synchronize reads head from the internal ref when pull_request.head_repo_id != pull_request.repo_id.
  • pulls.Merge worktree-add reads head from the internal ref.
  • Re-check fork visibility at merge time (the merger may have lost read access on the head between PR open and merge).

The internal ref is private — we never advertise it via info/refs. The git-http handler's ref filter already restricts to refs/heads/* and refs/tags/*, so the namespace is naturally hidden.

S16 hard-delete cascade amendment

When a source repo with active forks is hard-deleted, the forks become orphans (fork_of_repo_id ON DELETE SET NULL from the existing FK). Today the orphan forks have only the refs they added since fork — the objects up to fork point still live in the source. Hard-deleting the source would prune those objects and break the orphan forks.

The fix is to repack each fork before removing the source:

git repack -a -d --no-shared

…runs in the fork's repo, copies all reachable objects into the fork's own pack, then we can safely delete the source.

This is a KindRepoForkRepackOnSourceDelete job (deferred from S16; see ListForksOfRepoForRepack query that this sprint shipped for it). The lifecycle worker's repo_hard_delete step needs to fan out one repack job per fork, await completion, then proceed with the FS delete.

The query is in place; the job + the cascade wiring land in a follow-up commit (or in S37 when the deploy plan freezes the hard-delete sequence).

Routes

Method Path Auth Notes
POST /{owner}/{repo}/fork RequireUser Create a fork
POST /{owner}/{repo}/sync RequireUser Fast-forward fork from upstream
GET /{owner}/{repo}/forks public Paginated list of forks

The /fork POST emits a forked domain event (kind=forked, source_kind=repo) into S26's domain_events log so the future activity feed picks it up. The /sync POST emits repo_fork_synced through the audit log only (no public event).

The fork-create handler also auto-watches the new fork at level=all so the user sees fork-side events without having to opt in. Matches GitHub's "watching your own forks" default.

Pitfalls noted in code

  • Source-repo GC pruning fork-needed objects → preciousObjects.
  • Source-repo deletion with active forks → S16 amendment (above).
  • Cross-fork PR with deleted fork → mark mergeable_state='blocked' with "head repository deleted" reason at the merge gate (lands with cross-fork PR work).
  • Fork rename / transfer → fork_of_repo_id is by-id so the relationship survives.
  • Sync race with concurrent push → CAS on update-ref; surfaces as ErrSyncRefRaced.
  • Fork-of-fork chains → spec leans "flatten alternates to root". Today the clone uses --shared against whatever path we pass; if the source is itself a fork, the alternates chain is two levels deep. Acceptable for v1; the flattening lands when fork-of-fork becomes a real user complaint.
View source
1 # Forks (S27)
2
3 S27 ships fork creation, fork sync (fast-forward only), ahead/behind
4 stats, and the schema columns + triggers that maintain
5 `repos.fork_count`. Cross-fork PRs and the S16 hard-delete-cascade
6 amendment for repacking forks are scoped here too; the cross-fork PR
7 deferral pointer and the S16 amendment are landed in their own
8 sub-sections.
9
10 ## Schema
11
12 `repos` gained two columns in 0029:
13
14 * `fork_count bigint NOT NULL DEFAULT 0` — maintained by the
15 `forks_count_inc` / `forks_count_dec` AFTER triggers on `repos`
16 insert/delete. Decrement uses `GREATEST(... - 1, 0)` so a
17 hand-written DB tweak that violates the trigger doesn't
18 underflow into negatives.
19 * `init_status repo_init_status NOT NULL DEFAULT 'initialized'`
20 enum `('initialized', 'init_pending', 'init_failed')`. Synchronous
21 repo creates (the S11 path) write `'initialized'` directly. Forks
22 start at `'init_pending'`; the worker job flips to `'initialized'`
23 on success or `'init_failed'` on permanent failure.
24
25 `fork_of_repo_id` was already present from S11 (the only column the
26 S11 status block actually shipped — `is_fork` and `fork_count` were
27 the missed ones, same shape as the S11/S26 gap noted in the
28 stars-watchers doc).
29
30 We deliberately did NOT add `is_fork` — it would duplicate
31 `fork_of_repo_id IS NOT NULL` and create the kind of two-source-of-
32 truth drift that the audit penalises. Use the FK predicate.
33
34 ## On-disk layout
35
36 `git clone --bare --shared <source> <fork>` creates a fork whose
37 `objects/info/alternates` file points back at the source's `objects/`
38 directory. Disk usage of the fork is essentially refs + a small
39 overhead. The same-volume requirement (S04 `RepoFS.Root()`) is what
40 makes alternates safe — alternates across volumes is undefined
41 behaviour for git.
42
43 When a fork is created we additionally set
44 `extensions.preciousObjects = true` on the **source** so a future
45 `git gc` on the source can't prune objects the fork reaches via
46 alternates. Idempotent; the fork-clone worker re-asserts on every
47 new fork so missing config is self-healing.
48
49 ## Worker job
50
51 `KindRepoForkClone` (`internal/worker/jobs/repo_fork_clone.go`) runs
52 the on-disk clone out of band so fork-create returns fast even for
53 large source repos. Payload is `{source_repo_id, fork_repo_id}`.
54
55 The job's flow:
56
57 1. Reload both repos by id (defends against soft-delete between
58 enqueue and run).
59 2. `CloneBareShared(sourcePath, forkPath)` — git clone + alternates.
60 3. `hooks.Install(forkPath, shithubdPath)` — same hook install as
61 the synchronous repo-create path so subsequent user pushes fire
62 `push:process`.
63 4. `SetPreciousObjects(sourcePath)` — pin source's objects.
64 5. `SetRepoInitStatus(fork.ID, 'initialized')`.
65
66 On any permanent failure: flip to `'init_failed'` and return
67 `worker.PoisonError` (no retries). The repo row stays so the user
68 sees the failure; we don't auto-cleanup because that races concurrent
69 retries.
70
71 ## Sync (fast-forward fork from upstream)
72
73 `fork.Sync(ctx, deps, actorUserID, forkRepoID)` only fast-forwards.
74 Anything else (merge, rebase) belongs in the user's client; doing
75 either server-side without the user's resolution preferences risks
76 producing commits the user doesn't want.
77
78 Algorithm:
79
80 1. Resolve both default-branch OIDs (`fork`, `source`).
81 2. If equal → `ErrSyncUpToDate`.
82 3. If fork is NOT an ancestor of upstream → `ErrSyncDiverged`.
83 4. CAS update via `repogit.UpdateRefCAS(fork, branch, upstream, fork)`
84 — the trailing `fork` argument is git's old-value guard. A
85 concurrent push to the fork loses the CAS and surfaces as
86 `ErrSyncRefRaced`.
87 5. Update `repos.default_branch_oid` so the home view reflects the
88 new tip without waiting for `push:process` (update-ref bypasses
89 `post-receive`, same shape as the merge handler's fix in the
90 audit-remediation sprint).
91
92 Empty fork (no branch yet) is handled via the 40-zero OID literal
93 that git accepts as "ref must not exist yet" semantics — sync to
94 an empty fork creates the branch from upstream's tip.
95
96 ## Ahead/behind
97
98 `fork.AheadBehind(ctx, deps, forkRepoID)` returns
99 `{Ahead, Behind, Comparable}` where:
100
101 * `Ahead` = commits in fork's default branch not in source's.
102 * `Behind` = commits in source's default branch not in fork's.
103 * `Comparable` = false when either side's default ref is missing
104 (empty fork, never-initialised source).
105
106 Implementation: read both OIDs, then run
107 `git rev-list --left-right --count` *inside the fork's repo*.
108 Because the fork shares object alternates with the source, the
109 upstream OID resolves without an explicit fetch.
110
111 This is the floor implementation. S36's perf-pass sprint adds an
112 LRU cache keyed on `(fork_repo_id, fork_default_oid,
113 upstream_default_oid)` — already documented in S36's "Code-tab
114 caching" deliverables and on the S00–S25 audit's H4 deferral.
115
116 ## Visibility floor
117
118 `fork.allowedTargetVisibility(source, target)` enforces:
119
120 | source | target=public | target=private | target="" |
121 |---------|---------------|----------------|-----------|
122 | public | ✓ | ✓ | public |
123 | private | ✗ | ✓ | private |
124
125 Forking private → public would expose previously-private content
126 and is always rejected (`ErrVisibilityFloor`).
127
128 ## Permission lattice
129
130 `policy.ActionForkCreate` was already in the registry from earlier
131 sprints. Today's gating shape:
132
133 * Anonymous on any repo → deny (`DenyAnonymous`).
134 * Logged-in on a repo they can read → allow (login-required, no
135 role gate).
136 * Logged-in on a private repo they CAN'T read → deny
137 (`DenyVisibility`, leaks as 404 at the handler layer).
138
139 Suspended actors are blocked by step 3 of `policy.Can` (suspended +
140 write action → deny). Fork create counts as write (it mutates the
141 target owner's namespace).
142
143 ## Cross-fork PRs (deferred to a follow-up)
144
145 S27's spec lists cross-fork PR support as in-scope, but the actual
146 plumbing — fetching the fork's head into the base repo's
147 `refs/shithub-pr/<pr_id>/head` namespace and routing the merge from
148 the internal ref — is large enough that this sprint ships fork
149 creation, sync, and ahead/behind only. The cross-fork PR work is
150 tracked here as a follow-up:
151
152 * Extend `pulls.Create` to accept `head_repo_id != repo_id`.
153 * Add `repogit.FetchIntoNamespace` (already shipped in this sprint
154 for the eventual consumer).
155 * `pulls.Synchronize` reads head from the internal ref when
156 `pull_request.head_repo_id != pull_request.repo_id`.
157 * `pulls.Merge` worktree-add reads head from the internal ref.
158 * Re-check fork visibility at merge time (the merger may have lost
159 read access on the head between PR open and merge).
160
161 The internal ref is private — we never advertise it via
162 `info/refs`. The git-http handler's ref filter already restricts
163 to `refs/heads/*` and `refs/tags/*`, so the namespace is
164 naturally hidden.
165
166 ## S16 hard-delete cascade amendment
167
168 When a source repo with active forks is hard-deleted, the forks
169 become orphans (`fork_of_repo_id ON DELETE SET NULL` from the
170 existing FK). Today the orphan forks have only the *refs* they
171 added since fork — the objects up to fork point still live in the
172 source. Hard-deleting the source would prune those objects and
173 break the orphan forks.
174
175 The fix is to repack each fork before removing the source:
176
177 ```
178 git repack -a -d --no-shared
179 ```
180
181 …runs in the fork's repo, copies all reachable objects into the
182 fork's own pack, then we can safely delete the source.
183
184 This is a `KindRepoForkRepackOnSourceDelete` job (deferred from
185 S16; see `ListForksOfRepoForRepack` query that this sprint shipped
186 for it). The lifecycle worker's `repo_hard_delete` step needs to
187 fan out one repack job per fork, await completion, then proceed
188 with the FS delete.
189
190 The query is in place; the job + the cascade wiring land in a
191 follow-up commit (or in S37 when the deploy plan freezes the
192 hard-delete sequence).
193
194 ## Routes
195
196 | Method | Path | Auth | Notes |
197 |--------|-------------------------------------|---------------|------------------------------------|
198 | POST | `/{owner}/{repo}/fork` | RequireUser | Create a fork |
199 | POST | `/{owner}/{repo}/sync` | RequireUser | Fast-forward fork from upstream |
200 | GET | `/{owner}/{repo}/forks` | public | Paginated list of forks |
201
202 The `/fork` POST emits a `forked` domain event (kind=`forked`,
203 source_kind=`repo`) into S26's `domain_events` log so the future
204 activity feed picks it up. The `/sync` POST emits `repo_fork_synced`
205 through the audit log only (no public event).
206
207 The fork-create handler also auto-watches the new fork at
208 `level=all` so the user sees fork-side events without having to
209 opt in. Matches GitHub's "watching your own forks" default.
210
211 ## Pitfalls noted in code
212
213 * Source-repo GC pruning fork-needed objects → `preciousObjects`.
214 * Source-repo deletion with active forks → S16 amendment (above).
215 * Cross-fork PR with deleted fork → mark
216 `mergeable_state='blocked'` with "head repository deleted"
217 reason at the merge gate (lands with cross-fork PR work).
218 * Fork rename / transfer → `fork_of_repo_id` is by-id so the
219 relationship survives.
220 * Sync race with concurrent push → CAS on update-ref; surfaces as
221 `ErrSyncRefRaced`.
222 * Fork-of-fork chains → spec leans "flatten alternates to root".
223 Today the clone uses `--shared` against whatever path we pass; if
224 the source is itself a fork, the alternates chain is two levels
225 deep. Acceptable for v1; the flattening lands when fork-of-fork
226 becomes a real user complaint.