@@ -0,0 +1,226 @@ |
| 1 | +# Forks (S27) |
| 2 | + |
| 3 | +S27 ships fork creation, fork sync (fast-forward only), ahead/behind |
| 4 | +stats, and the schema columns + triggers that maintain |
| 5 | +`repos.fork_count`. Cross-fork PRs and the S16 hard-delete-cascade |
| 6 | +amendment for repacking forks are scoped here too; the cross-fork PR |
| 7 | +deferral pointer and the S16 amendment are landed in their own |
| 8 | +sub-sections. |
| 9 | + |
| 10 | +## Schema |
| 11 | + |
| 12 | +`repos` gained two columns in 0029: |
| 13 | + |
| 14 | +* `fork_count bigint NOT NULL DEFAULT 0` — maintained by the |
| 15 | + `forks_count_inc` / `forks_count_dec` AFTER triggers on `repos` |
| 16 | + insert/delete. Decrement uses `GREATEST(... - 1, 0)` so a |
| 17 | + hand-written DB tweak that violates the trigger doesn't |
| 18 | + underflow into negatives. |
| 19 | +* `init_status repo_init_status NOT NULL DEFAULT 'initialized'` — |
| 20 | + enum `('initialized', 'init_pending', 'init_failed')`. Synchronous |
| 21 | + repo creates (the S11 path) write `'initialized'` directly. Forks |
| 22 | + start at `'init_pending'`; the worker job flips to `'initialized'` |
| 23 | + on success or `'init_failed'` on permanent failure. |
| 24 | + |
| 25 | +`fork_of_repo_id` was already present from S11 (the only column the |
| 26 | +S11 status block actually shipped — `is_fork` and `fork_count` were |
| 27 | +the missed ones, same shape as the S11/S26 gap noted in the |
| 28 | +stars-watchers doc). |
| 29 | + |
| 30 | +We deliberately did NOT add `is_fork` — it would duplicate |
| 31 | +`fork_of_repo_id IS NOT NULL` and create the kind of two-source-of- |
| 32 | +truth drift that the audit penalises. Use the FK predicate. |
| 33 | + |
| 34 | +## On-disk layout |
| 35 | + |
| 36 | +`git clone --bare --shared <source> <fork>` creates a fork whose |
| 37 | +`objects/info/alternates` file points back at the source's `objects/` |
| 38 | +directory. Disk usage of the fork is essentially refs + a small |
| 39 | +overhead. The same-volume requirement (S04 `RepoFS.Root()`) is what |
| 40 | +makes alternates safe — alternates across volumes is undefined |
| 41 | +behaviour for git. |
| 42 | + |
| 43 | +When a fork is created we additionally set |
| 44 | +`extensions.preciousObjects = true` on the **source** so a future |
| 45 | +`git gc` on the source can't prune objects the fork reaches via |
| 46 | +alternates. Idempotent; the fork-clone worker re-asserts on every |
| 47 | +new fork so missing config is self-healing. |
| 48 | + |
| 49 | +## Worker job |
| 50 | + |
| 51 | +`KindRepoForkClone` (`internal/worker/jobs/repo_fork_clone.go`) runs |
| 52 | +the on-disk clone out of band so fork-create returns fast even for |
| 53 | +large source repos. Payload is `{source_repo_id, fork_repo_id}`. |
| 54 | + |
| 55 | +The job's flow: |
| 56 | + |
| 57 | +1. Reload both repos by id (defends against soft-delete between |
| 58 | + enqueue and run). |
| 59 | +2. `CloneBareShared(sourcePath, forkPath)` — git clone + alternates. |
| 60 | +3. `hooks.Install(forkPath, shithubdPath)` — same hook install as |
| 61 | + the synchronous repo-create path so subsequent user pushes fire |
| 62 | + `push:process`. |
| 63 | +4. `SetPreciousObjects(sourcePath)` — pin source's objects. |
| 64 | +5. `SetRepoInitStatus(fork.ID, 'initialized')`. |
| 65 | + |
| 66 | +On any permanent failure: flip to `'init_failed'` and return |
| 67 | +`worker.PoisonError` (no retries). The repo row stays so the user |
| 68 | +sees the failure; we don't auto-cleanup because that races concurrent |
| 69 | +retries. |
| 70 | + |
| 71 | +## Sync (fast-forward fork from upstream) |
| 72 | + |
| 73 | +`fork.Sync(ctx, deps, actorUserID, forkRepoID)` only fast-forwards. |
| 74 | +Anything else (merge, rebase) belongs in the user's client; doing |
| 75 | +either server-side without the user's resolution preferences risks |
| 76 | +producing commits the user doesn't want. |
| 77 | + |
| 78 | +Algorithm: |
| 79 | + |
| 80 | +1. Resolve both default-branch OIDs (`fork`, `source`). |
| 81 | +2. If equal → `ErrSyncUpToDate`. |
| 82 | +3. If fork is NOT an ancestor of upstream → `ErrSyncDiverged`. |
| 83 | +4. CAS update via `repogit.UpdateRefCAS(fork, branch, upstream, fork)` |
| 84 | + — the trailing `fork` argument is git's old-value guard. A |
| 85 | + concurrent push to the fork loses the CAS and surfaces as |
| 86 | + `ErrSyncRefRaced`. |
| 87 | +5. Update `repos.default_branch_oid` so the home view reflects the |
| 88 | + new tip without waiting for `push:process` (update-ref bypasses |
| 89 | + `post-receive`, same shape as the merge handler's fix in the |
| 90 | + audit-remediation sprint). |
| 91 | + |
| 92 | +Empty fork (no branch yet) is handled via the 40-zero OID literal |
| 93 | +that git accepts as "ref must not exist yet" semantics — sync to |
| 94 | +an empty fork creates the branch from upstream's tip. |
| 95 | + |
| 96 | +## Ahead/behind |
| 97 | + |
| 98 | +`fork.AheadBehind(ctx, deps, forkRepoID)` returns |
| 99 | +`{Ahead, Behind, Comparable}` where: |
| 100 | + |
| 101 | +* `Ahead` = commits in fork's default branch not in source's. |
| 102 | +* `Behind` = commits in source's default branch not in fork's. |
| 103 | +* `Comparable` = false when either side's default ref is missing |
| 104 | + (empty fork, never-initialised source). |
| 105 | + |
| 106 | +Implementation: read both OIDs, then run |
| 107 | +`git rev-list --left-right --count` *inside the fork's repo*. |
| 108 | +Because the fork shares object alternates with the source, the |
| 109 | +upstream OID resolves without an explicit fetch. |
| 110 | + |
| 111 | +This is the floor implementation. S36's perf-pass sprint adds an |
| 112 | +LRU cache keyed on `(fork_repo_id, fork_default_oid, |
| 113 | +upstream_default_oid)` — already documented in S36's "Code-tab |
| 114 | +caching" deliverables and on the S00–S25 audit's H4 deferral. |
| 115 | + |
| 116 | +## Visibility floor |
| 117 | + |
| 118 | +`fork.allowedTargetVisibility(source, target)` enforces: |
| 119 | + |
| 120 | +| source | target=public | target=private | target="" | |
| 121 | +|---------|---------------|----------------|-----------| |
| 122 | +| public | ✓ | ✓ | public | |
| 123 | +| private | ✗ | ✓ | private | |
| 124 | + |
| 125 | +Forking private → public would expose previously-private content |
| 126 | +and is always rejected (`ErrVisibilityFloor`). |
| 127 | + |
| 128 | +## Permission lattice |
| 129 | + |
| 130 | +`policy.ActionForkCreate` was already in the registry from earlier |
| 131 | +sprints. Today's gating shape: |
| 132 | + |
| 133 | +* Anonymous on any repo → deny (`DenyAnonymous`). |
| 134 | +* Logged-in on a repo they can read → allow (login-required, no |
| 135 | + role gate). |
| 136 | +* Logged-in on a private repo they CAN'T read → deny |
| 137 | + (`DenyVisibility`, leaks as 404 at the handler layer). |
| 138 | + |
| 139 | +Suspended actors are blocked by step 3 of `policy.Can` (suspended + |
| 140 | +write action → deny). Fork create counts as write (it mutates the |
| 141 | +target owner's namespace). |
| 142 | + |
| 143 | +## Cross-fork PRs (deferred to a follow-up) |
| 144 | + |
| 145 | +S27's spec lists cross-fork PR support as in-scope, but the actual |
| 146 | +plumbing — fetching the fork's head into the base repo's |
| 147 | +`refs/shithub-pr/<pr_id>/head` namespace and routing the merge from |
| 148 | +the internal ref — is large enough that this sprint ships fork |
| 149 | +creation, sync, and ahead/behind only. The cross-fork PR work is |
| 150 | +tracked here as a follow-up: |
| 151 | + |
| 152 | +* Extend `pulls.Create` to accept `head_repo_id != repo_id`. |
| 153 | +* Add `repogit.FetchIntoNamespace` (already shipped in this sprint |
| 154 | + for the eventual consumer). |
| 155 | +* `pulls.Synchronize` reads head from the internal ref when |
| 156 | + `pull_request.head_repo_id != pull_request.repo_id`. |
| 157 | +* `pulls.Merge` worktree-add reads head from the internal ref. |
| 158 | +* Re-check fork visibility at merge time (the merger may have lost |
| 159 | + read access on the head between PR open and merge). |
| 160 | + |
| 161 | +The internal ref is private — we never advertise it via |
| 162 | +`info/refs`. The git-http handler's ref filter already restricts |
| 163 | +to `refs/heads/*` and `refs/tags/*`, so the namespace is |
| 164 | +naturally hidden. |
| 165 | + |
| 166 | +## S16 hard-delete cascade amendment |
| 167 | + |
| 168 | +When a source repo with active forks is hard-deleted, the forks |
| 169 | +become orphans (`fork_of_repo_id ON DELETE SET NULL` from the |
| 170 | +existing FK). Today the orphan forks have only the *refs* they |
| 171 | +added since fork — the objects up to fork point still live in the |
| 172 | +source. Hard-deleting the source would prune those objects and |
| 173 | +break the orphan forks. |
| 174 | + |
| 175 | +The fix is to repack each fork before removing the source: |
| 176 | + |
| 177 | +``` |
| 178 | +git repack -a -d --no-shared |
| 179 | +``` |
| 180 | + |
| 181 | +…runs in the fork's repo, copies all reachable objects into the |
| 182 | +fork's own pack, then we can safely delete the source. |
| 183 | + |
| 184 | +This is a `KindRepoForkRepackOnSourceDelete` job (deferred from |
| 185 | +S16; see `ListForksOfRepoForRepack` query that this sprint shipped |
| 186 | +for it). The lifecycle worker's `repo_hard_delete` step needs to |
| 187 | +fan out one repack job per fork, await completion, then proceed |
| 188 | +with the FS delete. |
| 189 | + |
| 190 | +The query is in place; the job + the cascade wiring land in a |
| 191 | +follow-up commit (or in S37 when the deploy plan freezes the |
| 192 | +hard-delete sequence). |
| 193 | + |
| 194 | +## Routes |
| 195 | + |
| 196 | +| Method | Path | Auth | Notes | |
| 197 | +|--------|-------------------------------------|---------------|------------------------------------| |
| 198 | +| POST | `/{owner}/{repo}/fork` | RequireUser | Create a fork | |
| 199 | +| POST | `/{owner}/{repo}/sync` | RequireUser | Fast-forward fork from upstream | |
| 200 | +| GET | `/{owner}/{repo}/forks` | public | Paginated list of forks | |
| 201 | + |
| 202 | +The `/fork` POST emits a `forked` domain event (kind=`forked`, |
| 203 | +source_kind=`repo`) into S26's `domain_events` log so the future |
| 204 | +activity feed picks it up. The `/sync` POST emits `repo_fork_synced` |
| 205 | +through the audit log only (no public event). |
| 206 | + |
| 207 | +The fork-create handler also auto-watches the new fork at |
| 208 | +`level=all` so the user sees fork-side events without having to |
| 209 | +opt in. Matches GitHub's "watching your own forks" default. |
| 210 | + |
| 211 | +## Pitfalls noted in code |
| 212 | + |
| 213 | +* Source-repo GC pruning fork-needed objects → `preciousObjects`. |
| 214 | +* Source-repo deletion with active forks → S16 amendment (above). |
| 215 | +* Cross-fork PR with deleted fork → mark |
| 216 | + `mergeable_state='blocked'` with "head repository deleted" |
| 217 | + reason at the merge gate (lands with cross-fork PR work). |
| 218 | +* Fork rename / transfer → `fork_of_repo_id` is by-id so the |
| 219 | + relationship survives. |
| 220 | +* Sync race with concurrent push → CAS on update-ref; surfaces as |
| 221 | + `ErrSyncRefRaced`. |
| 222 | +* Fork-of-fork chains → spec leans "flatten alternates to root". |
| 223 | + Today the clone uses `--shared` against whatever path we pass; if |
| 224 | + the source is itself a fork, the alternates chain is two levels |
| 225 | + deep. Acceptable for v1; the flattening lands when fork-of-fork |
| 226 | + becomes a real user complaint. |