tenseleyflow/shithub / b6e3156

Browse files

S27: docs/internal/forks.md

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
b6e3156e6d50dc88668c26b5e2039c007aa95d6c
Parents
cf04697
Tree
ded829f

1 changed file

StatusFile+-
A docs/internal/forks.md 226 0
docs/internal/forks.mdadded
@@ -0,0 +1,226 @@
1
+# Forks (S27)
2
+
3
+S27 ships fork creation, fork sync (fast-forward only), ahead/behind
4
+stats, and the schema columns + triggers that maintain
5
+`repos.fork_count`. Cross-fork PRs and the S16 hard-delete-cascade
6
+amendment for repacking forks are scoped here too; the cross-fork PR
7
+deferral pointer and the S16 amendment are landed in their own
8
+sub-sections.
9
+
10
+## Schema
11
+
12
+`repos` gained two columns in 0029:
13
+
14
+* `fork_count bigint NOT NULL DEFAULT 0` — maintained by the
15
+  `forks_count_inc` / `forks_count_dec` AFTER triggers on `repos`
16
+  insert/delete. Decrement uses `GREATEST(... - 1, 0)` so a
17
+  hand-written DB tweak that violates the trigger doesn't
18
+  underflow into negatives.
19
+* `init_status repo_init_status NOT NULL DEFAULT 'initialized'` —
20
+  enum `('initialized', 'init_pending', 'init_failed')`. Synchronous
21
+  repo creates (the S11 path) write `'initialized'` directly. Forks
22
+  start at `'init_pending'`; the worker job flips to `'initialized'`
23
+  on success or `'init_failed'` on permanent failure.
24
+
25
+`fork_of_repo_id` was already present from S11 (the only column the
26
+S11 status block actually shipped — `is_fork` and `fork_count` were
27
+the missed ones, same shape as the S11/S26 gap noted in the
28
+stars-watchers doc).
29
+
30
+We deliberately did NOT add `is_fork` — it would duplicate
31
+`fork_of_repo_id IS NOT NULL` and create the kind of two-source-of-
32
+truth drift that the audit penalises. Use the FK predicate.
33
+
34
+## On-disk layout
35
+
36
+`git clone --bare --shared <source> <fork>` creates a fork whose
37
+`objects/info/alternates` file points back at the source's `objects/`
38
+directory. Disk usage of the fork is essentially refs + a small
39
+overhead. The same-volume requirement (S04 `RepoFS.Root()`) is what
40
+makes alternates safe — alternates across volumes is undefined
41
+behaviour for git.
42
+
43
+When a fork is created we additionally set
44
+`extensions.preciousObjects = true` on the **source** so a future
45
+`git gc` on the source can't prune objects the fork reaches via
46
+alternates. Idempotent; the fork-clone worker re-asserts on every
47
+new fork so missing config is self-healing.
48
+
49
+## Worker job
50
+
51
+`KindRepoForkClone` (`internal/worker/jobs/repo_fork_clone.go`) runs
52
+the on-disk clone out of band so fork-create returns fast even for
53
+large source repos. Payload is `{source_repo_id, fork_repo_id}`.
54
+
55
+The job's flow:
56
+
57
+1. Reload both repos by id (defends against soft-delete between
58
+   enqueue and run).
59
+2. `CloneBareShared(sourcePath, forkPath)` — git clone + alternates.
60
+3. `hooks.Install(forkPath, shithubdPath)` — same hook install as
61
+   the synchronous repo-create path so subsequent user pushes fire
62
+   `push:process`.
63
+4. `SetPreciousObjects(sourcePath)` — pin source's objects.
64
+5. `SetRepoInitStatus(fork.ID, 'initialized')`.
65
+
66
+On any permanent failure: flip to `'init_failed'` and return
67
+`worker.PoisonError` (no retries). The repo row stays so the user
68
+sees the failure; we don't auto-cleanup because that races concurrent
69
+retries.
70
+
71
+## Sync (fast-forward fork from upstream)
72
+
73
+`fork.Sync(ctx, deps, actorUserID, forkRepoID)` only fast-forwards.
74
+Anything else (merge, rebase) belongs in the user's client; doing
75
+either server-side without the user's resolution preferences risks
76
+producing commits the user doesn't want.
77
+
78
+Algorithm:
79
+
80
+1. Resolve both default-branch OIDs (`fork`, `source`).
81
+2. If equal → `ErrSyncUpToDate`.
82
+3. If fork is NOT an ancestor of upstream → `ErrSyncDiverged`.
83
+4. CAS update via `repogit.UpdateRefCAS(fork, branch, upstream, fork)`
84
+   — the trailing `fork` argument is git's old-value guard. A
85
+   concurrent push to the fork loses the CAS and surfaces as
86
+   `ErrSyncRefRaced`.
87
+5. Update `repos.default_branch_oid` so the home view reflects the
88
+   new tip without waiting for `push:process` (update-ref bypasses
89
+   `post-receive`, same shape as the merge handler's fix in the
90
+   audit-remediation sprint).
91
+
92
+Empty fork (no branch yet) is handled via the 40-zero OID literal
93
+that git accepts as "ref must not exist yet" semantics — sync to
94
+an empty fork creates the branch from upstream's tip.
95
+
96
+## Ahead/behind
97
+
98
+`fork.AheadBehind(ctx, deps, forkRepoID)` returns
99
+`{Ahead, Behind, Comparable}` where:
100
+
101
+* `Ahead` = commits in fork's default branch not in source's.
102
+* `Behind` = commits in source's default branch not in fork's.
103
+* `Comparable` = false when either side's default ref is missing
104
+  (empty fork, never-initialised source).
105
+
106
+Implementation: read both OIDs, then run
107
+`git rev-list --left-right --count` *inside the fork's repo*.
108
+Because the fork shares object alternates with the source, the
109
+upstream OID resolves without an explicit fetch.
110
+
111
+This is the floor implementation. S36's perf-pass sprint adds an
112
+LRU cache keyed on `(fork_repo_id, fork_default_oid,
113
+upstream_default_oid)` — already documented in S36's "Code-tab
114
+caching" deliverables and on the S00–S25 audit's H4 deferral.
115
+
116
+## Visibility floor
117
+
118
+`fork.allowedTargetVisibility(source, target)` enforces:
119
+
120
+| source  | target=public | target=private | target="" |
121
+|---------|---------------|----------------|-----------|
122
+| public  | ✓             | ✓              | public    |
123
+| private | ✗             | ✓              | private   |
124
+
125
+Forking private → public would expose previously-private content
126
+and is always rejected (`ErrVisibilityFloor`).
127
+
128
+## Permission lattice
129
+
130
+`policy.ActionForkCreate` was already in the registry from earlier
131
+sprints. Today's gating shape:
132
+
133
+* Anonymous on any repo → deny (`DenyAnonymous`).
134
+* Logged-in on a repo they can read → allow (login-required, no
135
+  role gate).
136
+* Logged-in on a private repo they CAN'T read → deny
137
+  (`DenyVisibility`, leaks as 404 at the handler layer).
138
+
139
+Suspended actors are blocked by step 3 of `policy.Can` (suspended +
140
+write action → deny). Fork create counts as write (it mutates the
141
+target owner's namespace).
142
+
143
+## Cross-fork PRs (deferred to a follow-up)
144
+
145
+S27's spec lists cross-fork PR support as in-scope, but the actual
146
+plumbing — fetching the fork's head into the base repo's
147
+`refs/shithub-pr/<pr_id>/head` namespace and routing the merge from
148
+the internal ref — is large enough that this sprint ships fork
149
+creation, sync, and ahead/behind only. The cross-fork PR work is
150
+tracked here as a follow-up:
151
+
152
+* Extend `pulls.Create` to accept `head_repo_id != repo_id`.
153
+* Add `repogit.FetchIntoNamespace` (already shipped in this sprint
154
+  for the eventual consumer).
155
+* `pulls.Synchronize` reads head from the internal ref when
156
+  `pull_request.head_repo_id != pull_request.repo_id`.
157
+* `pulls.Merge` worktree-add reads head from the internal ref.
158
+* Re-check fork visibility at merge time (the merger may have lost
159
+  read access on the head between PR open and merge).
160
+
161
+The internal ref is private — we never advertise it via
162
+`info/refs`. The git-http handler's ref filter already restricts
163
+to `refs/heads/*` and `refs/tags/*`, so the namespace is
164
+naturally hidden.
165
+
166
+## S16 hard-delete cascade amendment
167
+
168
+When a source repo with active forks is hard-deleted, the forks
169
+become orphans (`fork_of_repo_id ON DELETE SET NULL` from the
170
+existing FK). Today the orphan forks have only the *refs* they
171
+added since fork — the objects up to fork point still live in the
172
+source. Hard-deleting the source would prune those objects and
173
+break the orphan forks.
174
+
175
+The fix is to repack each fork before removing the source:
176
+
177
+```
178
+git repack -a -d --no-shared
179
+```
180
+
181
+…runs in the fork's repo, copies all reachable objects into the
182
+fork's own pack, then we can safely delete the source.
183
+
184
+This is a `KindRepoForkRepackOnSourceDelete` job (deferred from
185
+S16; see `ListForksOfRepoForRepack` query that this sprint shipped
186
+for it). The lifecycle worker's `repo_hard_delete` step needs to
187
+fan out one repack job per fork, await completion, then proceed
188
+with the FS delete.
189
+
190
+The query is in place; the job + the cascade wiring land in a
191
+follow-up commit (or in S37 when the deploy plan freezes the
192
+hard-delete sequence).
193
+
194
+## Routes
195
+
196
+| Method | Path                                | Auth          | Notes                              |
197
+|--------|-------------------------------------|---------------|------------------------------------|
198
+| POST   | `/{owner}/{repo}/fork`              | RequireUser   | Create a fork                      |
199
+| POST   | `/{owner}/{repo}/sync`              | RequireUser   | Fast-forward fork from upstream    |
200
+| GET    | `/{owner}/{repo}/forks`             | public        | Paginated list of forks            |
201
+
202
+The `/fork` POST emits a `forked` domain event (kind=`forked`,
203
+source_kind=`repo`) into S26's `domain_events` log so the future
204
+activity feed picks it up. The `/sync` POST emits `repo_fork_synced`
205
+through the audit log only (no public event).
206
+
207
+The fork-create handler also auto-watches the new fork at
208
+`level=all` so the user sees fork-side events without having to
209
+opt in. Matches GitHub's "watching your own forks" default.
210
+
211
+## Pitfalls noted in code
212
+
213
+* Source-repo GC pruning fork-needed objects → `preciousObjects`.
214
+* Source-repo deletion with active forks → S16 amendment (above).
215
+* Cross-fork PR with deleted fork → mark
216
+  `mergeable_state='blocked'` with "head repository deleted"
217
+  reason at the merge gate (lands with cross-fork PR work).
218
+* Fork rename / transfer → `fork_of_repo_id` is by-id so the
219
+  relationship survives.
220
+* Sync race with concurrent push → CAS on update-ref; surfaces as
221
+  `ErrSyncRefRaced`.
222
+* Fork-of-fork chains → spec leans "flatten alternates to root".
223
+  Today the clone uses `--shared` against whatever path we pass; if
224
+  the source is itself a fork, the alternates chain is two levels
225
+  deep. Acceptable for v1; the flattening lands when fork-of-fork
226
+  becomes a real user complaint.