tenseleyflow/shithub / 271ba33

Browse files

S11: docs/internal/repo-create.md — sprint reference

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
271ba334ab09ac004dbd2e631a56eeef435a71de
Parents
ce07b86
Tree
b785b90

1 changed file

StatusFile+-
A docs/internal/repo-create.md 122 0
docs/internal/repo-create.mdadded
@@ -0,0 +1,122 @@
1
+# Repository creation
2
+
3
+S11 ships the create-a-repo flow end-to-end: a logged-in user clicks **New**, fills out the form, and lands on the empty-repo home with a quick-setup snippet showing how to push code. The on-disk bare repo is created via S04's `RepoFS.InitBare`; if the user ticked "initialize," a single initial commit is built using git plumbing only — no working tree.
4
+
5
+## What's wired
6
+
7
+- **Migration:** `0017_repos.sql` adds the `repos` table (with `repo_visibility` enum, owner XOR check, per-owner unique-by-name partial indexes, soft-delete column).
8
+- **sqlc package:** `internal/repos/sqlc` (`reposdb`) — Create, Get-by-owner-and-name, Exists, List-by-owner, Count, SoftDelete, UpdateDiskUsed.
9
+- `internal/repos/validate.go` — name shape (≤100 chars, `[a-z0-9._-]`, non-separator edges, no dot-dot, no leading dot) + reserved-name list.
10
+- `internal/repos/templates/` — embeds 10 SPDX licenses + 10 .gitignore templates + a minimal README generator. Sourced from gitea's `options/license` and `options/gitignore` (originally github.com/github/gitignore, MIT/CC0).
11
+- `internal/repos/git/plumbing.go` — `InitialCommit{}.Build(ctx)` runs `git hash-object → update-index → write-tree → commit-tree → update-ref` against a temp index file. No working tree spawned.
12
+- `internal/repos/create.go` — orchestrator: validate → rate-limit → resolve author → tx-insert → InitBare → optional initial commit → audit. Cleans up the on-disk dir on any post-DB failure.
13
+- `internal/web/handlers/repo/repo.go` — GET/POST `/new`, GET `/{owner}/{repo}` (empty home placeholder for S11; S17 will replace).
14
+- `internal/web/templates/repo/{new,empty}.html` + GitHub-aligned CSS.
15
+
16
+## Routes
17
+
18
+| Route | Method | Handler | Notes |
19
+|---|---|---|---|
20
+| `/new` | GET | `repo.newRepoForm` | Auth-required; renders the form. |
21
+| `/new` | POST | `repo.newRepoSubmit` | Auth-required; calls `repos.Create`, redirects on success. |
22
+| `/{owner}/{repo}` | GET | `repo.repoHome` | Two-segment match — does NOT collide with the `/{username}` catch-all. Visibility-aware. |
23
+
24
+`/new` is on the reserved-name list so the catch-all profile route can't shadow it. Two-segment `/{owner}/{repo}` doesn't collide with the one-segment `/{username}` route — chi matches by segment count.
25
+
26
+## Creation flow
27
+
28
+```
29
+POST /new
30
+  │
31
+  ├─ ValidateName / ValidateDescription (friendly error if bad shape)
32
+  ├─ Visibility ∈ {"public", "private"}
33
+  ├─ License/Gitignore keys ∈ curated list (when set)
34
+  ├─ Limiter.Hit(scope=repo_create, ident=user:<id>, max=10/hour)
35
+  ├─ Resolve author = display name + verified primary email
36
+  │     (refuse with ErrNoVerifiedEmail when init is requested AND missing)
37
+  ├─ RepoFS.RepoPath(owner, name) → defense-in-depth path validation
38
+  ├─ tx.Begin()
39
+  │   └─ reposdb.CreateRepo(...)        ← unique-violation surfaces as ErrTaken
40
+  ├─ RepoFS.InitBare(diskPath)          ← `git init --bare --initial-branch=trunk`
41
+  ├─ if init flag set:
42
+  │     buildInitialCommit(ic) → commit OID
43
+  │       (hash-object → update-index → write-tree → commit-tree → update-ref)
44
+  ├─ tx.Commit()
45
+  ├─ audit.Record(action=repo_created, target=repo, target_id=<repo.id>)
46
+  └─ return Result{Repo, InitialCommitOID, DiskPath}
47
+```
48
+
49
+Failure handling at each step:
50
+
51
+- DB insert error: tx already rolled back via the deferred Rollback closure; nothing on disk to clean.
52
+- FS InitBare error: tx still uncommitted (we Rollback via defer); best-effort `os.RemoveAll(diskPath)` clears any partially-mkdir'd directory.
53
+- Initial-commit error: same as above — Rollback + RemoveAll.
54
+- tx.Commit error: post-FS-success but DB couldn't commit. We RemoveAll the bare repo dir to keep DB and disk consistent.
55
+- Audit error: logged at WARN, not propagated — we don't fail the create just because audit logging blipped.
56
+
57
+## Plumbing-only initial commit
58
+
59
+Why no working tree:
60
+
61
+- A working tree means a temp dir, a checkout, an `add`, a `commit`, and cleanup — five orders of magnitude more I/O than what we actually need.
62
+- Plumbing-only is deterministic: same `(name, body)` inputs → same blob OIDs → same tree OID, every time. The test pins `When` and asserts on the resulting commit.
63
+- It's atomic at the ref level: until we run `update-ref`, the bare repo's `HEAD` is an unborn ref pointing at a non-existent branch. Halfway-through state is invisible to clients.
64
+
65
+The plumbing helpers shell out to `git` rather than vendoring a Go-native git library. Reasons: (a) any divergence between go-git and real git is a foot-gun; (b) the host requires git ≥ 2.28 anyway for `--initial-branch=trunk`; (c) the call surface is small (4–5 commands) and easy to audit. Future sprints will keep this discipline.
66
+
67
+## Templates
68
+
69
+License substitution handles the canonical placeholders we encounter in the SPDX texts: `<year>`, `[year]`, `[yyyy]`, `{{ year }}`, `{year}`, plus author flavors `<copyright holders>`, `<owner>`, `<name of author>`, `[fullname]`, `[name of copyright owner]`. Anything we miss survives in the output and is harmless (just less personalized).
70
+
71
+The README template is intentionally boring (`# {name}\n\n{description}\n` — nothing more). Per the spec: "always exactly this — no fancy boilerplate."
72
+
73
+## Visibility
74
+
75
+`/{owner}/{repo}` looks up the row via `GetRepoByOwnerUserAndName` (filters `deleted_at IS NULL`). If the row is `private` and the viewer isn't the owner (or is anonymous), the handler returns `pgx.ErrNoRows` from the lookup helper, which the route catches and renders as 404. This matches GitHub: a private repo is indistinguishable from "doesn't exist."
76
+
77
+## Reserved repo names
78
+
79
+`internal/repos/validate.go::reservedRepoNames` is the small set of names that would either confuse git itself or break our routing inside the repo URL space. Members: `.git`, `.gitignore`, `.gitmodules`, `.gitattributes`, `.well-known`, `.github`, `head`, `refs`, `objects`, `info`, `hooks`, `branches`. Note: top-level reservations like `new` / `settings` live in `internal/auth/reserved.go` and are checked by the profile route, not here.
80
+
81
+## Rate limit
82
+
83
+10 creates per rolling hour per user. The throttle key is `repo_create | user:<id>`, namespaced so it never collides with login or signup throttles. Configurable in spirit (the constants live in `internal/repos/create.go`); per-instance overrides land in S15 with the policy package.
84
+
85
+## Author identity
86
+
87
+We refuse to fabricate a commit author. The user's verified primary email + display name (or username when display name is empty) are baked into the initial commit. Pre-MVP feature: noreply emails for users who want to avoid leaking their address. Today the user must verify their primary email before they can run repo init.
88
+
89
+## Testing
90
+
91
+`internal/repos/create_test.go` is the integration spine:
92
+
93
+- `TestCreate_EmptyRepo` — no init flags. Verifies HEAD is a symbolic ref to `refs/heads/trunk` and the repo has zero commits.
94
+- `TestCreate_WithReadmeLicenseGitignore` — three init flags, `InitialCommitWhen` pinned. Asserts on `rev-list --count = 1`, the `ls-tree` payload, the author identity, and the year substitution in the LICENSE file.
95
+- `TestCreate_RejectsDuplicate` — second create with the same `(owner, name)` returns `ErrTaken`.
96
+- `TestCreate_RejectsReservedName` — name `"head"` returns `ErrReservedName`.
97
+- `TestCreate_RefusesWithoutVerifiedEmail` — user with no verified primary email is rejected with `ErrNoVerifiedEmail` when init is requested.
98
+- `TestCreate_PrivateVisibilityPersists` — visibility round-trips and the disk path lands under the right shard prefix.
99
+
100
+`internal/repos/git/plumbing_test.go` — single-commit roundtrip, author env, ref shape.
101
+
102
+## Pitfalls / what to remember
103
+
104
+- **Tx held across FS operations.** Postgres connection sits idle for a few seconds during InitBare + plumbing. At our scale this is fine; if write throughput grows, swap to a "create row → schedule FS init via a job" pattern.
105
+- **Repo names are lowercased before path construction.** The DB column is `citext` so case-insensitive uniqueness comes for free, but the disk path is always lowercase.
106
+- **Bare repo dirs aren't cleaned on tx.Commit failure** unless we ALSO RemoveAll. The orchestrator does it; future paths that bypass `Create` must remember.
107
+- **Audit row creation is best-effort.** Don't move it inside the tx — an audit failure must not roll back the create.
108
+- **Two-segment route ordering.** `/{owner}/{repo}` is registered before the `/{username}` catch-all but they don't actually conflict (different segment counts). The pattern is preserved for the future when more 2-segment routes (like `/{owner}/{repo}/issues`) ship.
109
+- **License placeholder substitution is best-effort.** We aim for the most common placeholders SPDX uses; anything missed survives in the output.
110
+
111
+## Open follow-ups
112
+
113
+- **Fork count + `fork_of_repo_id`** are columns now but unused; S27 lights them up.
114
+- **Org-owned repos.** `owner_org_id` exists with the XOR check; S31 wires the org side.
115
+- **Disk size recalc.** `disk_used_bytes` defaults to 0 and stays there; S14 will enqueue a `repo:size_recalc` job after init.
116
+- **Code listing.** `/{owner}/{repo}` renders the empty placeholder unconditionally; S17 will switch on whether the repo has commits.
117
+
118
+## Related docs
119
+
120
+- `docs/internal/storage.md` — RepoFS layout + InitBare semantics.
121
+- `docs/internal/auth.md` — login + sessions; restore-on-login affects whether a user can hit `/new`.
122
+- `docs/internal/profile.md` — `/{username}` catch-all that lives next to `/{owner}/{repo}`.