tenseleyflow/shithub / b6160a3

Browse files

Document source remote imports

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
b6160a3e02337fa1eb05ffe536ff0317742d729d
Parents
93270f2
Tree
3ea3821

2 changed files

StatusFile+-
M docs/internal/code-tab.md 21 12
M docs/internal/repo-create.md 37 0
docs/internal/code-tab.mdmodified
@@ -72,18 +72,27 @@ directories first, then files alphabetically.
7272
 on the rendered ref, the Code tab parses it once, matches entries by
7373
 submodule path, and links GitHub or configured shithub clone remotes to
7474
 the local `/{owner}/{repo}/tree/{gitlink-oid}` route when the target
75
-repo has that commit. If the target repo exists locally but does not
76
-have the pinned commit object, and `.gitmodules` points at GitHub or a
77
-relative sibling repo, the handler performs a bounded, non-forced fetch
78
-of heads/tags from the corresponding GitHub remote, re-checks the
79
-object, and then links to the exact detached-commit tree when it
80
-arrived. Successful backfills update the target repo's default-branch
81
-OID when that ref moved, then enqueue the same code-index and
82
-size-recalc maintenance used after pushes. Diverged local refs are
83
-never force-updated; on fetch failure or still-missing objects, the row
84
-links to the target repo's default Code tab so independently-created
85
-mirrors don't produce dead links. Unknown, external, absent, or
86
-malformed remotes stay as plain `name @ shortsha` rows.
75
+repo has that commit.
76
+
77
+If the target repo exists locally but does not have the pinned commit
78
+object, the handler first checks `repo_source_remotes` for that target
79
+repo. A stored source remote is the durable source of truth for imports:
80
+the handler validates it with the shared SSRF defense, performs a
81
+bounded, non-forced fetch of heads/tags, re-checks the object, and then
82
+links to the exact detached-commit tree when it arrived. Successful
83
+backfills update the target repo's default-branch OID when that ref
84
+moved, mark the source remote fetched, and enqueue the same code-index
85
+and size-recalc maintenance used after pushes.
86
+
87
+GitHub URL/name inference remains as a compatibility fallback for
88
+legacy repos that were created before source remotes existed: when the
89
+`.gitmodules` URL is GitHub-hosted or a relative sibling that maps
90
+cleanly to a GitHub owner/repo, shithub may fetch from the inferred
91
+GitHub URL. Diverged local refs are never force-updated; on fetch
92
+failure or still-missing objects, the row links to the target repo's
93
+default Code tab so independently-created mirrors don't produce dead
94
+links. Unknown, external, absent, or malformed remotes stay as plain
95
+`name @ shortsha` rows.
8796
 
8897
 The S17 ship excludes the htmx-driven "last commit per entry" column
8998
 that the spec describes — an extra round-trip we can add later without
docs/internal/repo-create.mdmodified
@@ -5,6 +5,7 @@ S11 ships the create-a-repo flow end-to-end: a logged-in user clicks **New**, fi
55
 ## What's wired
66
 
77
 - **Migration:** `0017_repos.sql` adds the `repos` table (with `repo_visibility` enum, owner XOR check, per-owner unique-by-name partial indexes, soft-delete column).
8
+- **Source remotes:** `0051_repo_source_remotes.sql` adds one optional public fetch URL per repo. Creation and settings can save this URL, fetch heads/tags, and use it later for submodule gitlink backfill.
89
 - **sqlc package:** `internal/repos/sqlc` (`reposdb`) — Create, Get-by-owner-and-name, Exists, List-by-owner, Count, SoftDelete, UpdateDiskUsed.
910
 - `internal/repos/validate.go` — name shape (≤100 chars, `[a-z0-9._-]`, non-separator edges, no dot-dot, no leading dot) + reserved-name list.
1011
 - `internal/repos/templates/` — embeds 10 SPDX licenses + 10 .gitignore templates + a minimal README generator. Sourced from gitea's `options/license` and `options/gitignore` (originally github.com/github/gitignore, MIT/CC0).
@@ -43,6 +44,9 @@ POST /new
4344
   ├─ ValidateName / ValidateDescription (friendly error if bad shape)
4445
   ├─ Visibility ∈ {"public", "private"}
4546
   ├─ License/Gitignore keys ∈ curated list (when set)
47
+  ├─ Optional source_remote_url:
48
+  │     normalize + SSRF-validate a public http(s) Git remote
49
+  │     refuse credentials/query/fragment and any init-template combo
4650
   ├─ Limiter.Hit(scope=repo_create, ident=user:<id>, max=10/hour)
4751
   ├─ Resolve author = display name + verified primary email
4852
   │     (refuse with ErrNoVerifiedEmail when init is requested AND missing)
@@ -55,6 +59,11 @@ POST /new
5559
   │       (hash-object → update-index → write-tree → commit-tree → update-ref)
5660
   ├─ tx.Commit()
5761
   ├─ audit.Record(action=repo_created, target=repo, target_id=<repo.id>)
62
+  ├─ if source_remote_url set:
63
+  │     repo_source_remotes UPSERT
64
+  │     git fetch --no-recurse-submodules heads/tags from that remote
65
+  │     update default_branch/default_branch_oid from fetched refs
66
+  │     enqueue index + size recalculation
5867
   └─ return Result{Repo, InitialCommitOID, DiskPath}
5968
 ```
6069
 
@@ -65,6 +74,34 @@ Failure handling at each step:
6574
 - Initial-commit error: same as above — Rollback + RemoveAll.
6675
 - tx.Commit error: post-FS-success but DB couldn't commit. We RemoveAll the bare repo dir to keep DB and disk consistent.
6776
 - Audit error: logged at WARN, not propagated — we don't fail the create just because audit logging blipped.
77
+- Source remote fetch error: the repo remains created, the URL is retained with `last_error`, and the user lands on General settings where they can fix or retry the remote.
78
+
79
+## Source remotes and imports
80
+
81
+Source remotes are for public Git import/mirror metadata, not private
82
+credentials. The accepted shape is `http://` or `https://`, a host, and
83
+a non-empty repository path; userinfo, query strings, and fragments are
84
+rejected so secrets do not enter the database or logs. Before storing or
85
+fetching, the URL runs through `internal/security/ssrf` with DNS
86
+resolution so loopback/private/CGNAT/link-local hosts are rejected.
87
+
88
+Fetches use `internal/repos/git.FetchRemoteHeadsAndTags`, which shells
89
+out to canonical git with `--no-recurse-submodules` and non-forcing
90
+head/tag refspecs. If the local branch diverged, git rejects the update;
91
+shithub records the fetch error instead of overwriting local history.
92
+After a successful fetch, shithub keeps the current default branch if it
93
+exists, otherwise prefers `trunk`, then `main`, then `master`, then the
94
+first fetched branch. The chosen branch OID becomes
95
+`repos.default_branch_oid`, making the Code tab and history views work
96
+without a later push.
97
+
98
+The same stored remote is used by submodule rendering. If a parent repo
99
+pins a submodule commit that the local target repo lacks, shithub tries
100
+the target repo's source remote before any GitHub-name fallback. This is
101
+the durable path for self-hosted or non-GitHub upstreams: create/import
102
+each submodule repo with its source remote, then create/import the parent
103
+repo, and the pinned submodule links can hydrate exact detached tree
104
+views on demand.
68105
 
69106
 ## Plumbing-only initial commit
70107