markdown · 13032 bytes Raw Blame History

Code tab

The code tab is the GitHub-style repo browser: tree listing, blob view with syntax highlighting, raw view, "Go to file" finder, and the branch/tag switcher. For populated repos, /{owner}/{repo} renders the default branch Code tab directly, matching GitHub's canonical repo URL.

Routes

Route Handler
GET /{owner}/{repo} default-branch Code tab
GET /{owner}/{repo}/tree/{ref}/{path...} codeTree
GET /{owner}/{repo}/blob/{ref}/{path...} codeBlob
GET /{owner}/{repo}/raw/{ref}/{path...} codeRaw
GET /{owner}/{repo}/find/{ref}?q=... codeFinder
GET /{owner}/{repo}/edit/{ref}/{path...} in-browser file editor
POST /{owner}/{repo}/edit/{ref}/{path...} commit file edit / rename
GET /{owner}/{repo}/new/{ref}/{path...} in-browser new-file editor
POST /{owner}/{repo}/new/{ref}/{path...} commit new file
GET /{owner}/{repo}/delete/{ref}/{path...} delete-file confirmation
POST /{owner}/{repo}/delete/{ref}/{path...} commit file deletion
GET /{owner}/{repo}/upload/{ref}/{path...} upload-files form
POST /{owner}/{repo}/upload/{ref}/{path...} commit uploaded files
POST /{owner}/{repo}/markdown-preview editor Markdown preview fragment
GET /{owner}/{repo}/actions parked product-tab shell
GET /{owner}/{repo}/projects parked product-tab shell
GET /{owner}/{repo}/wiki parked product-tab shell
GET /{owner}/{repo}/security parked product-tab shell
GET /{owner}/{repo}/pulse parked product-tab shell
GET /{owner}/{repo}/packages parked product-tab shell
GET /{owner}/{repo}/releases parked product-tab shell
GET /static/css/chroma.css runtime-generated Chroma theme

Every code-tab handler runs through policy.Can(... ActionRepoRead) — private repos hide from anonymous viewers and unrelated users via the existence-leak 404 guard from S15.

The in-browser mutation routes run through policy.Can(... ActionRepoWrite). They inherit the same archived-repo, suspended-user, collaborator-role, site-admin, and private-repo existence behavior as git push surfaces.

Repository product tabs

The repo header intentionally exposes GitHub's major product-map tabs: Code, Issues, Pull requests, Actions, Projects, Wiki, Security and quality, Insights, and Settings when visible to the viewer. Forks remain available from the repo action button and About sidebar, but are not a top-level tab on GitHub.

Actions, Projects, Wiki, Security and quality, Insights, Packages, and Releases currently render honest parked shells via repo/deferred_tab. They are public read surfaces gated by ActionRepoRead, so private repo existence behavior matches Code/Issues/Pull requests while the deeper systems remain assigned to their later sprints.

Ref + path disambiguation

{ref} is the chi * wildcard, so the URL /tree/feature/x/sub/file.go arrives as a single string. Resolution:

  1. If the first segment is exactly 40 hex chars → treat as a SHA, the rest is the path.
  2. Otherwise, longest-prefix match against the cached ref list (branches first, then tags, sorted longest-first). The remainder after the matched ref is the in-tree path.

This handles release/v1.0/beta/CHANGELOG.md correctly without ambiguity. Resolution lives in internal/repos/git/treeops.go::ResolveRef.

When the matched ref is a raw 40-character commit SHA, the tree page resolves the top commit summary against that SHA and displays the short SHA in the ref switcher, matching GitHub's detached-commit tree view.

Path validation rejects .., control chars, leading slashes, and backslashes — defense in depth on top of git's own validation.

Tree listing

git ls-tree --long --full-tree <ref>:<path> is parsed into typed TreeEntry values (tree | blob | commit | symlink). Sort is directories first, then files alphabetically.

commit entries are git submodule pointers. When .gitmodules exists on the rendered ref, the Code tab parses it once, matches entries by submodule path, and links GitHub or configured shithub clone remotes to the local /{owner}/{repo}/tree/{gitlink-oid} route when the target repo has that commit.

If the target repo exists locally but does not have the pinned commit object, the handler first checks repo_source_remotes for that target repo. A stored source remote is the durable source of truth for imports: the handler validates it with the shared SSRF defense, performs a bounded, non-forced fetch of heads/tags, re-checks the object, and then links to the exact detached-commit tree when it arrived. Successful backfills update the target repo's default-branch OID when that ref moved, mark the source remote fetched, and enqueue the same code-index and size-recalc maintenance used after pushes.

GitHub URL/name inference remains as a compatibility fallback for legacy repos that were created before source remotes existed: when the .gitmodules URL is GitHub-hosted or a relative sibling that maps cleanly to a GitHub owner/repo, shithub may fetch from the inferred GitHub URL. Diverged local refs are never force-updated; on fetch failure or still-missing objects, the row links to the target repo's default Code tab so independently-created mirrors don't produce dead links. Unknown, external, absent, or malformed remotes stay as plain name @ shortsha rows.

The S17 ship excludes the htmx-driven "last commit per entry" column that the spec describes — an extra round-trip we can add later without a schema change. The current page renders the listing immediately. Deferred to S18 (commits-per-entry) — the spec calls out this deferral path; the tree template has the column slot ready.

File view

codeBlob walks four cases:

  • Large (>1 MiB): placeholder + raw download link, no body read.
  • Binary (NUL byte in first 8 KiB): placeholder. Image extensions (png/jpg/jpeg/gif/webp) ≤5 MiB get an <img> preview pointing at /raw/....
  • Markdown (.md/.markdown): Goldmark + bluemonday rendered HTML PLUS a <details> source-toggle with the highlighted source.
  • Default text: Chroma highlight by filename extension, content sniffing fallback.

Chroma uses the github style baked at process start; the CSS is served from /static/css/chroma.css via a tiny in-process generator.

In-browser file edits

The Code tab surfaces GitHub-style write affordances for users with repo:write on a named branch:

  • The tree header has an Add file dropdown with create and upload actions.
  • Text blob headers show edit and delete icon buttons.
  • The rendered README header shows an edit icon when the README was found in the current directory. SECURITY and CONTRIBUTING documents use the same blob-header controls when opened from the document tabs.

Direct web commits are intentionally limited to refs/heads/<branch>. Tags and detached 40-hex commit views render read-only controls and direct edit URLs return 400.

internal/repos/webedit owns the mutation path. For each edit it:

  1. Resolves the branch to its current commit and compares the submitted hidden base_oid; a mismatch returns a stale-edit conflict.
  2. Builds a temporary index from the old commit with git read-tree.
  3. Stages file changes via canonical git plumbing (hash-object, update-index, write-tree, commit-tree).
  4. Runs protection.Enforce before moving the branch, so protected branches deny direct web commits just like pushes.
  5. Advances the branch with git update-ref <ref> <new> <old> CAS.
  6. Inserts a push_events row with protocol = 'web', enqueues push:process, and sends the worker NOTIFY. If enqueueing fails after the ref has moved, the commit still succeeds and the failure is logged; the same post-push reconciliation gap exists for hook failures.

Validation rules:

  • Text editor actions are capped at 1 MiB and reject NUL-byte binary content. Existing edit sources must be regular blobs, not symlinks, submodules, trees, or oversized blobs.
  • Uploads are capped at 25 MiB per request and 10 MiB per file. Uploads may contain binary data.
  • Repository paths reject empty names, leading/trailing slash, duplicate slash, backslash, ./.. segments, control bytes, exact overwrites, duplicate uploads, and parent-path conflicts.
  • Default commit messages are generated server-side (Update, Create, Rename, Delete, or Upload) when the form leaves the message blank.

The editor component is still server-rendered Go templates plus a small page-local script. No frontend build pipeline or React/Vite layer is required for this slice.

Raw view

  • Content-Type derived from the extension whitelist (code.go::rawContentType).
  • X-Content-Type-Options: nosniff always.
  • Content-Security-Policy: default-src 'none'; sandbox at the handler level (the global SecureHeaders middleware may overlay a broader CSP — both are restrictive; the OR of the two is what user agents enforce).
  • Content-Disposition: attachment is forced for HTML, SVG, JS, WASM, and anything that could execute on shithub's domain. We don't have a separate raw.shithub.tld host yet (post-MVP); attachment is the safety belt.
  • Streamed via git cat-file -p; never buffered. Large blobs don't blow up the worker's memory.

Finder ("Go to file")

/find/{ref} lists every blob path on the ref via git ls-tree -r --name-only, then filters with internal/repos/finder/finder.go::Filter. The matcher is a subsequence-with-bonus scorer (boundary, consecutive run, basename hit) — not as fancy as VS Code's quickopen but good enough for tens of thousands of paths.

Key shortcut and live-filter via htmx are spec deliverables that we defer for now — the form-submission flow works without JS and that's the floor S17 commits to.

Caching

Currently no caching layer. Every request runs git for-each-ref, git ls-tree, etc. That's fine for small-to-medium repos; the cost shows up on big repos with deep trees. The S17 spec proposes a cache keyed on (repo_id, ref_oid, dir_path) invalidated on push (S14's push:process job is the right invalidation hook).

Deferred — the cache is purely performance polish. When we hit a real-world repo where it matters, wire it in: file internal/cache/ plus a callback in worker/jobs/push_process.go. The handlers already take a per-request policy.Cache so adding a per-process git cache is mechanically straightforward.

Pitfalls + protections

  • XSS via raw HTML/SVG: blocked by Content-Disposition: attachment for those extensions.
  • XSS via markdown: Goldmark configured without HTML passthrough + bluemonday's UGC policy on top. Tests in internal/repos/markdown/ (TODO — minimal coverage today).
  • Path traversal: validateSubpath in code.go rejects .., controls, leading slashes.
  • Web edit path traversal / overwrite: webedit.ValidateFilePath applies the stricter mutation path guard and the service re-checks path existence against the commit being modified.
  • Hex collision with SHA: ref-list lookup wins over SHA shortcut when the same string is both.
  • Encoding (GBK / Shift-JIS): TODO — text files outside UTF-8 may render as garbled. The body is rendered as-is; a future commit can add golang.org/x/text/encoding autodetection.

Dependencies

  • github.com/alecthomas/chroma/v2 — syntax highlighting
  • github.com/yuin/goldmark — CommonMark + GFM
  • github.com/microcosm-cc/bluemonday — HTML sanitizer

Deferred polish (tracked, not blocking)

These items are spec deliverables we ship in a later pass:

  • Last-commit-per-entry column with htmx lazy load and pre-walked git log --name-status cache → wire into S18 (commit history) where the same walk powers the per-file history page.
  • Tree caching keyed on (repo_id, ref_oid, dir_path), push-event invalidation → wire into S36 (performance pass) once we have a real workload to measure.
  • Pagination at 1000 entries per directory → cosmetic for huge trees; add when someone hits node_modules-grade inflation.
  • Encoding detection for non-UTF-8 source files → file reads are defensive (io.LimitReader + size cap); render quality is the only loss until this lands.
View source
1 # Code tab
2
3 The code tab is the GitHub-style repo browser: tree listing, blob view
4 with syntax highlighting, raw view, "Go to file" finder, and the
5 branch/tag switcher. For populated repos, `/{owner}/{repo}` renders the
6 default branch Code tab directly, matching GitHub's canonical repo URL.
7
8 ## Routes
9
10 | Route | Handler |
11 | ------------------------------------------------ | -------------------------------- |
12 | `GET /{owner}/{repo}` | default-branch Code tab |
13 | `GET /{owner}/{repo}/tree/{ref}/{path...}` | `codeTree` |
14 | `GET /{owner}/{repo}/blob/{ref}/{path...}` | `codeBlob` |
15 | `GET /{owner}/{repo}/raw/{ref}/{path...}` | `codeRaw` |
16 | `GET /{owner}/{repo}/find/{ref}?q=...` | `codeFinder` |
17 | `GET /{owner}/{repo}/edit/{ref}/{path...}` | in-browser file editor |
18 | `POST /{owner}/{repo}/edit/{ref}/{path...}` | commit file edit / rename |
19 | `GET /{owner}/{repo}/new/{ref}/{path...}` | in-browser new-file editor |
20 | `POST /{owner}/{repo}/new/{ref}/{path...}` | commit new file |
21 | `GET /{owner}/{repo}/delete/{ref}/{path...}` | delete-file confirmation |
22 | `POST /{owner}/{repo}/delete/{ref}/{path...}` | commit file deletion |
23 | `GET /{owner}/{repo}/upload/{ref}/{path...}` | upload-files form |
24 | `POST /{owner}/{repo}/upload/{ref}/{path...}` | commit uploaded files |
25 | `POST /{owner}/{repo}/markdown-preview` | editor Markdown preview fragment |
26 | `GET /{owner}/{repo}/actions` | parked product-tab shell |
27 | `GET /{owner}/{repo}/projects` | parked product-tab shell |
28 | `GET /{owner}/{repo}/wiki` | parked product-tab shell |
29 | `GET /{owner}/{repo}/security` | parked product-tab shell |
30 | `GET /{owner}/{repo}/pulse` | parked product-tab shell |
31 | `GET /{owner}/{repo}/packages` | parked product-tab shell |
32 | `GET /{owner}/{repo}/releases` | parked product-tab shell |
33 | `GET /static/css/chroma.css` | runtime-generated Chroma theme |
34
35 Every code-tab handler runs through `policy.Can(... ActionRepoRead)`
36 private repos hide from anonymous viewers and unrelated users via the
37 existence-leak 404 guard from S15.
38
39 The in-browser mutation routes run through `policy.Can(... ActionRepoWrite)`.
40 They inherit the same archived-repo, suspended-user, collaborator-role,
41 site-admin, and private-repo existence behavior as git push surfaces.
42
43 ## Repository product tabs
44
45 The repo header intentionally exposes GitHub's major product-map tabs:
46 Code, Issues, Pull requests, Actions, Projects, Wiki, Security and
47 quality, Insights, and Settings when visible to the viewer. Forks remain
48 available from the repo action button and About sidebar, but are not a
49 top-level tab on GitHub.
50
51 Actions, Projects, Wiki, Security and quality, Insights, Packages, and
52 Releases currently render honest parked shells via `repo/deferred_tab`.
53 They are public read surfaces gated by `ActionRepoRead`, so private repo
54 existence behavior matches Code/Issues/Pull requests while the deeper
55 systems remain assigned to their later sprints.
56
57 ## Ref + path disambiguation
58
59 `{ref}` is the chi `*` wildcard, so the URL `/tree/feature/x/sub/file.go`
60 arrives as a single string. Resolution:
61
62 1. If the first segment is exactly 40 hex chars → treat as a SHA, the
63 rest is the path.
64 2. Otherwise, longest-prefix match against the cached ref list
65 (branches first, then tags, sorted longest-first). The remainder
66 after the matched ref is the in-tree path.
67
68 This handles `release/v1.0/beta/CHANGELOG.md` correctly without
69 ambiguity. Resolution lives in `internal/repos/git/treeops.go::ResolveRef`.
70
71 When the matched ref is a raw 40-character commit SHA, the tree page
72 resolves the top commit summary against that SHA and displays the short
73 SHA in the ref switcher, matching GitHub's detached-commit tree view.
74
75 Path validation rejects `..`, control chars, leading slashes, and
76 backslashes — defense in depth on top of git's own validation.
77
78 ## Tree listing
79
80 `git ls-tree --long --full-tree <ref>:<path>` is parsed into typed
81 `TreeEntry` values (`tree | blob | commit | symlink`). Sort is
82 directories first, then files alphabetically.
83
84 `commit` entries are git submodule pointers. When `.gitmodules` exists
85 on the rendered ref, the Code tab parses it once, matches entries by
86 submodule path, and links GitHub or configured shithub clone remotes to
87 the local `/{owner}/{repo}/tree/{gitlink-oid}` route when the target
88 repo has that commit.
89
90 If the target repo exists locally but does not have the pinned commit
91 object, the handler first checks `repo_source_remotes` for that target
92 repo. A stored source remote is the durable source of truth for imports:
93 the handler validates it with the shared SSRF defense, performs a
94 bounded, non-forced fetch of heads/tags, re-checks the object, and then
95 links to the exact detached-commit tree when it arrived. Successful
96 backfills update the target repo's default-branch OID when that ref
97 moved, mark the source remote fetched, and enqueue the same code-index
98 and size-recalc maintenance used after pushes.
99
100 GitHub URL/name inference remains as a compatibility fallback for
101 legacy repos that were created before source remotes existed: when the
102 `.gitmodules` URL is GitHub-hosted or a relative sibling that maps
103 cleanly to a GitHub owner/repo, shithub may fetch from the inferred
104 GitHub URL. Diverged local refs are never force-updated; on fetch
105 failure or still-missing objects, the row links to the target repo's
106 default Code tab so independently-created mirrors don't produce dead
107 links. Unknown, external, absent, or malformed remotes stay as plain
108 `name @ shortsha` rows.
109
110 The S17 ship excludes the htmx-driven "last commit per entry" column
111 that the spec describes — an extra round-trip we can add later without
112 a schema change. The current page renders the listing immediately.
113 **Deferred to S18 (commits-per-entry)** — the spec calls out this
114 deferral path; the tree template has the column slot ready.
115
116 ## File view
117
118 `codeBlob` walks four cases:
119
120 * **Large** (>1 MiB): placeholder + raw download link, no body read.
121 * **Binary** (NUL byte in first 8 KiB): placeholder. Image extensions
122 (png/jpg/jpeg/gif/webp) ≤5 MiB get an `<img>` preview pointing at
123 `/raw/...`.
124 * **Markdown** (`.md`/`.markdown`): Goldmark + bluemonday rendered HTML
125 PLUS a `<details>` source-toggle with the highlighted source.
126 * **Default text**: Chroma highlight by filename extension, content
127 sniffing fallback.
128
129 Chroma uses the `github` style baked at process start; the CSS is
130 served from `/static/css/chroma.css` via a tiny in-process generator.
131
132 ## In-browser file edits
133
134 The Code tab surfaces GitHub-style write affordances for users with
135 `repo:write` on a named branch:
136
137 - The tree header has an **Add file** dropdown with create and upload
138 actions.
139 - Text blob headers show edit and delete icon buttons.
140 - The rendered README header shows an edit icon when the README was
141 found in the current directory. SECURITY and CONTRIBUTING documents
142 use the same blob-header controls when opened from the document tabs.
143
144 Direct web commits are intentionally limited to `refs/heads/<branch>`.
145 Tags and detached 40-hex commit views render read-only controls and
146 direct edit URLs return `400`.
147
148 `internal/repos/webedit` owns the mutation path. For each edit it:
149
150 1. Resolves the branch to its current commit and compares the submitted
151 hidden `base_oid`; a mismatch returns a stale-edit conflict.
152 2. Builds a temporary index from the old commit with `git read-tree`.
153 3. Stages file changes via canonical git plumbing (`hash-object`,
154 `update-index`, `write-tree`, `commit-tree`).
155 4. Runs `protection.Enforce` before moving the branch, so protected
156 branches deny direct web commits just like pushes.
157 5. Advances the branch with `git update-ref <ref> <new> <old>` CAS.
158 6. Inserts a `push_events` row with `protocol = 'web'`, enqueues
159 `push:process`, and sends the worker NOTIFY. If enqueueing fails
160 after the ref has moved, the commit still succeeds and the failure
161 is logged; the same post-push reconciliation gap exists for hook
162 failures.
163
164 Validation rules:
165
166 - Text editor actions are capped at 1 MiB and reject NUL-byte binary
167 content. Existing edit sources must be regular blobs, not symlinks,
168 submodules, trees, or oversized blobs.
169 - Uploads are capped at 25 MiB per request and 10 MiB per file. Uploads
170 may contain binary data.
171 - Repository paths reject empty names, leading/trailing slash, duplicate
172 slash, backslash, `.`/`..` segments, control bytes, exact overwrites,
173 duplicate uploads, and parent-path conflicts.
174 - Default commit messages are generated server-side (`Update`, `Create`,
175 `Rename`, `Delete`, or `Upload`) when the form leaves the message
176 blank.
177
178 The editor component is still server-rendered Go templates plus a small
179 page-local script. No frontend build pipeline or React/Vite layer is
180 required for this slice.
181
182 ## Raw view
183
184 * Content-Type derived from the extension whitelist
185 (`code.go::rawContentType`).
186 * `X-Content-Type-Options: nosniff` always.
187 * `Content-Security-Policy: default-src 'none'; sandbox` at the
188 handler level (the global SecureHeaders middleware may overlay a
189 broader CSP — both are restrictive; the OR of the two is what user
190 agents enforce).
191 * **`Content-Disposition: attachment`** is forced for HTML, SVG, JS,
192 WASM, and anything that could execute on shithub's domain. We don't
193 have a separate `raw.shithub.tld` host yet (post-MVP); attachment is
194 the safety belt.
195 * Streamed via `git cat-file -p`; never buffered. Large blobs don't
196 blow up the worker's memory.
197
198 ## Finder ("Go to file")
199
200 `/find/{ref}` lists every blob path on the ref via
201 `git ls-tree -r --name-only`, then filters with
202 `internal/repos/finder/finder.go::Filter`. The matcher is a
203 subsequence-with-bonus scorer (boundary, consecutive run, basename
204 hit) — not as fancy as VS Code's quickopen but good enough for tens of
205 thousands of paths.
206
207 Key shortcut and live-filter via htmx are spec deliverables that we
208 defer for now — the form-submission flow works without JS and that's
209 the floor S17 commits to.
210
211 ## Caching
212
213 Currently **no caching layer**. Every request runs `git for-each-ref`,
214 `git ls-tree`, etc. That's fine for small-to-medium repos; the cost
215 shows up on big repos with deep trees. The S17 spec proposes a cache
216 keyed on `(repo_id, ref_oid, dir_path)` invalidated on push (S14's
217 `push:process` job is the right invalidation hook).
218
219 **Deferred** — the cache is purely performance polish. When we hit a
220 real-world repo where it matters, wire it in: file `internal/cache/`
221 plus a callback in `worker/jobs/push_process.go`. The handlers already
222 take a per-request `policy.Cache` so adding a per-process git cache is
223 mechanically straightforward.
224
225 ## Pitfalls + protections
226
227 * **XSS via raw HTML/SVG**: blocked by `Content-Disposition: attachment`
228 for those extensions.
229 * **XSS via markdown**: Goldmark configured without HTML passthrough +
230 bluemonday's UGC policy on top. Tests in `internal/repos/markdown/`
231 (TODO — minimal coverage today).
232 * **Path traversal**: `validateSubpath` in `code.go` rejects `..`,
233 controls, leading slashes.
234 * **Web edit path traversal / overwrite**: `webedit.ValidateFilePath`
235 applies the stricter mutation path guard and the service re-checks
236 path existence against the commit being modified.
237 * **Hex collision with SHA**: ref-list lookup wins over SHA shortcut
238 when the same string is both.
239 * **Encoding (GBK / Shift-JIS)**: TODO — text files outside UTF-8 may
240 render as garbled. The body is rendered as-is; a future commit can
241 add `golang.org/x/text/encoding` autodetection.
242
243 ## Dependencies
244
245 * `github.com/alecthomas/chroma/v2` — syntax highlighting
246 * `github.com/yuin/goldmark` — CommonMark + GFM
247 * `github.com/microcosm-cc/bluemonday` — HTML sanitizer
248
249 ## Deferred polish (tracked, not blocking)
250
251 These items are spec deliverables we ship in a later pass:
252
253 * **Last-commit-per-entry column** with htmx lazy load and pre-walked
254 `git log --name-status` cache → wire into S18 (commit history) where
255 the same walk powers the per-file history page.
256 * **Tree caching keyed on (repo_id, ref_oid, dir_path)**, push-event
257 invalidation → wire into S36 (performance pass) once we have a real
258 workload to measure.
259 * **Pagination at 1000 entries per directory** → cosmetic for huge
260 trees; add when someone hits `node_modules`-grade inflation.
261 * **Encoding detection for non-UTF-8 source files** → file reads are
262 defensive (`io.LimitReader` + size cap); render quality is the only
263 loss until this lands.