Code tab
The code tab is the GitHub-style repo browser: tree listing, blob view
with syntax highlighting, raw view, "Go to file" finder, and the
branch/tag switcher. After a successful push, hitting /{owner}/{repo}
sends the user to /tree/{default_branch}.
Routes
| Route | Handler |
|---|---|
GET /{owner}/{repo} |
redirects to /tree/{default} |
GET /{owner}/{repo}/tree/{ref}/{path...} |
codeTree |
GET /{owner}/{repo}/blob/{ref}/{path...} |
codeBlob |
GET /{owner}/{repo}/raw/{ref}/{path...} |
codeRaw |
GET /{owner}/{repo}/find/{ref}?q=... |
codeFinder |
GET /static/css/chroma.css |
runtime-generated Chroma theme |
Every code-tab handler runs through policy.Can(... ActionRepoRead) —
private repos hide from anonymous viewers and unrelated users via the
existence-leak 404 guard from S15.
Ref + path disambiguation
{ref} is the chi * wildcard, so the URL /tree/feature/x/sub/file.go
arrives as a single string. Resolution:
- If the first segment is exactly 40 hex chars → treat as a SHA, the rest is the path.
- Otherwise, longest-prefix match against the cached ref list (branches first, then tags, sorted longest-first). The remainder after the matched ref is the in-tree path.
This handles release/v1.0/beta/CHANGELOG.md correctly without
ambiguity. Resolution lives in internal/repos/git/treeops.go::ResolveRef.
Path validation rejects .., control chars, leading slashes, and
backslashes — defense in depth on top of git's own validation.
Tree listing
git ls-tree --long --full-tree <ref>:<path> is parsed into typed
TreeEntry values (tree | blob | commit | symlink). Sort is
directories first, then files alphabetically.
The S17 ship excludes the htmx-driven "last commit per entry" column that the spec describes — an extra round-trip we can add later without a schema change. The current page renders the listing immediately. Deferred to S18 (commits-per-entry) — the spec calls out this deferral path; the tree template has the column slot ready.
File view
codeBlob walks four cases:
- Large (>1 MiB): placeholder + raw download link, no body read.
- Binary (NUL byte in first 8 KiB): placeholder. Image extensions
(png/jpg/jpeg/gif/webp) ≤5 MiB get an
<img>preview pointing at/raw/.... - Markdown (
.md/.markdown): Goldmark + bluemonday rendered HTML PLUS a<details>source-toggle with the highlighted source. - Default text: Chroma highlight by filename extension, content sniffing fallback.
Chroma uses the github style baked at process start; the CSS is
served from /static/css/chroma.css via a tiny in-process generator.
Raw view
- Content-Type derived from the extension whitelist
(
code.go::rawContentType). X-Content-Type-Options: nosniffalways.Content-Security-Policy: default-src 'none'; sandboxat the handler level (the global SecureHeaders middleware may overlay a broader CSP — both are restrictive; the OR of the two is what user agents enforce).Content-Disposition: attachmentis forced for HTML, SVG, JS, WASM, and anything that could execute on shithub's domain. We don't have a separateraw.shithub.tldhost yet (post-MVP); attachment is the safety belt.- Streamed via
git cat-file -p; never buffered. Large blobs don't blow up the worker's memory.
Finder ("Go to file")
/find/{ref} lists every blob path on the ref via
git ls-tree -r --name-only, then filters with
internal/repos/finder/finder.go::Filter. The matcher is a
subsequence-with-bonus scorer (boundary, consecutive run, basename
hit) — not as fancy as VS Code's quickopen but good enough for tens of
thousands of paths.
Key shortcut and live-filter via htmx are spec deliverables that we defer for now — the form-submission flow works without JS and that's the floor S17 commits to.
Caching
Currently no caching layer. Every request runs git for-each-ref,
git ls-tree, etc. That's fine for small-to-medium repos; the cost
shows up on big repos with deep trees. The S17 spec proposes a cache
keyed on (repo_id, ref_oid, dir_path) invalidated on push (S14's
push:process job is the right invalidation hook).
Deferred — the cache is purely performance polish. When we hit a
real-world repo where it matters, wire it in: file internal/cache/
plus a callback in worker/jobs/push_process.go. The handlers already
take a per-request policy.Cache so adding a per-process git cache is
mechanically straightforward.
Pitfalls + protections
- XSS via raw HTML/SVG: blocked by
Content-Disposition: attachmentfor those extensions. - XSS via markdown: Goldmark configured without HTML passthrough +
bluemonday's UGC policy on top. Tests in
internal/repos/markdown/(TODO — minimal coverage today). - Path traversal:
validateSubpathincode.gorejects.., controls, leading slashes. - Hex collision with SHA: ref-list lookup wins over SHA shortcut when the same string is both.
- Encoding (GBK / Shift-JIS): TODO — text files outside UTF-8 may
render as garbled. The body is rendered as-is; a future commit can
add
golang.org/x/text/encodingautodetection.
Dependencies
github.com/alecthomas/chroma/v2— syntax highlightinggithub.com/yuin/goldmark— CommonMark + GFMgithub.com/microcosm-cc/bluemonday— HTML sanitizer
Deferred polish (tracked, not blocking)
These items are spec deliverables we ship in a later pass:
- Last-commit-per-entry column with htmx lazy load and pre-walked
git log --name-statuscache → wire into S18 (commit history) where the same walk powers the per-file history page. - Tree caching keyed on (repo_id, ref_oid, dir_path), push-event invalidation → wire into S36 (performance pass) once we have a real workload to measure.
- Pagination at 1000 entries per directory → cosmetic for huge
trees; add when someone hits
node_modules-grade inflation. - Encoding detection for non-UTF-8 source files → file reads are
defensive (
io.LimitReader+ size cap); render quality is the only loss until this lands.
View source
| 1 | # Code tab |
| 2 | |
| 3 | The code tab is the GitHub-style repo browser: tree listing, blob view |
| 4 | with syntax highlighting, raw view, "Go to file" finder, and the |
| 5 | branch/tag switcher. After a successful push, hitting `/{owner}/{repo}` |
| 6 | sends the user to `/tree/{default_branch}`. |
| 7 | |
| 8 | ## Routes |
| 9 | |
| 10 | | Route | Handler | |
| 11 | | ------------------------------------------------ | -------------------------------- | |
| 12 | | `GET /{owner}/{repo}` | redirects to `/tree/{default}` | |
| 13 | | `GET /{owner}/{repo}/tree/{ref}/{path...}` | `codeTree` | |
| 14 | | `GET /{owner}/{repo}/blob/{ref}/{path...}` | `codeBlob` | |
| 15 | | `GET /{owner}/{repo}/raw/{ref}/{path...}` | `codeRaw` | |
| 16 | | `GET /{owner}/{repo}/find/{ref}?q=...` | `codeFinder` | |
| 17 | | `GET /static/css/chroma.css` | runtime-generated Chroma theme | |
| 18 | |
| 19 | Every code-tab handler runs through `policy.Can(... ActionRepoRead)` — |
| 20 | private repos hide from anonymous viewers and unrelated users via the |
| 21 | existence-leak 404 guard from S15. |
| 22 | |
| 23 | ## Ref + path disambiguation |
| 24 | |
| 25 | `{ref}` is the chi `*` wildcard, so the URL `/tree/feature/x/sub/file.go` |
| 26 | arrives as a single string. Resolution: |
| 27 | |
| 28 | 1. If the first segment is exactly 40 hex chars → treat as a SHA, the |
| 29 | rest is the path. |
| 30 | 2. Otherwise, longest-prefix match against the cached ref list |
| 31 | (branches first, then tags, sorted longest-first). The remainder |
| 32 | after the matched ref is the in-tree path. |
| 33 | |
| 34 | This handles `release/v1.0/beta/CHANGELOG.md` correctly without |
| 35 | ambiguity. Resolution lives in `internal/repos/git/treeops.go::ResolveRef`. |
| 36 | |
| 37 | Path validation rejects `..`, control chars, leading slashes, and |
| 38 | backslashes — defense in depth on top of git's own validation. |
| 39 | |
| 40 | ## Tree listing |
| 41 | |
| 42 | `git ls-tree --long --full-tree <ref>:<path>` is parsed into typed |
| 43 | `TreeEntry` values (`tree | blob | commit | symlink`). Sort is |
| 44 | directories first, then files alphabetically. |
| 45 | |
| 46 | The S17 ship excludes the htmx-driven "last commit per entry" column |
| 47 | that the spec describes — an extra round-trip we can add later without |
| 48 | a schema change. The current page renders the listing immediately. |
| 49 | **Deferred to S18 (commits-per-entry)** — the spec calls out this |
| 50 | deferral path; the tree template has the column slot ready. |
| 51 | |
| 52 | ## File view |
| 53 | |
| 54 | `codeBlob` walks four cases: |
| 55 | |
| 56 | * **Large** (>1 MiB): placeholder + raw download link, no body read. |
| 57 | * **Binary** (NUL byte in first 8 KiB): placeholder. Image extensions |
| 58 | (png/jpg/jpeg/gif/webp) ≤5 MiB get an `<img>` preview pointing at |
| 59 | `/raw/...`. |
| 60 | * **Markdown** (`.md`/`.markdown`): Goldmark + bluemonday rendered HTML |
| 61 | PLUS a `<details>` source-toggle with the highlighted source. |
| 62 | * **Default text**: Chroma highlight by filename extension, content |
| 63 | sniffing fallback. |
| 64 | |
| 65 | Chroma uses the `github` style baked at process start; the CSS is |
| 66 | served from `/static/css/chroma.css` via a tiny in-process generator. |
| 67 | |
| 68 | ## Raw view |
| 69 | |
| 70 | * Content-Type derived from the extension whitelist |
| 71 | (`code.go::rawContentType`). |
| 72 | * `X-Content-Type-Options: nosniff` always. |
| 73 | * `Content-Security-Policy: default-src 'none'; sandbox` at the |
| 74 | handler level (the global SecureHeaders middleware may overlay a |
| 75 | broader CSP — both are restrictive; the OR of the two is what user |
| 76 | agents enforce). |
| 77 | * **`Content-Disposition: attachment`** is forced for HTML, SVG, JS, |
| 78 | WASM, and anything that could execute on shithub's domain. We don't |
| 79 | have a separate `raw.shithub.tld` host yet (post-MVP); attachment is |
| 80 | the safety belt. |
| 81 | * Streamed via `git cat-file -p`; never buffered. Large blobs don't |
| 82 | blow up the worker's memory. |
| 83 | |
| 84 | ## Finder ("Go to file") |
| 85 | |
| 86 | `/find/{ref}` lists every blob path on the ref via |
| 87 | `git ls-tree -r --name-only`, then filters with |
| 88 | `internal/repos/finder/finder.go::Filter`. The matcher is a |
| 89 | subsequence-with-bonus scorer (boundary, consecutive run, basename |
| 90 | hit) — not as fancy as VS Code's quickopen but good enough for tens of |
| 91 | thousands of paths. |
| 92 | |
| 93 | Key shortcut and live-filter via htmx are spec deliverables that we |
| 94 | defer for now — the form-submission flow works without JS and that's |
| 95 | the floor S17 commits to. |
| 96 | |
| 97 | ## Caching |
| 98 | |
| 99 | Currently **no caching layer**. Every request runs `git for-each-ref`, |
| 100 | `git ls-tree`, etc. That's fine for small-to-medium repos; the cost |
| 101 | shows up on big repos with deep trees. The S17 spec proposes a cache |
| 102 | keyed on `(repo_id, ref_oid, dir_path)` invalidated on push (S14's |
| 103 | `push:process` job is the right invalidation hook). |
| 104 | |
| 105 | **Deferred** — the cache is purely performance polish. When we hit a |
| 106 | real-world repo where it matters, wire it in: file `internal/cache/` |
| 107 | plus a callback in `worker/jobs/push_process.go`. The handlers already |
| 108 | take a per-request `policy.Cache` so adding a per-process git cache is |
| 109 | mechanically straightforward. |
| 110 | |
| 111 | ## Pitfalls + protections |
| 112 | |
| 113 | * **XSS via raw HTML/SVG**: blocked by `Content-Disposition: attachment` |
| 114 | for those extensions. |
| 115 | * **XSS via markdown**: Goldmark configured without HTML passthrough + |
| 116 | bluemonday's UGC policy on top. Tests in `internal/repos/markdown/` |
| 117 | (TODO — minimal coverage today). |
| 118 | * **Path traversal**: `validateSubpath` in `code.go` rejects `..`, |
| 119 | controls, leading slashes. |
| 120 | * **Hex collision with SHA**: ref-list lookup wins over SHA shortcut |
| 121 | when the same string is both. |
| 122 | * **Encoding (GBK / Shift-JIS)**: TODO — text files outside UTF-8 may |
| 123 | render as garbled. The body is rendered as-is; a future commit can |
| 124 | add `golang.org/x/text/encoding` autodetection. |
| 125 | |
| 126 | ## Dependencies |
| 127 | |
| 128 | * `github.com/alecthomas/chroma/v2` — syntax highlighting |
| 129 | * `github.com/yuin/goldmark` — CommonMark + GFM |
| 130 | * `github.com/microcosm-cc/bluemonday` — HTML sanitizer |
| 131 | |
| 132 | ## Deferred polish (tracked, not blocking) |
| 133 | |
| 134 | These items are spec deliverables we ship in a later pass: |
| 135 | |
| 136 | * **Last-commit-per-entry column** with htmx lazy load and pre-walked |
| 137 | `git log --name-status` cache → wire into S18 (commit history) where |
| 138 | the same walk powers the per-file history page. |
| 139 | * **Tree caching keyed on (repo_id, ref_oid, dir_path)**, push-event |
| 140 | invalidation → wire into S36 (performance pass) once we have a real |
| 141 | workload to measure. |
| 142 | * **Pagination at 1000 entries per directory** → cosmetic for huge |
| 143 | trees; add when someone hits `node_modules`-grade inflation. |
| 144 | * **Encoding detection for non-UTF-8 source files** → file reads are |
| 145 | defensive (`io.LimitReader` + size cap); render quality is the only |
| 146 | loss until this lands. |