markdown · 6502 bytes Raw Blame History

Code tab

The code tab is the GitHub-style repo browser: tree listing, blob view with syntax highlighting, raw view, "Go to file" finder, and the branch/tag switcher. After a successful push, hitting /{owner}/{repo} sends the user to /tree/{default_branch}.

Routes

Route Handler
GET /{owner}/{repo} redirects to /tree/{default}
GET /{owner}/{repo}/tree/{ref}/{path...} codeTree
GET /{owner}/{repo}/blob/{ref}/{path...} codeBlob
GET /{owner}/{repo}/raw/{ref}/{path...} codeRaw
GET /{owner}/{repo}/find/{ref}?q=... codeFinder
GET /static/css/chroma.css runtime-generated Chroma theme

Every code-tab handler runs through policy.Can(... ActionRepoRead) — private repos hide from anonymous viewers and unrelated users via the existence-leak 404 guard from S15.

Ref + path disambiguation

{ref} is the chi * wildcard, so the URL /tree/feature/x/sub/file.go arrives as a single string. Resolution:

  1. If the first segment is exactly 40 hex chars → treat as a SHA, the rest is the path.
  2. Otherwise, longest-prefix match against the cached ref list (branches first, then tags, sorted longest-first). The remainder after the matched ref is the in-tree path.

This handles release/v1.0/beta/CHANGELOG.md correctly without ambiguity. Resolution lives in internal/repos/git/treeops.go::ResolveRef.

Path validation rejects .., control chars, leading slashes, and backslashes — defense in depth on top of git's own validation.

Tree listing

git ls-tree --long --full-tree <ref>:<path> is parsed into typed TreeEntry values (tree | blob | commit | symlink). Sort is directories first, then files alphabetically.

The S17 ship excludes the htmx-driven "last commit per entry" column that the spec describes — an extra round-trip we can add later without a schema change. The current page renders the listing immediately. Deferred to S18 (commits-per-entry) — the spec calls out this deferral path; the tree template has the column slot ready.

File view

codeBlob walks four cases:

  • Large (>1 MiB): placeholder + raw download link, no body read.
  • Binary (NUL byte in first 8 KiB): placeholder. Image extensions (png/jpg/jpeg/gif/webp) ≤5 MiB get an <img> preview pointing at /raw/....
  • Markdown (.md/.markdown): Goldmark + bluemonday rendered HTML PLUS a <details> source-toggle with the highlighted source.
  • Default text: Chroma highlight by filename extension, content sniffing fallback.

Chroma uses the github style baked at process start; the CSS is served from /static/css/chroma.css via a tiny in-process generator.

Raw view

  • Content-Type derived from the extension whitelist (code.go::rawContentType).
  • X-Content-Type-Options: nosniff always.
  • Content-Security-Policy: default-src 'none'; sandbox at the handler level (the global SecureHeaders middleware may overlay a broader CSP — both are restrictive; the OR of the two is what user agents enforce).
  • Content-Disposition: attachment is forced for HTML, SVG, JS, WASM, and anything that could execute on shithub's domain. We don't have a separate raw.shithub.tld host yet (post-MVP); attachment is the safety belt.
  • Streamed via git cat-file -p; never buffered. Large blobs don't blow up the worker's memory.

Finder ("Go to file")

/find/{ref} lists every blob path on the ref via git ls-tree -r --name-only, then filters with internal/repos/finder/finder.go::Filter. The matcher is a subsequence-with-bonus scorer (boundary, consecutive run, basename hit) — not as fancy as VS Code's quickopen but good enough for tens of thousands of paths.

Key shortcut and live-filter via htmx are spec deliverables that we defer for now — the form-submission flow works without JS and that's the floor S17 commits to.

Caching

Currently no caching layer. Every request runs git for-each-ref, git ls-tree, etc. That's fine for small-to-medium repos; the cost shows up on big repos with deep trees. The S17 spec proposes a cache keyed on (repo_id, ref_oid, dir_path) invalidated on push (S14's push:process job is the right invalidation hook).

Deferred — the cache is purely performance polish. When we hit a real-world repo where it matters, wire it in: file internal/cache/ plus a callback in worker/jobs/push_process.go. The handlers already take a per-request policy.Cache so adding a per-process git cache is mechanically straightforward.

Pitfalls + protections

  • XSS via raw HTML/SVG: blocked by Content-Disposition: attachment for those extensions.
  • XSS via markdown: Goldmark configured without HTML passthrough + bluemonday's UGC policy on top. Tests in internal/repos/markdown/ (TODO — minimal coverage today).
  • Path traversal: validateSubpath in code.go rejects .., controls, leading slashes.
  • Hex collision with SHA: ref-list lookup wins over SHA shortcut when the same string is both.
  • Encoding (GBK / Shift-JIS): TODO — text files outside UTF-8 may render as garbled. The body is rendered as-is; a future commit can add golang.org/x/text/encoding autodetection.

Dependencies

  • github.com/alecthomas/chroma/v2 — syntax highlighting
  • github.com/yuin/goldmark — CommonMark + GFM
  • github.com/microcosm-cc/bluemonday — HTML sanitizer

Deferred polish (tracked, not blocking)

These items are spec deliverables we ship in a later pass:

  • Last-commit-per-entry column with htmx lazy load and pre-walked git log --name-status cache → wire into S18 (commit history) where the same walk powers the per-file history page.
  • Tree caching keyed on (repo_id, ref_oid, dir_path), push-event invalidation → wire into S36 (performance pass) once we have a real workload to measure.
  • Pagination at 1000 entries per directory → cosmetic for huge trees; add when someone hits node_modules-grade inflation.
  • Encoding detection for non-UTF-8 source files → file reads are defensive (io.LimitReader + size cap); render quality is the only loss until this lands.
View source
1 # Code tab
2
3 The code tab is the GitHub-style repo browser: tree listing, blob view
4 with syntax highlighting, raw view, "Go to file" finder, and the
5 branch/tag switcher. After a successful push, hitting `/{owner}/{repo}`
6 sends the user to `/tree/{default_branch}`.
7
8 ## Routes
9
10 | Route | Handler |
11 | ------------------------------------------------ | -------------------------------- |
12 | `GET /{owner}/{repo}` | redirects to `/tree/{default}` |
13 | `GET /{owner}/{repo}/tree/{ref}/{path...}` | `codeTree` |
14 | `GET /{owner}/{repo}/blob/{ref}/{path...}` | `codeBlob` |
15 | `GET /{owner}/{repo}/raw/{ref}/{path...}` | `codeRaw` |
16 | `GET /{owner}/{repo}/find/{ref}?q=...` | `codeFinder` |
17 | `GET /static/css/chroma.css` | runtime-generated Chroma theme |
18
19 Every code-tab handler runs through `policy.Can(... ActionRepoRead)`
20 private repos hide from anonymous viewers and unrelated users via the
21 existence-leak 404 guard from S15.
22
23 ## Ref + path disambiguation
24
25 `{ref}` is the chi `*` wildcard, so the URL `/tree/feature/x/sub/file.go`
26 arrives as a single string. Resolution:
27
28 1. If the first segment is exactly 40 hex chars → treat as a SHA, the
29 rest is the path.
30 2. Otherwise, longest-prefix match against the cached ref list
31 (branches first, then tags, sorted longest-first). The remainder
32 after the matched ref is the in-tree path.
33
34 This handles `release/v1.0/beta/CHANGELOG.md` correctly without
35 ambiguity. Resolution lives in `internal/repos/git/treeops.go::ResolveRef`.
36
37 Path validation rejects `..`, control chars, leading slashes, and
38 backslashes — defense in depth on top of git's own validation.
39
40 ## Tree listing
41
42 `git ls-tree --long --full-tree <ref>:<path>` is parsed into typed
43 `TreeEntry` values (`tree | blob | commit | symlink`). Sort is
44 directories first, then files alphabetically.
45
46 The S17 ship excludes the htmx-driven "last commit per entry" column
47 that the spec describes — an extra round-trip we can add later without
48 a schema change. The current page renders the listing immediately.
49 **Deferred to S18 (commits-per-entry)** — the spec calls out this
50 deferral path; the tree template has the column slot ready.
51
52 ## File view
53
54 `codeBlob` walks four cases:
55
56 * **Large** (>1 MiB): placeholder + raw download link, no body read.
57 * **Binary** (NUL byte in first 8 KiB): placeholder. Image extensions
58 (png/jpg/jpeg/gif/webp) ≤5 MiB get an `<img>` preview pointing at
59 `/raw/...`.
60 * **Markdown** (`.md`/`.markdown`): Goldmark + bluemonday rendered HTML
61 PLUS a `<details>` source-toggle with the highlighted source.
62 * **Default text**: Chroma highlight by filename extension, content
63 sniffing fallback.
64
65 Chroma uses the `github` style baked at process start; the CSS is
66 served from `/static/css/chroma.css` via a tiny in-process generator.
67
68 ## Raw view
69
70 * Content-Type derived from the extension whitelist
71 (`code.go::rawContentType`).
72 * `X-Content-Type-Options: nosniff` always.
73 * `Content-Security-Policy: default-src 'none'; sandbox` at the
74 handler level (the global SecureHeaders middleware may overlay a
75 broader CSP — both are restrictive; the OR of the two is what user
76 agents enforce).
77 * **`Content-Disposition: attachment`** is forced for HTML, SVG, JS,
78 WASM, and anything that could execute on shithub's domain. We don't
79 have a separate `raw.shithub.tld` host yet (post-MVP); attachment is
80 the safety belt.
81 * Streamed via `git cat-file -p`; never buffered. Large blobs don't
82 blow up the worker's memory.
83
84 ## Finder ("Go to file")
85
86 `/find/{ref}` lists every blob path on the ref via
87 `git ls-tree -r --name-only`, then filters with
88 `internal/repos/finder/finder.go::Filter`. The matcher is a
89 subsequence-with-bonus scorer (boundary, consecutive run, basename
90 hit) — not as fancy as VS Code's quickopen but good enough for tens of
91 thousands of paths.
92
93 Key shortcut and live-filter via htmx are spec deliverables that we
94 defer for now — the form-submission flow works without JS and that's
95 the floor S17 commits to.
96
97 ## Caching
98
99 Currently **no caching layer**. Every request runs `git for-each-ref`,
100 `git ls-tree`, etc. That's fine for small-to-medium repos; the cost
101 shows up on big repos with deep trees. The S17 spec proposes a cache
102 keyed on `(repo_id, ref_oid, dir_path)` invalidated on push (S14's
103 `push:process` job is the right invalidation hook).
104
105 **Deferred** — the cache is purely performance polish. When we hit a
106 real-world repo where it matters, wire it in: file `internal/cache/`
107 plus a callback in `worker/jobs/push_process.go`. The handlers already
108 take a per-request `policy.Cache` so adding a per-process git cache is
109 mechanically straightforward.
110
111 ## Pitfalls + protections
112
113 * **XSS via raw HTML/SVG**: blocked by `Content-Disposition: attachment`
114 for those extensions.
115 * **XSS via markdown**: Goldmark configured without HTML passthrough +
116 bluemonday's UGC policy on top. Tests in `internal/repos/markdown/`
117 (TODO — minimal coverage today).
118 * **Path traversal**: `validateSubpath` in `code.go` rejects `..`,
119 controls, leading slashes.
120 * **Hex collision with SHA**: ref-list lookup wins over SHA shortcut
121 when the same string is both.
122 * **Encoding (GBK / Shift-JIS)**: TODO — text files outside UTF-8 may
123 render as garbled. The body is rendered as-is; a future commit can
124 add `golang.org/x/text/encoding` autodetection.
125
126 ## Dependencies
127
128 * `github.com/alecthomas/chroma/v2` — syntax highlighting
129 * `github.com/yuin/goldmark` — CommonMark + GFM
130 * `github.com/microcosm-cc/bluemonday` — HTML sanitizer
131
132 ## Deferred polish (tracked, not blocking)
133
134 These items are spec deliverables we ship in a later pass:
135
136 * **Last-commit-per-entry column** with htmx lazy load and pre-walked
137 `git log --name-status` cache → wire into S18 (commit history) where
138 the same walk powers the per-file history page.
139 * **Tree caching keyed on (repo_id, ref_oid, dir_path)**, push-event
140 invalidation → wire into S36 (performance pass) once we have a real
141 workload to measure.
142 * **Pagination at 1000 entries per directory** → cosmetic for huge
143 trees; add when someone hits `node_modules`-grade inflation.
144 * **Encoding detection for non-UTF-8 source files** → file reads are
145 defensive (`io.LimitReader` + size cap); render quality is the only
146 loss until this lands.