markdown · 6153 bytes Raw Blame History

Commits & blame

S18 ships the commits list, single-commit page, blame view, and the Atom feed. Every page resolves commit author emails to shithub user identities (display name + avatar + profile link) when there's a verified user_emails row, and falls back to the raw author + a deterministic identicon seed when there isn't.

Routes

Route Handler
GET /{owner}/{repo}/commits/{ref}/* commitsList
GET /{owner}/{repo}/commits/{ref}.atom commitsAtom
GET /{owner}/{repo}/commit/{sha} commitView
GET /{owner}/{repo}/blame/{ref}/{path...} blameView

The commits/* and blame/* patterns use chi's * wildcard so branches with / in their name (feature/x, release/v1.0/beta) resolve correctly via repogit.ResolveRef (longest-prefix match against the cached ref list, hex-SHA shortcut for 40-char first segments). Same logic the code-tab uses.

{sha} accepts 7..40 hex chars; git itself disambiguates short SHAs.

git plumbing

  • repogit.Log(ctx, gitDir, opts) — single git log --format=... call with packed-record output (ASCII unit-separator + record-end markers so newlines in commit bodies don't confuse the parser). Supports --max-count, --skip, --author, --since, --until, optional --follow -- <path>.
  • repogit.GetCommit(ctx, gitDir, sha)git log -1 --format=... for metadata + parents + tree, then repogit.DiffStat for the per-file change rows.
  • repogit.DiffStat(ctx, gitDir, sha) — combines git diff-tree -r --root --name-status -M -C (status + rename pairs) and git diff-tree -r --root --numstat (insert/delete counts, - flag for binary). The --root flag is what makes the initial parentless commit emit anything.
  • repogit.Blame(ctx, gitDir, opts)git blame --line-porcelain parsed into BlameLines, then collapsed via groupBlame into BlameChunks for the rendered "consecutive lines from the same commit collapse the gutter" UX. Caps at 5 MiB / 50k lines (returns ErrBlameTooLarge); refuses on non-blobs (ErrBlameOnBinary).

Identity resolution

internal/repos/identity/Resolver is a per-request memo that maps author emails to Resolved records:

  • User=true → matched a verified user_emails row whose user is not suspended/deleted. Fields populated: UserID, Username, DisplayName, AvatarURL.
  • User=false → unknown email; render the raw author name with a deterministic identicon seed (md5 hex of the lowercased email, matching the gravatar/identicon convention).

Construct one resolver per request and pass it through the page render. The cache is in-process and request-scoped — across-request caching is S36 territory.

A user with multiple verified emails (work + personal) resolves on either: the lookup checks user_emails against any verified row, not just the primary. Document this in the user-facing help when /settings docs land.

File-changed table on the commit view

S18 emits the rows + per-file +X / -Y stats; the per-file diff body slot is left structural so S19's diff renderer drops in without re-rendering. The status column uses git's letters: A M D R C T. Renames/copies show "old → new" in the path column.

Atom feed

Lightweight: title, id, updated, link, then per-entry id (full SHA), title (subject), updated (author time), author name+email, summary (commit body). The ID URN format urn:shithub:commit:<sha> is stable across renames and visibility flips. The feed is capped at 50 commits.

Issue-ref linkification (S21 hook)

Commit-body URLs are linkified inline; #NNN and owner/repo#NNN issue refs are emitted as stable tokens <span data-ref="#123">#123</span> so the S21 issue layer can post-render-link them without re-rendering. The token shape is documented here so the S21 enhancer doesn't have to re-derive it.

Caching

Currently no caching layer — every request runs the relevant git plumbing fresh. The S18 spec calls out cache keys ((repo_id, ref_oid, page, filters) for commit lists, (repo_id, ref_oid, path) for blame, (repo_id, sha) for single commits) and push-event invalidation. Wiring lives with the rest of the code-tab caching deferral in S36.

Pitfalls handled

  • Encoding in commit messages: bodies are HTML-escaped before any link/ref substitution; non-UTF-8 characters render as the closest fallback template.HTMLEscapeString produces (Go's escaper accepts arbitrary byte slices).
  • Path traversal in blame paths: same validateSubpath guard the code-tab uses.
  • Memory on --line-porcelain: parsed in a streaming bufio.Scanner with a 1 MiB max line length; we never io.ReadAll the porcelain output.
  • Initial-commit DiffStat: requires --root flag to diff against the empty tree. Without it, root commits show no files.
  • SHA collision with feature/abcdef…: ref-list lookup wins over hex-SHA shortcut when the same string appears in both — same rule as the code-tab ResolveRef.
  • Blame on binary: rejected early via StatPath (which already knows the kind from cat-file -t). No expensive git blame runs.

Deferred polish

  • Tree last-commit-per-entry column (POST /tree-commits htmx fragment) is the S17 deferral that lands here. The shared git log --name-status walk that powers per-file history can also power this column. Not wired in this commit set — the column slot exists in tree.html and the data shape is straight Log + DiffStat. Wire when we want the polish.
  • Caching layer with push-event invalidation → S36.
  • Older / Newer blame navigation (walk the chunk's commit's parent for that path) — UI links not yet emitted; the data path is a git log -1 <chunk_sha>^ -- <path> away. Wire when needed.
  • Signed-commit verification badge — placeholder slot only; actual verification post-MVP.
View source
1 # Commits & blame
2
3 S18 ships the commits list, single-commit page, blame view, and the
4 Atom feed. Every page resolves commit author emails to shithub user
5 identities (display name + avatar + profile link) when there's a
6 verified `user_emails` row, and falls back to the raw author + a
7 deterministic identicon seed when there isn't.
8
9 ## Routes
10
11 | Route | Handler |
12 | ---------------------------------------------------- | ------------------------ |
13 | `GET /{owner}/{repo}/commits/{ref}/*` | `commitsList` |
14 | `GET /{owner}/{repo}/commits/{ref}.atom` | `commitsAtom` |
15 | `GET /{owner}/{repo}/commit/{sha}` | `commitView` |
16 | `GET /{owner}/{repo}/blame/{ref}/{path...}` | `blameView` |
17
18 The `commits/*` and `blame/*` patterns use chi's `*` wildcard so
19 branches with `/` in their name (`feature/x`, `release/v1.0/beta`)
20 resolve correctly via `repogit.ResolveRef` (longest-prefix match
21 against the cached ref list, hex-SHA shortcut for 40-char first
22 segments). Same logic the code-tab uses.
23
24 `{sha}` accepts 7..40 hex chars; git itself disambiguates short SHAs.
25
26 ## git plumbing
27
28 * `repogit.Log(ctx, gitDir, opts)` — single `git log --format=...` call
29 with packed-record output (ASCII unit-separator + record-end markers
30 so newlines in commit bodies don't confuse the parser). Supports
31 `--max-count`, `--skip`, `--author`, `--since`, `--until`, optional
32 `--follow -- <path>`.
33 * `repogit.GetCommit(ctx, gitDir, sha)``git log -1 --format=...`
34 for metadata + parents + tree, then `repogit.DiffStat` for the
35 per-file change rows.
36 * `repogit.DiffStat(ctx, gitDir, sha)` — combines
37 `git diff-tree -r --root --name-status -M -C` (status + rename
38 pairs) and `git diff-tree -r --root --numstat` (insert/delete
39 counts, `-` flag for binary). The `--root` flag is what makes the
40 initial parentless commit emit anything.
41 * `repogit.Blame(ctx, gitDir, opts)``git blame --line-porcelain`
42 parsed into `BlameLine`s, then collapsed via `groupBlame` into
43 `BlameChunk`s for the rendered "consecutive lines from the same
44 commit collapse the gutter" UX. Caps at 5 MiB / 50k lines (returns
45 `ErrBlameTooLarge`); refuses on non-blobs (`ErrBlameOnBinary`).
46
47 ## Identity resolution
48
49 `internal/repos/identity/Resolver` is a per-request memo that maps
50 author emails to `Resolved` records:
51
52 * `User=true` → matched a verified `user_emails` row whose user is
53 not suspended/deleted. Fields populated: UserID, Username,
54 DisplayName, AvatarURL.
55 * `User=false` → unknown email; render the raw author name with a
56 deterministic identicon seed (md5 hex of the lowercased email,
57 matching the gravatar/identicon convention).
58
59 Construct one resolver per request and pass it through the page render.
60 The cache is in-process and request-scoped — across-request caching is
61 S36 territory.
62
63 A user with multiple verified emails (work + personal) resolves on
64 either: the lookup checks `user_emails` against any verified row, not
65 just the primary. Document this in the user-facing help when /settings
66 docs land.
67
68 ## File-changed table on the commit view
69
70 S18 emits the rows + per-file `+X / -Y` stats; the per-file diff body
71 slot is left structural so S19's diff renderer drops in without
72 re-rendering. The status column uses git's letters: `A M D R C T`.
73 Renames/copies show "old → new" in the path column.
74
75 ## Atom feed
76
77 Lightweight: title, id, updated, link, then per-entry id (full SHA),
78 title (subject), updated (author time), author name+email, summary
79 (commit body). The ID URN format `urn:shithub:commit:<sha>` is stable
80 across renames and visibility flips. The feed is capped at 50 commits.
81
82 ## Issue-ref linkification (S21 hook)
83
84 Commit-body URLs are linkified inline; `#NNN` and `owner/repo#NNN`
85 issue refs are emitted as stable tokens
86 `<span data-ref="#123">#123</span>` so the S21 issue layer can
87 post-render-link them without re-rendering. The token shape is
88 documented here so the S21 enhancer doesn't have to re-derive it.
89
90 ## Caching
91
92 Currently **no caching layer** — every request runs the relevant git
93 plumbing fresh. The S18 spec calls out cache keys
94 (`(repo_id, ref_oid, page, filters)` for commit lists,
95 `(repo_id, ref_oid, path)` for blame, `(repo_id, sha)` for single
96 commits) and push-event invalidation. Wiring lives with the rest of
97 the code-tab caching deferral in **S36**.
98
99 ## Pitfalls handled
100
101 * **Encoding in commit messages**: bodies are HTML-escaped before any
102 link/ref substitution; non-UTF-8 characters render as the closest
103 fallback `template.HTMLEscapeString` produces (Go's escaper accepts
104 arbitrary byte slices).
105 * **Path traversal in blame paths**: same `validateSubpath` guard the
106 code-tab uses.
107 * **Memory on `--line-porcelain`**: parsed in a streaming `bufio.Scanner`
108 with a 1 MiB max line length; we never `io.ReadAll` the porcelain
109 output.
110 * **Initial-commit DiffStat**: requires `--root` flag to diff against
111 the empty tree. Without it, root commits show no files.
112 * **SHA collision with `feature/abcdef…`**: ref-list lookup wins over
113 hex-SHA shortcut when the same string appears in both — same rule
114 as the code-tab `ResolveRef`.
115 * **Blame on binary**: rejected early via `StatPath` (which already
116 knows the kind from `cat-file -t`). No expensive `git blame` runs.
117
118 ## Deferred polish
119
120 * **Tree last-commit-per-entry column** (`POST /tree-commits` htmx
121 fragment) is the S17 deferral that lands here. The shared
122 `git log --name-status` walk that powers per-file history can also
123 power this column. Not wired in this commit set — the column slot
124 exists in `tree.html` and the data shape is straight Log + DiffStat.
125 Wire when we want the polish.
126 * **Caching layer** with push-event invalidation → S36.
127 * **Older / Newer blame navigation** (walk the chunk's commit's parent
128 for that path) — UI links not yet emitted; the data path is a
129 `git log -1 <chunk_sha>^ -- <path>` away. Wire when needed.
130 * **Signed-commit verification badge** — placeholder slot only;
131 actual verification post-MVP.