@@ -0,0 +1,131 @@ |
| 1 | +# Commits & blame |
| 2 | + |
| 3 | +S18 ships the commits list, single-commit page, blame view, and the |
| 4 | +Atom feed. Every page resolves commit author emails to shithub user |
| 5 | +identities (display name + avatar + profile link) when there's a |
| 6 | +verified `user_emails` row, and falls back to the raw author + a |
| 7 | +deterministic identicon seed when there isn't. |
| 8 | + |
| 9 | +## Routes |
| 10 | + |
| 11 | +| Route | Handler | |
| 12 | +| ---------------------------------------------------- | ------------------------ | |
| 13 | +| `GET /{owner}/{repo}/commits/{ref}/*` | `commitsList` | |
| 14 | +| `GET /{owner}/{repo}/commits/{ref}.atom` | `commitsAtom` | |
| 15 | +| `GET /{owner}/{repo}/commit/{sha}` | `commitView` | |
| 16 | +| `GET /{owner}/{repo}/blame/{ref}/{path...}` | `blameView` | |
| 17 | + |
| 18 | +The `commits/*` and `blame/*` patterns use chi's `*` wildcard so |
| 19 | +branches with `/` in their name (`feature/x`, `release/v1.0/beta`) |
| 20 | +resolve correctly via `repogit.ResolveRef` (longest-prefix match |
| 21 | +against the cached ref list, hex-SHA shortcut for 40-char first |
| 22 | +segments). Same logic the code-tab uses. |
| 23 | + |
| 24 | +`{sha}` accepts 7..40 hex chars; git itself disambiguates short SHAs. |
| 25 | + |
| 26 | +## git plumbing |
| 27 | + |
| 28 | +* `repogit.Log(ctx, gitDir, opts)` — single `git log --format=...` call |
| 29 | + with packed-record output (ASCII unit-separator + record-end markers |
| 30 | + so newlines in commit bodies don't confuse the parser). Supports |
| 31 | + `--max-count`, `--skip`, `--author`, `--since`, `--until`, optional |
| 32 | + `--follow -- <path>`. |
| 33 | +* `repogit.GetCommit(ctx, gitDir, sha)` — `git log -1 --format=...` |
| 34 | + for metadata + parents + tree, then `repogit.DiffStat` for the |
| 35 | + per-file change rows. |
| 36 | +* `repogit.DiffStat(ctx, gitDir, sha)` — combines |
| 37 | + `git diff-tree -r --root --name-status -M -C` (status + rename |
| 38 | + pairs) and `git diff-tree -r --root --numstat` (insert/delete |
| 39 | + counts, `-` flag for binary). The `--root` flag is what makes the |
| 40 | + initial parentless commit emit anything. |
| 41 | +* `repogit.Blame(ctx, gitDir, opts)` — `git blame --line-porcelain` |
| 42 | + parsed into `BlameLine`s, then collapsed via `groupBlame` into |
| 43 | + `BlameChunk`s for the rendered "consecutive lines from the same |
| 44 | + commit collapse the gutter" UX. Caps at 5 MiB / 50k lines (returns |
| 45 | + `ErrBlameTooLarge`); refuses on non-blobs (`ErrBlameOnBinary`). |
| 46 | + |
| 47 | +## Identity resolution |
| 48 | + |
| 49 | +`internal/repos/identity/Resolver` is a per-request memo that maps |
| 50 | +author emails to `Resolved` records: |
| 51 | + |
| 52 | +* `User=true` → matched a verified `user_emails` row whose user is |
| 53 | + not suspended/deleted. Fields populated: UserID, Username, |
| 54 | + DisplayName, AvatarURL. |
| 55 | +* `User=false` → unknown email; render the raw author name with a |
| 56 | + deterministic identicon seed (md5 hex of the lowercased email, |
| 57 | + matching the gravatar/identicon convention). |
| 58 | + |
| 59 | +Construct one resolver per request and pass it through the page render. |
| 60 | +The cache is in-process and request-scoped — across-request caching is |
| 61 | +S36 territory. |
| 62 | + |
| 63 | +A user with multiple verified emails (work + personal) resolves on |
| 64 | +either: the lookup checks `user_emails` against any verified row, not |
| 65 | +just the primary. Document this in the user-facing help when /settings |
| 66 | +docs land. |
| 67 | + |
| 68 | +## File-changed table on the commit view |
| 69 | + |
| 70 | +S18 emits the rows + per-file `+X / -Y` stats; the per-file diff body |
| 71 | +slot is left structural so S19's diff renderer drops in without |
| 72 | +re-rendering. The status column uses git's letters: `A M D R C T`. |
| 73 | +Renames/copies show "old → new" in the path column. |
| 74 | + |
| 75 | +## Atom feed |
| 76 | + |
| 77 | +Lightweight: title, id, updated, link, then per-entry id (full SHA), |
| 78 | +title (subject), updated (author time), author name+email, summary |
| 79 | +(commit body). The ID URN format `urn:shithub:commit:<sha>` is stable |
| 80 | +across renames and visibility flips. The feed is capped at 50 commits. |
| 81 | + |
| 82 | +## Issue-ref linkification (S21 hook) |
| 83 | + |
| 84 | +Commit-body URLs are linkified inline; `#NNN` and `owner/repo#NNN` |
| 85 | +issue refs are emitted as stable tokens |
| 86 | +`<span data-ref="#123">#123</span>` so the S21 issue layer can |
| 87 | +post-render-link them without re-rendering. The token shape is |
| 88 | +documented here so the S21 enhancer doesn't have to re-derive it. |
| 89 | + |
| 90 | +## Caching |
| 91 | + |
| 92 | +Currently **no caching layer** — every request runs the relevant git |
| 93 | +plumbing fresh. The S18 spec calls out cache keys |
| 94 | +(`(repo_id, ref_oid, page, filters)` for commit lists, |
| 95 | +`(repo_id, ref_oid, path)` for blame, `(repo_id, sha)` for single |
| 96 | +commits) and push-event invalidation. Wiring lives with the rest of |
| 97 | +the code-tab caching deferral in **S36**. |
| 98 | + |
| 99 | +## Pitfalls handled |
| 100 | + |
| 101 | +* **Encoding in commit messages**: bodies are HTML-escaped before any |
| 102 | + link/ref substitution; non-UTF-8 characters render as the closest |
| 103 | + fallback `template.HTMLEscapeString` produces (Go's escaper accepts |
| 104 | + arbitrary byte slices). |
| 105 | +* **Path traversal in blame paths**: same `validateSubpath` guard the |
| 106 | + code-tab uses. |
| 107 | +* **Memory on `--line-porcelain`**: parsed in a streaming `bufio.Scanner` |
| 108 | + with a 1 MiB max line length; we never `io.ReadAll` the porcelain |
| 109 | + output. |
| 110 | +* **Initial-commit DiffStat**: requires `--root` flag to diff against |
| 111 | + the empty tree. Without it, root commits show no files. |
| 112 | +* **SHA collision with `feature/abcdef…`**: ref-list lookup wins over |
| 113 | + hex-SHA shortcut when the same string appears in both — same rule |
| 114 | + as the code-tab `ResolveRef`. |
| 115 | +* **Blame on binary**: rejected early via `StatPath` (which already |
| 116 | + knows the kind from `cat-file -t`). No expensive `git blame` runs. |
| 117 | + |
| 118 | +## Deferred polish |
| 119 | + |
| 120 | +* **Tree last-commit-per-entry column** (`POST /tree-commits` htmx |
| 121 | + fragment) is the S17 deferral that lands here. The shared |
| 122 | + `git log --name-status` walk that powers per-file history can also |
| 123 | + power this column. Not wired in this commit set — the column slot |
| 124 | + exists in `tree.html` and the data shape is straight Log + DiffStat. |
| 125 | + Wire when we want the polish. |
| 126 | +* **Caching layer** with push-event invalidation → S36. |
| 127 | +* **Older / Newer blame navigation** (walk the chunk's commit's parent |
| 128 | + for that path) — UI links not yet emitted; the data path is a |
| 129 | + `git log -1 <chunk_sha>^ -- <path>` away. Wire when needed. |
| 130 | +* **Signed-commit verification badge** — placeholder slot only; |
| 131 | + actual verification post-MVP. |