tenseleyflow/shithub / 6a9105d

Browse files

S18: docs/internal/commits-blame.md

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
6a9105db46a62e57e2d4f7cbd0a5c12b8739489e
Parents
3b9caa6
Tree
09fd812

1 changed file

StatusFile+-
A docs/internal/commits-blame.md 131 0
docs/internal/commits-blame.mdadded
@@ -0,0 +1,131 @@
1
+# Commits & blame
2
+
3
+S18 ships the commits list, single-commit page, blame view, and the
4
+Atom feed. Every page resolves commit author emails to shithub user
5
+identities (display name + avatar + profile link) when there's a
6
+verified `user_emails` row, and falls back to the raw author + a
7
+deterministic identicon seed when there isn't.
8
+
9
+## Routes
10
+
11
+| Route                                                | Handler                  |
12
+| ---------------------------------------------------- | ------------------------ |
13
+| `GET /{owner}/{repo}/commits/{ref}/*`                | `commitsList`            |
14
+| `GET /{owner}/{repo}/commits/{ref}.atom`             | `commitsAtom`            |
15
+| `GET /{owner}/{repo}/commit/{sha}`                   | `commitView`             |
16
+| `GET /{owner}/{repo}/blame/{ref}/{path...}`          | `blameView`              |
17
+
18
+The `commits/*` and `blame/*` patterns use chi's `*` wildcard so
19
+branches with `/` in their name (`feature/x`, `release/v1.0/beta`)
20
+resolve correctly via `repogit.ResolveRef` (longest-prefix match
21
+against the cached ref list, hex-SHA shortcut for 40-char first
22
+segments). Same logic the code-tab uses.
23
+
24
+`{sha}` accepts 7..40 hex chars; git itself disambiguates short SHAs.
25
+
26
+## git plumbing
27
+
28
+* `repogit.Log(ctx, gitDir, opts)` — single `git log --format=...` call
29
+  with packed-record output (ASCII unit-separator + record-end markers
30
+  so newlines in commit bodies don't confuse the parser). Supports
31
+  `--max-count`, `--skip`, `--author`, `--since`, `--until`, optional
32
+  `--follow -- <path>`.
33
+* `repogit.GetCommit(ctx, gitDir, sha)` — `git log -1 --format=...`
34
+  for metadata + parents + tree, then `repogit.DiffStat` for the
35
+  per-file change rows.
36
+* `repogit.DiffStat(ctx, gitDir, sha)` — combines
37
+  `git diff-tree -r --root --name-status -M -C` (status + rename
38
+  pairs) and `git diff-tree -r --root --numstat` (insert/delete
39
+  counts, `-` flag for binary). The `--root` flag is what makes the
40
+  initial parentless commit emit anything.
41
+* `repogit.Blame(ctx, gitDir, opts)` — `git blame --line-porcelain`
42
+  parsed into `BlameLine`s, then collapsed via `groupBlame` into
43
+  `BlameChunk`s for the rendered "consecutive lines from the same
44
+  commit collapse the gutter" UX. Caps at 5 MiB / 50k lines (returns
45
+  `ErrBlameTooLarge`); refuses on non-blobs (`ErrBlameOnBinary`).
46
+
47
+## Identity resolution
48
+
49
+`internal/repos/identity/Resolver` is a per-request memo that maps
50
+author emails to `Resolved` records:
51
+
52
+* `User=true` → matched a verified `user_emails` row whose user is
53
+  not suspended/deleted. Fields populated: UserID, Username,
54
+  DisplayName, AvatarURL.
55
+* `User=false` → unknown email; render the raw author name with a
56
+  deterministic identicon seed (md5 hex of the lowercased email,
57
+  matching the gravatar/identicon convention).
58
+
59
+Construct one resolver per request and pass it through the page render.
60
+The cache is in-process and request-scoped — across-request caching is
61
+S36 territory.
62
+
63
+A user with multiple verified emails (work + personal) resolves on
64
+either: the lookup checks `user_emails` against any verified row, not
65
+just the primary. Document this in the user-facing help when /settings
66
+docs land.
67
+
68
+## File-changed table on the commit view
69
+
70
+S18 emits the rows + per-file `+X / -Y` stats; the per-file diff body
71
+slot is left structural so S19's diff renderer drops in without
72
+re-rendering. The status column uses git's letters: `A M D R C T`.
73
+Renames/copies show "old → new" in the path column.
74
+
75
+## Atom feed
76
+
77
+Lightweight: title, id, updated, link, then per-entry id (full SHA),
78
+title (subject), updated (author time), author name+email, summary
79
+(commit body). The ID URN format `urn:shithub:commit:<sha>` is stable
80
+across renames and visibility flips. The feed is capped at 50 commits.
81
+
82
+## Issue-ref linkification (S21 hook)
83
+
84
+Commit-body URLs are linkified inline; `#NNN` and `owner/repo#NNN`
85
+issue refs are emitted as stable tokens
86
+`<span data-ref="#123">#123</span>` so the S21 issue layer can
87
+post-render-link them without re-rendering. The token shape is
88
+documented here so the S21 enhancer doesn't have to re-derive it.
89
+
90
+## Caching
91
+
92
+Currently **no caching layer** — every request runs the relevant git
93
+plumbing fresh. The S18 spec calls out cache keys
94
+(`(repo_id, ref_oid, page, filters)` for commit lists,
95
+`(repo_id, ref_oid, path)` for blame, `(repo_id, sha)` for single
96
+commits) and push-event invalidation. Wiring lives with the rest of
97
+the code-tab caching deferral in **S36**.
98
+
99
+## Pitfalls handled
100
+
101
+* **Encoding in commit messages**: bodies are HTML-escaped before any
102
+  link/ref substitution; non-UTF-8 characters render as the closest
103
+  fallback `template.HTMLEscapeString` produces (Go's escaper accepts
104
+  arbitrary byte slices).
105
+* **Path traversal in blame paths**: same `validateSubpath` guard the
106
+  code-tab uses.
107
+* **Memory on `--line-porcelain`**: parsed in a streaming `bufio.Scanner`
108
+  with a 1 MiB max line length; we never `io.ReadAll` the porcelain
109
+  output.
110
+* **Initial-commit DiffStat**: requires `--root` flag to diff against
111
+  the empty tree. Without it, root commits show no files.
112
+* **SHA collision with `feature/abcdef…`**: ref-list lookup wins over
113
+  hex-SHA shortcut when the same string appears in both — same rule
114
+  as the code-tab `ResolveRef`.
115
+* **Blame on binary**: rejected early via `StatPath` (which already
116
+  knows the kind from `cat-file -t`). No expensive `git blame` runs.
117
+
118
+## Deferred polish
119
+
120
+* **Tree last-commit-per-entry column** (`POST /tree-commits` htmx
121
+  fragment) is the S17 deferral that lands here. The shared
122
+  `git log --name-status` walk that powers per-file history can also
123
+  power this column. Not wired in this commit set — the column slot
124
+  exists in `tree.html` and the data shape is straight Log + DiffStat.
125
+  Wire when we want the polish.
126
+* **Caching layer** with push-event invalidation → S36.
127
+* **Older / Newer blame navigation** (walk the chunk's commit's parent
128
+  for that path) — UI links not yet emitted; the data path is a
129
+  `git log -1 <chunk_sha>^ -- <path>` away. Wire when needed.
130
+* **Signed-commit verification badge** — placeholder slot only;
131
+  actual verification post-MVP.