tenseleyflow/shithub / fc23ff8

Browse files

S19: docs/internal/diffs.md

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
fc23ff8e4aef5de43677f361a79b04068df7c1a3
Parents
6cb4002
Tree
af43663

1 changed file

StatusFile+-
A docs/internal/diffs.md 149 0
docs/internal/diffs.mdadded
@@ -0,0 +1,149 @@
1
+# Diffs
2
+
3
+S19 ships the diff renderer wired into the S18 single-commit page.
4
+The same renderer will host S20's compare view and S22's PR
5
+files-changed once those sprints land.
6
+
7
+## Pipeline
8
+
9
+```
10
+git diff (--patch)        →  []byte
11
+internal/repos/diff/source     produces the raw patch bytes
12
+                                  ▼
13
+internal/repos/diff/parse       wraps go-gitdiff into our types
14
+                                  ▼
15
+internal/repos/diff/render      emits HTML for the templates
16
+```
17
+
18
+The shape is library-on-the-inside, package-on-the-outside: the
19
+public surface exposes our `Diff/File/Hunk/Line` structs so we can
20
+swap the parser later without breaking callers.
21
+
22
+## Source helpers (`source/`)
23
+
24
+| Function          | Git invocation                               | Use                |
25
+| ----------------- | -------------------------------------------- | ------------------ |
26
+| `FromCommit`      | `git diff-tree -p -r --root <sha>`           | single-commit page |
27
+| `FromRange`       | `git diff <base>..<head>`                    | "show all changes" |
28
+| `FromMergeBase`   | `git diff <base>...<head>` (three-dot)        | PR / compare       |
29
+
30
+`Options` carries `IgnoreWhitespace` (`-w`) and `FindRenames` (`-M -C`).
31
+Whitespace toggles re-source rather than parser-side filter — git's
32
+`-w` respects language quirks (Python indentation, etc.) more
33
+correctly than post-processing hunks.
34
+
35
+The `--root` flag on `FromCommit` is what lets the initial parentless
36
+commit emit its files (against the empty tree); without it the
37
+parser sees nothing.
38
+
39
+## Parser (`parse/`)
40
+
41
+Wraps `github.com/bluekeyes/go-gitdiff/gitdiff`. Translates each
42
+`*gitdiff.File` into our `File`, expanding `TextFragments` into
43
+`Hunks` whose `Lines` carry one of four `LineKind`s:
44
+
45
+| Kind            | Meaning                                  |
46
+| --------------- | ---------------------------------------- |
47
+| `LineContext`   | unchanged context line                   |
48
+| `LineAdd`       | `+` — added line                         |
49
+| `LineDelete`    | `-` — deleted line                       |
50
+| `LineNoNewline` | `\ No newline at end of file` (rare)     |
51
+
52
+Old/new line numbers are populated based on kind:
53
+- Context: both populated
54
+- Add: only `NewLineNo`
55
+- Delete: only `OldLineNo`
56
+
57
+The renderer reads `SizeBytes` (sum of line content) for the
58
+"per-file too large" decision — cheap to compute during parse.
59
+
60
+## Renderer (`render/`)
61
+
62
+Two modes share the same `RenderFile` skeleton (header → hunks →
63
+lines):
64
+
65
+- **Unified** — old/new line numbers in two adjacent `<td>` cells,
66
+  content in the third. The `+`/`-`/space marker is prepended to the
67
+  content for visual consistency with `git diff`.
68
+- **Split** — left side (old) and right side (new) in separate
69
+  columns. Adds and deletes pair row-by-row; trailing additions on
70
+  either side leave the other column padded.
71
+
72
+### Per-file collapse + whole-diff truncate
73
+
74
+| Threshold         | Default | Trigger                                                     |
75
+| ----------------- | ------- | ----------------------------------------------------------- |
76
+| `PerFileLineCap`  | 1000    | files with > 1000 changed lines collapse under `<details>`  |
77
+| `PerFileBytesCap` | 500 KiB | per-file size cap                                           |
78
+| `WholeDiffFileCap`| 100     | whole-diff truncation: every file collapses, file list eager |
79
+
80
+The collapsed `<details>` currently renders the hunks inside the
81
+toggle (lazy at the user's click but still inline). The spec calls
82
+for a separate `/diff-fragment` endpoint that fetches per-file hunks
83
+on demand; that's a polish for **S36** when we have a real big-PR
84
+workload to bench against.
85
+
86
+### Binary + image
87
+
88
+Binary files emit a `shithub-diff-binary` placeholder. Image files
89
+(by extension) render an `shithub-diff-image` placeholder note —
90
+the side-by-side preview wires once the renderer accepts ref+path
91
+context for `/raw/...` URLs (deferred polish; the commit page already
92
+links the changed file in the file table).
93
+
94
+### Highlighting
95
+
96
+The renderer currently HTML-escapes line content; the file-level
97
+`<a>` link to the blob view (which uses the S17 Chroma highlighter)
98
+is the path users follow when they want syntax highlighting on a
99
+specific change. Per-line Chroma highlighting in the diff itself is
100
+deferred — escape is the floor; richer rendering is polish.
101
+
102
+## Routes
103
+
104
+The S19 changes are entirely additive to the S18 commit page. New
105
+query parameters:
106
+
107
+| Param         | Default | Effect                                  |
108
+| ------------- | ------- | --------------------------------------- |
109
+| `?diff=unified` | yes     | unified mode                            |
110
+| `?diff=split`   |         | split mode                              |
111
+| `?w=1`          |         | re-source with `-w` (ignore whitespace) |
112
+
113
+The toggles emit links that preserve the other parameter so users
114
+don't lose a mode when toggling whitespace and vice versa.
115
+
116
+## Tests
117
+
118
+- `parse/parse_test.go` — happy path, rename, binary, empty, multi-file.
119
+- `render/render_test.go` — unified + split classes present, binary
120
+  placeholder, rename header, too-large collapse, empty-diff label.
121
+
122
+## Pitfalls handled
123
+
124
+- **Initial-commit diff emits nothing without `--root`** — the
125
+  source helper passes the flag.
126
+- **`go-gitdiff` returns `(nil, _, io.EOF)` for empty input** —
127
+  treated as "no files," no error.
128
+- **HTML escape on every line** — content goes through
129
+  `html.EscapeString`; only the renderer's wrapper markup is raw.
130
+- **Whitespace toggle cache key** — currently no cache; when one
131
+  lands in S36, the cache key must include `IgnoreWhitespace`.
132
+- **Trailing-newline preservation** — `parse.trimNewline` strips one
133
+  `\n`; multi-newline trailing data is preserved verbatim.
134
+
135
+## Deferred polish (tracked)
136
+
137
+* **`POST /{owner}/{repo}/diff-fragment?...` endpoint** — per-file
138
+  lazy fetch for whole-diff-truncated cases. Track in S36 with the
139
+  rest of the diff/blame caching work; current `<details>` inline
140
+  expand is the floor.
141
+* **Per-line Chroma highlighting in diff hunks** — every line
142
+  currently HTML-escapes; richer rendering is post-MVP. The Chroma
143
+  helper (`internal/repos/highlight`) is the same one the blob view
144
+  uses, so the wiring is mechanical when desired.
145
+* **Image diff side-by-side preview** — needs `/raw/<sha>/<path>`
146
+  threading into the renderer. Current placeholder is honest; the
147
+  file table already links the new path's blob.
148
+* **`Diff` cache keyed on `(base_sha, head_sha)`** — S36, with the
149
+  rest of the immutable-content cache layer.