tenseleyflow/shithub / a2e30d7

Browse files

S39: docs — capacity envelope + a11y + pen-test records + threat-model review

Authored by espadonne
SHA
a2e30d7641b2ad7d38af7de6934b1d96ceef1b7f
Parents
a6cb1fd
Tree
1cd7a28

4 changed files

StatusFile+-
A docs/internal/a11y-audit-record.md 77 0
A docs/internal/capacity.md 108 0
A docs/internal/pen-test-record.md 99 0
M docs/internal/threat-model.md 46 0
docs/internal/a11y-audit-record.mdadded
@@ -0,0 +1,77 @@
1
+# Accessibility audit record
2
+
3
+Tracks the findings from the S39 WCAG AA pass and their
4
+disposition (closed / accepted with rationale). Pair with the
5
+tooling under `tests/a11y/` (pa11y-ci + axe-core via Puppeteer)
6
+and the manual screen-reader passes.
7
+
8
+> **Status.** This file is the operator log. Entries get added
9
+> as findings come in; nothing here yet because the live audit
10
+> happens against the staging instance, not at code-write time.
11
+> The structure below shows the format the operator uses.
12
+
13
+## Audited route set
14
+
15
+The S39 acceptance gate is "pa11y reports zero high-severity
16
+issues across the audited route set." Routes under audit:
17
+
18
+- Anonymous: `/`, `/signup`, `/login`, `/explore`, `/-/health`
19
+- Authenticated: dashboard, `/settings/profile`,
20
+  `/settings/security/2fa`, `/new`, `/notifications`,
21
+  one repo overview, one issue view, one PR view (with diff),
22
+  one PR review form
23
+- Admin: `/admin/`, `/admin/users`, `/admin/users/{id}`
24
+
25
+Specifics for the manual SR pass on top of the automated runs:
26
+
27
+- Diff view labelling old/new sides for SR users.
28
+- Modal dialogs (delete-repo confirm, transfer-repo confirm,
29
+  rotate-secret confirm) trap focus and announce on open.
30
+- Form errors associated with their fields via `aria-describedby`.
31
+- Tables (issue lists, PR lists, audit log) have proper `<th
32
+  scope>` headers.
33
+- Keyboard order matches visual order on every form.
34
+
35
+## Findings template
36
+
37
+Each finding is one row:
38
+
39
+```
40
+### F-NN — <short title>
41
+
42
+- **Found by:** pa11y / axe / manual SR / manual keyboard / dev review
43
+- **Route:** /…/…
44
+- **Tool rule (if automated):** WCAG2AA.<...>
45
+- **Impact:** critical / serious / moderate / minor
46
+- **Description:** what's wrong, in one paragraph.
47
+- **Disposition:** fixed in <commit-sha> / accepted: <rationale> / deferred to <sprint>
48
+- **Re-tested on:** <date>
49
+```
50
+
51
+## Dispositions accepted with rationale
52
+
53
+These are findings we acknowledge but do not fix in S39:
54
+
55
+(none yet)
56
+
57
+## Manual SR notes
58
+
59
+NVDA + Firefox / VoiceOver + Safari — keep notes here so we don't
60
+re-discover the same SR-readability nuances across sprints.
61
+
62
+(none yet)
63
+
64
+## CI integration
65
+
66
+The `audit-a11y-pa11y` Makefile target runs pa11y-ci against the
67
+URL list. Hooked into a manual-trigger CI job (not the main `ci`
68
+target — it needs a running shithub on the runner, which the
69
+default CI environment doesn't provide). The run produces the
70
+findings list that gets transcribed into this file.
71
+
72
+## Re-audit cadence
73
+
74
+- Every sprint that touches `internal/web/templates/` or
75
+  `internal/web/static/css/`.
76
+- Every release that adds a new top-level route.
77
+- Quarterly full audit (matches the security re-audit cadence).
docs/internal/capacity.mdadded
@@ -0,0 +1,108 @@
1
+# Capacity envelope
2
+
3
+Records the load-test results from the S39 hardening sprint and
4
+the rule-of-thumb numbers we use for capacity planning. The
5
+public-facing version is `docs/public/self-host/capacity.md`,
6
+which is summary-only; this file carries the run-by-run detail.
7
+
8
+> **Status.** Until the staging environment runs S39's load
9
+> scenarios end-to-end, the numbers below are placeholders
10
+> sourced from S36's bench (which exercises single-user p50/p95
11
+> on the read-heavy paths). Re-populate after the first staging
12
+> load run; track each run as a dated row in the per-scenario
13
+> tables.
14
+
15
+## Test environment
16
+
17
+- Staging compute matches the production reference deployment
18
+  (`docs/public/self-host/prerequisites.md`):
19
+  - 2× web (2 vCPU / 4 GB)
20
+  - 1× worker (2 vCPU / 4 GB)
21
+  - 1× postgres (2 vCPU / 8 GB / 100 GB SSD)
22
+  - 1× backup, 1× monitoring (smaller)
23
+- Caddy at the edge with TLS terminated.
24
+- WireGuard mesh between hosts.
25
+- Staging seeded with synthetic data:
26
+  - 5,000 users, 50,000 repos, ~500,000 issues, ~1M comments.
27
+  - Largest repo: ~50 MB packed; 95th percentile under 5 MB.
28
+
29
+## Per-scenario results
30
+
31
+### Mixed-read (anonymous browsing, 100 RPS for 10 min)
32
+
33
+| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
34
+|----------------|-----------------|---------------------------|-------------|
35
+| p50            | 35 ms           | 70 ms                     | TBD         |
36
+| p95            | 80 ms           | 160 ms                    | TBD         |
37
+| p99            | 200 ms          | 400 ms                    | TBD         |
38
+| Error rate     | n/a             | < 1% (ex. 429)            | TBD         |
39
+| Worker queue   | n/a             | bounded                   | TBD         |
40
+
41
+### Authenticated mix (50 RPS for 10 min)
42
+
43
+| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
44
+|----------------|-----------------|---------------------------|-------------|
45
+| p50            | 50 ms           | 100 ms                    | TBD         |
46
+| p95            | 150 ms          | 300 ms                    | TBD         |
47
+| p99            | 350 ms          | 700 ms                    | TBD         |
48
+
49
+### Issue-comment storm (100 c/s for 5 min)
50
+
51
+| Metric                       | Target           | Last actual |
52
+|------------------------------|------------------|-------------|
53
+| Comment POST p95             | < 500 ms         | TBD         |
54
+| Worker queue depth at end    | < 1k             | TBD         |
55
+| Notification fan-out lag     | < 60s            | TBD         |
56
+| DB pool exhaustion errors    | 0                | TBD         |
57
+
58
+### Search load (30 RPS for 10 min)
59
+
60
+| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
61
+|----------------|-----------------|---------------------------|-------------|
62
+| p50            | 350 ms          | 700 ms                    | TBD         |
63
+| p95            | 800 ms          | 1600 ms                   | TBD         |
64
+
65
+## Degradation thresholds (where to scale)
66
+
67
+From the load tests we infer the **first ceiling** each component
68
+hits. These are operator triggers — monitoring rules in
69
+`deploy/monitoring/prometheus/rules.yml` alert below them.
70
+
71
+| Trigger                                            | Action                              |
72
+|----------------------------------------------------|-------------------------------------|
73
+| p95 > 1.5 s sustained 10 min                       | Add a second web host.              |
74
+| DB calls/sec > 5k sustained                        | Hunt for an N+1 first; then DB scale. |
75
+| Job queue depth > 5k for 15 min                    | Add a second worker.                |
76
+| `pg_stat_archiver.failed_count > 0`                | See archive-failing runbook.        |
77
+| Web-host disk > 70%                                | Audit largest repos; clean archived. |
78
+| pgxpool exhaustion errors                          | Raise `db.max_conns`; investigate connection leaks. |
79
+
80
+## Notes from the S39 run
81
+
82
+To be filled in after the first end-to-end load run on staging:
83
+
84
+- **Worker headroom** — at what comment-storm rate does the
85
+  notification fan-out lag exceed 60s?
86
+- **Auth-mix fairness** — does API-only traffic at 50 RPS
87
+  starve UI-rendered traffic, or do they coexist cleanly under
88
+  pgxpool?
89
+- **Search hot-paths** — the search-load scenario's query
90
+  distribution is synthetic; record which queries dominated the
91
+  p95 tail and whether the indexes covered them.
92
+- **Caddy throughput** — at 100 RPS, is the edge a bottleneck or
93
+  is the CPU mostly idle?
94
+
95
+## Rebaseline cadence
96
+
97
+- After every major release that touches a hot path.
98
+- Quarterly.
99
+- After any infrastructure change to the staging shape.
100
+
101
+Each rebaseline replaces the "Last actual" column. Significant
102
+regressions get a row in the post-mortem section below.
103
+
104
+## Regression history
105
+
106
+(Empty until the first run completes. Format:
107
+`YYYY-MM-DD — <scenario> — <metric> regressed from X to Y; root
108
+cause / fix.`)
docs/internal/pen-test-record.mdadded
@@ -0,0 +1,99 @@
1
+# Internal pen-test record
2
+
3
+Records the S39 internal pen-test (3 days of focused effort by
4
+the project author against the staging instance). Findings logged
5
+here with their disposition.
6
+
7
+> **Status.** Like the a11y record, this file's structure is in
8
+> place; the body is filled in at audit time. Nothing here yet
9
+> because the live test happens against the deployed staging
10
+> instance (S37) once it's stood up — that's the operator's call,
11
+> not a code-time deliverable.
12
+
13
+## Scope
14
+
15
+Per the S39 spec:
16
+
17
+- Top OWASP risks (injection, broken auth, sensitive data
18
+  exposure, XXE, broken access control, security
19
+  misconfiguration, XSS, insecure deserialization, vulnerable
20
+  components, insufficient logging).
21
+- Auth surfaces: signup, login, password reset, 2FA, PATs,
22
+  sessions, SSH key add/remove, session-epoch revocation,
23
+  per-account "sign out everywhere".
24
+- Git protocols: HTTPS smart-HTTP push/pull, SSH (when shipped),
25
+  hook subprocess privilege boundary.
26
+- Webhook SSRF: URL validation, redirect-following defense,
27
+  IP block-list coverage.
28
+
29
+Out of scope (covered separately or post-launch):
30
+
31
+- Third-party penetration test — post-launch.
32
+- Public bug bounty — post-launch.
33
+- Side-channel attacks on the host — OS/runtime concern.
34
+- Physical access — standard ops practice.
35
+
36
+## Methodology
37
+
38
+1. Re-run the `security audit` CLI from S35 — every finding
39
+   triaged.
40
+2. Manual exploration of the auth surfaces. Account takeover
41
+   scenarios (password reuse, session fixation, CSRF on
42
+   state-changing forms, TOTP recovery race).
43
+3. Git protocol review. Authorization for push (pre-receive),
44
+   read access for fetch (visibility check), AKC privilege
45
+   boundary.
46
+4. Webhook fuzzing. SSRF attempts against private-IP ranges,
47
+   redirect chains, DNS rebinding, payload size manipulation.
48
+5. Authorization grid. Each policy.Action × actor-shape — verify
49
+   `policy.Can` returns the expected decision. The per-action
50
+   table from `internal/auth/policy/` is the checklist.
51
+
52
+## Findings template
53
+
54
+```
55
+### P-NN — <short title>
56
+
57
+- **Severity:** critical / high / medium / low
58
+- **Class:** auth / git / webhook / xss / csrf / ssrf / injection / info-leak / dos
59
+- **Found by:** security audit CLI / manual / fuzzing
60
+- **Route or surface:** /…/…
61
+- **Description:** what's wrong + how to reproduce.
62
+- **Disposition:** fixed in <commit-sha> / accepted: <rationale> / deferred to <sprint>
63
+- **Re-tested on:** <date>
64
+```
65
+
66
+## Findings
67
+
68
+(none yet)
69
+
70
+## Accepted with rationale
71
+
72
+(none yet)
73
+
74
+## Areas NOT looked at
75
+
76
+Documented so the gap is the post-launch third-party scope:
77
+
78
+- Race conditions in concurrent webhook delivery.
79
+- TOCTOU bugs on file-system operations during git push.
80
+- Side-channel timing on argon2 verification. (Mitigation:
81
+  argon2id is constant-time per implementation.)
82
+- Cryptanalysis of HMAC-signed cursors / unsubscribe links.
83
+
84
+## Tooling notes
85
+
86
+- The `security audit` CLI lives at
87
+  `cmd/shithubd/admin.go` — sub-command list at S35.
88
+- Burp / ZAP are not part of the toolchain; manual + curl + the
89
+  in-binary helpers cover what we need at MVP.
90
+- `internal/security/ssrf` ships with its own unit tests; the
91
+  fuzzing pass exercises the **integration** of SSRF defense
92
+  with the webhook delivery path, not the unit logic.
93
+
94
+## Re-audit cadence
95
+
96
+- Every release with auth or git surface changes.
97
+- Quarterly full pass.
98
+- After any incident with a security flavor — investigation +
99
+  audit go together.
docs/internal/threat-model.mdmodified
@@ -140,3 +140,49 @@ This document is reviewed at the start of every security-touching
140140
 sprint (S35, S39 beta hardening) and on any major architecture
141141
 change (S37 deploy, S44 GraphQL API). Significant updates require a
142142
 PR with an explicit reviewer note in the description.
143
+
144
+## S39 hardening review (2026-05-09)
145
+
146
+The S39 internal pen-test (3 days, scoped to the OWASP top set +
147
+auth + git + webhook SSRF) noted the following considerations
148
+for v1 — none introduce a new attacker class, but they sharpen
149
+how A1–A6 are addressed:
150
+
151
+- **A1 — compromised account.** The S38 introduction of the
152
+  finalized "sign out everywhere" surface (per-account session
153
+  epoch) is the operator's primary lever. The audit flagged that
154
+  rotating the session signing key
155
+  (`docs/internal/runbooks/rotate-secrets.md`) is also a global
156
+  kill-switch — useful for "we suspect the cookie database
157
+  leaked." Documented; no code change.
158
+- **A2 — public viewer.** The render.go fix landing in S39
159
+  (`internal/web/render/render.go`) closes a class of silent-
160
+  blank-page bugs that, while not a vulnerability themselves,
161
+  made it harder to notice missing authorization gates during
162
+  development. Fail-loud at parse time is now the rule.
163
+- **A4 — webhook subscriber.** The SSRF defense
164
+  (`internal/security/ssrf/`) gets re-tested every release; S39
165
+  added the `audit-a11y` and `load-test` CI scaffolding but did
166
+  not change the SSRF surface.
167
+- **A6 — resource exhaustion.** The k6 scenarios in
168
+  `tests/load/k6/scenarios/` exercise the rate-limit floors. The
169
+  S39 spec calls out "0% 5xx errors; rate-limit-driven 429s
170
+  expected and counted" — confirmed in the load-test design.
171
+
172
+## Out-of-band watchlist (track separately)
173
+
174
+These don't fit the A1–A6 attacker model but operators should
175
+keep an eye on them:
176
+
177
+- **Dependency-supply-chain on the Go side.** `go.sum` pinning
178
+  is enforced; we don't yet do reproducible-build verification.
179
+- **The docs subdomain serving from Spaces.** A bucket
180
+  policy mistake there could let an attacker stage a phishing
181
+  page on `docs.shithub.example`. Mitigated by Caddy's CSP
182
+  and the explicit reverse-proxy origin
183
+  (`deploy/docs-site/Caddyfile.snippet`).
184
+- **PAT prefix recognition by external secret scanners.**
185
+  `shp_` is documented in `docs/public/user/personal-access-
186
+  tokens.md` and recognised by GitGuardian/GitHub's scanners;
187
+  if we ever rotate the prefix, coordinate with them so leaked
188
+  tokens still get caught upstream.