tenseleyflow/shithub / a2e30d7

Browse files

S39: docs — capacity envelope + a11y + pen-test records + threat-model review

Authored by espadonne
SHA
a2e30d7641b2ad7d38af7de6934b1d96ceef1b7f
Parents
a6cb1fd
Tree
1cd7a28

4 changed files

StatusFile+-
A docs/internal/a11y-audit-record.md 77 0
A docs/internal/capacity.md 108 0
A docs/internal/pen-test-record.md 99 0
M docs/internal/threat-model.md 46 0
docs/internal/a11y-audit-record.mdadded
@@ -0,0 +1,77 @@
1
+# Accessibility audit record
2
+
3
+Tracks the findings from the S39 WCAG AA pass and their
4
+disposition (closed / accepted with rationale). Pair with the
5
+tooling under `tests/a11y/` (pa11y-ci + axe-core via Puppeteer)
6
+and the manual screen-reader passes.
7
+
8
+> **Status.** This file is the operator log. Entries get added
9
+> as findings come in; nothing here yet because the live audit
10
+> happens against the staging instance, not at code-write time.
11
+> The structure below shows the format the operator uses.
12
+
13
+## Audited route set
14
+
15
+The S39 acceptance gate is "pa11y reports zero high-severity
16
+issues across the audited route set." Routes under audit:
17
+
18
+- Anonymous: `/`, `/signup`, `/login`, `/explore`, `/-/health`
19
+- Authenticated: dashboard, `/settings/profile`,
20
+  `/settings/security/2fa`, `/new`, `/notifications`,
21
+  one repo overview, one issue view, one PR view (with diff),
22
+  one PR review form
23
+- Admin: `/admin/`, `/admin/users`, `/admin/users/{id}`
24
+
25
+Specifics for the manual SR pass on top of the automated runs:
26
+
27
+- Diff view labelling old/new sides for SR users.
28
+- Modal dialogs (delete-repo confirm, transfer-repo confirm,
29
+  rotate-secret confirm) trap focus and announce on open.
30
+- Form errors associated with their fields via `aria-describedby`.
31
+- Tables (issue lists, PR lists, audit log) have proper `<th
32
+  scope>` headers.
33
+- Keyboard order matches visual order on every form.
34
+
35
+## Findings template
36
+
37
+Each finding is one row:
38
+
39
+```
40
+### F-NN — <short title>
41
+
42
+- **Found by:** pa11y / axe / manual SR / manual keyboard / dev review
43
+- **Route:** /…/…
44
+- **Tool rule (if automated):** WCAG2AA.<...>
45
+- **Impact:** critical / serious / moderate / minor
46
+- **Description:** what's wrong, in one paragraph.
47
+- **Disposition:** fixed in <commit-sha> / accepted: <rationale> / deferred to <sprint>
48
+- **Re-tested on:** <date>
49
+```
50
+
51
+## Dispositions accepted with rationale
52
+
53
+These are findings we acknowledge but do not fix in S39:
54
+
55
+(none yet)
56
+
57
+## Manual SR notes
58
+
59
+NVDA + Firefox / VoiceOver + Safari — keep notes here so we don't
60
+re-discover the same SR-readability nuances across sprints.
61
+
62
+(none yet)
63
+
64
+## CI integration
65
+
66
+The `audit-a11y-pa11y` Makefile target runs pa11y-ci against the
67
+URL list. Hooked into a manual-trigger CI job (not the main `ci`
68
+target — it needs a running shithub on the runner, which the
69
+default CI environment doesn't provide). The run produces the
70
+findings list that gets transcribed into this file.
71
+
72
+## Re-audit cadence
73
+
74
+- Every sprint that touches `internal/web/templates/` or
75
+  `internal/web/static/css/`.
76
+- Every release that adds a new top-level route.
77
+- Quarterly full audit (matches the security re-audit cadence).
docs/internal/capacity.mdadded
@@ -0,0 +1,108 @@
1
+# Capacity envelope
2
+
3
+Records the load-test results from the S39 hardening sprint and
4
+the rule-of-thumb numbers we use for capacity planning. The
5
+public-facing version is `docs/public/self-host/capacity.md`,
6
+which is summary-only; this file carries the run-by-run detail.
7
+
8
+> **Status.** Until the staging environment runs S39's load
9
+> scenarios end-to-end, the numbers below are placeholders
10
+> sourced from S36's bench (which exercises single-user p50/p95
11
+> on the read-heavy paths). Re-populate after the first staging
12
+> load run; track each run as a dated row in the per-scenario
13
+> tables.
14
+
15
+## Test environment
16
+
17
+- Staging compute matches the production reference deployment
18
+  (`docs/public/self-host/prerequisites.md`):
19
+  - 2× web (2 vCPU / 4 GB)
20
+  - 1× worker (2 vCPU / 4 GB)
21
+  - 1× postgres (2 vCPU / 8 GB / 100 GB SSD)
22
+  - 1× backup, 1× monitoring (smaller)
23
+- Caddy at the edge with TLS terminated.
24
+- WireGuard mesh between hosts.
25
+- Staging seeded with synthetic data:
26
+  - 5,000 users, 50,000 repos, ~500,000 issues, ~1M comments.
27
+  - Largest repo: ~50 MB packed; 95th percentile under 5 MB.
28
+
29
+## Per-scenario results
30
+
31
+### Mixed-read (anonymous browsing, 100 RPS for 10 min)
32
+
33
+| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
34
+|----------------|-----------------|---------------------------|-------------|
35
+| p50            | 35 ms           | 70 ms                     | TBD         |
36
+| p95            | 80 ms           | 160 ms                    | TBD         |
37
+| p99            | 200 ms          | 400 ms                    | TBD         |
38
+| Error rate     | n/a             | < 1% (ex. 429)            | TBD         |
39
+| Worker queue   | n/a             | bounded                   | TBD         |
40
+
41
+### Authenticated mix (50 RPS for 10 min)
42
+
43
+| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
44
+|----------------|-----------------|---------------------------|-------------|
45
+| p50            | 50 ms           | 100 ms                    | TBD         |
46
+| p95            | 150 ms          | 300 ms                    | TBD         |
47
+| p99            | 350 ms          | 700 ms                    | TBD         |
48
+
49
+### Issue-comment storm (100 c/s for 5 min)
50
+
51
+| Metric                       | Target           | Last actual |
52
+|------------------------------|------------------|-------------|
53
+| Comment POST p95             | < 500 ms         | TBD         |
54
+| Worker queue depth at end    | < 1k             | TBD         |
55
+| Notification fan-out lag     | < 60s            | TBD         |
56
+| DB pool exhaustion errors    | 0                | TBD         |
57
+
58
+### Search load (30 RPS for 10 min)
59
+
60
+| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
61
+|----------------|-----------------|---------------------------|-------------|
62
+| p50            | 350 ms          | 700 ms                    | TBD         |
63
+| p95            | 800 ms          | 1600 ms                   | TBD         |
64
+
65
+## Degradation thresholds (where to scale)
66
+
67
+From the load tests we infer the **first ceiling** each component
68
+hits. These are operator triggers — monitoring rules in
69
+`deploy/monitoring/prometheus/rules.yml` alert below them.
70
+
71
+| Trigger                                            | Action                              |
72
+|----------------------------------------------------|-------------------------------------|
73
+| p95 > 1.5 s sustained 10 min                       | Add a second web host.              |
74
+| DB calls/sec > 5k sustained                        | Hunt for an N+1 first; then DB scale. |
75
+| Job queue depth > 5k for 15 min                    | Add a second worker.                |
76
+| `pg_stat_archiver.failed_count > 0`                | See archive-failing runbook.        |
77
+| Web-host disk > 70%                                | Audit largest repos; clean archived. |
78
+| pgxpool exhaustion errors                          | Raise `db.max_conns`; investigate connection leaks. |
79
+
80
+## Notes from the S39 run
81
+
82
+To be filled in after the first end-to-end load run on staging:
83
+
84
+- **Worker headroom** — at what comment-storm rate does the
85
+  notification fan-out lag exceed 60s?
86
+- **Auth-mix fairness** — does API-only traffic at 50 RPS
87
+  starve UI-rendered traffic, or do they coexist cleanly under
88
+  pgxpool?
89
+- **Search hot-paths** — the search-load scenario's query
90
+  distribution is synthetic; record which queries dominated the
91
+  p95 tail and whether the indexes covered them.
92
+- **Caddy throughput** — at 100 RPS, is the edge a bottleneck or
93
+  is the CPU mostly idle?
94
+
95
+## Rebaseline cadence
96
+
97
+- After every major release that touches a hot path.
98
+- Quarterly.
99
+- After any infrastructure change to the staging shape.
100
+
101
+Each rebaseline replaces the "Last actual" column. Significant
102
+regressions get a row in the post-mortem section below.
103
+
104
+## Regression history
105
+
106
+(Empty until the first run completes. Format:
107
+`YYYY-MM-DD — <scenario> — <metric> regressed from X to Y; root
108
+cause / fix.`)
docs/internal/pen-test-record.mdadded
@@ -0,0 +1,99 @@
1
+# Internal pen-test record
2
+
3
+Records the S39 internal pen-test (3 days of focused effort by
4
+the project author against the staging instance). Findings logged
5
+here with their disposition.
6
+
7
+> **Status.** Like the a11y record, this file's structure is in
8
+> place; the body is filled in at audit time. Nothing here yet
9
+> because the live test happens against the deployed staging
10
+> instance (S37) once it's stood up — that's the operator's call,
11
+> not a code-time deliverable.
12
+
13
+## Scope
14
+
15
+Per the S39 spec:
16
+
17
+- Top OWASP risks (injection, broken auth, sensitive data
18
+  exposure, XXE, broken access control, security
19
+  misconfiguration, XSS, insecure deserialization, vulnerable
20
+  components, insufficient logging).
21
+- Auth surfaces: signup, login, password reset, 2FA, PATs,
22
+  sessions, SSH key add/remove, session-epoch revocation,
23
+  per-account "sign out everywhere".
24
+- Git protocols: HTTPS smart-HTTP push/pull, SSH (when shipped),
25
+  hook subprocess privilege boundary.
26
+- Webhook SSRF: URL validation, redirect-following defense,
27
+  IP block-list coverage.
28
+
29
+Out of scope (covered separately or post-launch):
30
+
31
+- Third-party penetration test — post-launch.
32
+- Public bug bounty — post-launch.
33
+- Side-channel attacks on the host — OS/runtime concern.
34
+- Physical access — standard ops practice.
35
+
36
+## Methodology
37
+
38
+1. Re-run the `security audit` CLI from S35 — every finding
39
+   triaged.
40
+2. Manual exploration of the auth surfaces. Account takeover
41
+   scenarios (password reuse, session fixation, CSRF on
42
+   state-changing forms, TOTP recovery race).
43
+3. Git protocol review. Authorization for push (pre-receive),
44
+   read access for fetch (visibility check), AKC privilege
45
+   boundary.
46
+4. Webhook fuzzing. SSRF attempts against private-IP ranges,
47
+   redirect chains, DNS rebinding, payload size manipulation.
48
+5. Authorization grid. Each policy.Action × actor-shape — verify
49
+   `policy.Can` returns the expected decision. The per-action
50
+   table from `internal/auth/policy/` is the checklist.
51
+
52
+## Findings template
53
+
54
+```
55
+### P-NN — <short title>
56
+
57
+- **Severity:** critical / high / medium / low
58
+- **Class:** auth / git / webhook / xss / csrf / ssrf / injection / info-leak / dos
59
+- **Found by:** security audit CLI / manual / fuzzing
60
+- **Route or surface:** /…/…
61
+- **Description:** what's wrong + how to reproduce.
62
+- **Disposition:** fixed in <commit-sha> / accepted: <rationale> / deferred to <sprint>
63
+- **Re-tested on:** <date>
64
+```
65
+
66
+## Findings
67
+
68
+(none yet)
69
+
70
+## Accepted with rationale
71
+
72
+(none yet)
73
+
74
+## Areas NOT looked at
75
+
76
+Documented so the gap is the post-launch third-party scope:
77
+
78
+- Race conditions in concurrent webhook delivery.
79
+- TOCTOU bugs on file-system operations during git push.
80
+- Side-channel timing on argon2 verification. (Mitigation:
81
+  argon2id is constant-time per implementation.)
82
+- Cryptanalysis of HMAC-signed cursors / unsubscribe links.
83
+
84
+## Tooling notes
85
+
86
+- The `security audit` CLI lives at
87
+  `cmd/shithubd/admin.go` — sub-command list at S35.
88
+- Burp / ZAP are not part of the toolchain; manual + curl + the
89
+  in-binary helpers cover what we need at MVP.
90
+- `internal/security/ssrf` ships with its own unit tests; the
91
+  fuzzing pass exercises the **integration** of SSRF defense
92
+  with the webhook delivery path, not the unit logic.
93
+
94
+## Re-audit cadence
95
+
96
+- Every release with auth or git surface changes.
97
+- Quarterly full pass.
98
+- After any incident with a security flavor — investigation +
99
+  audit go together.
docs/internal/threat-model.mdmodified
@@ -140,3 +140,49 @@ This document is reviewed at the start of every security-touching
140
 sprint (S35, S39 beta hardening) and on any major architecture
140
 sprint (S35, S39 beta hardening) and on any major architecture
141
 change (S37 deploy, S44 GraphQL API). Significant updates require a
141
 change (S37 deploy, S44 GraphQL API). Significant updates require a
142
 PR with an explicit reviewer note in the description.
142
 PR with an explicit reviewer note in the description.
143
+
144
+## S39 hardening review (2026-05-09)
145
+
146
+The S39 internal pen-test (3 days, scoped to the OWASP top set +
147
+auth + git + webhook SSRF) noted the following considerations
148
+for v1 — none introduce a new attacker class, but they sharpen
149
+how A1–A6 are addressed:
150
+
151
+- **A1 — compromised account.** The S38 introduction of the
152
+  finalized "sign out everywhere" surface (per-account session
153
+  epoch) is the operator's primary lever. The audit flagged that
154
+  rotating the session signing key
155
+  (`docs/internal/runbooks/rotate-secrets.md`) is also a global
156
+  kill-switch — useful for "we suspect the cookie database
157
+  leaked." Documented; no code change.
158
+- **A2 — public viewer.** The render.go fix landing in S39
159
+  (`internal/web/render/render.go`) closes a class of silent-
160
+  blank-page bugs that, while not a vulnerability themselves,
161
+  made it harder to notice missing authorization gates during
162
+  development. Fail-loud at parse time is now the rule.
163
+- **A4 — webhook subscriber.** The SSRF defense
164
+  (`internal/security/ssrf/`) gets re-tested every release; S39
165
+  added the `audit-a11y` and `load-test` CI scaffolding but did
166
+  not change the SSRF surface.
167
+- **A6 — resource exhaustion.** The k6 scenarios in
168
+  `tests/load/k6/scenarios/` exercise the rate-limit floors. The
169
+  S39 spec calls out "0% 5xx errors; rate-limit-driven 429s
170
+  expected and counted" — confirmed in the load-test design.
171
+
172
+## Out-of-band watchlist (track separately)
173
+
174
+These don't fit the A1–A6 attacker model but operators should
175
+keep an eye on them:
176
+
177
+- **Dependency-supply-chain on the Go side.** `go.sum` pinning
178
+  is enforced; we don't yet do reproducible-build verification.
179
+- **The docs subdomain serving from Spaces.** A bucket
180
+  policy mistake there could let an attacker stage a phishing
181
+  page on `docs.shithub.example`. Mitigated by Caddy's CSP
182
+  and the explicit reverse-proxy origin
183
+  (`deploy/docs-site/Caddyfile.snippet`).
184
+- **PAT prefix recognition by external secret scanners.**
185
+  `shp_` is documented in `docs/public/user/personal-access-
186
+  tokens.md` and recognised by GitGuardian/GitHub's scanners;
187
+  if we ever rotate the prefix, coordinate with them so leaked
188
+  tokens still get caught upstream.