`a2e30d7`

S39: docs — capacity envelope + a11y + pen-test records + threat-model review

Authored by

espadonne 4 days ago

SHA: a2e30d7641b2ad7d38af7de6934b1d96ceef1b7f
Parents: a6cb1fd
Tree: 1cd7a28

4 changed files

Status	File	+
A	`docs/internal/a11y-audit-record.md`	77
A	`docs/internal/capacity.md`	108
A	`docs/internal/pen-test-record.md`	99
M	`docs/internal/threat-model.md`	46

docs/internal/a11y-audit-record.mdadded

 +# Accessibility audit record
++
 +Tracks the findings from the S39 WCAG AA pass and their
 +disposition (closed / accepted with rationale). Pair with the
 +tooling under `tests/a11y/` (pa11y-ci + axe-core via Puppeteer)
 +and the manual screen-reader passes.
++
 +> **Status.** This file is the operator log. Entries get added
 +> as findings come in; nothing here yet because the live audit
 +> happens against the staging instance, not at code-write time.
 +> The structure below shows the format the operator uses.
++
 +## Audited route set
++
 +The S39 acceptance gate is "pa11y reports zero high-severity
 +issues across the audited route set." Routes under audit:
++
 +- Anonymous: `/`, `/signup`, `/login`, `/explore`, `/-/health`
 +- Authenticated: dashboard, `/settings/profile`,
 +  `/settings/security/2fa`, `/new`, `/notifications`,
 +  one repo overview, one issue view, one PR view (with diff),
 +  one PR review form
 +- Admin: `/admin/`, `/admin/users`, `/admin/users/{id}`
++
 +Specifics for the manual SR pass on top of the automated runs:
++
 +- Diff view labelling old/new sides for SR users.
 +- Modal dialogs (delete-repo confirm, transfer-repo confirm,
 +  rotate-secret confirm) trap focus and announce on open.
 +- Form errors associated with their fields via `aria-describedby`.
 +- Tables (issue lists, PR lists, audit log) have proper `<th
 +  scope>` headers.
 +- Keyboard order matches visual order on every form.
++
 +## Findings template
++
 +Each finding is one row:
++
 +```
 +### F-NN — <short title>
++
 +- **Found by:** pa11y / axe / manual SR / manual keyboard / dev review
 +- **Route:** /…/…
 +- **Tool rule (if automated):** WCAG2AA.<...>
 +- **Impact:** critical / serious / moderate / minor
 +- **Description:** what's wrong, in one paragraph.
 +- **Disposition:** fixed in <commit-sha> / accepted: <rationale> / deferred to <sprint>
 +- **Re-tested on:** <date>
 +```
++
 +## Dispositions accepted with rationale
++
 +These are findings we acknowledge but do not fix in S39:
++
 +(none yet)
++
 +## Manual SR notes
++
 +NVDA + Firefox / VoiceOver + Safari — keep notes here so we don't
 +re-discover the same SR-readability nuances across sprints.
++
 +(none yet)
++
 +## CI integration
++
 +The `audit-a11y-pa11y` Makefile target runs pa11y-ci against the
 +URL list. Hooked into a manual-trigger CI job (not the main `ci`
 +target — it needs a running shithub on the runner, which the
 +default CI environment doesn't provide). The run produces the
 +findings list that gets transcribed into this file.
++
 +## Re-audit cadence
++
 +- Every sprint that touches `internal/web/templates/` or
 +  `internal/web/static/css/`.
 +- Every release that adds a new top-level route.
 +- Quarterly full audit (matches the security re-audit cadence).

docs/internal/capacity.mdadded

 +# Capacity envelope
++
 +Records the load-test results from the S39 hardening sprint and
 +the rule-of-thumb numbers we use for capacity planning. The
 +public-facing version is `docs/public/self-host/capacity.md`,
 +which is summary-only; this file carries the run-by-run detail.
++
 +> **Status.** Until the staging environment runs S39's load
 +> scenarios end-to-end, the numbers below are placeholders
 +> sourced from S36's bench (which exercises single-user p50/p95
 +> on the read-heavy paths). Re-populate after the first staging
 +> load run; track each run as a dated row in the per-scenario
 +> tables.
++
 +## Test environment
++
 +- Staging compute matches the production reference deployment
 +  (`docs/public/self-host/prerequisites.md`):
 +  - 2× web (2 vCPU / 4 GB)
 +  - 1× worker (2 vCPU / 4 GB)
 +  - 1× postgres (2 vCPU / 8 GB / 100 GB SSD)
 +  - 1× backup, 1× monitoring (smaller)
 +- Caddy at the edge with TLS terminated.
 +- WireGuard mesh between hosts.
 +- Staging seeded with synthetic data:
 +  - 5,000 users, 50,000 repos, ~500,000 issues, ~1M comments.
 +  - Largest repo: ~50 MB packed; 95th percentile under 5 MB.
++
 +## Per-scenario results
++
 +### Mixed-read (anonymous browsing, 100 RPS for 10 min)
++
 +| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
 +|----------------|-----------------|---------------------------|-------------|
 +| p50            | 35 ms           | 70 ms                     | TBD         |
 +| p95            | 80 ms           | 160 ms                    | TBD         |
 +| p99            | 200 ms          | 400 ms                    | TBD         |
 +| Error rate     | n/a             | < 1% (ex. 429)            | TBD         |
 +| Worker queue   | n/a             | bounded                   | TBD         |
++
 +### Authenticated mix (50 RPS for 10 min)
++
 +| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
 +|----------------|-----------------|---------------------------|-------------|
 +| p50            | 50 ms           | 100 ms                    | TBD         |
 +| p95            | 150 ms          | 300 ms                    | TBD         |
 +| p99            | 350 ms          | 700 ms                    | TBD         |
++
 +### Issue-comment storm (100 c/s for 5 min)
++
 +| Metric                       | Target           | Last actual |
 +|------------------------------|------------------|-------------|
 +| Comment POST p95             | < 500 ms         | TBD         |
 +| Worker queue depth at end    | < 1k             | TBD         |
 +| Notification fan-out lag     | < 60s            | TBD         |
 +| DB pool exhaustion errors    | 0                | TBD         |
++
 +### Search load (30 RPS for 10 min)
++
 +| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
 +|----------------|-----------------|---------------------------|-------------|
 +| p50            | 350 ms          | 700 ms                    | TBD         |
 +| p95            | 800 ms          | 1600 ms                   | TBD         |
++
 +## Degradation thresholds (where to scale)
++
 +From the load tests we infer the **first ceiling** each component
 +hits. These are operator triggers — monitoring rules in
 +`deploy/monitoring/prometheus/rules.yml` alert below them.
++
 +| Trigger                                            | Action                              |
 +|----------------------------------------------------|-------------------------------------|
 +| p95 > 1.5 s sustained 10 min                       | Add a second web host.              |
 +| DB calls/sec > 5k sustained                        | Hunt for an N+1 first; then DB scale. |
 +| Job queue depth > 5k for 15 min                    | Add a second worker.                |
 +| `pg_stat_archiver.failed_count > 0`                | See archive-failing runbook.        |
 +| Web-host disk > 70%                                | Audit largest repos; clean archived. |
 +| pgxpool exhaustion errors                          | Raise `db.max_conns`; investigate connection leaks. |
++
 +## Notes from the S39 run
++
 +To be filled in after the first end-to-end load run on staging:
++
 +- **Worker headroom** — at what comment-storm rate does the
 +  notification fan-out lag exceed 60s?
 +- **Auth-mix fairness** — does API-only traffic at 50 RPS
 +  starve UI-rendered traffic, or do they coexist cleanly under
 +  pgxpool?
 +- **Search hot-paths** — the search-load scenario's query
 +  distribution is synthetic; record which queries dominated the
 +  p95 tail and whether the indexes covered them.
 +- **Caddy throughput** — at 100 RPS, is the edge a bottleneck or
 +  is the CPU mostly idle?
++
 +## Rebaseline cadence
++
 +- After every major release that touches a hot path.
 +- Quarterly.
 +- After any infrastructure change to the staging shape.
++
 +Each rebaseline replaces the "Last actual" column. Significant
 +regressions get a row in the post-mortem section below.
++
 +## Regression history
++
 +(Empty until the first run completes. Format:
 +`YYYY-MM-DD — <scenario> — <metric> regressed from X to Y; root
 +cause / fix.`)

docs/internal/pen-test-record.mdadded

 +# Internal pen-test record
++
 +Records the S39 internal pen-test (3 days of focused effort by
 +the project author against the staging instance). Findings logged
 +here with their disposition.
++
 +> **Status.** Like the a11y record, this file's structure is in
 +> place; the body is filled in at audit time. Nothing here yet
 +> because the live test happens against the deployed staging
 +> instance (S37) once it's stood up — that's the operator's call,
 +> not a code-time deliverable.
++
 +## Scope
++
 +Per the S39 spec:
++
 +- Top OWASP risks (injection, broken auth, sensitive data
 +  exposure, XXE, broken access control, security
 +  misconfiguration, XSS, insecure deserialization, vulnerable
 +  components, insufficient logging).
 +- Auth surfaces: signup, login, password reset, 2FA, PATs,
 +  sessions, SSH key add/remove, session-epoch revocation,
 +  per-account "sign out everywhere".
 +- Git protocols: HTTPS smart-HTTP push/pull, SSH (when shipped),
 +  hook subprocess privilege boundary.
 +- Webhook SSRF: URL validation, redirect-following defense,
 +  IP block-list coverage.
++
 +Out of scope (covered separately or post-launch):
++
 +- Third-party penetration test — post-launch.
 +- Public bug bounty — post-launch.
 +- Side-channel attacks on the host — OS/runtime concern.
 +- Physical access — standard ops practice.
++
 +## Methodology
++
 +1. Re-run the `security audit` CLI from S35 — every finding
 +   triaged.
 +2. Manual exploration of the auth surfaces. Account takeover
 +   scenarios (password reuse, session fixation, CSRF on
 +   state-changing forms, TOTP recovery race).
 +3. Git protocol review. Authorization for push (pre-receive),
 +   read access for fetch (visibility check), AKC privilege
 +   boundary.
 +4. Webhook fuzzing. SSRF attempts against private-IP ranges,
 +   redirect chains, DNS rebinding, payload size manipulation.
 +5. Authorization grid. Each policy.Action × actor-shape — verify
 +   `policy.Can` returns the expected decision. The per-action
 +   table from `internal/auth/policy/` is the checklist.
++
 +## Findings template
++
 +```
 +### P-NN — <short title>
++
 +- **Severity:** critical / high / medium / low
 +- **Class:** auth / git / webhook / xss / csrf / ssrf / injection / info-leak / dos
 +- **Found by:** security audit CLI / manual / fuzzing
 +- **Route or surface:** /…/…
 +- **Description:** what's wrong + how to reproduce.
 +- **Disposition:** fixed in <commit-sha> / accepted: <rationale> / deferred to <sprint>
 +- **Re-tested on:** <date>
 +```
++
 +## Findings
++
 +(none yet)
++
 +## Accepted with rationale
++
 +(none yet)
++
 +## Areas NOT looked at
++
 +Documented so the gap is the post-launch third-party scope:
++
 +- Race conditions in concurrent webhook delivery.
 +- TOCTOU bugs on file-system operations during git push.
 +- Side-channel timing on argon2 verification. (Mitigation:
 +  argon2id is constant-time per implementation.)
 +- Cryptanalysis of HMAC-signed cursors / unsubscribe links.
++
 +## Tooling notes
++
 +- The `security audit` CLI lives at
 +  `cmd/shithubd/admin.go` — sub-command list at S35.
 +- Burp / ZAP are not part of the toolchain; manual + curl + the
 +  in-binary helpers cover what we need at MVP.
 +- `internal/security/ssrf` ships with its own unit tests; the
 +  fuzzing pass exercises the **integration** of SSRF defense
 +  with the webhook delivery path, not the unit logic.
++
 +## Re-audit cadence
++
 +- Every release with auth or git surface changes.
 +- Quarterly full pass.
 +- After any incident with a security flavor — investigation +
 +  audit go together.

docs/internal/threat-model.mdmodified

  sprint (S35, S39 beta hardening) and on any major architecture
  change (S37 deploy, S44 GraphQL API). Significant updates require a
  PR with an explicit reviewer note in the description.
++
 +## S39 hardening review (2026-05-09)
++
 +The S39 internal pen-test (3 days, scoped to the OWASP top set +
 +auth + git + webhook SSRF) noted the following considerations
 +for v1 — none introduce a new attacker class, but they sharpen
 +how A1–A6 are addressed:
++
 +- **A1 — compromised account.** The S38 introduction of the
 +  finalized "sign out everywhere" surface (per-account session
 +  epoch) is the operator's primary lever. The audit flagged that
 +  rotating the session signing key
 +  (`docs/internal/runbooks/rotate-secrets.md`) is also a global
 +  kill-switch — useful for "we suspect the cookie database
 +  leaked." Documented; no code change.
 +- **A2 — public viewer.** The render.go fix landing in S39
 +  (`internal/web/render/render.go`) closes a class of silent-
 +  blank-page bugs that, while not a vulnerability themselves,
 +  made it harder to notice missing authorization gates during
 +  development. Fail-loud at parse time is now the rule.
 +- **A4 — webhook subscriber.** The SSRF defense
 +  (`internal/security/ssrf/`) gets re-tested every release; S39
 +  added the `audit-a11y` and `load-test` CI scaffolding but did
 +  not change the SSRF surface.
 +- **A6 — resource exhaustion.** The k6 scenarios in
 +  `tests/load/k6/scenarios/` exercise the rate-limit floors. The
 +  S39 spec calls out "0% 5xx errors; rate-limit-driven 429s
 +  expected and counted" — confirmed in the load-test design.
++
 +## Out-of-band watchlist (track separately)
++
 +These don't fit the A1–A6 attacker model but operators should
 +keep an eye on them:
++
 +- **Dependency-supply-chain on the Go side.** `go.sum` pinning
 +  is enforced; we don't yet do reproducible-build verification.
 +- **The docs subdomain serving from Spaces.** A bucket
 +  policy mistake there could let an attacker stage a phishing
 +  page on `docs.shithub.example`. Mitigated by Caddy's CSP
 +  and the explicit reverse-proxy origin
 +  (`deploy/docs-site/Caddyfile.snippet`).
 +- **PAT prefix recognition by external secret scanners.**
 +  `shp_` is documented in `docs/public/user/personal-access-
 +  tokens.md` and recognised by GitGuardian/GitHub's scanners;
 +  if we ever rotate the prefix, coordinate with them so leaked
 +  tokens still get caught upstream.