`a2e30d7`

S39: docs — capacity envelope + a11y + pen-test records + threat-model review

Authored by

espadonne 4 days ago

SHA: a2e30d7641b2ad7d38af7de6934b1d96ceef1b7f
Parents: a6cb1fd
Tree: 1cd7a28

4 changed files

Status	File	+
A	`docs/internal/a11y-audit-record.md`	77
A	`docs/internal/capacity.md`	108
A	`docs/internal/pen-test-record.md`	99
M	`docs/internal/threat-model.md`	46

docs/internal/a11y-audit-record.mdadded

++# Accessibility audit record
++
++Tracks the findings from the S39 WCAG AA pass and their
++disposition (closed / accepted with rationale). Pair with the
++tooling under `tests/a11y/` (pa11y-ci + axe-core via Puppeteer)
++and the manual screen-reader passes.
++
++> **Status.** This file is the operator log. Entries get added
++> as findings come in; nothing here yet because the live audit
++> happens against the staging instance, not at code-write time.
++> The structure below shows the format the operator uses.
++
++## Audited route set
++
++The S39 acceptance gate is "pa11y reports zero high-severity
++issues across the audited route set." Routes under audit:
++
++- Anonymous: `/`, `/signup`, `/login`, `/explore`, `/-/health`
++- Authenticated: dashboard, `/settings/profile`,
++  `/settings/security/2fa`, `/new`, `/notifications`,
++  one repo overview, one issue view, one PR view (with diff),
++  one PR review form
++- Admin: `/admin/`, `/admin/users`, `/admin/users/{id}`
++
++Specifics for the manual SR pass on top of the automated runs:
++
++- Diff view labelling old/new sides for SR users.
++- Modal dialogs (delete-repo confirm, transfer-repo confirm,
++  rotate-secret confirm) trap focus and announce on open.
++- Form errors associated with their fields via `aria-describedby`.
++- Tables (issue lists, PR lists, audit log) have proper `<th
++  scope>` headers.
++- Keyboard order matches visual order on every form.
++
++## Findings template
++
++Each finding is one row:
++
++```
++### F-NN — <short title>
++
++- **Found by:** pa11y / axe / manual SR / manual keyboard / dev review
++- **Route:** /…/…
++- **Tool rule (if automated):** WCAG2AA.<...>
++- **Impact:** critical / serious / moderate / minor
++- **Description:** what's wrong, in one paragraph.
++- **Disposition:** fixed in <commit-sha> / accepted: <rationale> / deferred to <sprint>
++- **Re-tested on:** <date>
++```
++
++## Dispositions accepted with rationale
++
++These are findings we acknowledge but do not fix in S39:
++
++(none yet)
++
++## Manual SR notes
++
++NVDA + Firefox / VoiceOver + Safari — keep notes here so we don't
++re-discover the same SR-readability nuances across sprints.
++
++(none yet)
++
++## CI integration
++
++The `audit-a11y-pa11y` Makefile target runs pa11y-ci against the
++URL list. Hooked into a manual-trigger CI job (not the main `ci`
++target — it needs a running shithub on the runner, which the
++default CI environment doesn't provide). The run produces the
++findings list that gets transcribed into this file.
++
++## Re-audit cadence
++
++- Every sprint that touches `internal/web/templates/` or
++  `internal/web/static/css/`.
++- Every release that adds a new top-level route.
++- Quarterly full audit (matches the security re-audit cadence).

docs/internal/capacity.mdadded

++# Capacity envelope
++
++Records the load-test results from the S39 hardening sprint and
++the rule-of-thumb numbers we use for capacity planning. The
++public-facing version is `docs/public/self-host/capacity.md`,
++which is summary-only; this file carries the run-by-run detail.
++
++> **Status.** Until the staging environment runs S39's load
++> scenarios end-to-end, the numbers below are placeholders
++> sourced from S36's bench (which exercises single-user p50/p95
++> on the read-heavy paths). Re-populate after the first staging
++> load run; track each run as a dated row in the per-scenario
++> tables.
++
++## Test environment
++
++- Staging compute matches the production reference deployment
++  (`docs/public/self-host/prerequisites.md`):
++  - 2× web (2 vCPU / 4 GB)
++  - 1× worker (2 vCPU / 4 GB)
++  - 1× postgres (2 vCPU / 8 GB / 100 GB SSD)
++  - 1× backup, 1× monitoring (smaller)
++- Caddy at the edge with TLS terminated.
++- WireGuard mesh between hosts.
++- Staging seeded with synthetic data:
++  - 5,000 users, 50,000 repos, ~500,000 issues, ~1M comments.
++  - Largest repo: ~50 MB packed; 95th percentile under 5 MB.
++
++## Per-scenario results
++
++### Mixed-read (anonymous browsing, 100 RPS for 10 min)
++
++| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
++|----------------|-----------------|---------------------------|-------------|
++| p50            | 35 ms           | 70 ms                     | TBD         |
++| p95            | 80 ms           | 160 ms                    | TBD         |
++| p99            | 200 ms          | 400 ms                    | TBD         |
++| Error rate     | n/a             | < 1% (ex. 429)            | TBD         |
++| Worker queue   | n/a             | bounded                   | TBD         |
++
++### Authenticated mix (50 RPS for 10 min)
++
++| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
++|----------------|-----------------|---------------------------|-------------|
++| p50            | 50 ms           | 100 ms                    | TBD         |
++| p95            | 150 ms          | 300 ms                    | TBD         |
++| p99            | 350 ms          | 700 ms                    | TBD         |
++
++### Issue-comment storm (100 c/s for 5 min)
++
++| Metric                       | Target           | Last actual |
++|------------------------------|------------------|-------------|
++| Comment POST p95             | < 500 ms         | TBD         |
++| Worker queue depth at end    | < 1k             | TBD         |
++| Notification fan-out lag     | < 60s            | TBD         |
++| DB pool exhaustion errors    | 0                | TBD         |
++
++### Search load (30 RPS for 10 min)
++
++| Metric         | S36 single-user | S39 baseline (target ≤2x) | Last actual |
++|----------------|-----------------|---------------------------|-------------|
++| p50            | 350 ms          | 700 ms                    | TBD         |
++| p95            | 800 ms          | 1600 ms                   | TBD         |
++
++## Degradation thresholds (where to scale)
++
++From the load tests we infer the **first ceiling** each component
++hits. These are operator triggers — monitoring rules in
++`deploy/monitoring/prometheus/rules.yml` alert below them.
++
++| Trigger                                            | Action                              |
++|----------------------------------------------------|-------------------------------------|
++| p95 > 1.5 s sustained 10 min                       | Add a second web host.              |
++| DB calls/sec > 5k sustained                        | Hunt for an N+1 first; then DB scale. |
++| Job queue depth > 5k for 15 min                    | Add a second worker.                |
++| `pg_stat_archiver.failed_count > 0`                | See archive-failing runbook.        |
++| Web-host disk > 70%                                | Audit largest repos; clean archived. |
++| pgxpool exhaustion errors                          | Raise `db.max_conns`; investigate connection leaks. |
++
++## Notes from the S39 run
++
++To be filled in after the first end-to-end load run on staging:
++
++- **Worker headroom** — at what comment-storm rate does the
++  notification fan-out lag exceed 60s?
++- **Auth-mix fairness** — does API-only traffic at 50 RPS
++  starve UI-rendered traffic, or do they coexist cleanly under
++  pgxpool?
++- **Search hot-paths** — the search-load scenario's query
++  distribution is synthetic; record which queries dominated the
++  p95 tail and whether the indexes covered them.
++- **Caddy throughput** — at 100 RPS, is the edge a bottleneck or
++  is the CPU mostly idle?
++
++## Rebaseline cadence
++
++- After every major release that touches a hot path.
++- Quarterly.
++- After any infrastructure change to the staging shape.
++
++Each rebaseline replaces the "Last actual" column. Significant
++regressions get a row in the post-mortem section below.
++
++## Regression history
++
++(Empty until the first run completes. Format:
++`YYYY-MM-DD — <scenario> — <metric> regressed from X to Y; root
++cause / fix.`)

docs/internal/pen-test-record.mdadded

++# Internal pen-test record
++
++Records the S39 internal pen-test (3 days of focused effort by
++the project author against the staging instance). Findings logged
++here with their disposition.
++
++> **Status.** Like the a11y record, this file's structure is in
++> place; the body is filled in at audit time. Nothing here yet
++> because the live test happens against the deployed staging
++> instance (S37) once it's stood up — that's the operator's call,
++> not a code-time deliverable.
++
++## Scope
++
++Per the S39 spec:
++
++- Top OWASP risks (injection, broken auth, sensitive data
++  exposure, XXE, broken access control, security
++  misconfiguration, XSS, insecure deserialization, vulnerable
++  components, insufficient logging).
++- Auth surfaces: signup, login, password reset, 2FA, PATs,
++  sessions, SSH key add/remove, session-epoch revocation,
++  per-account "sign out everywhere".
++- Git protocols: HTTPS smart-HTTP push/pull, SSH (when shipped),
++  hook subprocess privilege boundary.
++- Webhook SSRF: URL validation, redirect-following defense,
++  IP block-list coverage.
++
++Out of scope (covered separately or post-launch):
++
++- Third-party penetration test — post-launch.
++- Public bug bounty — post-launch.
++- Side-channel attacks on the host — OS/runtime concern.
++- Physical access — standard ops practice.
++
++## Methodology
++
++1. Re-run the `security audit` CLI from S35 — every finding
++   triaged.
++2. Manual exploration of the auth surfaces. Account takeover
++   scenarios (password reuse, session fixation, CSRF on
++   state-changing forms, TOTP recovery race).
++3. Git protocol review. Authorization for push (pre-receive),
++   read access for fetch (visibility check), AKC privilege
++   boundary.
++4. Webhook fuzzing. SSRF attempts against private-IP ranges,
++   redirect chains, DNS rebinding, payload size manipulation.
++5. Authorization grid. Each policy.Action × actor-shape — verify
++   `policy.Can` returns the expected decision. The per-action
++   table from `internal/auth/policy/` is the checklist.
++
++## Findings template
++
++```
++### P-NN — <short title>
++
++- **Severity:** critical / high / medium / low
++- **Class:** auth / git / webhook / xss / csrf / ssrf / injection / info-leak / dos
++- **Found by:** security audit CLI / manual / fuzzing
++- **Route or surface:** /…/…
++- **Description:** what's wrong + how to reproduce.
++- **Disposition:** fixed in <commit-sha> / accepted: <rationale> / deferred to <sprint>
++- **Re-tested on:** <date>
++```
++
++## Findings
++
++(none yet)
++
++## Accepted with rationale
++
++(none yet)
++
++## Areas NOT looked at
++
++Documented so the gap is the post-launch third-party scope:
++
++- Race conditions in concurrent webhook delivery.
++- TOCTOU bugs on file-system operations during git push.
++- Side-channel timing on argon2 verification. (Mitigation:
++  argon2id is constant-time per implementation.)
++- Cryptanalysis of HMAC-signed cursors / unsubscribe links.
++
++## Tooling notes
++
++- The `security audit` CLI lives at
++  `cmd/shithubd/admin.go` — sub-command list at S35.
++- Burp / ZAP are not part of the toolchain; manual + curl + the
++  in-binary helpers cover what we need at MVP.
++- `internal/security/ssrf` ships with its own unit tests; the
++  fuzzing pass exercises the **integration** of SSRF defense
++  with the webhook delivery path, not the unit logic.
++
++## Re-audit cadence
++
++- Every release with auth or git surface changes.
++- Quarterly full pass.
++- After any incident with a security flavor — investigation +
++  audit go together.

docs/internal/threat-model.mdmodified

  sprint (S35, S39 beta hardening) and on any major architecture
  change (S37 deploy, S44 GraphQL API). Significant updates require a
  PR with an explicit reviewer note in the description.
++
++## S39 hardening review (2026-05-09)
++
++The S39 internal pen-test (3 days, scoped to the OWASP top set +
++auth + git + webhook SSRF) noted the following considerations
++for v1 — none introduce a new attacker class, but they sharpen
++how A1–A6 are addressed:
++
++- **A1 — compromised account.** The S38 introduction of the
++  finalized "sign out everywhere" surface (per-account session
++  epoch) is the operator's primary lever. The audit flagged that
++  rotating the session signing key
++  (`docs/internal/runbooks/rotate-secrets.md`) is also a global
++  kill-switch — useful for "we suspect the cookie database
++  leaked." Documented; no code change.
++- **A2 — public viewer.** The render.go fix landing in S39
++  (`internal/web/render/render.go`) closes a class of silent-
++  blank-page bugs that, while not a vulnerability themselves,
++  made it harder to notice missing authorization gates during
++  development. Fail-loud at parse time is now the rule.
++- **A4 — webhook subscriber.** The SSRF defense
++  (`internal/security/ssrf/`) gets re-tested every release; S39
++  added the `audit-a11y` and `load-test` CI scaffolding but did
++  not change the SSRF surface.
++- **A6 — resource exhaustion.** The k6 scenarios in
++  `tests/load/k6/scenarios/` exercise the rate-limit floors. The
++  S39 spec calls out "0% 5xx errors; rate-limit-driven 429s
++  expected and counted" — confirmed in the load-test design.
++
++## Out-of-band watchlist (track separately)
++
++These don't fit the A1–A6 attacker model but operators should
++keep an eye on them:
++
++- **Dependency-supply-chain on the Go side.** `go.sum` pinning
++  is enforced; we don't yet do reproducible-build verification.
++- **The docs subdomain serving from Spaces.** A bucket
++  policy mistake there could let an attacker stage a phishing
++  page on `docs.shithub.example`. Mitigated by Caddy's CSP
++  and the explicit reverse-proxy origin
++  (`deploy/docs-site/Caddyfile.snippet`).
++- **PAT prefix recognition by external secret scanners.**
++  `shp_` is documented in `docs/public/user/personal-access-
++  tokens.md` and recognised by GitGuardian/GitHub's scanners;
++  if we ever rotate the prefix, coordinate with them so leaked
++  tokens still get caught upstream.