Capacity envelope
Records the load-test results from the S39 hardening sprint and
the rule-of-thumb numbers we use for capacity planning. The
public-facing version is docs/public/self-host/capacity.md,
which is summary-only; this file carries the run-by-run detail.
Status. Until the staging environment runs S39's load scenarios end-to-end, the numbers below are placeholders sourced from S36's bench (which exercises single-user p50/p95 on the read-heavy paths). Re-populate after the first staging load run; track each run as a dated row in the per-scenario tables.
Test environment
- Staging compute matches the production reference deployment
(
docs/public/self-host/prerequisites.md):- 2× web (2 vCPU / 4 GB)
- 1× worker (2 vCPU / 4 GB)
- 1× postgres (2 vCPU / 8 GB / 100 GB SSD)
- 1× backup, 1× monitoring (smaller)
- Caddy at the edge with TLS terminated.
- WireGuard mesh between hosts.
- Staging seeded with synthetic data:
- 5,000 users, 50,000 repos, ~500,000 issues, ~1M comments.
- Largest repo: ~50 MB packed; 95th percentile under 5 MB.
Per-scenario results
Mixed-read (anonymous browsing, 100 RPS for 10 min)
| Metric | S36 single-user | S39 baseline (target ≤2x) | Last actual |
|---|---|---|---|
| p50 | 35 ms | 70 ms | TBD |
| p95 | 80 ms | 160 ms | TBD |
| p99 | 200 ms | 400 ms | TBD |
| Error rate | n/a | < 1% (ex. 429) | TBD |
| Worker queue | n/a | bounded | TBD |
Authenticated mix (50 RPS for 10 min)
| Metric | S36 single-user | S39 baseline (target ≤2x) | Last actual |
|---|---|---|---|
| p50 | 50 ms | 100 ms | TBD |
| p95 | 150 ms | 300 ms | TBD |
| p99 | 350 ms | 700 ms | TBD |
Issue-comment storm (100 c/s for 5 min)
| Metric | Target | Last actual |
|---|---|---|
| Comment POST p95 | < 500 ms | TBD |
| Worker queue depth at end | < 1k | TBD |
| Notification fan-out lag | < 60s | TBD |
| DB pool exhaustion errors | 0 | TBD |
Search load (30 RPS for 10 min)
| Metric | S36 single-user | S39 baseline (target ≤2x) | Last actual |
|---|---|---|---|
| p50 | 350 ms | 700 ms | TBD |
| p95 | 800 ms | 1600 ms | TBD |
Degradation thresholds (where to scale)
From the load tests we infer the first ceiling each component
hits. These are operator triggers — monitoring rules in
deploy/monitoring/prometheus/rules.yml alert below them.
| Trigger | Action |
|---|---|
| p95 > 1.5 s sustained 10 min | Add a second web host. |
| DB calls/sec > 5k sustained | Hunt for an N+1 first; then DB scale. |
| Job queue depth > 5k for 15 min | Add a second worker. |
pg_stat_archiver.failed_count > 0 |
See archive-failing runbook. |
| Web-host disk > 70% | Audit largest repos; clean archived. |
| pgxpool exhaustion errors | Raise db.max_conns; investigate connection leaks. |
Notes from the S39 run
To be filled in after the first end-to-end load run on staging:
- Worker headroom — at what comment-storm rate does the notification fan-out lag exceed 60s?
- Auth-mix fairness — does API-only traffic at 50 RPS starve UI-rendered traffic, or do they coexist cleanly under pgxpool?
- Search hot-paths — the search-load scenario's query distribution is synthetic; record which queries dominated the p95 tail and whether the indexes covered them.
- Caddy throughput — at 100 RPS, is the edge a bottleneck or is the CPU mostly idle?
Rebaseline cadence
- After every major release that touches a hot path.
- Quarterly.
- After any infrastructure change to the staging shape.
Each rebaseline replaces the "Last actual" column. Significant regressions get a row in the post-mortem section below.
Regression history
(Empty until the first run completes. Format:
YYYY-MM-DD — <scenario> — <metric> regressed from X to Y; root cause / fix.)
View source
| 1 | # Capacity envelope |
| 2 | |
| 3 | Records the load-test results from the S39 hardening sprint and |
| 4 | the rule-of-thumb numbers we use for capacity planning. The |
| 5 | public-facing version is `docs/public/self-host/capacity.md`, |
| 6 | which is summary-only; this file carries the run-by-run detail. |
| 7 | |
| 8 | > **Status.** Until the staging environment runs S39's load |
| 9 | > scenarios end-to-end, the numbers below are placeholders |
| 10 | > sourced from S36's bench (which exercises single-user p50/p95 |
| 11 | > on the read-heavy paths). Re-populate after the first staging |
| 12 | > load run; track each run as a dated row in the per-scenario |
| 13 | > tables. |
| 14 | |
| 15 | ## Test environment |
| 16 | |
| 17 | - Staging compute matches the production reference deployment |
| 18 | (`docs/public/self-host/prerequisites.md`): |
| 19 | - 2× web (2 vCPU / 4 GB) |
| 20 | - 1× worker (2 vCPU / 4 GB) |
| 21 | - 1× postgres (2 vCPU / 8 GB / 100 GB SSD) |
| 22 | - 1× backup, 1× monitoring (smaller) |
| 23 | - Caddy at the edge with TLS terminated. |
| 24 | - WireGuard mesh between hosts. |
| 25 | - Staging seeded with synthetic data: |
| 26 | - 5,000 users, 50,000 repos, ~500,000 issues, ~1M comments. |
| 27 | - Largest repo: ~50 MB packed; 95th percentile under 5 MB. |
| 28 | |
| 29 | ## Per-scenario results |
| 30 | |
| 31 | ### Mixed-read (anonymous browsing, 100 RPS for 10 min) |
| 32 | |
| 33 | | Metric | S36 single-user | S39 baseline (target ≤2x) | Last actual | |
| 34 | |----------------|-----------------|---------------------------|-------------| |
| 35 | | p50 | 35 ms | 70 ms | TBD | |
| 36 | | p95 | 80 ms | 160 ms | TBD | |
| 37 | | p99 | 200 ms | 400 ms | TBD | |
| 38 | | Error rate | n/a | < 1% (ex. 429) | TBD | |
| 39 | | Worker queue | n/a | bounded | TBD | |
| 40 | |
| 41 | ### Authenticated mix (50 RPS for 10 min) |
| 42 | |
| 43 | | Metric | S36 single-user | S39 baseline (target ≤2x) | Last actual | |
| 44 | |----------------|-----------------|---------------------------|-------------| |
| 45 | | p50 | 50 ms | 100 ms | TBD | |
| 46 | | p95 | 150 ms | 300 ms | TBD | |
| 47 | | p99 | 350 ms | 700 ms | TBD | |
| 48 | |
| 49 | ### Issue-comment storm (100 c/s for 5 min) |
| 50 | |
| 51 | | Metric | Target | Last actual | |
| 52 | |------------------------------|------------------|-------------| |
| 53 | | Comment POST p95 | < 500 ms | TBD | |
| 54 | | Worker queue depth at end | < 1k | TBD | |
| 55 | | Notification fan-out lag | < 60s | TBD | |
| 56 | | DB pool exhaustion errors | 0 | TBD | |
| 57 | |
| 58 | ### Search load (30 RPS for 10 min) |
| 59 | |
| 60 | | Metric | S36 single-user | S39 baseline (target ≤2x) | Last actual | |
| 61 | |----------------|-----------------|---------------------------|-------------| |
| 62 | | p50 | 350 ms | 700 ms | TBD | |
| 63 | | p95 | 800 ms | 1600 ms | TBD | |
| 64 | |
| 65 | ## Degradation thresholds (where to scale) |
| 66 | |
| 67 | From the load tests we infer the **first ceiling** each component |
| 68 | hits. These are operator triggers — monitoring rules in |
| 69 | `deploy/monitoring/prometheus/rules.yml` alert below them. |
| 70 | |
| 71 | | Trigger | Action | |
| 72 | |----------------------------------------------------|-------------------------------------| |
| 73 | | p95 > 1.5 s sustained 10 min | Add a second web host. | |
| 74 | | DB calls/sec > 5k sustained | Hunt for an N+1 first; then DB scale. | |
| 75 | | Job queue depth > 5k for 15 min | Add a second worker. | |
| 76 | | `pg_stat_archiver.failed_count > 0` | See archive-failing runbook. | |
| 77 | | Web-host disk > 70% | Audit largest repos; clean archived. | |
| 78 | | pgxpool exhaustion errors | Raise `db.max_conns`; investigate connection leaks. | |
| 79 | |
| 80 | ## Notes from the S39 run |
| 81 | |
| 82 | To be filled in after the first end-to-end load run on staging: |
| 83 | |
| 84 | - **Worker headroom** — at what comment-storm rate does the |
| 85 | notification fan-out lag exceed 60s? |
| 86 | - **Auth-mix fairness** — does API-only traffic at 50 RPS |
| 87 | starve UI-rendered traffic, or do they coexist cleanly under |
| 88 | pgxpool? |
| 89 | - **Search hot-paths** — the search-load scenario's query |
| 90 | distribution is synthetic; record which queries dominated the |
| 91 | p95 tail and whether the indexes covered them. |
| 92 | - **Caddy throughput** — at 100 RPS, is the edge a bottleneck or |
| 93 | is the CPU mostly idle? |
| 94 | |
| 95 | ## Rebaseline cadence |
| 96 | |
| 97 | - After every major release that touches a hot path. |
| 98 | - Quarterly. |
| 99 | - After any infrastructure change to the staging shape. |
| 100 | |
| 101 | Each rebaseline replaces the "Last actual" column. Significant |
| 102 | regressions get a row in the post-mortem section below. |
| 103 | |
| 104 | ## Regression history |
| 105 | |
| 106 | (Empty until the first run completes. Format: |
| 107 | `YYYY-MM-DD — <scenario> — <metric> regressed from X to Y; root |
| 108 | cause / fix.`) |