markdown · 6088 bytes Raw Blame History

DB indexes — catalog

Per-domain catalog of every non-PK index, the query it backs, and the rationale. Each new index needs an entry here AND a sentence in its migration file justifying the write-throughput cost. Reviewed on every perf-pass sprint (S36 baseline; revisited on S39 beta hardening).

The columns are: Index, Covers query, Selectivity, Cost notes. Selectivity is rough — "high" means typically returns ≤ 100 rows for a 1M-row table; "medium" 1–10k; "low" most-of-the-table.

Identity (users, user_emails, user_*)

Index Covers query Selectivity Cost notes
users.username_uq (unique) login + profile lookup by username high implied by UNIQUE; PK-class
users_deleted_at_idx (partial WHERE deleted_at IS NOT NULL) restore listing + admin "deleted" filter high partial keeps it tiny
users_suspended_at_idx (partial WHERE suspended_at IS NOT NULL) admin "suspended" filter high partial
users_site_admin_idx (partial WHERE is_site_admin = true) admin elevation lookups high partial; few rows
user_emails (user_id) enumerate a user's emails high per-user
user_emails_address_uq (unique) reverse lookup on signup-collision check high UNIQUE

Repos

Index Covers query Selectivity Cost notes
repos (owner_user_id, name) PK-class per-owner repo lookup high composite
repos_owner_org_name_uq org-owner repo lookup high UNIQUE
repo_collaborators_user_id_idx "all my repos" enumeration medium parallel index to PK
repo_collaborators_repo_id_idx settings access tab high per-repo
repo_topics_topic_idx "repos with topic X" medium post-MVP topic filter

Issues / pulls

Index Covers query Selectivity Cost notes
issues (repo_id, number) PK issue-by-number high PK
issues_repo_state_updated_idx (repo_id, state, updated_at DESC) issues list with state filter (S21) medium the dominant list-page index
issues_repo_kind_state_idx (repo_id, kind, state) PR-vs-issue split medium composite
issues_author_idx (author_user_id) "issues authored by me" medium per-user-author
issues_search_* FTS lookup medium GIN tsv index

Jobs queue

Index Covers query Selectivity Cost notes
jobs_dispatch_idx (kind, run_at) WHERE completed_at IS NULL AND failed_at IS NULL worker claim — hot path high partial; only dispatchable rows
jobs_locked_idx (locked_by, locked_at) stuck-claim sweeper high rare scan

Notifications + domain events

Index Covers query Selectivity Cost notes
notifications_recipient_recent_idx (recipient_user_id, last_event_at DESC) inbox first page high per-user, recency-sorted
notifications_recipient_unread_idx unread badge high partial
notifications_thread_coalesce_idx (UNIQUE) dedup thread rows on fanout high UNIQUE
domain_events_created_at_idx (created_at) fanout cursor scan low covers polling
domain_events_repo_created_idx (repo_id, created_at DESC) WHERE repo_id IS NOT NULL per-repo activity feed medium partial
domain_events_actor_created_idx (actor_user_id, created_at DESC) WHERE actor_user_id IS NOT NULL per-user activity feed medium partial
domain_events_public_created_idx (created_at DESC) WHERE public = true public feed medium partial

Check runs

Index Covers query Selectivity Cost notes
check_runs_repo_head_idx (repo_id, head_sha) required-check eval per PR high composite
check_runs_required_lookup_idx (repo_id, head_sha, name) named-check exact lookup high composite
check_runs_external_id_idx (UNIQUE) upsert-by-(name+external_id) high UNIQUE

Auth audit log

Index Covers query Selectivity Cost notes
auth_audit_log_actor_id_idx "what did actor X do" medium per-actor
auth_audit_log_target_idx (target_type, target_id) "what happened to target Y" medium composite
auth_audit_log_action_idx filter by action prefix medium per-action (admin viewer)
auth_audit_log_created_at_idx (created_at DESC) recency scan low dominant index for the admin viewer

Webhooks (S33)

Index Covers query Selectivity Cost notes
webhooks_owner_idx (owner_kind, owner_id) settings list high composite
webhooks_active_idx (owner_kind, owner_id) WHERE active = true AND disabled_at IS NULL fanout subscriber lookup high partial; the hot path
webhook_deliveries_pending_due_idx (next_retry_at) WHERE status IN ('pending','failed_retry') deliverer claim high partial
webhook_deliveries_webhook_started_idx (webhook_id, started_at DESC) per-webhook delivery view medium composite

Rate-limit (S35)

Index Covers query Selectivity Cost notes
rate_limits PK (scope, key) per-scope-key bump high UPSERT hot path
rate_limits_window_started_idx (window_started_at) periodic prune low scan-friendly
signup_ip_throttle PK (cidr) per-/24 lookup high UPSERT
signup_ip_throttle_window_started_idx (window_started_at) periodic prune low scan-friendly

Future considerations (deferred)

  • pg_stat_statements extension. S37's deploy doc owns the install. Once the suite runs against a populated dataset we'll capture top-N slow queries here and decide on additional indexes.
  • gin_trgm_ops indexes for partial-match search on usernames / repo names. The existing FTS covers most needs; trigram lookup pays off only if profiling shows an LIKE-prefix hot path.
  • Covering indexes for the issue list. INCLUDE columns on issues_repo_state_updated_idx could turn the page into an index-only scan. Defer until the scan proves expensive on the 100k-issue fixture.
View source
1 # DB indexes — catalog
2
3 Per-domain catalog of every non-PK index, the query it backs, and the
4 rationale. Each new index needs an entry here AND a sentence in its
5 migration file justifying the write-throughput cost. Reviewed on
6 every perf-pass sprint (S36 baseline; revisited on S39 beta hardening).
7
8 The columns are: **Index**, **Covers query**, **Selectivity**, **Cost
9 notes**. Selectivity is rough — "high" means typically returns ≤ 100
10 rows for a 1M-row table; "medium" 1–10k; "low" most-of-the-table.
11
12 ## Identity (`users`, `user_emails`, `user_*`)
13
14 | Index | Covers query | Selectivity | Cost notes |
15 |---|---|---|---|
16 | `users.username_uq` (unique) | login + profile lookup by username | high | implied by UNIQUE; PK-class |
17 | `users_deleted_at_idx` (partial WHERE deleted_at IS NOT NULL) | restore listing + admin "deleted" filter | high | partial keeps it tiny |
18 | `users_suspended_at_idx` (partial WHERE suspended_at IS NOT NULL) | admin "suspended" filter | high | partial |
19 | `users_site_admin_idx` (partial WHERE is_site_admin = true) | admin elevation lookups | high | partial; few rows |
20 | `user_emails (user_id)` | enumerate a user's emails | high | per-user |
21 | `user_emails_address_uq` (unique) | reverse lookup on signup-collision check | high | UNIQUE |
22
23 ## Repos
24
25 | Index | Covers query | Selectivity | Cost notes |
26 |---|---|---|---|
27 | `repos (owner_user_id, name)` PK-class | per-owner repo lookup | high | composite |
28 | `repos_owner_org_name_uq` | org-owner repo lookup | high | UNIQUE |
29 | `repo_collaborators_user_id_idx` | "all my repos" enumeration | medium | parallel index to PK |
30 | `repo_collaborators_repo_id_idx` | settings access tab | high | per-repo |
31 | `repo_topics_topic_idx` | "repos with topic X" | medium | post-MVP topic filter |
32
33 ## Issues / pulls
34
35 | Index | Covers query | Selectivity | Cost notes |
36 |---|---|---|---|
37 | `issues (repo_id, number)` PK | issue-by-number | high | PK |
38 | `issues_repo_state_updated_idx (repo_id, state, updated_at DESC)` | issues list with state filter (S21) | medium | the dominant list-page index |
39 | `issues_repo_kind_state_idx (repo_id, kind, state)` | PR-vs-issue split | medium | composite |
40 | `issues_author_idx (author_user_id)` | "issues authored by me" | medium | per-user-author |
41 | `issues_search_*` | FTS lookup | medium | GIN tsv index |
42
43 ## Jobs queue
44
45 | Index | Covers query | Selectivity | Cost notes |
46 |---|---|---|---|
47 | `jobs_dispatch_idx (kind, run_at) WHERE completed_at IS NULL AND failed_at IS NULL` | worker claim — hot path | high | partial; only dispatchable rows |
48 | `jobs_locked_idx (locked_by, locked_at)` | stuck-claim sweeper | high | rare scan |
49
50 ## Notifications + domain events
51
52 | Index | Covers query | Selectivity | Cost notes |
53 |---|---|---|---|
54 | `notifications_recipient_recent_idx (recipient_user_id, last_event_at DESC)` | inbox first page | high | per-user, recency-sorted |
55 | `notifications_recipient_unread_idx` | unread badge | high | partial |
56 | `notifications_thread_coalesce_idx` (UNIQUE) | dedup thread rows on fanout | high | UNIQUE |
57 | `domain_events_created_at_idx (created_at)` | fanout cursor scan | low | covers polling |
58 | `domain_events_repo_created_idx (repo_id, created_at DESC) WHERE repo_id IS NOT NULL` | per-repo activity feed | medium | partial |
59 | `domain_events_actor_created_idx (actor_user_id, created_at DESC) WHERE actor_user_id IS NOT NULL` | per-user activity feed | medium | partial |
60 | `domain_events_public_created_idx (created_at DESC) WHERE public = true` | public feed | medium | partial |
61
62 ## Check runs
63
64 | Index | Covers query | Selectivity | Cost notes |
65 |---|---|---|---|
66 | `check_runs_repo_head_idx (repo_id, head_sha)` | required-check eval per PR | high | composite |
67 | `check_runs_required_lookup_idx (repo_id, head_sha, name)` | named-check exact lookup | high | composite |
68 | `check_runs_external_id_idx` (UNIQUE) | upsert-by-(name+external_id) | high | UNIQUE |
69
70 ## Auth audit log
71
72 | Index | Covers query | Selectivity | Cost notes |
73 |---|---|---|---|
74 | `auth_audit_log_actor_id_idx` | "what did actor X do" | medium | per-actor |
75 | `auth_audit_log_target_idx (target_type, target_id)` | "what happened to target Y" | medium | composite |
76 | `auth_audit_log_action_idx` | filter by action prefix | medium | per-action (admin viewer) |
77 | `auth_audit_log_created_at_idx (created_at DESC)` | recency scan | low | dominant index for the admin viewer |
78
79 ## Webhooks (S33)
80
81 | Index | Covers query | Selectivity | Cost notes |
82 |---|---|---|---|
83 | `webhooks_owner_idx (owner_kind, owner_id)` | settings list | high | composite |
84 | `webhooks_active_idx (owner_kind, owner_id) WHERE active = true AND disabled_at IS NULL` | fanout subscriber lookup | high | partial; the hot path |
85 | `webhook_deliveries_pending_due_idx (next_retry_at) WHERE status IN ('pending','failed_retry')` | deliverer claim | high | partial |
86 | `webhook_deliveries_webhook_started_idx (webhook_id, started_at DESC)` | per-webhook delivery view | medium | composite |
87
88 ## Rate-limit (S35)
89
90 | Index | Covers query | Selectivity | Cost notes |
91 |---|---|---|---|
92 | `rate_limits` PK `(scope, key)` | per-scope-key bump | high | UPSERT hot path |
93 | `rate_limits_window_started_idx (window_started_at)` | periodic prune | low | scan-friendly |
94 | `signup_ip_throttle` PK `(cidr)` | per-/24 lookup | high | UPSERT |
95 | `signup_ip_throttle_window_started_idx (window_started_at)` | periodic prune | low | scan-friendly |
96
97 ## Future considerations (deferred)
98
99 - **`pg_stat_statements` extension.** S37's deploy doc owns the
100 install. Once the suite runs against a populated dataset we'll
101 capture top-N slow queries here and decide on additional
102 indexes.
103 - **`gin_trgm_ops` indexes** for partial-match search on
104 usernames / repo names. The existing FTS covers most needs;
105 trigram lookup pays off only if profiling shows an LIKE-prefix
106 hot path.
107 - **Covering indexes for the issue list.** `INCLUDE` columns on
108 `issues_repo_state_updated_idx` could turn the page into an
109 index-only scan. Defer until the scan proves expensive on the
110 100k-issue fixture.