tenseleyflow/shithub / ad13327

Browse files

S36: docs — DB indexes catalog with rationale

Authored by espadonne
SHA
ad13327f32216c0a046c4808c7829d693bca13c8
Parents
8f82a9e
Tree
74f77c2

1 changed file

StatusFile+-
A docs/internal/db-indexes.md 110 0
docs/internal/db-indexes.mdadded
@@ -0,0 +1,110 @@
1
+# DB indexes — catalog
2
+
3
+Per-domain catalog of every non-PK index, the query it backs, and the
4
+rationale. Each new index needs an entry here AND a sentence in its
5
+migration file justifying the write-throughput cost. Reviewed on
6
+every perf-pass sprint (S36 baseline; revisited on S39 beta hardening).
7
+
8
+The columns are: **Index**, **Covers query**, **Selectivity**, **Cost
9
+notes**. Selectivity is rough — "high" means typically returns ≤ 100
10
+rows for a 1M-row table; "medium" 1–10k; "low" most-of-the-table.
11
+
12
+## Identity (`users`, `user_emails`, `user_*`)
13
+
14
+| Index | Covers query | Selectivity | Cost notes |
15
+|---|---|---|---|
16
+| `users.username_uq` (unique) | login + profile lookup by username | high | implied by UNIQUE; PK-class |
17
+| `users_deleted_at_idx` (partial WHERE deleted_at IS NOT NULL) | restore listing + admin "deleted" filter | high | partial keeps it tiny |
18
+| `users_suspended_at_idx` (partial WHERE suspended_at IS NOT NULL) | admin "suspended" filter | high | partial |
19
+| `users_site_admin_idx` (partial WHERE is_site_admin = true) | admin elevation lookups | high | partial; few rows |
20
+| `user_emails (user_id)` | enumerate a user's emails | high | per-user |
21
+| `user_emails_address_uq` (unique) | reverse lookup on signup-collision check | high | UNIQUE |
22
+
23
+## Repos
24
+
25
+| Index | Covers query | Selectivity | Cost notes |
26
+|---|---|---|---|
27
+| `repos (owner_user_id, name)` PK-class | per-owner repo lookup | high | composite |
28
+| `repos_owner_org_name_uq` | org-owner repo lookup | high | UNIQUE |
29
+| `repo_collaborators_user_id_idx` | "all my repos" enumeration | medium | parallel index to PK |
30
+| `repo_collaborators_repo_id_idx` | settings access tab | high | per-repo |
31
+| `repo_topics_topic_idx` | "repos with topic X" | medium | post-MVP topic filter |
32
+
33
+## Issues / pulls
34
+
35
+| Index | Covers query | Selectivity | Cost notes |
36
+|---|---|---|---|
37
+| `issues (repo_id, number)` PK | issue-by-number | high | PK |
38
+| `issues_repo_state_updated_idx (repo_id, state, updated_at DESC)` | issues list with state filter (S21) | medium | the dominant list-page index |
39
+| `issues_repo_kind_state_idx (repo_id, kind, state)` | PR-vs-issue split | medium | composite |
40
+| `issues_author_idx (author_user_id)` | "issues authored by me" | medium | per-user-author |
41
+| `issues_search_*` | FTS lookup | medium | GIN tsv index |
42
+
43
+## Jobs queue
44
+
45
+| Index | Covers query | Selectivity | Cost notes |
46
+|---|---|---|---|
47
+| `jobs_dispatch_idx (kind, run_at) WHERE completed_at IS NULL AND failed_at IS NULL` | worker claim — hot path | high | partial; only dispatchable rows |
48
+| `jobs_locked_idx (locked_by, locked_at)` | stuck-claim sweeper | high | rare scan |
49
+
50
+## Notifications + domain events
51
+
52
+| Index | Covers query | Selectivity | Cost notes |
53
+|---|---|---|---|
54
+| `notifications_recipient_recent_idx (recipient_user_id, last_event_at DESC)` | inbox first page | high | per-user, recency-sorted |
55
+| `notifications_recipient_unread_idx` | unread badge | high | partial |
56
+| `notifications_thread_coalesce_idx` (UNIQUE) | dedup thread rows on fanout | high | UNIQUE |
57
+| `domain_events_created_at_idx (created_at)` | fanout cursor scan | low | covers polling |
58
+| `domain_events_repo_created_idx (repo_id, created_at DESC) WHERE repo_id IS NOT NULL` | per-repo activity feed | medium | partial |
59
+| `domain_events_actor_created_idx (actor_user_id, created_at DESC) WHERE actor_user_id IS NOT NULL` | per-user activity feed | medium | partial |
60
+| `domain_events_public_created_idx (created_at DESC) WHERE public = true` | public feed | medium | partial |
61
+
62
+## Check runs
63
+
64
+| Index | Covers query | Selectivity | Cost notes |
65
+|---|---|---|---|
66
+| `check_runs_repo_head_idx (repo_id, head_sha)` | required-check eval per PR | high | composite |
67
+| `check_runs_required_lookup_idx (repo_id, head_sha, name)` | named-check exact lookup | high | composite |
68
+| `check_runs_external_id_idx` (UNIQUE) | upsert-by-(name+external_id) | high | UNIQUE |
69
+
70
+## Auth audit log
71
+
72
+| Index | Covers query | Selectivity | Cost notes |
73
+|---|---|---|---|
74
+| `auth_audit_log_actor_id_idx` | "what did actor X do" | medium | per-actor |
75
+| `auth_audit_log_target_idx (target_type, target_id)` | "what happened to target Y" | medium | composite |
76
+| `auth_audit_log_action_idx` | filter by action prefix | medium | per-action (admin viewer) |
77
+| `auth_audit_log_created_at_idx (created_at DESC)` | recency scan | low | dominant index for the admin viewer |
78
+
79
+## Webhooks (S33)
80
+
81
+| Index | Covers query | Selectivity | Cost notes |
82
+|---|---|---|---|
83
+| `webhooks_owner_idx (owner_kind, owner_id)` | settings list | high | composite |
84
+| `webhooks_active_idx (owner_kind, owner_id) WHERE active = true AND disabled_at IS NULL` | fanout subscriber lookup | high | partial; the hot path |
85
+| `webhook_deliveries_pending_due_idx (next_retry_at) WHERE status IN ('pending','failed_retry')` | deliverer claim | high | partial |
86
+| `webhook_deliveries_webhook_started_idx (webhook_id, started_at DESC)` | per-webhook delivery view | medium | composite |
87
+
88
+## Rate-limit (S35)
89
+
90
+| Index | Covers query | Selectivity | Cost notes |
91
+|---|---|---|---|
92
+| `rate_limits` PK `(scope, key)` | per-scope-key bump | high | UPSERT hot path |
93
+| `rate_limits_window_started_idx (window_started_at)` | periodic prune | low | scan-friendly |
94
+| `signup_ip_throttle` PK `(cidr)` | per-/24 lookup | high | UPSERT |
95
+| `signup_ip_throttle_window_started_idx (window_started_at)` | periodic prune | low | scan-friendly |
96
+
97
+## Future considerations (deferred)
98
+
99
+- **`pg_stat_statements` extension.** S37's deploy doc owns the
100
+  install. Once the suite runs against a populated dataset we'll
101
+  capture top-N slow queries here and decide on additional
102
+  indexes.
103
+- **`gin_trgm_ops` indexes** for partial-match search on
104
+  usernames / repo names. The existing FTS covers most needs;
105
+  trigram lookup pays off only if profiling shows an LIKE-prefix
106
+  hot path.
107
+- **Covering indexes for the issue list.** `INCLUDE` columns on
108
+  `issues_repo_state_updated_idx` could turn the page into an
109
+  index-only scan. Defer until the scan proves expensive on the
110
+  100k-issue fixture.