markdown · 4829 bytes Raw Blame History

Database conventions

This document is the authoritative reference for shithub's PostgreSQL schema conventions. Every domain sprint (S05 onwards) follows these.

Engine and tooling

  • PostgreSQL 16. Production runs self-hosted on a dedicated block volume (S37). Local dev runs in docker-compose (make dev-db).
  • Driver: pgx/v5 (github.com/jackc/pgx/v5). Native *pgxpool.Pool for app code; the stdlib adapter is reserved for libraries that demand *sql.DB.
  • Migrations: goose (github.com/pressly/goose/v3), used as a library via shithubd migrate. Plain SQL up/down blocks; one file per migration.
  • Code generation: sqlc. Queries live under internal/<domain>/queries/*.sql; generated Go lives under internal/<domain>/sqlc/.

Naming

Concern Convention
Tables snake_case, plural (users, repos, pull_requests)
Columns snake_case (created_at, owner_user_id)
Primary keys id (column)
Indexes <table>_<columns>_idx
Unique indexes / constraints <table>_<columns>_key
Foreign keys <table>_<col>_fkey (Postgres default)
Enums snake_case_enum (e.g. repo_visibility)
Trigger functions tg_<purpose> (e.g. tg_set_updated_at)

ID strategy

  • bigserial id PRIMARY KEY for internal IDs. Compact, ordered, cheap.
  • UUID v7 for any ID exposed publicly via URL. Gives sortability without leaking tenant size. Stored as the canonical Postgres uuid type.
  • Per-repo issue/PR numbers are NEITHER bigserial NOR UUID — they are per-(repo_id) monotonic counters maintained by a small counter table (S21).

Timestamps

  • Always timestamptz. Never timestamp (without timezone) — that's a foot-gun.

  • created_at timestamptz NOT NULL DEFAULT now() on every table.

  • updated_at timestamptz NOT NULL DEFAULT now() on every table that receives updates, with a BEFORE UPDATE trigger that re-stamps it:

    CREATE TRIGGER set_updated_at BEFORE UPDATE ON <table>
        FOR EACH ROW EXECUTE FUNCTION tg_set_updated_at();
    

    The tg_set_updated_at() function is created in 0001_meta.sql and reused across every table.

Foreign keys

  • Always specify ON DELETE explicitly. RESTRICT (the default) is the safe choice; only use CASCADE when the lifecycle of the child IS the lifecycle of the parent.
  • Index every FK column unless the child is naturally indexed by a covering composite index that starts with the FK.

Soft deletes

  • No soft deletes by default. Hard delete + audit-log row when audit matters.
  • Exceptions exist where grace windows are required (user delete, repo delete, org delete) — these are explicit and documented per sprint.

Migrations

  • Forward-only after production deploy. Down blocks exist for dev convenience but are immutable post-deploy. Corrections come as new migrations.

  • One change per migration. Filename: NNNN_short_purpose.sql where NNNN is monotonically increasing and globally unique. scripts/lint-migration-versions.sh enforces this in CI because goose panics on duplicate numeric versions before it can run any migration.

  • Goose markers:

    -- +goose Up
    ...
    -- +goose Down
    ...
    

    Multi-statement DDL that needs PL/pgSQL must be wrapped in -- +goose StatementBegin / -- +goose StatementEnd.

citext

Use citext (case-insensitive text) for identifiers that should match case-insensitively but display case-preserved (users.username, user_emails.email, orgs.slug, etc.). The extension is enabled in 0001_meta.sql (post-S05) — for now, S01 enables only the function we need.

JSONB

  • Acceptable for genuinely polymorphic data (event metadata, audit log details, notification summaries).
  • Not a substitute for proper schema. If the shape is known and stable, model it as columns or a child table.

Connection pooling

  • App: pgxpool.Pool per process. Default sizing 25 in prod, 10 in dev.
  • Hooks (S07/S14): tiny pools (max 4) since they're short-lived and on the critical path of git operations.
  • Tests: per-test pool with max 2 connections.

Test harness

  • internal/testing/dbtest: dbtest.NewTestDB(t) creates a fresh database cloned from a template (which has all migrations applied) per test, then drops it on t.Cleanup. Parallel-safe.
  • Tests requiring DB access set SHITHUB_TEST_DATABASE_URL pointing at a Postgres server with permission to CREATE DATABASE.

Operational

  • pg_stat_statements extension is loaded by default in dev compose and prod (S37). Used by S36's perf pass.
  • archive_mode=on + WAL shipping to Spaces (cross-region) in prod (S37).
  • Daily logical backups via pg_dump --format=custom, restored weekly to validate the backup chain.
View source
1 # Database conventions
2
3 This document is the authoritative reference for shithub's PostgreSQL schema
4 conventions. Every domain sprint (S05 onwards) follows these.
5
6 ## Engine and tooling
7
8 - **PostgreSQL 16**. Production runs self-hosted on a dedicated block volume
9 (S37). Local dev runs in `docker-compose` (`make dev-db`).
10 - **Driver:** `pgx/v5` (`github.com/jackc/pgx/v5`). Native `*pgxpool.Pool` for
11 app code; the `stdlib` adapter is reserved for libraries that demand
12 `*sql.DB`.
13 - **Migrations:** `goose` (`github.com/pressly/goose/v3`), used as a library
14 via `shithubd migrate`. Plain SQL up/down blocks; one file per migration.
15 - **Code generation:** `sqlc`. Queries live under
16 `internal/<domain>/queries/*.sql`; generated Go lives under
17 `internal/<domain>/sqlc/`.
18
19 ## Naming
20
21 | Concern | Convention |
22 |---|---|
23 | Tables | snake_case, plural (`users`, `repos`, `pull_requests`) |
24 | Columns | snake_case (`created_at`, `owner_user_id`) |
25 | Primary keys | `id` (column) |
26 | Indexes | `<table>_<columns>_idx` |
27 | Unique indexes / constraints | `<table>_<columns>_key` |
28 | Foreign keys | `<table>_<col>_fkey` (Postgres default) |
29 | Enums | `snake_case_enum` (e.g. `repo_visibility`) |
30 | Trigger functions | `tg_<purpose>` (e.g. `tg_set_updated_at`) |
31
32 ## ID strategy
33
34 - **`bigserial id PRIMARY KEY`** for internal IDs. Compact, ordered, cheap.
35 - **UUID v7** for any ID exposed publicly via URL. Gives sortability without
36 leaking tenant size. Stored as the canonical Postgres `uuid` type.
37 - **Per-repo issue/PR numbers** are NEITHER `bigserial` NOR UUID — they are
38 per-`(repo_id)` monotonic counters maintained by a small counter table
39 (S21).
40
41 ## Timestamps
42
43 - Always **`timestamptz`**. Never `timestamp` (without timezone) — that's a
44 foot-gun.
45 - `created_at timestamptz NOT NULL DEFAULT now()` on every table.
46 - `updated_at timestamptz NOT NULL DEFAULT now()` on every table that
47 receives updates, with a `BEFORE UPDATE` trigger that re-stamps it:
48
49 ```sql
50 CREATE TRIGGER set_updated_at BEFORE UPDATE ON <table>
51 FOR EACH ROW EXECUTE FUNCTION tg_set_updated_at();
52 ```
53
54 The `tg_set_updated_at()` function is created in `0001_meta.sql` and
55 reused across every table.
56
57 ## Foreign keys
58
59 - **Always specify `ON DELETE` explicitly.** `RESTRICT` (the default) is the
60 safe choice; only use `CASCADE` when the lifecycle of the child IS the
61 lifecycle of the parent.
62 - Index every FK column unless the child is naturally indexed by a covering
63 composite index that starts with the FK.
64
65 ## Soft deletes
66
67 - **No soft deletes by default.** Hard delete + audit-log row when audit
68 matters.
69 - Exceptions exist where grace windows are required (user delete, repo
70 delete, org delete) — these are explicit and documented per sprint.
71
72 ## Migrations
73
74 - **Forward-only after production deploy.** `Down` blocks exist for dev
75 convenience but are immutable post-deploy. Corrections come as new
76 migrations.
77 - One change per migration. Filename: `NNNN_short_purpose.sql` where `NNNN`
78 is monotonically increasing and globally unique. `scripts/lint-migration-versions.sh`
79 enforces this in CI because goose panics on duplicate numeric versions before
80 it can run any migration.
81 - Goose markers:
82
83 ```
84 -- +goose Up
85 ...
86 -- +goose Down
87 ...
88 ```
89
90 Multi-statement DDL that needs PL/pgSQL must be wrapped in
91 `-- +goose StatementBegin` / `-- +goose StatementEnd`.
92
93 ## citext
94
95 Use `citext` (case-insensitive text) for identifiers that should match
96 case-insensitively but display case-preserved (`users.username`,
97 `user_emails.email`, `orgs.slug`, etc.). The extension is enabled in
98 `0001_meta.sql` (post-S05) — for now, S01 enables only the function we need.
99
100 ## JSONB
101
102 - Acceptable for genuinely polymorphic data (event metadata, audit log
103 details, notification summaries).
104 - Not a substitute for proper schema. If the shape is known and stable,
105 model it as columns or a child table.
106
107 ## Connection pooling
108
109 - App: `pgxpool.Pool` per process. Default sizing 25 in prod, 10 in dev.
110 - Hooks (S07/S14): tiny pools (max 4) since they're short-lived and on the
111 critical path of git operations.
112 - Tests: per-test pool with max 2 connections.
113
114 ## Test harness
115
116 - `internal/testing/dbtest`: `dbtest.NewTestDB(t)` creates a fresh database
117 cloned from a template (which has all migrations applied) per test, then
118 drops it on `t.Cleanup`. Parallel-safe.
119 - Tests requiring DB access set `SHITHUB_TEST_DATABASE_URL` pointing at a
120 Postgres server with permission to `CREATE DATABASE`.
121
122 ## Operational
123
124 - `pg_stat_statements` extension is loaded by default in dev compose and
125 prod (S37). Used by S36's perf pass.
126 - `archive_mode=on` + WAL shipping to Spaces (cross-region) in prod (S37).
127 - Daily logical backups via `pg_dump --format=custom`, restored weekly to
128 validate the backup chain.