`59c1236`

docs(runbook): pgdata migration from root disk to block volume

Authored by

espadonne 3 days ago

SHA: 59c1236f415245c04b9ccfcbfc09383ea577c9f9
Parents: e2c3943
Tree: c225372

1 changed file

Status	File	+	-
A	`docs/internal/runbooks/db-on-block-volume.md`	244	0

docs/internal/runbooks/db-on-block-volume.mdadded

 +# Move Postgres data from root disk to block volume
++
 +One-time migration. After this lands, the droplet's root disk
 +holds only the OS + binaries; all stateful data (pgdata, repos,
 +tmp) lives on the attached block volume mounted at `/data`. The
 +goal is twofold: (1) the root disk can never fill up from
 +runaway DB growth and OOM the system, and (2) the volume can be
 +detached and reattached to a replacement droplet without losing
 +state if the host ever needs replacing.
++
 +## Preconditions (verify before scheduling)
++
 +- **Block volume mounted at `/data`** with enough free space for
 +  pgdata + headroom for several months of growth. Check:
 +  ```sh
 +  df -h /data
 +  ```
 +- **`/data/pgdata` does not contain a live cluster.** It may
 +  contain a stale `initdb` from when the volume was first
 +  provisioned — that's fine and we'll move it aside.
 +  ```sh
 +  ls -la /data/pgdata
 +  # Expect a PG_VERSION file dated to volume-attach time, NOT
 +  # to "minutes ago". If "minutes ago", STOP — something is
 +  # already running there.
 +  ```
 +- **Recent dump landed in Spaces** (not just locally). Worst-case
 +  rollback is restoring this dump:
 +  ```sh
 +  rclone --config /etc/rclone-shithub.conf --s3-no-check-bucket \
 +         lsl spaces-prod:shithub-backups/daily/$(date -u +%Y/%m/%d)/ | tail -3
 +  ```
 +  If today's directory is empty, run a fresh dump first:
 +  ```sh
 +  /usr/local/bin/shithub-backup-daily
 +  ```
 +- **WAL archiver is healthy** — gives PITR coverage for changes
 +  between the last dump and migration time:
 +  ```sh
 +  sudo -u postgres psql -tAc "SELECT last_archived_time, last_failed_time FROM pg_stat_archiver;"
 +  # last_archived_time should be < 5 min ago, last_failed_time blank or older
 +  ```
 +- **DigitalOcean snapshot taken** via the DO dashboard
 +  (Droplets → shithub-prod → Snapshots → Take Snapshot). This
 +  is the panic-button rollback if everything else fails.
 +  Snapshots take a few minutes; DON'T start the migration
 +  until the snapshot completes.
 +- **Notice users** — site will be down for ~3 minutes for a
 +  ~115 MB pgdata. Scale the window with current `du -sh
 +  /var/lib/postgresql/16/main`.
++
 +## Migration
++
 +Total downtime: ~3 minutes for a 100 MB DB. Most of that is
 +postgres clean-shutdown + start; the rsync itself is seconds.
++
 +```sh
 +ssh root@shithub.sh
 +```
++
 +### 1. Drain writes
++
 +```sh
 +systemctl stop shithubd-web
 +systemctl stop shithubd-cron 2>/dev/null   # if running
 +```
++
 +Verify nothing else has DB sessions open before stopping
 +postgres (background workers, manual psql sessions):
++
 +```sh
 +sudo -u postgres psql -tAc "SELECT pid, application_name, client_addr, state FROM pg_stat_activity WHERE datname = 'shithub';"
 +```
++
 +If this returns rows besides your own psql, kill those processes
 +first.
++
 +### 2. Stop postgres
++
 +```sh
 +systemctl stop postgresql@16-main
 +systemctl status postgresql@16-main --no-pager | head -5  # should be "inactive (dead)"
 +```
++
 +### 3. Rename the stale pre-init aside (don't delete)
++
 +Keeping it lets us undo step 4 instantly if something looks off:
++
 +```sh
 +mv /data/pgdata /data/pgdata.preinit-$(date -u +%Y%m%d)
 +```
++
 +### 4. Copy live data to the volume
++
 +`rsync -aHX --info=progress2` preserves perms, owners, hard
 +links, and xattrs. `--info=progress2` shows a single overall
 +progress line:
++
 +```sh
 +rsync -aHX --info=progress2 \
 +  /var/lib/postgresql/16/main/ \
 +  /data/pgdata/
 +```
++
 +Verify the copy looks right:
++
 +```sh
 +ls -la /data/pgdata/ | head
 +diff <(cd /var/lib/postgresql/16/main && find . -printf '%p %s %m %u:%g\n' | sort) \
 +     <(cd /data/pgdata               && find . -printf '%p %s %m %u:%g\n' | sort) | head
 +# Empty diff = byte-identical layout.
 +```
++
 +### 5. Repoint the cluster
++
 +Edit the active config:
++
 +```sh
 +sed -i.bak "s|^data_directory = .*|data_directory = '/data/pgdata'|" \
 +  /etc/postgresql/16/main/postgresql.conf
 +grep ^data_directory /etc/postgresql/16/main/postgresql.conf
 +# Expect: data_directory = '/data/pgdata'
 +```
++
 +(`.bak` lets you `cp postgresql.conf.bak postgresql.conf` to
 +revert in 1 second if needed.)
++
 +### 6. Start postgres on the new path
++
 +```sh
 +systemctl start postgresql@16-main
 +sleep 2
 +systemctl is-active postgresql@16-main      # active
 +sudo -u postgres pg_isready -h /var/run/postgresql
 +sudo -u postgres psql -tAc "SHOW data_directory;"   # should print /data/pgdata
 +sudo -u postgres psql -d shithub -tAc "SELECT count(*) FROM repos;"
 +sudo -u postgres psql -d shithub -tAc "SELECT count(*) FROM users;"
 +```
++
 +If any of those fail, jump to **Rollback** below.
++
 +### 7. Bring the app back up
++
 +```sh
 +systemctl start shithubd-web
 +systemctl start shithubd-cron 2>/dev/null
 +systemctl is-active shithubd-web
 +curl -fsS -o /dev/null -w '%{http_code}\n' http://127.0.0.1:8080/healthz   # 200
 +```
++
 +### 8. Smoke test
++
 +From your laptop (not the droplet):
++
 +```sh
 +curl -fsS -o /dev/null -w '%{http_code} %{time_total}s\n' https://shithub.sh/
 +# Walk the site briefly: load a repo, view an issue, log in.
 +```
++
 +Confirm the WAL archiver picked up where it left off (next
 +archive timestamp should be > migration time within a couple of
 +minutes):
++
 +```sh
 +ssh root@shithub.sh 'sudo -u postgres psql -tAc "SELECT last_archived_time FROM pg_stat_archiver;"'
 +```
++
 +Run a fresh backup to confirm the new pgdata is durable end-to-end:
++
 +```sh
 +ssh root@shithub.sh /usr/local/bin/shithub-backup-daily
 +```
++
 +### 9. Cleanup (after a few days of healthy operation)
++
 +Don't do this until you've slept on at least one full daily
 +backup cycle from the new location and confirmed it landed in
 +Spaces. Then:
++
 +```sh
 +# Reclaim root-disk space.
 +rm -rf /var/lib/postgresql/16/main.preinit-*
 +# (After confirming /var/lib/postgresql/16/main is empty)
 +rmdir /var/lib/postgresql/16/main 2>/dev/null
 +rm -rf /data/pgdata.preinit-*
 +```
++
 +The systemd unit's `RequiresMountsFor=/var/lib/postgresql/%I`
 +is a static path — if you remove the empty dir, recreate it as
 +`mkdir -p /var/lib/postgresql/16/main && chown postgres:postgres
 +…` before the next reboot, otherwise `pg_ctlcluster` will refuse
 +to start. Easier: leave the empty dir alone.
++
 +## Rollback
++
 +### Mid-migration (steps 4–6 failed)
++
 +```sh
 +systemctl stop postgresql@16-main
 +cp /etc/postgresql/16/main/postgresql.conf.bak /etc/postgresql/16/main/postgresql.conf
 +mv /data/pgdata /data/pgdata.failed-$(date -u +%Y%m%d_%H%M)
 +mv /data/pgdata.preinit-* /data/pgdata 2>/dev/null   # restore the stale init in case something checks for it
 +systemctl start postgresql@16-main
 +systemctl start shithubd-web
 +```
++
 +The original `/var/lib/postgresql/16/main` was never modified, so
 +postgres comes back up against unchanged data. Total recovery
 +time: ~30 seconds.
++
 +### Worst case (data corruption observed after step 6)
++
 +```sh
 +# 1. Stop everything that talks to the DB.
 +systemctl stop shithubd-web shithubd-cron postgresql@16-main
++
 +# 2. Find the latest dump in Spaces.
 +rclone --config /etc/rclone-shithub.conf --s3-no-check-bucket \
 +       lsl spaces-prod:shithub-backups/daily/ | sort | tail -1
++
 +# 3. Drop and recreate the cluster from the dump. See restore.md
 +#    for the full pg_restore procedure.
 +```
++
 +Or, faster:
++
 +### Nuclear option
++
 +Restore the droplet from the DO snapshot taken in preconditions.
 +Loses any user activity since the snapshot — usually that's
 +"the last few minutes" since the snapshot is taken right before
 +migration. Coordinate via status page if you do this.
++
 +## Why this layout
++
 +- **Root disk is 77 GB**; pgdata growth is unbounded. A
 +  runaway query log or WAL spike can fill it and freeze the
 +  whole droplet, including sshd.
 +- **Block volume is 100 GB**, separately managed, snapshotable
 +  in DO independently of the droplet, and detachable. If the
 +  droplet is unrecoverable, attaching the volume to a fresh
 +  droplet recovers state in minutes.
 +- **Repos and tmp already live on `/data`**. Moving pgdata
 +  finishes the layout the volume was provisioned for.