shithub Public

Watch 1 Fork 0 Star 0

markdown · 7766 bytes Raw Blame History

Move Postgres data from root disk to block volume

One-time migration. After this lands, the droplet's root disk holds only the OS + binaries; all stateful data (pgdata, repos, tmp) lives on the attached block volume mounted at /data. The goal is twofold: (1) the root disk can never fill up from runaway DB growth and OOM the system, and (2) the volume can be detached and reattached to a replacement droplet without losing state if the host ever needs replacing.

Preconditions (verify before scheduling)

Block volume mounted at /data with enough free space for pgdata + headroom for several months of growth. Check:
```
df -h /data
```

/data/pgdata does not contain a live cluster. It may contain a stale initdb from when the volume was first provisioned — that's fine and we'll move it aside.

ls -la /data/pgdata
# Expect a PG_VERSION file dated to volume-attach time, NOT
# to "minutes ago". If "minutes ago", STOP — something is
# already running there.

Recent dump landed in Spaces (not just locally). Worst-case rollback is restoring this dump:

rclone --config /etc/rclone-shithub.conf --s3-no-check-bucket \
       lsl spaces-prod:shithub-backups/daily/$(date -u +%Y/%m/%d)/ | tail -3

If today's directory is empty, run a fresh dump first:

/usr/local/bin/shithub-backup-daily

WAL archiver is healthy — gives PITR coverage for changes between the last dump and migration time:

sudo -u postgres psql -tAc "SELECT last_archived_time, last_failed_time FROM pg_stat_archiver;"
# last_archived_time should be < 5 min ago, last_failed_time blank or older

DigitalOcean snapshot taken via the DO dashboard (Droplets → shithub-prod → Snapshots → Take Snapshot). This is the panic-button rollback if everything else fails. Snapshots take a few minutes; DON'T start the migration until the snapshot completes.
Notice users — site will be down for ~3 minutes for a ~115 MB pgdata. Scale the window with current du -sh /var/lib/postgresql/16/main.

Migration

Total downtime: ~3 minutes for a 100 MB DB. Most of that is postgres clean-shutdown + start; the rsync itself is seconds.

ssh root@shithub.sh

1. Drain writes

systemctl stop shithubd-web
systemctl stop shithubd-cron 2>/dev/null   # if running

Verify nothing else has DB sessions open before stopping postgres (background workers, manual psql sessions):

sudo -u postgres psql -tAc "SELECT pid, application_name, client_addr, state FROM pg_stat_activity WHERE datname = 'shithub';"

If this returns rows besides your own psql, kill those processes first.

2. Stop postgres

systemctl stop postgresql@16-main
systemctl status postgresql@16-main --no-pager | head -5  # should be "inactive (dead)"

3. Rename the stale pre-init aside (don't delete)

Keeping it lets us undo step 4 instantly if something looks off:

mv /data/pgdata /data/pgdata.preinit-$(date -u +%Y%m%d)

4. Copy live data to the volume

rsync -aHX --info=progress2 preserves perms, owners, hard links, and xattrs. --info=progress2 shows a single overall progress line:

rsync -aHX --info=progress2 \
  /var/lib/postgresql/16/main/ \
  /data/pgdata/

Verify the copy looks right:

ls -la /data/pgdata/ | head
diff <(cd /var/lib/postgresql/16/main && find . -printf '%p %s %m %u:%g\n' | sort) \
     <(cd /data/pgdata               && find . -printf '%p %s %m %u:%g\n' | sort) | head
# Empty diff = byte-identical layout.

5. Repoint the cluster

Edit the active config:

sed -i.bak "s|^data_directory = .*|data_directory = '/data/pgdata'|" \
  /etc/postgresql/16/main/postgresql.conf
grep ^data_directory /etc/postgresql/16/main/postgresql.conf
# Expect: data_directory = '/data/pgdata'

(.bak lets you cp postgresql.conf.bak postgresql.conf to revert in 1 second if needed.)

6. Start postgres on the new path

systemctl start postgresql@16-main
sleep 2
systemctl is-active postgresql@16-main      # active
sudo -u postgres pg_isready -h /var/run/postgresql
sudo -u postgres psql -tAc "SHOW data_directory;"   # should print /data/pgdata
sudo -u postgres psql -d shithub -tAc "SELECT count(*) FROM repos;"
sudo -u postgres psql -d shithub -tAc "SELECT count(*) FROM users;"

If any of those fail, jump to Rollback below.

7. Bring the app back up

systemctl start shithubd-web
systemctl start shithubd-cron 2>/dev/null
systemctl is-active shithubd-web
curl -fsS -o /dev/null -w '%{http_code}\n' http://127.0.0.1:8080/healthz   # 200

8. Smoke test

From your laptop (not the droplet):

curl -fsS -o /dev/null -w '%{http_code} %{time_total}s\n' https://shithub.sh/
# Walk the site briefly: load a repo, view an issue, log in.

Confirm the WAL archiver picked up where it left off (next archive timestamp should be > migration time within a couple of minutes):

ssh root@shithub.sh 'sudo -u postgres psql -tAc "SELECT last_archived_time FROM pg_stat_archiver;"'

Run a fresh backup to confirm the new pgdata is durable end-to-end:

ssh root@shithub.sh /usr/local/bin/shithub-backup-daily

9. Cleanup (after a few days of healthy operation)

Don't do this until you've slept on at least one full daily backup cycle from the new location and confirmed it landed in Spaces. Then:

# Reclaim root-disk space.
rm -rf /var/lib/postgresql/16/main.preinit-*
# (After confirming /var/lib/postgresql/16/main is empty)
rmdir /var/lib/postgresql/16/main 2>/dev/null
rm -rf /data/pgdata.preinit-*

The systemd unit's RequiresMountsFor=/var/lib/postgresql/%I is a static path — if you remove the empty dir, recreate it as mkdir -p /var/lib/postgresql/16/main && chown postgres:postgres … before the next reboot, otherwise pg_ctlcluster will refuse to start. Easier: leave the empty dir alone.

Rollback

Mid-migration (steps 4–6 failed)

systemctl stop postgresql@16-main
cp /etc/postgresql/16/main/postgresql.conf.bak /etc/postgresql/16/main/postgresql.conf
mv /data/pgdata /data/pgdata.failed-$(date -u +%Y%m%d_%H%M)
mv /data/pgdata.preinit-* /data/pgdata 2>/dev/null   # restore the stale init in case something checks for it
systemctl start postgresql@16-main
systemctl start shithubd-web

The original /var/lib/postgresql/16/main was never modified, so postgres comes back up against unchanged data. Total recovery time: ~30 seconds.

Worst case (data corruption observed after step 6)

# 1. Stop everything that talks to the DB.
systemctl stop shithubd-web shithubd-cron postgresql@16-main

# 2. Find the latest dump in Spaces.
rclone --config /etc/rclone-shithub.conf --s3-no-check-bucket \
       lsl spaces-prod:shithub-backups/daily/ | sort | tail -1

# 3. Drop and recreate the cluster from the dump. See restore.md
#    for the full pg_restore procedure.

Or, faster:

Nuclear option

Restore the droplet from the DO snapshot taken in preconditions. Loses any user activity since the snapshot — usually that's "the last few minutes" since the snapshot is taken right before migration. Coordinate via status page if you do this.

Why this layout

Root disk is 77 GB; pgdata growth is unbounded. A runaway query log or WAL spike can fill it and freeze the whole droplet, including sshd.
Block volume is 100 GB, separately managed, snapshotable in DO independently of the droplet, and detachable. If the droplet is unrecoverable, attaching the volume to a fresh droplet recovers state in minutes.
Repos and tmp already live on /data. Moving pgdata finishes the layout the volume was provisioned for.

View source

  
        1
        # Move Postgres data from root disk to block volume
      
        2
        
        3
        One-time migration. After this lands, the droplet's root disk
      
        4
        holds only the OS + binaries; all stateful data (pgdata, repos,
      
        5
        tmp) lives on the attached block volume mounted at `/data`. The
      
        6
        goal is twofold: (1) the root disk can never fill up from
      
        7
        runaway DB growth and OOM the system, and (2) the volume can be
      
        8
        detached and reattached to a replacement droplet without losing
      
        9
        state if the host ever needs replacing.
      
        10
        
        11
        ## Preconditions (verify before scheduling)
      
        12
        
        13
        - **Block volume mounted at `/data`** with enough free space for
      
        14
          pgdata + headroom for several months of growth. Check:
      
        15
          ```sh
      
        16
          df -h /data
      
        17
          ```
      
        18
        - **`/data/pgdata` does not contain a live cluster.** It may
      
        19
          contain a stale `initdb` from when the volume was first
      
        20
          provisioned — that's fine and we'll move it aside.
      
        21
          ```sh
      
        22
          ls -la /data/pgdata
      
        23
          # Expect a PG_VERSION file dated to volume-attach time, NOT
      
        24
          # to "minutes ago". If "minutes ago", STOP — something is
      
        25
          # already running there.
      
        26
          ```
      
        27
        - **Recent dump landed in Spaces** (not just locally). Worst-case
      
        28
          rollback is restoring this dump:
      
        29
          ```sh
      
        30
          rclone --config /etc/rclone-shithub.conf --s3-no-check-bucket \
      
        31
                 lsl spaces-prod:shithub-backups/daily/$(date -u +%Y/%m/%d)/ | tail -3
      
        32
          ```
      
        33
          If today's directory is empty, run a fresh dump first:
      
        34
          ```sh
      
        35
          /usr/local/bin/shithub-backup-daily
      
        36
          ```
      
        37
        - **WAL archiver is healthy** — gives PITR coverage for changes
      
        38
          between the last dump and migration time:
      
        39
          ```sh
      
        40
          sudo -u postgres psql -tAc "SELECT last_archived_time, last_failed_time FROM pg_stat_archiver;"
      
        41
          # last_archived_time should be < 5 min ago, last_failed_time blank or older
      
        42
          ```
      
        43
        - **DigitalOcean snapshot taken** via the DO dashboard
      
        44
          (Droplets → shithub-prod → Snapshots → Take Snapshot). This
      
        45
          is the panic-button rollback if everything else fails.
      
        46
          Snapshots take a few minutes; DON'T start the migration
      
        47
          until the snapshot completes.
      
        48
        - **Notice users** — site will be down for ~3 minutes for a
      
        49
          ~115 MB pgdata. Scale the window with current `du -sh
      
        50
          /var/lib/postgresql/16/main`.
      
        51
        
        52
        ## Migration
      
        53
        
        54
        Total downtime: ~3 minutes for a 100 MB DB. Most of that is
      
        55
        postgres clean-shutdown + start; the rsync itself is seconds.
      
        56
        
        57
        ```sh
      
        58
        ssh root@shithub.sh
      
        59
        ```
      
        60
        
        61
        ### 1. Drain writes
      
        62
        
        63
        ```sh
      
        64
        systemctl stop shithubd-web
      
        65
        systemctl stop shithubd-cron 2>/dev/null   # if running
      
        66
        ```
      
        67
        
        68
        Verify nothing else has DB sessions open before stopping
      
        69
        postgres (background workers, manual psql sessions):
      
        70
        
        71
        ```sh
      
        72
        sudo -u postgres psql -tAc "SELECT pid, application_name, client_addr, state FROM pg_stat_activity WHERE datname = 'shithub';"
      
        73
        ```
      
        74
        
        75
        If this returns rows besides your own psql, kill those processes
      
        76
        first.
      
        77
        
        78
        ### 2. Stop postgres
      
        79
        
        80
        ```sh
      
        81
        systemctl stop postgresql@16-main
      
        82
        systemctl status postgresql@16-main --no-pager | head -5  # should be "inactive (dead)"
      
        83
        ```
      
        84
        
        85
        ### 3. Rename the stale pre-init aside (don't delete)
      
        86
        
        87
        Keeping it lets us undo step 4 instantly if something looks off:
      
        88
        
        89
        ```sh
      
        90
        mv /data/pgdata /data/pgdata.preinit-$(date -u +%Y%m%d)
      
        91
        ```
      
        92
        
        93
        ### 4. Copy live data to the volume
      
        94
        
        95
        `rsync -aHX --info=progress2` preserves perms, owners, hard
      
        96
        links, and xattrs. `--info=progress2` shows a single overall
      
        97
        progress line:
      
        98
        
        99
        ```sh
      
        100
        rsync -aHX --info=progress2 \
      
        101
          /var/lib/postgresql/16/main/ \
      
        102
          /data/pgdata/
      
        103
        ```
      
        104
        
        105
        Verify the copy looks right:
      
        106
        
        107
        ```sh
      
        108
        ls -la /data/pgdata/ | head
      
        109
        diff <(cd /var/lib/postgresql/16/main && find . -printf '%p %s %m %u:%g\n' | sort) \
      
        110
             <(cd /data/pgdata               && find . -printf '%p %s %m %u:%g\n' | sort) | head
      
        111
        # Empty diff = byte-identical layout.
      
        112
        ```
      
        113
        
        114
        ### 5. Repoint the cluster
      
        115
        
        116
        Edit the active config:
      
        117
        
        118
        ```sh
      
        119
        sed -i.bak "s|^data_directory = .*|data_directory = '/data/pgdata'|" \
      
        120
          /etc/postgresql/16/main/postgresql.conf
      
        121
        grep ^data_directory /etc/postgresql/16/main/postgresql.conf
      
        122
        # Expect: data_directory = '/data/pgdata'
      
        123
        ```
      
        124
        
        125
        (`.bak` lets you `cp postgresql.conf.bak postgresql.conf` to
      
        126
        revert in 1 second if needed.)
      
        127
        
        128
        ### 6. Start postgres on the new path
      
        129
        
        130
        ```sh
      
        131
        systemctl start postgresql@16-main
      
        132
        sleep 2
      
        133
        systemctl is-active postgresql@16-main      # active
      
        134
        sudo -u postgres pg_isready -h /var/run/postgresql
      
        135
        sudo -u postgres psql -tAc "SHOW data_directory;"   # should print /data/pgdata
      
        136
        sudo -u postgres psql -d shithub -tAc "SELECT count(*) FROM repos;"
      
        137
        sudo -u postgres psql -d shithub -tAc "SELECT count(*) FROM users;"
      
        138
        ```
      
        139
        
        140
        If any of those fail, jump to **Rollback** below.
      
        141
        
        142
        ### 7. Bring the app back up
      
        143
        
        144
        ```sh
      
        145
        systemctl start shithubd-web
      
        146
        systemctl start shithubd-cron 2>/dev/null
      
        147
        systemctl is-active shithubd-web
      
        148
        curl -fsS -o /dev/null -w '%{http_code}\n' http://127.0.0.1:8080/healthz   # 200
      
        149
        ```
      
        150
        
        151
        ### 8. Smoke test
      
        152
        
        153
        From your laptop (not the droplet):
      
        154
        
        155
        ```sh
      
        156
        curl -fsS -o /dev/null -w '%{http_code} %{time_total}s\n' https://shithub.sh/
      
        157
        # Walk the site briefly: load a repo, view an issue, log in.
      
        158
        ```
      
        159
        
        160
        Confirm the WAL archiver picked up where it left off (next
      
        161
        archive timestamp should be > migration time within a couple of
      
        162
        minutes):
      
        163
        
        164
        ```sh
      
        165
        ssh root@shithub.sh 'sudo -u postgres psql -tAc "SELECT last_archived_time FROM pg_stat_archiver;"'
      
        166
        ```
      
        167
        
        168
        Run a fresh backup to confirm the new pgdata is durable end-to-end:
      
        169
        
        170
        ```sh
      
        171
        ssh root@shithub.sh /usr/local/bin/shithub-backup-daily
      
        172
        ```
      
        173
        
        174
        ### 9. Cleanup (after a few days of healthy operation)
      
        175
        
        176
        Don't do this until you've slept on at least one full daily
      
        177
        backup cycle from the new location and confirmed it landed in
      
        178
        Spaces. Then:
      
        179
        
        180
        ```sh
      
        181
        # Reclaim root-disk space.
      
        182
        rm -rf /var/lib/postgresql/16/main.preinit-*
      
        183
        # (After confirming /var/lib/postgresql/16/main is empty)
      
        184
        rmdir /var/lib/postgresql/16/main 2>/dev/null
      
        185
        rm -rf /data/pgdata.preinit-*
      
        186
        ```
      
        187
        
        188
        The systemd unit's `RequiresMountsFor=/var/lib/postgresql/%I`
      
        189
        is a static path — if you remove the empty dir, recreate it as
      
        190
        `mkdir -p /var/lib/postgresql/16/main && chown postgres:postgres
      
        191
        …` before the next reboot, otherwise `pg_ctlcluster` will refuse
      
        192
        to start. Easier: leave the empty dir alone.
      
        193
        
        194
        ## Rollback
      
        195
        
        196
        ### Mid-migration (steps 4–6 failed)
      
        197
        
        198
        ```sh
      
        199
        systemctl stop postgresql@16-main
      
        200
        cp /etc/postgresql/16/main/postgresql.conf.bak /etc/postgresql/16/main/postgresql.conf
      
        201
        mv /data/pgdata /data/pgdata.failed-$(date -u +%Y%m%d_%H%M)
      
        202
        mv /data/pgdata.preinit-* /data/pgdata 2>/dev/null   # restore the stale init in case something checks for it
      
        203
        systemctl start postgresql@16-main
      
        204
        systemctl start shithubd-web
      
        205
        ```
      
        206
        
        207
        The original `/var/lib/postgresql/16/main` was never modified, so
      
        208
        postgres comes back up against unchanged data. Total recovery
      
        209
        time: ~30 seconds.
      
        210
        
        211
        ### Worst case (data corruption observed after step 6)
      
        212
        
        213
        ```sh
      
        214
        # 1. Stop everything that talks to the DB.
      
        215
        systemctl stop shithubd-web shithubd-cron postgresql@16-main
      
        216
        
        217
        # 2. Find the latest dump in Spaces.
      
        218
        rclone --config /etc/rclone-shithub.conf --s3-no-check-bucket \
      
        219
               lsl spaces-prod:shithub-backups/daily/ | sort | tail -1
      
        220
        
        221
        # 3. Drop and recreate the cluster from the dump. See restore.md
      
        222
        #    for the full pg_restore procedure.
      
        223
        ```
      
        224
        
        225
        Or, faster:
      
        226
        
        227
        ### Nuclear option
      
        228
        
        229
        Restore the droplet from the DO snapshot taken in preconditions.
      
        230
        Loses any user activity since the snapshot — usually that's
      
        231
        "the last few minutes" since the snapshot is taken right before
      
        232
        migration. Coordinate via status page if you do this.
      
        233
        
        234
        ## Why this layout
      
        235
        
        236
        - **Root disk is 77 GB**; pgdata growth is unbounded. A
      
        237
          runaway query log or WAL spike can fill it and freeze the
      
        238
          whole droplet, including sshd.
      
        239
        - **Block volume is 100 GB**, separately managed, snapshotable
      
        240
          in DO independently of the droplet, and detachable. If the
      
        241
          droplet is unrecoverable, attaching the volume to a fresh
      
        242
          droplet recovers state in minutes.
      
        243
        - **Repos and tmp already live on `/data`**. Moving pgdata
      
        244
          finishes the layout the volume was provisioned for.

1	# Move Postgres data from root disk to block volume
2
3	One-time migration. After this lands, the droplet's root disk
4	holds only the OS + binaries; all stateful data (pgdata, repos,
5	tmp) lives on the attached block volume mounted at `/data`. The
6	goal is twofold: (1) the root disk can never fill up from
7	runaway DB growth and OOM the system, and (2) the volume can be
8	detached and reattached to a replacement droplet without losing
9	state if the host ever needs replacing.
10
11	## Preconditions (verify before scheduling)
12
13	- Block volume mounted at `/data` with enough free space for
14	pgdata + headroom for several months of growth. Check:
15	```sh
16	df -h /data
17	```
18	- `/data/pgdata` does not contain a live cluster. It may
19	contain a stale `initdb` from when the volume was first
20	provisioned — that's fine and we'll move it aside.
21	```sh
22	ls -la /data/pgdata
23	# Expect a PG_VERSION file dated to volume-attach time, NOT
24	# to "minutes ago". If "minutes ago", STOP — something is
25	# already running there.
26	```
27	- Recent dump landed in Spaces (not just locally). Worst-case
28	rollback is restoring this dump:
29	```sh
30	rclone --config /etc/rclone-shithub.conf --s3-no-check-bucket \
31	lsl spaces-prod:shithub-backups/daily/$(date -u +%Y/%m/%d)/ \| tail -3
32	```
33	If today's directory is empty, run a fresh dump first:
34	```sh
35	/usr/local/bin/shithub-backup-daily
36	```
37	- WAL archiver is healthy — gives PITR coverage for changes
38	between the last dump and migration time:
39	```sh
40	sudo -u postgres psql -tAc "SELECT last_archived_time, last_failed_time FROM pg_stat_archiver;"
41	# last_archived_time should be < 5 min ago, last_failed_time blank or older
42	```
43	- DigitalOcean snapshot taken via the DO dashboard
44	(Droplets → shithub-prod → Snapshots → Take Snapshot). This
45	is the panic-button rollback if everything else fails.
46	Snapshots take a few minutes; DON'T start the migration
47	until the snapshot completes.
48	- Notice users — site will be down for ~3 minutes for a
49	~115 MB pgdata. Scale the window with current `du -sh
50	/var/lib/postgresql/16/main`.
51
52	## Migration
53
54	Total downtime: ~3 minutes for a 100 MB DB. Most of that is
55	postgres clean-shutdown + start; the rsync itself is seconds.
56
57	```sh
58	ssh root@shithub.sh
59	```
60
61	### 1. Drain writes
62
63	```sh
64	systemctl stop shithubd-web
65	systemctl stop shithubd-cron 2>/dev/null # if running
66	```
67
68	Verify nothing else has DB sessions open before stopping
69	postgres (background workers, manual psql sessions):
70
71	```sh
72	sudo -u postgres psql -tAc "SELECT pid, application_name, client_addr, state FROM pg_stat_activity WHERE datname = 'shithub';"
73	```
74
75	If this returns rows besides your own psql, kill those processes
76	first.
77
78	### 2. Stop postgres
79
80	```sh
81	systemctl stop postgresql@16-main
82	systemctl status postgresql@16-main --no-pager \| head -5 # should be "inactive (dead)"
83	```
84
85	### 3. Rename the stale pre-init aside (don't delete)
86
87	Keeping it lets us undo step 4 instantly if something looks off:
88
89	```sh
90	mv /data/pgdata /data/pgdata.preinit-$(date -u +%Y%m%d)
91	```
92
93	### 4. Copy live data to the volume
94
95	`rsync -aHX --info=progress2` preserves perms, owners, hard
96	links, and xattrs. `--info=progress2` shows a single overall
97	progress line:
98
99	```sh
100	rsync -aHX --info=progress2 \
101	/var/lib/postgresql/16/main/ \
102	/data/pgdata/
103	```
104
105	Verify the copy looks right:
106
107	```sh
108	ls -la /data/pgdata/ \| head
109	diff <(cd /var/lib/postgresql/16/main && find . -printf '%p %s %m %u:%g\n' \| sort) \
110	<(cd /data/pgdata && find . -printf '%p %s %m %u:%g\n' \| sort) \| head
111	# Empty diff = byte-identical layout.
112	```
113
114	### 5. Repoint the cluster
115
116	Edit the active config:
117
118	```sh
119	sed -i.bak "s\|^data_directory = .*\|data_directory = '/data/pgdata'\|" \
120	/etc/postgresql/16/main/postgresql.conf
121	grep ^data_directory /etc/postgresql/16/main/postgresql.conf
122	# Expect: data_directory = '/data/pgdata'
123	```
124
125	(`.bak` lets you `cp postgresql.conf.bak postgresql.conf` to
126	revert in 1 second if needed.)
127
128	### 6. Start postgres on the new path
129
130	```sh
131	systemctl start postgresql@16-main
132	sleep 2
133	systemctl is-active postgresql@16-main # active
134	sudo -u postgres pg_isready -h /var/run/postgresql
135	sudo -u postgres psql -tAc "SHOW data_directory;" # should print /data/pgdata
136	sudo -u postgres psql -d shithub -tAc "SELECT count(*) FROM repos;"
137	sudo -u postgres psql -d shithub -tAc "SELECT count(*) FROM users;"
138	```
139
140	If any of those fail, jump to Rollback below.
141
142	### 7. Bring the app back up
143
144	```sh
145	systemctl start shithubd-web
146	systemctl start shithubd-cron 2>/dev/null
147	systemctl is-active shithubd-web
148	curl -fsS -o /dev/null -w '%{http_code}\n' http://127.0.0.1:8080/healthz # 200
149	```
150
151	### 8. Smoke test
152
153	From your laptop (not the droplet):
154
155	```sh
156	curl -fsS -o /dev/null -w '%{http_code} %{time_total}s\n' https://shithub.sh/
157	# Walk the site briefly: load a repo, view an issue, log in.
158	```
159
160	Confirm the WAL archiver picked up where it left off (next
161	archive timestamp should be > migration time within a couple of
162	minutes):
163
164	```sh
165	ssh root@shithub.sh 'sudo -u postgres psql -tAc "SELECT last_archived_time FROM pg_stat_archiver;"'
166	```
167
168	Run a fresh backup to confirm the new pgdata is durable end-to-end:
169
170	```sh
171	ssh root@shithub.sh /usr/local/bin/shithub-backup-daily
172	```
173
174	### 9. Cleanup (after a few days of healthy operation)
175
176	Don't do this until you've slept on at least one full daily
177	backup cycle from the new location and confirmed it landed in
178	Spaces. Then:
179
180	```sh
181	# Reclaim root-disk space.
182	rm -rf /var/lib/postgresql/16/main.preinit-*
183	# (After confirming /var/lib/postgresql/16/main is empty)
184	rmdir /var/lib/postgresql/16/main 2>/dev/null
185	rm -rf /data/pgdata.preinit-*
186	```
187
188	The systemd unit's `RequiresMountsFor=/var/lib/postgresql/%I`
189	is a static path — if you remove the empty dir, recreate it as
190	`mkdir -p /var/lib/postgresql/16/main && chown postgres:postgres
191	…` before the next reboot, otherwise `pg_ctlcluster` will refuse
192	to start. Easier: leave the empty dir alone.
193
194	## Rollback
195
196	### Mid-migration (steps 4–6 failed)
197
198	```sh
199	systemctl stop postgresql@16-main
200	cp /etc/postgresql/16/main/postgresql.conf.bak /etc/postgresql/16/main/postgresql.conf
201	mv /data/pgdata /data/pgdata.failed-$(date -u +%Y%m%d_%H%M)
202	mv /data/pgdata.preinit-* /data/pgdata 2>/dev/null # restore the stale init in case something checks for it
203	systemctl start postgresql@16-main
204	systemctl start shithubd-web
205	```
206
207	The original `/var/lib/postgresql/16/main` was never modified, so
208	postgres comes back up against unchanged data. Total recovery
209	time: ~30 seconds.
210
211	### Worst case (data corruption observed after step 6)
212
213	```sh
214	# 1. Stop everything that talks to the DB.
215	systemctl stop shithubd-web shithubd-cron postgresql@16-main
216
217	# 2. Find the latest dump in Spaces.
218	rclone --config /etc/rclone-shithub.conf --s3-no-check-bucket \
219	lsl spaces-prod:shithub-backups/daily/ \| sort \| tail -1
220
221	# 3. Drop and recreate the cluster from the dump. See restore.md
222	# for the full pg_restore procedure.
223	```
224
225	Or, faster:
226
227	### Nuclear option
228
229	Restore the droplet from the DO snapshot taken in preconditions.
230	Loses any user activity since the snapshot — usually that's
231	"the last few minutes" since the snapshot is taken right before
232	migration. Coordinate via status page if you do this.
233
234	## Why this layout
235
236	- Root disk is 77 GB; pgdata growth is unbounded. A
237	runaway query log or WAL spike can fill it and freeze the
238	whole droplet, including sshd.
239	- Block volume is 100 GB, separately managed, snapshotable
240	in DO independently of the droplet, and detachable. If the
241	droplet is unrecoverable, attaching the volume to a fresh
242	droplet recovers state in minutes.
243	- Repos and tmp already live on `/data`. Moving pgdata
244	finishes the layout the volume was provisioned for.