markdown · 5509 bytes Raw Blame History

Cutover checklist

The S40 launch checklist. Walk it top-to-bottom on cutover day; do not skip steps. Each box has a verification command or a visual check.

Time-box. A clean run is ~45 min from "ssh in" to "signup open." Budget 90 min; stop and back out if you hit ~2 hours.

T-7 days

  • DNS A/AAAA for shithub.sh published with low TTL (300s) so cutover-day changes propagate fast. Verify: dig +short A shithub.sh.
  • DNS CNAME for docs.shithub.sh published.
  • Postmark domain verified; SPF/DKIM/DMARC aligned. Verify: Postmark dashboard → Domains → green.
  • Signup-throttle config reviewed; per-IP and per-/24 ceilings tuned for the announcement bump.
  • Monitoring alerts wired to the on-call's Telegram + SMS. Test by triggering a synthetic BackupOverdue alert via Alertmanager API and confirming it pages.
  • Rollback rehearsed on staging: git checkout v0.999 && make deploy ANSIBLE_INVENTORY=staging.

T-48 hours

  • Last DNS change committed. Cutover after 48h ensures no propagation lag.
  • S37 backup-restore drill green within last 7 days.
  • S38 docs deploy verified; https://docs.shithub.sh/ returns 200.
  • S39 P0/P1 bugs closed.
  • Tag the release commit: sh git tag -a v0.1.0 -m "v0.1.0 — launch" git push origin v0.1.0

T-1 hour

  • On-call has phone + laptop reachable.
  • Status page updated to "Cutover in progress" (manual edit to docs/public/status.md, push, sync to docs bucket).
  • caddy_use_acme_staging=false in production inventory (so the cutover doesn't accidentally fall back to LE staging).

T-0: cutover

# 1. Pull the v0.1.0 tag.
git fetch --tags
git checkout v0.1.0

# 2. Dry-run to confirm exactly what will change.
make deploy-check ANSIBLE_INVENTORY=production

# 3. Apply. Expect ~10s downtime as the web service restarts.
make deploy ANSIBLE_INVENTORY=production

The Ansible run includes shithubd migrate up as the web service's ExecStartPre. New migrations run as part of the restart; the unit stays in activating until they complete.

Watch:

ssh web-01
journalctl -fu shithubd-web

Smoke

Run the smoke script as soon as the deploy reports ok=N changed=N failed=0:

deploy/cutover/smoke.sh https://shithub.sh

The script exercises: home page, signup form, login form, health endpoints, docs subdomain, a representative API call. Exits non-zero on any 5xx or unexpected response shape.

Bootstrap-admin

ssh web-01
sudo -u shithub /usr/local/bin/shithubd admin bootstrap-admin \
     --email you@yourdomain

The CLI prints a one-time password-reset link. Open in a browser, set a password, immediately enable 2FA (Settings → Account security).

Open signup

If signup was gated behind a feature flag during the pre-launch build:

ssh web-01
sudo systemctl edit shithubd-web --full
# remove SHITHUB_AUTH__SIGNUP_DISABLED=true (or set to false)
sudo systemctl restart shithubd-web

Otherwise signup is already on; verify via the signup form returning 200 + a valid CSRF token.

Mirror to GitHub

Set up the one-way mirror so the GitHub mirror keeps receiving pushes:

# On the web host, as the shithub user:
cd /data/repos/shithub/shithub.git
git remote add github https://github.com/tenseleyFlow/shithub.git
# Add the mirror push to the periodic worker job (covered by
# the worker config; the mirror job kind = "git.mirror_push").

Confirm a test push lands on both:

git clone https://shithub.sh/shithub/shithub.git /tmp/test-clone
cd /tmp/test-clone
echo "launch test" >> .launch-test
git add .launch-test
git commit -m "launch smoke push"
git push origin trunk
# Wait ~60s for the mirror job to run, then confirm on GitHub:
git ls-remote https://github.com/tenseleyFlow/shithub.git trunk

Status page

Update docs/public/status.md to "All systems normal." with the current timestamp; push, sync to docs bucket.

Announcement

Schedule the announcement post for Tuesday 09:00 ET (or your chosen window). Submit to:

  • Hacker News: title + URL only; first comment is the "What is shithub?" intro.
  • /r/programming, /r/selfhosted: link + summary, follow subreddit rules.
  • lobste.rs: title + URL.
  • Mastodon: short post + link.

Have the FAQ tab open; expect "is this Forgejo?" / "why not Codeberg?" / "where's CI?" within the first hour.

Day-zero monitoring

For the first 24h:

  • Refresh Grafana every 30 min.
  • Triage every alert immediately; nothing false-positive should page in week 1 (we tuned for it).
  • Bug reports go to https://shithub.sh/shithub/shithub/issues (the project's own self-hosted issues — drink your own champagne).

Backout

If cutover goes sideways within the first hour:

  1. Stop the bleed. Put the site in read-only mode (docs/internal/runbooks/read-only-mode.md).
  2. Decide: roll back code, restore data, or wait?
  3. If rolling back code: deploy/cutover/rollback.sh v0.999.
  4. Status page → "Investigating" with what we know.
  5. Page the operator (yourself, by definition).

The 24h SLO is "report what we know, not promises about when it's fixed." Honesty wins trust; deadlines under stress lose it.

Day-one retro

After the first 24h, fill in docs/internal/retro/v0.1.0.md with: what worked, what surprised us, top 3 user-reported issues, and the next sprint's focus.

View source
1 # Cutover checklist
2
3 The S40 launch checklist. Walk it top-to-bottom on cutover day;
4 do not skip steps. Each box has a verification command or a
5 visual check.
6
7 > **Time-box.** A clean run is ~45 min from "ssh in" to "signup
8 > open." Budget 90 min; stop and back out if you hit ~2 hours.
9
10 ## T-7 days
11
12 - [ ] DNS A/AAAA for `shithub.sh` published with low TTL
13 (300s) so cutover-day changes propagate fast. Verify:
14 `dig +short A shithub.sh`.
15 - [ ] DNS CNAME for `docs.shithub.sh` published.
16 - [ ] Postmark domain verified; SPF/DKIM/DMARC aligned. Verify:
17 Postmark dashboard → Domains → green.
18 - [ ] Signup-throttle config reviewed; per-IP and per-/24
19 ceilings tuned for the announcement bump.
20 - [ ] Monitoring alerts wired to the on-call's Telegram + SMS.
21 Test by triggering a synthetic `BackupOverdue` alert via
22 Alertmanager API and confirming it pages.
23 - [ ] Rollback rehearsed on staging:
24 `git checkout v0.999 && make deploy ANSIBLE_INVENTORY=staging`.
25
26 ## T-48 hours
27
28 - [ ] Last DNS change committed. Cutover after 48h ensures no
29 propagation lag.
30 - [ ] S37 backup-restore drill green within last 7 days.
31 - [ ] S38 docs deploy verified; `https://docs.shithub.sh/`
32 returns 200.
33 - [ ] S39 P0/P1 bugs closed.
34 - [ ] Tag the release commit:
35 ```sh
36 git tag -a v0.1.0 -m "v0.1.0 — launch"
37 git push origin v0.1.0
38 ```
39
40 ## T-1 hour
41
42 - [ ] On-call has phone + laptop reachable.
43 - [ ] Status page updated to "Cutover in progress" (manual edit
44 to `docs/public/status.md`, push, sync to docs bucket).
45 - [ ] `caddy_use_acme_staging=false` in production inventory
46 (so the cutover doesn't accidentally fall back to LE
47 staging).
48
49 ## T-0: cutover
50
51 ```sh
52 # 1. Pull the v0.1.0 tag.
53 git fetch --tags
54 git checkout v0.1.0
55
56 # 2. Dry-run to confirm exactly what will change.
57 make deploy-check ANSIBLE_INVENTORY=production
58
59 # 3. Apply. Expect ~10s downtime as the web service restarts.
60 make deploy ANSIBLE_INVENTORY=production
61 ```
62
63 The Ansible run includes `shithubd migrate up` as the web
64 service's `ExecStartPre`. New migrations run as part of the
65 restart; the unit stays in `activating` until they complete.
66
67 Watch:
68
69 ```sh
70 ssh web-01
71 journalctl -fu shithubd-web
72 ```
73
74 ## Smoke
75
76 Run the smoke script as soon as the deploy reports `ok=N
77 changed=N failed=0`:
78
79 ```sh
80 deploy/cutover/smoke.sh https://shithub.sh
81 ```
82
83 The script exercises: home page, signup form, login form, health
84 endpoints, docs subdomain, a representative API call. Exits
85 non-zero on any 5xx or unexpected response shape.
86
87 ## Bootstrap-admin
88
89 ```sh
90 ssh web-01
91 sudo -u shithub /usr/local/bin/shithubd admin bootstrap-admin \
92 --email you@yourdomain
93 ```
94
95 The CLI prints a one-time password-reset link. Open in a browser,
96 set a password, **immediately enable 2FA** (Settings → Account
97 security).
98
99 ## Open signup
100
101 If signup was gated behind a feature flag during the pre-launch
102 build:
103
104 ```sh
105 ssh web-01
106 sudo systemctl edit shithubd-web --full
107 # remove SHITHUB_AUTH__SIGNUP_DISABLED=true (or set to false)
108 sudo systemctl restart shithubd-web
109 ```
110
111 Otherwise signup is already on; verify via the signup form
112 returning 200 + a valid CSRF token.
113
114 ## Mirror to GitHub
115
116 Set up the one-way mirror so the GitHub mirror keeps receiving
117 pushes:
118
119 ```sh
120 # On the web host, as the shithub user:
121 cd /data/repos/shithub/shithub.git
122 git remote add github https://github.com/tenseleyFlow/shithub.git
123 # Add the mirror push to the periodic worker job (covered by
124 # the worker config; the mirror job kind = "git.mirror_push").
125 ```
126
127 Confirm a test push lands on both:
128
129 ```sh
130 git clone https://shithub.sh/shithub/shithub.git /tmp/test-clone
131 cd /tmp/test-clone
132 echo "launch test" >> .launch-test
133 git add .launch-test
134 git commit -m "launch smoke push"
135 git push origin trunk
136 # Wait ~60s for the mirror job to run, then confirm on GitHub:
137 git ls-remote https://github.com/tenseleyFlow/shithub.git trunk
138 ```
139
140 ## Status page
141
142 Update `docs/public/status.md` to "All systems normal." with the
143 current timestamp; push, sync to docs bucket.
144
145 ## Announcement
146
147 Schedule the announcement post for **Tuesday 09:00 ET** (or your
148 chosen window). Submit to:
149
150 - [ ] Hacker News: title + URL only; first comment is the
151 "What is shithub?" intro.
152 - [ ] /r/programming, /r/selfhosted: link + summary, follow
153 subreddit rules.
154 - [ ] lobste.rs: title + URL.
155 - [ ] Mastodon: short post + link.
156
157 Have the FAQ tab open; expect "is this Forgejo?" / "why not
158 Codeberg?" / "where's CI?" within the first hour.
159
160 ## Day-zero monitoring
161
162 For the first 24h:
163
164 - Refresh Grafana every 30 min.
165 - Triage every alert immediately; nothing false-positive should
166 page in week 1 (we tuned for it).
167 - Bug reports go to `https://shithub.sh/shithub/shithub/issues`
168 (the project's own self-hosted issues — drink your own
169 champagne).
170
171 ## Backout
172
173 If cutover goes sideways within the first hour:
174
175 1. **Stop the bleed.** Put the site in read-only mode
176 (`docs/internal/runbooks/read-only-mode.md`).
177 2. **Decide:** roll back code, restore data, or wait?
178 3. If rolling back code: `deploy/cutover/rollback.sh v0.999`.
179 4. Status page → "Investigating" with what we know.
180 5. Page the operator (yourself, by definition).
181
182 The 24h SLO is "report what we know, not promises about when it's
183 fixed." Honesty wins trust; deadlines under stress lose it.
184
185 ## Day-one retro
186
187 After the first 24h, fill in `docs/internal/retro/v0.1.0.md`
188 with: what worked, what surprised us, top 3 user-reported
189 issues, and the next sprint's focus.