S40: cutover artifacts (checklist + smoke + rollback)
- SHA
824c62d9e6968842e7fb6c2b0336dfd1f9e81bca- Parents
-
902991c - Tree
6fbc533
824c62d
824c62d9e6968842e7fb6c2b0336dfd1f9e81bca902991c
6fbc533| Status | File | + | - |
|---|---|---|---|
| A |
deploy/cutover/checklist.md
|
189 | 0 |
| A |
deploy/cutover/rollback.sh
|
101 | 0 |
| A |
deploy/cutover/smoke.sh
|
100 | 0 |
deploy/cutover/checklist.mdadded@@ -0,0 +1,189 @@ | ||
| 1 | +# Cutover checklist | |
| 2 | + | |
| 3 | +The S40 launch checklist. Walk it top-to-bottom on cutover day; | |
| 4 | +do not skip steps. Each box has a verification command or a | |
| 5 | +visual check. | |
| 6 | + | |
| 7 | +> **Time-box.** A clean run is ~45 min from "ssh in" to "signup | |
| 8 | +> open." Budget 90 min; stop and back out if you hit ~2 hours. | |
| 9 | + | |
| 10 | +## T-7 days | |
| 11 | + | |
| 12 | +- [ ] DNS A/AAAA for `shithub.example` published with low TTL | |
| 13 | + (300s) so cutover-day changes propagate fast. Verify: | |
| 14 | + `dig +short A shithub.example`. | |
| 15 | +- [ ] DNS CNAME for `docs.shithub.example` published. | |
| 16 | +- [ ] Postmark domain verified; SPF/DKIM/DMARC aligned. Verify: | |
| 17 | + Postmark dashboard → Domains → green. | |
| 18 | +- [ ] Signup-throttle config reviewed; per-IP and per-/24 | |
| 19 | + ceilings tuned for the announcement bump. | |
| 20 | +- [ ] Monitoring alerts wired to the on-call's Telegram + SMS. | |
| 21 | + Test by triggering a synthetic `BackupOverdue` alert via | |
| 22 | + Alertmanager API and confirming it pages. | |
| 23 | +- [ ] Rollback rehearsed on staging: | |
| 24 | + `git checkout v0.999 && make deploy ANSIBLE_INVENTORY=staging`. | |
| 25 | + | |
| 26 | +## T-48 hours | |
| 27 | + | |
| 28 | +- [ ] Last DNS change committed. Cutover after 48h ensures no | |
| 29 | + propagation lag. | |
| 30 | +- [ ] S37 backup-restore drill green within last 7 days. | |
| 31 | +- [ ] S38 docs deploy verified; `https://docs.shithub.example/` | |
| 32 | + returns 200. | |
| 33 | +- [ ] S39 P0/P1 bugs closed. | |
| 34 | +- [ ] Tag the release commit: | |
| 35 | + ```sh | |
| 36 | + git tag -a v1.0.0 -m "v1.0.0 — launch" | |
| 37 | + git push origin v1.0.0 | |
| 38 | + ``` | |
| 39 | + | |
| 40 | +## T-1 hour | |
| 41 | + | |
| 42 | +- [ ] On-call has phone + laptop reachable. | |
| 43 | +- [ ] Status page updated to "Cutover in progress" (manual edit | |
| 44 | + to `docs/public/status.md`, push, sync to docs bucket). | |
| 45 | +- [ ] `caddy_use_acme_staging=false` in production inventory | |
| 46 | + (so the cutover doesn't accidentally fall back to LE | |
| 47 | + staging). | |
| 48 | + | |
| 49 | +## T-0: cutover | |
| 50 | + | |
| 51 | +```sh | |
| 52 | +# 1. Pull the v1.0.0 tag. | |
| 53 | +git fetch --tags | |
| 54 | +git checkout v1.0.0 | |
| 55 | + | |
| 56 | +# 2. Dry-run to confirm exactly what will change. | |
| 57 | +make deploy-check ANSIBLE_INVENTORY=production | |
| 58 | + | |
| 59 | +# 3. Apply. Expect ~10s downtime as the web service restarts. | |
| 60 | +make deploy ANSIBLE_INVENTORY=production | |
| 61 | +``` | |
| 62 | + | |
| 63 | +The Ansible run includes `shithubd migrate up` as the web | |
| 64 | +service's `ExecStartPre`. New migrations run as part of the | |
| 65 | +restart; the unit stays in `activating` until they complete. | |
| 66 | + | |
| 67 | +Watch: | |
| 68 | + | |
| 69 | +```sh | |
| 70 | +ssh web-01 | |
| 71 | +journalctl -fu shithubd-web | |
| 72 | +``` | |
| 73 | + | |
| 74 | +## Smoke | |
| 75 | + | |
| 76 | +Run the smoke script as soon as the deploy reports `ok=N | |
| 77 | +changed=N failed=0`: | |
| 78 | + | |
| 79 | +```sh | |
| 80 | +deploy/cutover/smoke.sh https://shithub.example | |
| 81 | +``` | |
| 82 | + | |
| 83 | +The script exercises: home page, signup form, login form, health | |
| 84 | +endpoints, docs subdomain, a representative API call. Exits | |
| 85 | +non-zero on any 5xx or unexpected response shape. | |
| 86 | + | |
| 87 | +## Bootstrap-admin | |
| 88 | + | |
| 89 | +```sh | |
| 90 | +ssh web-01 | |
| 91 | +sudo -u shithub /usr/local/bin/shithubd admin bootstrap-admin \ | |
| 92 | + --email you@yourdomain | |
| 93 | +``` | |
| 94 | + | |
| 95 | +The CLI prints a one-time password-reset link. Open in a browser, | |
| 96 | +set a password, **immediately enable 2FA** (Settings → Account | |
| 97 | +security). | |
| 98 | + | |
| 99 | +## Open signup | |
| 100 | + | |
| 101 | +If signup was gated behind a feature flag during the pre-launch | |
| 102 | +build: | |
| 103 | + | |
| 104 | +```sh | |
| 105 | +ssh web-01 | |
| 106 | +sudo systemctl edit shithubd-web --full | |
| 107 | +# remove SHITHUB_AUTH__SIGNUP_DISABLED=true (or set to false) | |
| 108 | +sudo systemctl restart shithubd-web | |
| 109 | +``` | |
| 110 | + | |
| 111 | +Otherwise signup is already on; verify via the signup form | |
| 112 | +returning 200 + a valid CSRF token. | |
| 113 | + | |
| 114 | +## Mirror to GitHub | |
| 115 | + | |
| 116 | +Set up the one-way mirror so the GitHub mirror keeps receiving | |
| 117 | +pushes: | |
| 118 | + | |
| 119 | +```sh | |
| 120 | +# On the web host, as the shithub user: | |
| 121 | +cd /data/repos/shithub/shithub.git | |
| 122 | +git remote add github https://github.com/tenseleyFlow/shithub.git | |
| 123 | +# Add the mirror push to the periodic worker job (covered by | |
| 124 | +# the worker config; the mirror job kind = "git.mirror_push"). | |
| 125 | +``` | |
| 126 | + | |
| 127 | +Confirm a test push lands on both: | |
| 128 | + | |
| 129 | +```sh | |
| 130 | +git clone https://shithub.example/shithub/shithub.git /tmp/test-clone | |
| 131 | +cd /tmp/test-clone | |
| 132 | +echo "launch test" >> .launch-test | |
| 133 | +git add .launch-test | |
| 134 | +git commit -m "launch smoke push" | |
| 135 | +git push origin trunk | |
| 136 | +# Wait ~60s for the mirror job to run, then confirm on GitHub: | |
| 137 | +git ls-remote https://github.com/tenseleyFlow/shithub.git trunk | |
| 138 | +``` | |
| 139 | + | |
| 140 | +## Status page | |
| 141 | + | |
| 142 | +Update `docs/public/status.md` to "All systems normal." with the | |
| 143 | +current timestamp; push, sync to docs bucket. | |
| 144 | + | |
| 145 | +## Announcement | |
| 146 | + | |
| 147 | +Schedule the announcement post for **Tuesday 09:00 ET** (or your | |
| 148 | +chosen window). Submit to: | |
| 149 | + | |
| 150 | +- [ ] Hacker News: title + URL only; first comment is the | |
| 151 | + "What is shithub?" intro. | |
| 152 | +- [ ] /r/programming, /r/selfhosted: link + summary, follow | |
| 153 | + subreddit rules. | |
| 154 | +- [ ] lobste.rs: title + URL. | |
| 155 | +- [ ] Mastodon: short post + link. | |
| 156 | + | |
| 157 | +Have the FAQ tab open; expect "is this Forgejo?" / "why not | |
| 158 | +Codeberg?" / "where's CI?" within the first hour. | |
| 159 | + | |
| 160 | +## Day-zero monitoring | |
| 161 | + | |
| 162 | +For the first 24h: | |
| 163 | + | |
| 164 | +- Refresh Grafana every 30 min. | |
| 165 | +- Triage every alert immediately; nothing false-positive should | |
| 166 | + page in week 1 (we tuned for it). | |
| 167 | +- Bug reports go to `https://shithub.example/shithub/shithub/issues` | |
| 168 | + (the project's own self-hosted issues — drink your own | |
| 169 | + champagne). | |
| 170 | + | |
| 171 | +## Backout | |
| 172 | + | |
| 173 | +If cutover goes sideways within the first hour: | |
| 174 | + | |
| 175 | +1. **Stop the bleed.** Put the site in read-only mode | |
| 176 | + (`docs/internal/runbooks/read-only-mode.md`). | |
| 177 | +2. **Decide:** roll back code, restore data, or wait? | |
| 178 | +3. If rolling back code: `deploy/cutover/rollback.sh v0.999`. | |
| 179 | +4. Status page → "Investigating" with what we know. | |
| 180 | +5. Page the operator (yourself, by definition). | |
| 181 | + | |
| 182 | +The 24h SLO is "report what we know, not promises about when it's | |
| 183 | +fixed." Honesty wins trust; deadlines under stress lose it. | |
| 184 | + | |
| 185 | +## Day-one retro | |
| 186 | + | |
| 187 | +After the first 24h, fill in `docs/internal/retro/v1.0.0.md` | |
| 188 | +with: what worked, what surprised us, top 3 user-reported | |
| 189 | +issues, and the next sprint's focus. | |
deploy/cutover/rollback.shadded@@ -0,0 +1,101 @@ | ||
| 1 | +#!/usr/bin/env bash | |
| 2 | +# SPDX-License-Identifier: AGPL-3.0-or-later | |
| 3 | +# | |
| 4 | +# S40 cutover rollback. Re-deploys a previous tag to production. | |
| 5 | +# Walks the operator through the data-safe path; do NOT use this | |
| 6 | +# blindly when migrations changed schema in non-additive ways | |
| 7 | +# (read docs/internal/runbooks/rollback.md before running). | |
| 8 | +# | |
| 9 | +# Usage: | |
| 10 | +# deploy/cutover/rollback.sh v0.999 | |
| 11 | +# | |
| 12 | +# What it does: | |
| 13 | +# 1. Checks out the named tag. | |
| 14 | +# 2. Confirms the tag exists and is signed (if signing is on). | |
| 15 | +# 3. Runs make deploy-check (DRY-RUN) and prints the diff. | |
| 16 | +# 4. Asks for explicit confirmation before applying. | |
| 17 | +# 5. Runs make deploy. | |
| 18 | +# 6. Runs the smoke script post-deploy. | |
| 19 | +# | |
| 20 | +# Exit status: | |
| 21 | +# 0 — rollback completed and smoked | |
| 22 | +# 1 — operator aborted, deploy failed, or smoke failed | |
| 23 | +# 2 — usage error | |
| 24 | + | |
| 25 | +set -euo pipefail | |
| 26 | + | |
| 27 | +if [[ $# -lt 1 ]]; then | |
| 28 | + echo "usage: $0 <previous-tag>" >&2 | |
| 29 | + exit 2 | |
| 30 | +fi | |
| 31 | + | |
| 32 | +TAG="$1" | |
| 33 | +ROOT="$(cd "$(dirname "$0")/../.." && pwd)" | |
| 34 | +cd "$ROOT" | |
| 35 | + | |
| 36 | +confirm() { | |
| 37 | + local prompt="$1" | |
| 38 | + read -r -p "$prompt [yes/NO] " resp | |
| 39 | + if [[ "$resp" != "yes" ]]; then | |
| 40 | + echo "aborted." >&2 | |
| 41 | + exit 1 | |
| 42 | + fi | |
| 43 | +} | |
| 44 | + | |
| 45 | +echo "rollback target: $TAG" | |
| 46 | + | |
| 47 | +# 1. Verify the tag exists locally; fetch if needed. | |
| 48 | +if ! git rev-parse --verify "refs/tags/$TAG" >/dev/null 2>&1; then | |
| 49 | + echo "tag $TAG not found locally; fetching..." | |
| 50 | + git fetch --tags | |
| 51 | + if ! git rev-parse --verify "refs/tags/$TAG" >/dev/null 2>&1; then | |
| 52 | + echo "FAIL: tag $TAG does not exist on origin" >&2 | |
| 53 | + exit 1 | |
| 54 | + fi | |
| 55 | +fi | |
| 56 | + | |
| 57 | +# 2. Check whether the rollback crosses any new migration files. | |
| 58 | +# Forward-only migrations mean the schema is ahead of the rolled- | |
| 59 | +# back code. The operator must read the matching `down` migrations | |
| 60 | +# before continuing; we don't auto-rollback schema here. | |
| 61 | +NEW_MIGRATIONS=$(git diff --name-only "$TAG"..HEAD -- 'internal/migrationsfs/migrations/*.sql' || true) | |
| 62 | +if [[ -n "$NEW_MIGRATIONS" ]]; then | |
| 63 | + echo "" | |
| 64 | + echo "WARNING: migrations exist between $TAG and HEAD:" | |
| 65 | + echo "$NEW_MIGRATIONS" | sed 's/^/ /' | |
| 66 | + echo "" | |
| 67 | + echo "Rolling back code without rolling back schema is fine ONLY if" | |
| 68 | + echo "every migration above is purely additive (new columns/tables" | |
| 69 | + echo "the old code ignores). Read docs/internal/runbooks/rollback.md" | |
| 70 | + echo "before continuing." | |
| 71 | + echo "" | |
| 72 | + confirm "All migrations above are additive (the old code handles them)?" | |
| 73 | +fi | |
| 74 | + | |
| 75 | +# 3. Check out the tag. | |
| 76 | +echo "" | |
| 77 | +echo "checking out $TAG..." | |
| 78 | +git checkout "$TAG" | |
| 79 | + | |
| 80 | +# 4. Dry-run. | |
| 81 | +echo "" | |
| 82 | +echo "running ANSIBLE deploy-check (DRY-RUN)..." | |
| 83 | +make deploy-check ANSIBLE_INVENTORY=production | |
| 84 | + | |
| 85 | +# 5. Confirm + apply. | |
| 86 | +echo "" | |
| 87 | +confirm "Apply the rollback to production?" | |
| 88 | +make deploy ANSIBLE_INVENTORY=production | |
| 89 | + | |
| 90 | +# 6. Smoke. Tries to read the production base URL from a deploy var | |
| 91 | +# file; falls back to asking. | |
| 92 | +BASE="${SHITHUB_PROD_URL:-}" | |
| 93 | +if [[ -z "$BASE" ]]; then | |
| 94 | + read -r -p "Smoke base URL (e.g. https://shithub.example): " BASE | |
| 95 | +fi | |
| 96 | +echo "" | |
| 97 | +echo "running smoke against $BASE..." | |
| 98 | +deploy/cutover/smoke.sh "$BASE" | |
| 99 | + | |
| 100 | +echo "" | |
| 101 | +echo "rollback to $TAG complete and smoked. Update the status page." | |
deploy/cutover/smoke.shadded@@ -0,0 +1,100 @@ | ||
| 1 | +#!/usr/bin/env bash | |
| 2 | +# SPDX-License-Identifier: AGPL-3.0-or-later | |
| 3 | +# | |
| 4 | +# S40 cutover smoke test. Exercises the public-facing routes that | |
| 5 | +# matter at launch: landing page, signup/login forms render with | |
| 6 | +# a fresh CSRF token, health endpoints respond, the docs subdomain | |
| 7 | +# is reachable, the API authenticates a known PAT. | |
| 8 | +# | |
| 9 | +# Usage: | |
| 10 | +# deploy/cutover/smoke.sh https://shithub.example | |
| 11 | +# | |
| 12 | +# Optional env (when set, the script also exercises the API): | |
| 13 | +# SHITHUB_SMOKE_PAT — a valid shp_ token for `user:read` | |
| 14 | +# SHITHUB_SMOKE_DOCS — docs subdomain URL (default: docs.<base>) | |
| 15 | +# | |
| 16 | +# Exit status: | |
| 17 | +# 0 — all green | |
| 18 | +# 1 — at least one check failed | |
| 19 | +# 2 — usage error | |
| 20 | + | |
| 21 | +set -euo pipefail | |
| 22 | + | |
| 23 | +if [[ $# -lt 1 ]]; then | |
| 24 | + echo "usage: $0 <base-url>" >&2 | |
| 25 | + exit 2 | |
| 26 | +fi | |
| 27 | + | |
| 28 | +BASE="$1" | |
| 29 | +DOCS="${SHITHUB_SMOKE_DOCS:-${BASE/shithub./docs.shithub.}}" | |
| 30 | +fail=0 | |
| 31 | + | |
| 32 | +say() { printf '\n=== %s ===\n' "$*"; } | |
| 33 | +ok() { printf ' PASS: %s\n' "$*"; } | |
| 34 | +bad() { printf ' FAIL: %s\n' "$*"; fail=$((fail + 1)); } | |
| 35 | + | |
| 36 | +# 1. Landing. | |
| 37 | +say "GET $BASE/" | |
| 38 | +body=$(curl -fsS -o - -w "\n%{http_code}\n" "$BASE/" 2>&1) || { bad "landing fetch"; body=""; } | |
| 39 | +if [[ "$body" == *"shithub"* ]]; then ok "body contains shithub"; else bad "body missing shithub"; fi | |
| 40 | + | |
| 41 | +# 2. Health endpoints. /readyz proves DB + storage are reachable. | |
| 42 | +say "GET $BASE/-/health" | |
| 43 | +curl -fsS "$BASE/-/health" >/dev/null && ok "/-/health 200" || bad "/-/health" | |
| 44 | +say "GET $BASE/healthz" | |
| 45 | +curl -fsS "$BASE/healthz" >/dev/null && ok "/healthz 200" || bad "/healthz" | |
| 46 | +say "GET $BASE/readyz" | |
| 47 | +curl -fsS "$BASE/readyz" >/dev/null && ok "/readyz 200" || bad "/readyz" | |
| 48 | + | |
| 49 | +# 3. Signup form renders. | |
| 50 | +say "GET $BASE/signup" | |
| 51 | +body=$(curl -fsS "$BASE/signup") || { bad "signup fetch"; body=""; } | |
| 52 | +if [[ "$body" == *"csrf_token"* ]]; then ok "CSRF token present"; else bad "no csrf_token in signup form"; fi | |
| 53 | + | |
| 54 | +# 4. Login form renders. | |
| 55 | +say "GET $BASE/login" | |
| 56 | +body=$(curl -fsS "$BASE/login") || { bad "login fetch"; body=""; } | |
| 57 | +if [[ "$body" == *"username"* ]] && [[ "$body" == *"password"* ]]; then | |
| 58 | + ok "login form fields present" | |
| 59 | +else | |
| 60 | + bad "login form missing username/password" | |
| 61 | +fi | |
| 62 | + | |
| 63 | +# 5. TLS posture. Strict-Transport-Security must be set. | |
| 64 | +say "TLS / HSTS" | |
| 65 | +hdrs=$(curl -fsS -I "$BASE/" 2>&1) || { bad "headers fetch"; hdrs=""; } | |
| 66 | +if grep -qi "strict-transport-security" <<<"$hdrs"; then | |
| 67 | + ok "HSTS header set" | |
| 68 | +else | |
| 69 | + bad "HSTS header missing" | |
| 70 | +fi | |
| 71 | +if grep -qi "x-content-type-options" <<<"$hdrs"; then | |
| 72 | + ok "X-Content-Type-Options set" | |
| 73 | +else | |
| 74 | + bad "X-Content-Type-Options missing" | |
| 75 | +fi | |
| 76 | + | |
| 77 | +# 6. Docs subdomain. | |
| 78 | +say "GET $DOCS/" | |
| 79 | +curl -fsS -o /dev/null "$DOCS/" && ok "docs site 200" || bad "docs site" | |
| 80 | + | |
| 81 | +# 7. API (only if a PAT is provided). | |
| 82 | +if [[ -n "${SHITHUB_SMOKE_PAT:-}" ]]; then | |
| 83 | + say "GET $BASE/api/v1/user (with PAT)" | |
| 84 | + body=$(curl -fsS -H "Authorization: Bearer $SHITHUB_SMOKE_PAT" "$BASE/api/v1/user") || { bad "api fetch"; body=""; } | |
| 85 | + if [[ "$body" == *'"username"'* ]]; then | |
| 86 | + ok "API returned a user object" | |
| 87 | + else | |
| 88 | + bad "API response unexpected: $body" | |
| 89 | + fi | |
| 90 | +else | |
| 91 | + printf ' SKIP: API check (set SHITHUB_SMOKE_PAT to run)\n' | |
| 92 | +fi | |
| 93 | + | |
| 94 | +printf '\n' | |
| 95 | +if [[ "$fail" -eq 0 ]]; then | |
| 96 | + echo "smoke: all checks passed" | |
| 97 | + exit 0 | |
| 98 | +fi | |
| 99 | +echo "smoke: $fail check(s) FAILED" | |
| 100 | +exit 1 | |