markdown · 2814 bytes Raw Blame History

Upgrades & migrations

Routine release deploys. Migrations apply automatically; the only place upgrades get exciting is around long migrations and the occasional config schema change.

Standard release

git fetch --tags
git checkout v<version>
make deploy-check ANSIBLE_INVENTORY=staging
make deploy ANSIBLE_INVENTORY=staging
# ... canary period ...
make deploy ANSIBLE_INVENTORY=production

shithubd migrate up runs as the web service's ExecStartPre=, so the binary that needs the new schema is also the one that applies it. Order on each host: ExecStartPre runs migrations → web starts on the new schema.

If a migration is long (>30s), the release notes call it out. Schedule the deploy outside peak hours; the web service hangs in "activating" until ExecStartPre finishes.

Canary

Deploy to staging first. Watch for 30 min in Grafana. Things to look at:

  • p95 latency on top routes.
  • DB call rate — a 10× jump usually means a regressed N+1.
  • Job queue depth — a stuck migration reflects here.
  • Error logs in Loki: {service="shithubd"} |~ "panic|ERROR".

If anything looks off, do not promote. Rollback on staging is cheap; rollback on production is loud.

Major release (database)

If the release notes flag a major schema change:

  1. Take a manual pg_dump immediately before the deploy: sudo -u postgres /usr/local/bin/shithub-backup-daily.
  2. Confirm it landed in Spaces.
  3. Deploy to staging, run make restore-drill against the post-deploy dump to confirm the new schema restores cleanly.
  4. Then production.

Config schema changes

When a release adds a required config key, the binary refuses to start and complains in the journal. Update deploy/ansible/roles/shithubd/templates/web.env.j2 (and worker.env.j2), bump the inventory vars, redeploy. There's no separate migration step for env files.

Rolling back

See Rollback (in-repo runbook).

Three rollback shapes, in preference order:

  1. Schema-compatible rollback (best). If the migration only added columns/tables that the old code ignores, the old code runs against the new schema fine. Roll the code back; leave schema alone. Most of our migrations are deliberately additive for this reason.
  2. Roll forward to a hotfix. If the migration changed semantics that the old code can't tolerate, ship a hotfix on top of the new release rather than reversing the migration.
  3. Migration down + code rollback. Last resort; some downs drop columns and will lose data.
# (3) only when (1) and (2) won't work
ssh web-01
sudo -u shithub /usr/local/bin/shithubd migrate down  # ONE step
git checkout v<previous>
make deploy ANSIBLE_INVENTORY=production
View source
1 # Upgrades & migrations
2
3 Routine release deploys. Migrations apply automatically; the
4 only place upgrades get exciting is around long migrations and
5 the occasional config schema change.
6
7 ## Standard release
8
9 ```sh
10 git fetch --tags
11 git checkout v<version>
12 make deploy-check ANSIBLE_INVENTORY=staging
13 make deploy ANSIBLE_INVENTORY=staging
14 # ... canary period ...
15 make deploy ANSIBLE_INVENTORY=production
16 ```
17
18 `shithubd migrate up` runs as the web service's
19 `ExecStartPre=`, so the binary that needs the new schema is also
20 the one that applies it. Order on each host: ExecStartPre runs
21 migrations → web starts on the new schema.
22
23 If a migration is long (>30s), the release notes call it out.
24 Schedule the deploy outside peak hours; the web service hangs in
25 "activating" until ExecStartPre finishes.
26
27 ## Canary
28
29 Deploy to staging first. Watch for 30 min in Grafana. Things to
30 look at:
31
32 - p95 latency on top routes.
33 - DB call rate — a 10× jump usually means a regressed N+1.
34 - Job queue depth — a stuck migration reflects here.
35 - Error logs in Loki: `{service="shithubd"} |~ "panic|ERROR"`.
36
37 If anything looks off, **do not** promote. Rollback on staging
38 is cheap; rollback on production is loud.
39
40 ## Major release (database)
41
42 If the release notes flag a major schema change:
43
44 1. Take a manual `pg_dump` immediately before the deploy:
45 `sudo -u postgres /usr/local/bin/shithub-backup-daily`.
46 2. Confirm it landed in Spaces.
47 3. Deploy to staging, run `make restore-drill` against the
48 *post-deploy* dump to confirm the new schema restores cleanly.
49 4. Then production.
50
51 ## Config schema changes
52
53 When a release adds a required config key, the binary refuses to
54 start and complains in the journal. Update
55 `deploy/ansible/roles/shithubd/templates/web.env.j2` (and
56 `worker.env.j2`), bump the inventory vars, redeploy. There's no
57 separate migration step for env files.
58
59 ## Rolling back
60
61 See [Rollback (in-repo runbook)](https://github.com/tenseleyFlow/shithub/blob/main/docs/internal/runbooks/rollback.md).
62
63 Three rollback shapes, in preference order:
64
65 1. **Schema-compatible rollback (best).** If the migration only
66 *added* columns/tables that the old code ignores, the old code
67 runs against the new schema fine. Roll the code back; leave
68 schema alone. Most of our migrations are deliberately additive
69 for this reason.
70 2. **Roll forward to a hotfix.** If the migration changed
71 semantics that the old code can't tolerate, ship a hotfix on
72 top of the new release rather than reversing the migration.
73 3. **Migration `down` + code rollback.** Last resort; some `down`s
74 drop columns and *will* lose data.
75
76 ```sh
77 # (3) only when (1) and (2) won't work
78 ssh web-01
79 sudo -u shithub /usr/local/bin/shithubd migrate down # ONE step
80 git checkout v<previous>
81 make deploy ANSIBLE_INVENTORY=production
82 ```