markdown · 1965 bytes Raw Blame History

Upgrade

Routine release deploys. The deploy is one binary swap + a systemd restart; the only place upgrades get exciting is around DB migrations and the occasional config schema change.

Standard release

# from a clean checkout of the release tag
git fetch --tags
git checkout v<version>
make deploy-check ANSIBLE_INVENTORY=staging
make deploy ANSIBLE_INVENTORY=staging
# ... canary period ...
make deploy ANSIBLE_INVENTORY=production

shithubd migrate up runs as the web service's ExecStartPre, so the binary that needs the new schema is also the one that applies it. Order on each host: ExecStartPre runs migrations → web starts on the new schema.

If a migration is long (>30s), call it out in the release notes and time the deploy outside peak hours. The web service hangs in "activating" until ExecStartPre finishes.

Canary

We deploy to staging first, watch for 30 min in Grafana. Things to look at:

  • p95 latency on the top routes (shithubd-overview dashboard).
  • DB call rate — a 10× jump usually means a regressed N+1.
  • Job queue depth — a stuck migration reflects here.
  • Error logs in Loki: {service="shithubd"} |~ "panic|ERROR".

If anything looks off, do not promote to production. Rollback on staging is cheap; rollback on production is loud.

Major version (database)

If the release notes flag a major schema change:

  1. Take a manual pg_dump immediately before the deploy: sudo -u postgres /usr/local/bin/shithub-backup-daily.
  2. Confirm it landed in Spaces.
  3. Deploy to staging, run make restore-drill against the post-deploy dump to confirm the new schema restores cleanly.
  4. Then production.

Config schema changes

When a release adds a required env var, the binary refuses to start and complains in the journal. Update deploy/ansible/roles/shithubd/ templates/web.env.j2 (and worker.env.j2), bump the inventory vars, redeploy. There's no separate migration step for env files.

View source
1 # Upgrade
2
3 Routine release deploys. The deploy is one binary swap + a systemd
4 restart; the only place upgrades get exciting is around DB migrations
5 and the occasional config schema change.
6
7 ## Standard release
8
9 ```sh
10 # from a clean checkout of the release tag
11 git fetch --tags
12 git checkout v<version>
13 make deploy-check ANSIBLE_INVENTORY=staging
14 make deploy ANSIBLE_INVENTORY=staging
15 # ... canary period ...
16 make deploy ANSIBLE_INVENTORY=production
17 ```
18
19 `shithubd migrate up` runs as the web service's ExecStartPre, so
20 the binary that needs the new schema is also the one that applies
21 it. Order on each host: ExecStartPre runs migrations → web starts
22 on the new schema.
23
24 If a migration is long (>30s), call it out in the release notes
25 and time the deploy outside peak hours. The web service hangs in
26 "activating" until ExecStartPre finishes.
27
28 ## Canary
29
30 We deploy to staging first, watch for 30 min in Grafana. Things to
31 look at:
32
33 - p95 latency on the top routes (`shithubd-overview` dashboard).
34 - DB call rate — a 10× jump usually means a regressed N+1.
35 - Job queue depth — a stuck migration reflects here.
36 - Error logs in Loki: `{service="shithubd"} |~ "panic|ERROR"`.
37
38 If anything looks off, **do not** promote to production. Rollback
39 on staging is cheap; rollback on production is loud.
40
41 ## Major version (database)
42
43 If the release notes flag a major schema change:
44
45 1. Take a manual `pg_dump` immediately before the deploy:
46 `sudo -u postgres /usr/local/bin/shithub-backup-daily`.
47 2. Confirm it landed in Spaces.
48 3. Deploy to staging, run `make restore-drill` against the
49 *post-deploy* dump to confirm the new schema restores cleanly.
50 4. Then production.
51
52 ## Config schema changes
53
54 When a release adds a required env var, the binary refuses to start
55 and complains in the journal. Update `deploy/ansible/roles/shithubd/
56 templates/web.env.j2` (and `worker.env.j2`), bump the inventory
57 vars, redeploy. There's no separate migration step for env files.