Upgrades & migrations
Routine release deploys. Migrations apply automatically; the only place upgrades get exciting is around long migrations and the occasional config schema change.
Standard release
git fetch --tags
git checkout v<version>
make deploy-check ANSIBLE_INVENTORY=staging
make deploy ANSIBLE_INVENTORY=staging
# ... canary period ...
make deploy ANSIBLE_INVENTORY=production
shithubd migrate up runs as the web service's
ExecStartPre=, so the binary that needs the new schema is also
the one that applies it. Order on each host: ExecStartPre runs
migrations → web starts on the new schema.
If a migration is long (>30s), the release notes call it out. Schedule the deploy outside peak hours; the web service hangs in "activating" until ExecStartPre finishes.
Canary
Deploy to staging first. Watch for 30 min in Grafana. Things to look at:
- p95 latency on top routes.
- DB call rate — a 10× jump usually means a regressed N+1.
- Job queue depth — a stuck migration reflects here.
- Error logs in Loki:
{service="shithubd"} |~ "panic|ERROR".
If anything looks off, do not promote. Rollback on staging is cheap; rollback on production is loud.
Major release (database)
If the release notes flag a major schema change:
- Take a manual
pg_dumpimmediately before the deploy:sudo -u postgres /usr/local/bin/shithub-backup-daily. - Confirm it landed in Spaces.
- Deploy to staging, run
make restore-drillagainst the post-deploy dump to confirm the new schema restores cleanly. - Then production.
Config schema changes
When a release adds a required config key, the binary refuses to
start and complains in the journal. Update
deploy/ansible/roles/shithubd/templates/web.env.j2 (and
worker.env.j2), bump the inventory vars, redeploy. There's no
separate migration step for env files.
Rolling back
See Rollback (in-repo runbook).
Three rollback shapes, in preference order:
- Schema-compatible rollback (best). If the migration only added columns/tables that the old code ignores, the old code runs against the new schema fine. Roll the code back; leave schema alone. Most of our migrations are deliberately additive for this reason.
- Roll forward to a hotfix. If the migration changed semantics that the old code can't tolerate, ship a hotfix on top of the new release rather than reversing the migration.
- Migration
down+ code rollback. Last resort; somedowns drop columns and will lose data.
# (3) only when (1) and (2) won't work
ssh web-01
sudo -u shithub /usr/local/bin/shithubd migrate down # ONE step
git checkout v<previous>
make deploy ANSIBLE_INVENTORY=production
View source
| 1 | # Upgrades & migrations |
| 2 | |
| 3 | Routine release deploys. Migrations apply automatically; the |
| 4 | only place upgrades get exciting is around long migrations and |
| 5 | the occasional config schema change. |
| 6 | |
| 7 | ## Standard release |
| 8 | |
| 9 | ```sh |
| 10 | git fetch --tags |
| 11 | git checkout v<version> |
| 12 | make deploy-check ANSIBLE_INVENTORY=staging |
| 13 | make deploy ANSIBLE_INVENTORY=staging |
| 14 | # ... canary period ... |
| 15 | make deploy ANSIBLE_INVENTORY=production |
| 16 | ``` |
| 17 | |
| 18 | `shithubd migrate up` runs as the web service's |
| 19 | `ExecStartPre=`, so the binary that needs the new schema is also |
| 20 | the one that applies it. Order on each host: ExecStartPre runs |
| 21 | migrations → web starts on the new schema. |
| 22 | |
| 23 | If a migration is long (>30s), the release notes call it out. |
| 24 | Schedule the deploy outside peak hours; the web service hangs in |
| 25 | "activating" until ExecStartPre finishes. |
| 26 | |
| 27 | ## Canary |
| 28 | |
| 29 | Deploy to staging first. Watch for 30 min in Grafana. Things to |
| 30 | look at: |
| 31 | |
| 32 | - p95 latency on top routes. |
| 33 | - DB call rate — a 10× jump usually means a regressed N+1. |
| 34 | - Job queue depth — a stuck migration reflects here. |
| 35 | - Error logs in Loki: `{service="shithubd"} |~ "panic|ERROR"`. |
| 36 | |
| 37 | If anything looks off, **do not** promote. Rollback on staging |
| 38 | is cheap; rollback on production is loud. |
| 39 | |
| 40 | ## Major release (database) |
| 41 | |
| 42 | If the release notes flag a major schema change: |
| 43 | |
| 44 | 1. Take a manual `pg_dump` immediately before the deploy: |
| 45 | `sudo -u postgres /usr/local/bin/shithub-backup-daily`. |
| 46 | 2. Confirm it landed in Spaces. |
| 47 | 3. Deploy to staging, run `make restore-drill` against the |
| 48 | *post-deploy* dump to confirm the new schema restores cleanly. |
| 49 | 4. Then production. |
| 50 | |
| 51 | ## Config schema changes |
| 52 | |
| 53 | When a release adds a required config key, the binary refuses to |
| 54 | start and complains in the journal. Update |
| 55 | `deploy/ansible/roles/shithubd/templates/web.env.j2` (and |
| 56 | `worker.env.j2`), bump the inventory vars, redeploy. There's no |
| 57 | separate migration step for env files. |
| 58 | |
| 59 | ## Rolling back |
| 60 | |
| 61 | See [Rollback (in-repo runbook)](https://github.com/tenseleyFlow/shithub/blob/main/docs/internal/runbooks/rollback.md). |
| 62 | |
| 63 | Three rollback shapes, in preference order: |
| 64 | |
| 65 | 1. **Schema-compatible rollback (best).** If the migration only |
| 66 | *added* columns/tables that the old code ignores, the old code |
| 67 | runs against the new schema fine. Roll the code back; leave |
| 68 | schema alone. Most of our migrations are deliberately additive |
| 69 | for this reason. |
| 70 | 2. **Roll forward to a hotfix.** If the migration changed |
| 71 | semantics that the old code can't tolerate, ship a hotfix on |
| 72 | top of the new release rather than reversing the migration. |
| 73 | 3. **Migration `down` + code rollback.** Last resort; some `down`s |
| 74 | drop columns and *will* lose data. |
| 75 | |
| 76 | ```sh |
| 77 | # (3) only when (1) and (2) won't work |
| 78 | ssh web-01 |
| 79 | sudo -u shithub /usr/local/bin/shithubd migrate down # ONE step |
| 80 | git checkout v<previous> |
| 81 | make deploy ANSIBLE_INVENTORY=production |
| 82 | ``` |