markdown · 3596 bytes Raw Blame History

Upgrade

Routine release deploys. The deploy is one binary swap + a systemd restart; the only place upgrades get exciting is around DB migrations and the occasional config schema change.

Standard release

Pushes to trunk auto-deploy to production via the deploy GitHub Actions workflow once ci succeeds. The workflow SSHes to the app droplet and runs deploy/redeploy.sh, which fetches trunk, rebuilds the binary in place, runs migrate up, and restarts the web + worker units. There is no canary tier today (see "Canary" below).

To redeploy current trunk without a push (e.g., after editing env files on the droplet), trigger the deploy workflow manually: gh workflow run deploy.yml --ref trunk. To deploy by hand from a console:

ssh root@shithub.sh 'bash /root/src/shithub/deploy/redeploy.sh'

For tagged releases on a staging-then-prod path (once we have a staging tier):

# from a clean checkout of the release tag
git fetch --tags
git checkout v<version>
make deploy-check ANSIBLE_INVENTORY=staging
make deploy ANSIBLE_INVENTORY=staging
# ... canary period ...
make deploy ANSIBLE_INVENTORY=production

GitHub Actions secrets

The deploy workflow needs three repo secrets (Settings → Secrets and variables → Actions, in the production environment):

  • DEPLOY_HOSTshithub.sh (or the app droplet's public IPv4)
  • DEPLOY_USERroot
  • DEPLOY_SSH_KEY — private half of an ed25519 key whose public half is in /root/.ssh/authorized_keys on the app droplet
  • DEPLOY_KNOWN_HOSTS — output of ssh-keyscan shithub.sh on a trusted host, pinning the host key so the runner won't TOFU-trust a hijacked DNS answer

Generate a dedicated deploy key (don't reuse the operator's laptop key):

ssh-keygen -t ed25519 -C 'gh-actions-deploy' -f ./gh-deploy -N ''
ssh-copy-id -i ./gh-deploy.pub root@shithub.sh
ssh-keyscan shithub.sh > known_hosts.txt
# Paste ./gh-deploy            → DEPLOY_SSH_KEY
# Paste known_hosts.txt        → DEPLOY_KNOWN_HOSTS
# Then: rm gh-deploy gh-deploy.pub known_hosts.txt

shithubd migrate up runs as the web service's ExecStartPre, so the binary that needs the new schema is also the one that applies it. Order on each host: ExecStartPre runs migrations → web starts on the new schema.

If a migration is long (>30s), call it out in the release notes and time the deploy outside peak hours. The web service hangs in "activating" until ExecStartPre finishes.

Canary

We deploy to staging first, watch for 30 min in Grafana. Things to look at:

  • p95 latency on the top routes (shithubd-overview dashboard).
  • DB call rate — a 10× jump usually means a regressed N+1.
  • Job queue depth — a stuck migration reflects here.
  • Error logs in Loki: {service="shithubd"} |~ "panic|ERROR".

If anything looks off, do not promote to production. Rollback on staging is cheap; rollback on production is loud.

Major version (database)

If the release notes flag a major schema change:

  1. Take a manual pg_dump immediately before the deploy: sudo -u postgres /usr/local/bin/shithub-backup-daily.
  2. Confirm it landed in Spaces.
  3. Deploy to staging, run make restore-drill against the post-deploy dump to confirm the new schema restores cleanly.
  4. Then production.

Config schema changes

When a release adds a required env var, the binary refuses to start and complains in the journal. Update deploy/ansible/roles/shithubd/ templates/web.env.j2 (and worker.env.j2), bump the inventory vars, redeploy. There's no separate migration step for env files.

View source
1 # Upgrade
2
3 Routine release deploys. The deploy is one binary swap + a systemd
4 restart; the only place upgrades get exciting is around DB migrations
5 and the occasional config schema change.
6
7 ## Standard release
8
9 Pushes to `trunk` auto-deploy to production via the `deploy` GitHub
10 Actions workflow once `ci` succeeds. The workflow SSHes to the app
11 droplet and runs `deploy/redeploy.sh`, which fetches trunk, rebuilds
12 the binary in place, runs `migrate up`, and restarts the web + worker
13 units. There is no canary tier today (see "Canary" below).
14
15 To redeploy current trunk without a push (e.g., after editing env
16 files on the droplet), trigger the `deploy` workflow manually:
17 `gh workflow run deploy.yml --ref trunk`. To deploy by hand from a
18 console:
19
20 ```sh
21 ssh root@shithub.sh 'bash /root/src/shithub/deploy/redeploy.sh'
22 ```
23
24 For tagged releases on a staging-then-prod path (once we have a
25 staging tier):
26
27 ```sh
28 # from a clean checkout of the release tag
29 git fetch --tags
30 git checkout v<version>
31 make deploy-check ANSIBLE_INVENTORY=staging
32 make deploy ANSIBLE_INVENTORY=staging
33 # ... canary period ...
34 make deploy ANSIBLE_INVENTORY=production
35 ```
36
37 ### GitHub Actions secrets
38
39 The `deploy` workflow needs three repo secrets (Settings → Secrets
40 and variables → Actions, in the `production` environment):
41
42 - `DEPLOY_HOST``shithub.sh` (or the app droplet's public IPv4)
43 - `DEPLOY_USER``root`
44 - `DEPLOY_SSH_KEY` — private half of an ed25519 key whose public half
45 is in `/root/.ssh/authorized_keys` on the app droplet
46 - `DEPLOY_KNOWN_HOSTS` — output of `ssh-keyscan shithub.sh` on a
47 trusted host, pinning the host key so the runner won't TOFU-trust
48 a hijacked DNS answer
49
50 Generate a dedicated deploy key (don't reuse the operator's laptop
51 key):
52
53 ```sh
54 ssh-keygen -t ed25519 -C 'gh-actions-deploy' -f ./gh-deploy -N ''
55 ssh-copy-id -i ./gh-deploy.pub root@shithub.sh
56 ssh-keyscan shithub.sh > known_hosts.txt
57 # Paste ./gh-deploy → DEPLOY_SSH_KEY
58 # Paste known_hosts.txt → DEPLOY_KNOWN_HOSTS
59 # Then: rm gh-deploy gh-deploy.pub known_hosts.txt
60 ```
61
62 `shithubd migrate up` runs as the web service's ExecStartPre, so
63 the binary that needs the new schema is also the one that applies
64 it. Order on each host: ExecStartPre runs migrations → web starts
65 on the new schema.
66
67 If a migration is long (>30s), call it out in the release notes
68 and time the deploy outside peak hours. The web service hangs in
69 "activating" until ExecStartPre finishes.
70
71 ## Canary
72
73 We deploy to staging first, watch for 30 min in Grafana. Things to
74 look at:
75
76 - p95 latency on the top routes (`shithubd-overview` dashboard).
77 - DB call rate — a 10× jump usually means a regressed N+1.
78 - Job queue depth — a stuck migration reflects here.
79 - Error logs in Loki: `{service="shithubd"} |~ "panic|ERROR"`.
80
81 If anything looks off, **do not** promote to production. Rollback
82 on staging is cheap; rollback on production is loud.
83
84 ## Major version (database)
85
86 If the release notes flag a major schema change:
87
88 1. Take a manual `pg_dump` immediately before the deploy:
89 `sudo -u postgres /usr/local/bin/shithub-backup-daily`.
90 2. Confirm it landed in Spaces.
91 3. Deploy to staging, run `make restore-drill` against the
92 *post-deploy* dump to confirm the new schema restores cleanly.
93 4. Then production.
94
95 ## Config schema changes
96
97 When a release adds a required env var, the binary refuses to start
98 and complains in the journal. Update `deploy/ansible/roles/shithubd/
99 templates/web.env.j2` (and `worker.env.j2`), bump the inventory
100 vars, redeploy. There's no separate migration step for env files.