shithub Public

Watch 1 Fork 0 Star 0

markdown · 1965 bytes Raw Blame History

Upgrade

Routine release deploys. The deploy is one binary swap + a systemd restart; the only place upgrades get exciting is around DB migrations and the occasional config schema change.

Standard release

# from a clean checkout of the release tag
git fetch --tags
git checkout v<version>
make deploy-check ANSIBLE_INVENTORY=staging
make deploy ANSIBLE_INVENTORY=staging
# ... canary period ...
make deploy ANSIBLE_INVENTORY=production

shithubd migrate up runs as the web service's ExecStartPre, so the binary that needs the new schema is also the one that applies it. Order on each host: ExecStartPre runs migrations → web starts on the new schema.

If a migration is long (>30s), call it out in the release notes and time the deploy outside peak hours. The web service hangs in "activating" until ExecStartPre finishes.

Canary

We deploy to staging first, watch for 30 min in Grafana. Things to look at:

p95 latency on the top routes (shithubd-overview dashboard).
DB call rate — a 10× jump usually means a regressed N+1.
Job queue depth — a stuck migration reflects here.
Error logs in Loki: {service="shithubd"} |~ "panic|ERROR".

If anything looks off, do not promote to production. Rollback on staging is cheap; rollback on production is loud.

Major version (database)

If the release notes flag a major schema change:

Take a manual pg_dump immediately before the deploy: sudo -u postgres /usr/local/bin/shithub-backup-daily.
Confirm it landed in Spaces.
Deploy to staging, run make restore-drill against the post-deploy dump to confirm the new schema restores cleanly.
Then production.

Config schema changes

When a release adds a required env var, the binary refuses to start and complains in the journal. Update deploy/ansible/roles/shithubd/ templates/web.env.j2 (and worker.env.j2), bump the inventory vars, redeploy. There's no separate migration step for env files.

View source

  
        1
        # Upgrade
      
        2
        
        3
        Routine release deploys. The deploy is one binary swap + a systemd
      
        4
        restart; the only place upgrades get exciting is around DB migrations
      
        5
        and the occasional config schema change.
      
        6
        
        7
        ## Standard release
      
        8
        
        9
        ```sh
      
        10
        # from a clean checkout of the release tag
      
        11
        git fetch --tags
      
        12
        git checkout v<version>
      
        13
        make deploy-check ANSIBLE_INVENTORY=staging
      
        14
        make deploy ANSIBLE_INVENTORY=staging
      
        15
        # ... canary period ...
      
        16
        make deploy ANSIBLE_INVENTORY=production
      
        17
        ```
      
        18
        
        19
        `shithubd migrate up` runs as the web service's ExecStartPre, so
      
        20
        the binary that needs the new schema is also the one that applies
      
        21
        it. Order on each host: ExecStartPre runs migrations → web starts
      
        22
        on the new schema.
      
        23
        
        24
        If a migration is long (>30s), call it out in the release notes
      
        25
        and time the deploy outside peak hours. The web service hangs in
      
        26
        "activating" until ExecStartPre finishes.
      
        27
        
        28
        ## Canary
      
        29
        
        30
        We deploy to staging first, watch for 30 min in Grafana. Things to
      
        31
        look at:
      
        32
        
        33
        - p95 latency on the top routes (`shithubd-overview` dashboard).
      
        34
        - DB call rate — a 10× jump usually means a regressed N+1.
      
        35
        - Job queue depth — a stuck migration reflects here.
      
        36
        - Error logs in Loki: `{service="shithubd"} |~ "panic|ERROR"`.
      
        37
        
        38
        If anything looks off, **do not** promote to production. Rollback
      
        39
        on staging is cheap; rollback on production is loud.
      
        40
        
        41
        ## Major version (database)
      
        42
        
        43
        If the release notes flag a major schema change:
      
        44
        
        45
        1. Take a manual `pg_dump` immediately before the deploy:
      
        46
           `sudo -u postgres /usr/local/bin/shithub-backup-daily`.
      
        47
        2. Confirm it landed in Spaces.
      
        48
        3. Deploy to staging, run `make restore-drill` against the
      
        49
           *post-deploy* dump to confirm the new schema restores cleanly.
      
        50
        4. Then production.
      
        51
        
        52
        ## Config schema changes
      
        53
        
        54
        When a release adds a required env var, the binary refuses to start
      
        55
        and complains in the journal. Update `deploy/ansible/roles/shithubd/
      
        56
        templates/web.env.j2` (and `worker.env.j2`), bump the inventory
      
        57
        vars, redeploy. There's no separate migration step for env files.

1	# Upgrade
2
3	Routine release deploys. The deploy is one binary swap + a systemd
4	restart; the only place upgrades get exciting is around DB migrations
5	and the occasional config schema change.
6
7	## Standard release
8
9	```sh
10	# from a clean checkout of the release tag
11	git fetch --tags
12	git checkout v<version>
13	make deploy-check ANSIBLE_INVENTORY=staging
14	make deploy ANSIBLE_INVENTORY=staging
15	# ... canary period ...
16	make deploy ANSIBLE_INVENTORY=production
17	```
18
19	`shithubd migrate up` runs as the web service's ExecStartPre, so
20	the binary that needs the new schema is also the one that applies
21	it. Order on each host: ExecStartPre runs migrations → web starts
22	on the new schema.
23
24	If a migration is long (>30s), call it out in the release notes
25	and time the deploy outside peak hours. The web service hangs in
26	"activating" until ExecStartPre finishes.
27
28	## Canary
29
30	We deploy to staging first, watch for 30 min in Grafana. Things to
31	look at:
32
33	- p95 latency on the top routes (`shithubd-overview` dashboard).
34	- DB call rate — a 10× jump usually means a regressed N+1.
35	- Job queue depth — a stuck migration reflects here.
36	- Error logs in Loki: `{service="shithubd"} \|~ "panic\|ERROR"`.
37
38	If anything looks off, do not promote to production. Rollback
39	on staging is cheap; rollback on production is loud.
40
41	## Major version (database)
42
43	If the release notes flag a major schema change:
44
45	1. Take a manual `pg_dump` immediately before the deploy:
46	`sudo -u postgres /usr/local/bin/shithub-backup-daily`.
47	2. Confirm it landed in Spaces.
48	3. Deploy to staging, run `make restore-drill` against the
49	post-deploy dump to confirm the new schema restores cleanly.
50	4. Then production.
51
52	## Config schema changes
53
54	When a release adds a required env var, the binary refuses to start
55	and complains in the journal. Update `deploy/ansible/roles/shithubd/
56	templates/web.env.j2` (and `worker.env.j2`), bump the inventory
57	vars, redeploy. There's no separate migration step for env files.