tenseleyflow/shithub / ffb0e19

Browse files

docs(runbook): operator backup of /etc/shithub/web.env

Authored by espadonne
SHA
ffb0e19e90d64170afc7b67111cb69900714648d
Parents
e2c3943
Tree
78add9a

1 changed file

StatusFile+-
A docs/internal/runbooks/env-backup.md 147 0
docs/internal/runbooks/env-backup.mdadded
@@ -0,0 +1,147 @@
1
+# Backing up the production .env
2
+
3
+`/etc/shithub/web.env` (and `/etc/shithub/worker.env`, when split)
4
+holds every load-bearing secret on the droplet:
5
+
6
+- DB password (`SHITHUB_DATABASE_URL`)
7
+- Session signing key, TOTP AEAD key, webhook AEAD key
8
+- Spaces access keys (object store)
9
+- Postmark / SMTP creds
10
+- SSH-git host key seed (when configured)
11
+
12
+The DB content is captured by [`backups.md`](backups.md) — those
13
+dumps and WAL segments rebuild the data. `web.env` is what
14
+re-binds a fresh droplet to that data: without the same secrets,
15
+the existing DB rows can't be decrypted (TOTP/webhook payloads),
16
+sessions can't be signed, and S3 buckets can't be reached.
17
+
18
+This doc covers backing up `web.env` itself.
19
+
20
+## The mental model
21
+
22
+Treat `web.env` like a master password. The DB backup chain is
23
+fully automated and tested; the env file is operator-managed
24
+because it has different lifecycle:
25
+
26
+- **Mostly stable**: changes only when secrets rotate (see
27
+  `rotate-secrets.md`) or new config keys land.
28
+- **Tiny**: a few KB.
29
+- **Maximum sensitivity**: leaking it is a "rotate everything"
30
+  incident. So it shouldn't ride alongside the DB dumps in
31
+  Spaces — different blast radius.
32
+
33
+That puts it in a different storage tier than DB dumps. Pick one
34
+of the three options below.
35
+
36
+## Option A (recommended) — password manager
37
+
38
+Keep an encrypted copy in your operator password manager
39
+(1Password, Bitwarden, KeePassXC, …) as a "Secure Note" or
40
+file attachment.
41
+
42
+**When to update:**
43
+- After every secret rotation per `rotate-secrets.md`
44
+- After any change to `web.env` for new config keys
45
+- Right after a fresh droplet provision (initial baseline)
46
+
47
+**How:**
48
+```sh
49
+ssh root@shithub.sh 'cat /etc/shithub/web.env'
50
+# Paste output into a Secure Note titled e.g. "shithub-prod web.env (YYYY-MM-DD)"
51
+# Tag with the rotation date so you can identify the active copy.
52
+```
53
+
54
+Pros: zero new infrastructure, auditable access (PM logs),
55
+already in your daily-use security tooling. Cons: manual
56
+step that's easy to skip after a rotation — set a calendar
57
+reminder for the quarterly rotation cadence.
58
+
59
+## Option B — encrypted blob alongside DB backups
60
+
61
+Include an encrypted copy of `web.env` in the daily backup
62
+chain. Encrypt with `age` or `gpg` so even a compromised Spaces
63
+key can't decrypt it; the recipient key is held only by
64
+operators (in their PM).
65
+
66
+Sketch (NOT yet wired up — would be a follow-up if we go this
67
+route):
68
+
69
+```sh
70
+# In shithub-backup-daily, after the pg_dump succeeds:
71
+age -r "$(cat /etc/shithub/env-backup.recipients)" \
72
+    -o "$LOCAL_DIR/web.env.${STAMP}.age" \
73
+    /etc/shithub/web.env
74
+rclone copyto "$LOCAL_DIR/web.env.${STAMP}.age" \
75
+       "$BUCKET/env/$(date -u +%Y/%m/%d)/web.env.${STAMP}.age"
76
+```
77
+
78
+Pros: automatic, captures rotations as they happen. Cons: adds
79
+a dep (`age`), a new key-mgmt surface (the recipient key), and
80
+a failure mode (if the recipient key is lost, the backups
81
+become useless).
82
+
83
+## Option C — DO snapshot
84
+
85
+DO droplet snapshots include `/etc/shithub/web.env` by virtue of
86
+including the whole filesystem. This is "free" coverage but the
87
+caveats are real:
88
+
89
+- **Point-in-time only**: a snapshot taken before a secret
90
+  rotation has the OLD secret. Restoring a stale snapshot
91
+  desynchronizes you from any DB rows encrypted with the new
92
+  key (TOTP, webhook payloads).
93
+- **Snapshots are scheduled to be deleted** under DO's free
94
+  policy (4 retained, oldest pruned).
95
+- **Snapshot restore replaces the droplet**, including the
96
+  block volume's previous state (depending on snapshot type).
97
+
98
+Suitable as a *belt-and-suspenders* on top of A or B. Not
99
+sufficient as the only backup of the env file.
100
+
101
+## Restore procedure
102
+
103
+If the live `web.env` is lost (droplet replaced, file deleted,
104
+permissions wedged):
105
+
106
+1. Stop the running services that need it:
107
+   ```sh
108
+   systemctl stop shithubd-web shithubd-cron
109
+   ```
110
+2. Recreate `/etc/shithub/web.env` with the right ownership and
111
+   mode (see `deploy/ansible/roles/shithubd/tasks/main.yml` for
112
+   canonical perms — currently `root:shithub 0640`):
113
+   ```sh
114
+   install -o root -g shithub -m 0640 /dev/stdin /etc/shithub/web.env <<'EOF'
115
+   <paste from your Secure Note>
116
+   EOF
117
+   ```
118
+3. Restart:
119
+   ```sh
120
+   systemctl start shithubd-web shithubd-cron
121
+   curl -fsS http://127.0.0.1:8080/healthz   # 200
122
+   ```
123
+4. If the secrets in the restored copy are stale (you rotated
124
+   after the backup), follow `rotate-secrets.md` to re-apply
125
+   the current values.
126
+
127
+## What goes wrong if you skip this
128
+
129
+Concrete scenarios where lacking an env backup turns a recoverable
130
+incident into a re-key-the-world incident:
131
+
132
+| Scenario | With env backup | Without |
133
+|---|---|---|
134
+| Droplet kernel panic, fsck loses files | Restore web.env, restart, done | Rotate every secret in `rotate-secrets.md`, plus likely re-encrypt every TOTP secret + webhook payload |
135
+| Operator deletes `/etc/shithub` by mistake | Same as above | Same as above |
136
+| DO destroys droplet (account billing issue) | New droplet + restored env + restored DB → up | Same DB-restore work, plus full secret rotation |
137
+| Provider breach forces DB password rotation | Update one line in PM, redeploy | Same — env backup neutral here, but you should still update the PM copy after the rotation |
138
+
139
+## How this relates to the rest of the backup story
140
+
141
+| What | Where it's backed up | Cadence |
142
+|---|---|---|
143
+| DB rows | `spaces-prod:shithub-backups/daily/...` (pg_dump) + `spaces-prod:shithub-wal/` (WAL) | Continuous + daily |
144
+| Bare repos | `/data/repos` on the block volume + cross-region Spaces sync | Continuous |
145
+| Object store contents | DO Spaces lifecycle handles versioning | Provider-managed |
146
+| Operator secrets (`web.env`) | Operator password manager (this doc) | Per-rotation |
147
+| Filesystem layout | DO droplet snapshots | Weekly or pre-major-change |