@@ -0,0 +1,147 @@ |
| 1 | +# Backing up the production .env |
| 2 | + |
| 3 | +`/etc/shithub/web.env` (and `/etc/shithub/worker.env`, when split) |
| 4 | +holds every load-bearing secret on the droplet: |
| 5 | + |
| 6 | +- DB password (`SHITHUB_DATABASE_URL`) |
| 7 | +- Session signing key, TOTP AEAD key, webhook AEAD key |
| 8 | +- Spaces access keys (object store) |
| 9 | +- Postmark / SMTP creds |
| 10 | +- SSH-git host key seed (when configured) |
| 11 | + |
| 12 | +The DB content is captured by [`backups.md`](backups.md) — those |
| 13 | +dumps and WAL segments rebuild the data. `web.env` is what |
| 14 | +re-binds a fresh droplet to that data: without the same secrets, |
| 15 | +the existing DB rows can't be decrypted (TOTP/webhook payloads), |
| 16 | +sessions can't be signed, and S3 buckets can't be reached. |
| 17 | + |
| 18 | +This doc covers backing up `web.env` itself. |
| 19 | + |
| 20 | +## The mental model |
| 21 | + |
| 22 | +Treat `web.env` like a master password. The DB backup chain is |
| 23 | +fully automated and tested; the env file is operator-managed |
| 24 | +because it has different lifecycle: |
| 25 | + |
| 26 | +- **Mostly stable**: changes only when secrets rotate (see |
| 27 | + `rotate-secrets.md`) or new config keys land. |
| 28 | +- **Tiny**: a few KB. |
| 29 | +- **Maximum sensitivity**: leaking it is a "rotate everything" |
| 30 | + incident. So it shouldn't ride alongside the DB dumps in |
| 31 | + Spaces — different blast radius. |
| 32 | + |
| 33 | +That puts it in a different storage tier than DB dumps. Pick one |
| 34 | +of the three options below. |
| 35 | + |
| 36 | +## Option A (recommended) — password manager |
| 37 | + |
| 38 | +Keep an encrypted copy in your operator password manager |
| 39 | +(1Password, Bitwarden, KeePassXC, …) as a "Secure Note" or |
| 40 | +file attachment. |
| 41 | + |
| 42 | +**When to update:** |
| 43 | +- After every secret rotation per `rotate-secrets.md` |
| 44 | +- After any change to `web.env` for new config keys |
| 45 | +- Right after a fresh droplet provision (initial baseline) |
| 46 | + |
| 47 | +**How:** |
| 48 | +```sh |
| 49 | +ssh root@shithub.sh 'cat /etc/shithub/web.env' |
| 50 | +# Paste output into a Secure Note titled e.g. "shithub-prod web.env (YYYY-MM-DD)" |
| 51 | +# Tag with the rotation date so you can identify the active copy. |
| 52 | +``` |
| 53 | + |
| 54 | +Pros: zero new infrastructure, auditable access (PM logs), |
| 55 | +already in your daily-use security tooling. Cons: manual |
| 56 | +step that's easy to skip after a rotation — set a calendar |
| 57 | +reminder for the quarterly rotation cadence. |
| 58 | + |
| 59 | +## Option B — encrypted blob alongside DB backups |
| 60 | + |
| 61 | +Include an encrypted copy of `web.env` in the daily backup |
| 62 | +chain. Encrypt with `age` or `gpg` so even a compromised Spaces |
| 63 | +key can't decrypt it; the recipient key is held only by |
| 64 | +operators (in their PM). |
| 65 | + |
| 66 | +Sketch (NOT yet wired up — would be a follow-up if we go this |
| 67 | +route): |
| 68 | + |
| 69 | +```sh |
| 70 | +# In shithub-backup-daily, after the pg_dump succeeds: |
| 71 | +age -r "$(cat /etc/shithub/env-backup.recipients)" \ |
| 72 | + -o "$LOCAL_DIR/web.env.${STAMP}.age" \ |
| 73 | + /etc/shithub/web.env |
| 74 | +rclone copyto "$LOCAL_DIR/web.env.${STAMP}.age" \ |
| 75 | + "$BUCKET/env/$(date -u +%Y/%m/%d)/web.env.${STAMP}.age" |
| 76 | +``` |
| 77 | + |
| 78 | +Pros: automatic, captures rotations as they happen. Cons: adds |
| 79 | +a dep (`age`), a new key-mgmt surface (the recipient key), and |
| 80 | +a failure mode (if the recipient key is lost, the backups |
| 81 | +become useless). |
| 82 | + |
| 83 | +## Option C — DO snapshot |
| 84 | + |
| 85 | +DO droplet snapshots include `/etc/shithub/web.env` by virtue of |
| 86 | +including the whole filesystem. This is "free" coverage but the |
| 87 | +caveats are real: |
| 88 | + |
| 89 | +- **Point-in-time only**: a snapshot taken before a secret |
| 90 | + rotation has the OLD secret. Restoring a stale snapshot |
| 91 | + desynchronizes you from any DB rows encrypted with the new |
| 92 | + key (TOTP, webhook payloads). |
| 93 | +- **Snapshots are scheduled to be deleted** under DO's free |
| 94 | + policy (4 retained, oldest pruned). |
| 95 | +- **Snapshot restore replaces the droplet**, including the |
| 96 | + block volume's previous state (depending on snapshot type). |
| 97 | + |
| 98 | +Suitable as a *belt-and-suspenders* on top of A or B. Not |
| 99 | +sufficient as the only backup of the env file. |
| 100 | + |
| 101 | +## Restore procedure |
| 102 | + |
| 103 | +If the live `web.env` is lost (droplet replaced, file deleted, |
| 104 | +permissions wedged): |
| 105 | + |
| 106 | +1. Stop the running services that need it: |
| 107 | + ```sh |
| 108 | + systemctl stop shithubd-web shithubd-cron |
| 109 | + ``` |
| 110 | +2. Recreate `/etc/shithub/web.env` with the right ownership and |
| 111 | + mode (see `deploy/ansible/roles/shithubd/tasks/main.yml` for |
| 112 | + canonical perms — currently `root:shithub 0640`): |
| 113 | + ```sh |
| 114 | + install -o root -g shithub -m 0640 /dev/stdin /etc/shithub/web.env <<'EOF' |
| 115 | + <paste from your Secure Note> |
| 116 | + EOF |
| 117 | + ``` |
| 118 | +3. Restart: |
| 119 | + ```sh |
| 120 | + systemctl start shithubd-web shithubd-cron |
| 121 | + curl -fsS http://127.0.0.1:8080/healthz # 200 |
| 122 | + ``` |
| 123 | +4. If the secrets in the restored copy are stale (you rotated |
| 124 | + after the backup), follow `rotate-secrets.md` to re-apply |
| 125 | + the current values. |
| 126 | + |
| 127 | +## What goes wrong if you skip this |
| 128 | + |
| 129 | +Concrete scenarios where lacking an env backup turns a recoverable |
| 130 | +incident into a re-key-the-world incident: |
| 131 | + |
| 132 | +| Scenario | With env backup | Without | |
| 133 | +|---|---|---| |
| 134 | +| Droplet kernel panic, fsck loses files | Restore web.env, restart, done | Rotate every secret in `rotate-secrets.md`, plus likely re-encrypt every TOTP secret + webhook payload | |
| 135 | +| Operator deletes `/etc/shithub` by mistake | Same as above | Same as above | |
| 136 | +| DO destroys droplet (account billing issue) | New droplet + restored env + restored DB → up | Same DB-restore work, plus full secret rotation | |
| 137 | +| Provider breach forces DB password rotation | Update one line in PM, redeploy | Same — env backup neutral here, but you should still update the PM copy after the rotation | |
| 138 | + |
| 139 | +## How this relates to the rest of the backup story |
| 140 | + |
| 141 | +| What | Where it's backed up | Cadence | |
| 142 | +|---|---|---| |
| 143 | +| DB rows | `spaces-prod:shithub-backups/daily/...` (pg_dump) + `spaces-prod:shithub-wal/` (WAL) | Continuous + daily | |
| 144 | +| Bare repos | `/data/repos` on the block volume + cross-region Spaces sync | Continuous | |
| 145 | +| Object store contents | DO Spaces lifecycle handles versioning | Provider-managed | |
| 146 | +| Operator secrets (`web.env`) | Operator password manager (this doc) | Per-rotation | |
| 147 | +| Filesystem layout | DO droplet snapshots | Weekly or pre-major-change | |