tenseleyflow/shithub / e3013aa

Browse files

docs: aide runbook — alert flow, re-baseline, exclusion rationale

Authored by espadonne
SHA
e3013aa925d2a748dc57387b4f89175b3c522704
Parents
54853fe
Tree
4d6441d

1 changed file

StatusFile+-
A docs/internal/runbooks/aide.md 118 0
docs/internal/runbooks/aide.mdadded
@@ -0,0 +1,118 @@
1
+# AIDE — file-integrity monitoring
2
+
3
+AIDE (Advanced Intrusion Detection Environment) hashes a chosen set
4
+of system files at install time and re-checks them nightly. We use
5
+it to catch the post-compromise persistence pattern — someone with
6
+root replaces `/usr/local/bin/shithubd`, drops a systemd unit in
7
+`/etc/systemd/system/`, modifies `/root/.ssh/authorized_keys`, etc.
8
+The daily check produces no output when nothing's changed and a
9
+loud journal entry when something has.
10
+
11
+## Where alerts surface
12
+
13
+```sh
14
+journalctl -t shithub-aide -n 200 --no-pager
15
+tail -100 /var/log/shithub/aide.log
16
+```
17
+
18
+The wrapper at `/usr/local/bin/shithub-aide-check` writes both:
19
+
20
+- `/var/log/shithub/aide.log` — append-only, persists across reboots.
21
+- `journalctl -t shithub-aide` — structured, queryable, ships
22
+  with whatever log shipper we add later.
23
+
24
+A `/var/run/shithub-aide.last-clean` heartbeat file is updated on
25
+every clean run so the operator can confirm the cron actually
26
+fires:
27
+
28
+```sh
29
+stat /var/run/shithub-aide.last-clean
30
+# Modify: 2026-05-10 03:30:12 +0000 UTC   ← yesterday's clean run
31
+```
32
+
33
+Email delivery is **not yet wired**. The droplet has no MTA and
34
+the project's outbound SMTP (Postmark) is approval-gated. Once
35
+Postmark is approved, swap the `systemd-cat` call in the wrapper
36
+for a `curl POST https://api.postmarkapp.com/email …` invocation
37
+using the existing `SHITHUB_AUTH__POSTMARK__SERVER_TOKEN` (read
38
+the env file from inside the wrapper).
39
+
40
+## When alerts fire
41
+
42
+1. Look at the journal entry. Each diff line is one of:
43
+   - `f` — file content changed (size, mtime, hash)
44
+   - `+` / `-` — file added / removed
45
+   - `d` — directory metadata changed
46
+2. **Match the diff against an authorized change**:
47
+   - `apt` / `unattended-upgrades` ran → expect changes under
48
+     `/usr/lib/`, `/usr/sbin/`, `/etc/apt/`. Cross-check against
49
+     `journalctl -u unattended-upgrades` for the same timeframe.
50
+   - A deploy ran → expect `/usr/local/bin/shithubd` to change.
51
+     Cross-check the SHA against `gh run list --workflow=deploy.yml`.
52
+   - A manual config edit → match against the operator's notes.
53
+3. **No authorized change matches** → treat as an incident. Open
54
+   `runbooks/incidents.md`. Don't re-baseline AIDE until the
55
+   investigation closes.
56
+
57
+## Re-baselining after an authorized change
58
+
59
+Whenever you make an intentional change to a watched path (apt
60
+upgrade, manual config edit, ansible-driven config change), the
61
+next nightly run will flag it. Re-baseline once the change is
62
+confirmed-good:
63
+
64
+```sh
65
+sudo aideinit -y -f
66
+sudo mv /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz
67
+sudo rm -f /var/lib/aide/.config-changed
68
+```
69
+
70
+The `-y` is "answer yes to all prompts," `-f` is "overwrite an
71
+existing new database." Run takes 1–3 minutes on a 4 GB droplet.
72
+
73
+## Re-baselining after an Ansible config change
74
+
75
+When `deploy/ansible/roles/base/files/aide-shithub.conf` is edited
76
+and the playbook re-runs, the `rebuild aide database` handler
77
+drops `/var/lib/aide/.config-changed`. Re-baseline as above to
78
+clear it.
79
+
80
+## Disabling temporarily
81
+
82
+If you're about to do a large planned change (OS upgrade, big
83
+ansible re-run) and don't want a flood of alerts:
84
+
85
+```sh
86
+# Disable for the next 24h
87
+sudo systemctl stop cron     # blunt; you may prefer to mv just the cron entry
88
+# ... make changes ...
89
+sudo aideinit -y -f && \
90
+sudo mv /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz
91
+sudo systemctl start cron
92
+```
93
+
94
+Or, surgically, comment out the `30 3 * * * /usr/local/bin/shithub-aide-check`
95
+line in `crontab -l`. Re-baseline + re-enable when done.
96
+
97
+## What's watched, what isn't
98
+
99
+The default Debian config (`/etc/aide/aide.conf` and the snippets
100
+in `/etc/aide/aide.conf.d/`) covers `/etc`, `/bin`, `/sbin`,
101
+`/usr/{bin,sbin,lib,libexec,local}`, `/root`, `/boot`, `/lib*`.
102
+Our exclusions (`/etc/aide/aide.conf.d/99_shithub_exclude`):
103
+
104
+| Path | Why excluded |
105
+|---|---|
106
+| `/data` | Repo data root — write-heavy by design |
107
+| `/var/lib/postgresql` | Postgres rewrites these constantly |
108
+| `/var/lib/shithub*` | Application state |
109
+| `/var/lib/caddy`, `/var/log/caddy` | Cert renewals + access log churn |
110
+| `/var/log/shithub` | App logs (incl. our own aide.log) |
111
+| `/root/src/shithub` | Source tree fetched by every deploy |
112
+| `/usr/local/share/shithub` | Restore-drill scratch |
113
+| `/var/backups/shithub` | Nightly pg_dump |
114
+| `/var/lib/aide` | AIDE's own DB |
115
+| `/tmp/shithubd-new` | Deploy step's binary swap path |
116
+
117
+If you add a new system path that legitimately churns, add it
118
+here, commit, re-run ansible, then re-baseline.