markdown · 5810 bytes Raw Blame History

AIDE — file-integrity monitoring

AIDE (Advanced Intrusion Detection Environment) hashes a chosen set of system files at install time and re-checks them nightly. We use it to catch the post-compromise persistence pattern — someone with root replaces /usr/local/bin/shithubd, drops a systemd unit in /etc/systemd/system/, modifies /root/.ssh/authorized_keys, etc. The daily check produces no output when nothing's changed and a loud journal entry when something has.

Where alerts surface

journalctl -t shithub-aide -n 200 --no-pager
tail -100 /var/log/shithub/aide.log

The wrapper at /usr/local/bin/shithub-aide-check writes both:

  • /var/log/shithub/aide.log — append-only, persists across reboots.
  • journalctl -t shithub-aide — structured, queryable, ships with whatever log shipper we add later.

A /var/run/shithub-aide.last-clean heartbeat file is updated on every clean run so the operator can confirm the cron actually fires:

stat /var/run/shithub-aide.last-clean
# Modify: 2026-05-10 03:30:12 +0000 UTC   ← yesterday's clean run

Email delivery is not yet wired. The droplet has no MTA and the project's outbound SMTP (Postmark) is approval-gated. Once Postmark is approved, swap the systemd-cat call in the wrapper for a curl POST https://api.postmarkapp.com/email … invocation using the existing SHITHUB_AUTH__POSTMARK__SERVER_TOKEN (read the env file from inside the wrapper).

When alerts fire

  1. Look at the journal entry. Each diff line is one of:
    • f — file content changed (size, mtime, hash)
    • + / - — file added / removed
    • d — directory metadata changed
  2. Match the diff against an authorized change:
    • apt / unattended-upgrades ran → expect changes under /usr/lib/, /usr/sbin/, /etc/apt/. Cross-check against journalctl -u unattended-upgrades for the same timeframe.
    • A deploy ran → expect /usr/local/bin/shithubd to change. Cross-check the SHA against gh run list --workflow=deploy.yml.
    • A manual config edit → match against the operator's notes.
  3. No authorized change matches → treat as an incident. Open runbooks/incidents.md. Don't re-baseline AIDE until the investigation closes.

Re-baselining after an authorized change

Whenever you make an intentional change to a watched path (apt upgrade, manual config edit, ansible-driven config change), the next nightly run will flag it. Re-baseline once the change is confirmed-good:

# 1. Generate a new baseline (10–15 min on shithub-prod). Use
#    aide --init directly — Ubuntu's aideinit wrapper prompts
#    interactively to confirm the post-init copy and won't be
#    auto-answered by stdin redirection.
sudo aide --config=/etc/aide/aide.conf --init

# 2. Keep the previous baseline as a dated backup so you can
#    revert tonight's check behavior in 1 second if the new
#    baseline turns out to capture unwanted state.
sudo mv /var/lib/aide/aide.db /var/lib/aide/aide.db.bak-$(date -u +%Y%m%d)

# 3. Promote. The exact filename suffix depends on the install:
#    Ubuntu 24's aide-common produces uncompressed aide.db.new
#    (no .gz). Adjust if your install differs (check ls
#    /var/lib/aide/ before this step).
sudo mv /var/lib/aide/aide.db.new /var/lib/aide/aide.db
sudo chown root:root /var/lib/aide/aide.db
sudo chmod 0600 /var/lib/aide/aide.db

sudo rm -f /var/lib/aide/.config-changed

Avoid aideinit directly: it prompts twice (Overwrite existing aide.db.new [Yn]? and Overwrite /var/lib/aide/aide.db [yN]?) and the second prompt's default is N, so any non-interactive invocation (cron, nohup, ssh without -t) silently bails after generating the new database without promoting it.

Re-baselining after an Ansible config change

When deploy/ansible/roles/base/files/aide-shithub.conf is edited and the playbook re-runs, the rebuild aide database handler drops /var/lib/aide/.config-changed. Re-baseline as above to clear it.

Disabling temporarily

If you're about to do a large planned change (OS upgrade, big ansible re-run) and don't want a flood of alerts:

# Disable for the next 24h
sudo systemctl stop cron     # blunt; you may prefer to mv just the cron entry
# ... make changes ...
# Re-baseline (see the "Re-baselining" section above for why we
# call aide --init directly instead of aideinit).
sudo aide --config=/etc/aide/aide.conf --init
sudo mv /var/lib/aide/aide.db /var/lib/aide/aide.db.bak-$(date -u +%Y%m%d)
sudo mv /var/lib/aide/aide.db.new /var/lib/aide/aide.db
sudo chown root:root /var/lib/aide/aide.db
sudo chmod 0600 /var/lib/aide/aide.db
sudo systemctl start cron

Or, surgically, comment out the 30 3 * * * /usr/local/bin/shithub-aide-check line in crontab -l. Re-baseline + re-enable when done.

What's watched, what isn't

The default Debian config (/etc/aide/aide.conf and the snippets in /etc/aide/aide.conf.d/) covers /etc, /bin, /sbin, /usr/{bin,sbin,lib,libexec,local}, /root, /boot, /lib*. Our exclusions (/etc/aide/aide.conf.d/99_shithub_exclude):

Path Why excluded
/data Repo data root — write-heavy by design
/var/lib/postgresql Postgres rewrites these constantly
/var/lib/shithub* Application state
/var/lib/caddy, /var/log/caddy Cert renewals + access log churn
/var/log/shithub App logs (incl. our own aide.log)
/root/src/shithub Source tree fetched by every deploy
/usr/local/share/shithub Restore-drill scratch
/var/backups/shithub Nightly pg_dump
/var/lib/aide AIDE's own DB
/tmp/shithubd-new Deploy step's binary swap path

If you add a new system path that legitimately churns, add it here, commit, re-run ansible, then re-baseline.

View source
1 # AIDE — file-integrity monitoring
2
3 AIDE (Advanced Intrusion Detection Environment) hashes a chosen set
4 of system files at install time and re-checks them nightly. We use
5 it to catch the post-compromise persistence pattern — someone with
6 root replaces `/usr/local/bin/shithubd`, drops a systemd unit in
7 `/etc/systemd/system/`, modifies `/root/.ssh/authorized_keys`, etc.
8 The daily check produces no output when nothing's changed and a
9 loud journal entry when something has.
10
11 ## Where alerts surface
12
13 ```sh
14 journalctl -t shithub-aide -n 200 --no-pager
15 tail -100 /var/log/shithub/aide.log
16 ```
17
18 The wrapper at `/usr/local/bin/shithub-aide-check` writes both:
19
20 - `/var/log/shithub/aide.log` — append-only, persists across reboots.
21 - `journalctl -t shithub-aide` — structured, queryable, ships
22 with whatever log shipper we add later.
23
24 A `/var/run/shithub-aide.last-clean` heartbeat file is updated on
25 every clean run so the operator can confirm the cron actually
26 fires:
27
28 ```sh
29 stat /var/run/shithub-aide.last-clean
30 # Modify: 2026-05-10 03:30:12 +0000 UTC ← yesterday's clean run
31 ```
32
33 Email delivery is **not yet wired**. The droplet has no MTA and
34 the project's outbound SMTP (Postmark) is approval-gated. Once
35 Postmark is approved, swap the `systemd-cat` call in the wrapper
36 for a `curl POST https://api.postmarkapp.com/email …` invocation
37 using the existing `SHITHUB_AUTH__POSTMARK__SERVER_TOKEN` (read
38 the env file from inside the wrapper).
39
40 ## When alerts fire
41
42 1. Look at the journal entry. Each diff line is one of:
43 - `f` — file content changed (size, mtime, hash)
44 - `+` / `-` — file added / removed
45 - `d` — directory metadata changed
46 2. **Match the diff against an authorized change**:
47 - `apt` / `unattended-upgrades` ran → expect changes under
48 `/usr/lib/`, `/usr/sbin/`, `/etc/apt/`. Cross-check against
49 `journalctl -u unattended-upgrades` for the same timeframe.
50 - A deploy ran → expect `/usr/local/bin/shithubd` to change.
51 Cross-check the SHA against `gh run list --workflow=deploy.yml`.
52 - A manual config edit → match against the operator's notes.
53 3. **No authorized change matches** → treat as an incident. Open
54 `runbooks/incidents.md`. Don't re-baseline AIDE until the
55 investigation closes.
56
57 ## Re-baselining after an authorized change
58
59 Whenever you make an intentional change to a watched path (apt
60 upgrade, manual config edit, ansible-driven config change), the
61 next nightly run will flag it. Re-baseline once the change is
62 confirmed-good:
63
64 ```sh
65 # 1. Generate a new baseline (10–15 min on shithub-prod). Use
66 # aide --init directly — Ubuntu's aideinit wrapper prompts
67 # interactively to confirm the post-init copy and won't be
68 # auto-answered by stdin redirection.
69 sudo aide --config=/etc/aide/aide.conf --init
70
71 # 2. Keep the previous baseline as a dated backup so you can
72 # revert tonight's check behavior in 1 second if the new
73 # baseline turns out to capture unwanted state.
74 sudo mv /var/lib/aide/aide.db /var/lib/aide/aide.db.bak-$(date -u +%Y%m%d)
75
76 # 3. Promote. The exact filename suffix depends on the install:
77 # Ubuntu 24's aide-common produces uncompressed aide.db.new
78 # (no .gz). Adjust if your install differs (check ls
79 # /var/lib/aide/ before this step).
80 sudo mv /var/lib/aide/aide.db.new /var/lib/aide/aide.db
81 sudo chown root:root /var/lib/aide/aide.db
82 sudo chmod 0600 /var/lib/aide/aide.db
83
84 sudo rm -f /var/lib/aide/.config-changed
85 ```
86
87 Avoid `aideinit` directly: it prompts twice (`Overwrite existing
88 aide.db.new [Yn]?` and `Overwrite /var/lib/aide/aide.db [yN]?`)
89 and the second prompt's default is `N`, so any non-interactive
90 invocation (cron, nohup, ssh without `-t`) silently bails after
91 generating the new database without promoting it.
92
93 ## Re-baselining after an Ansible config change
94
95 When `deploy/ansible/roles/base/files/aide-shithub.conf` is edited
96 and the playbook re-runs, the `rebuild aide database` handler
97 drops `/var/lib/aide/.config-changed`. Re-baseline as above to
98 clear it.
99
100 ## Disabling temporarily
101
102 If you're about to do a large planned change (OS upgrade, big
103 ansible re-run) and don't want a flood of alerts:
104
105 ```sh
106 # Disable for the next 24h
107 sudo systemctl stop cron # blunt; you may prefer to mv just the cron entry
108 # ... make changes ...
109 # Re-baseline (see the "Re-baselining" section above for why we
110 # call aide --init directly instead of aideinit).
111 sudo aide --config=/etc/aide/aide.conf --init
112 sudo mv /var/lib/aide/aide.db /var/lib/aide/aide.db.bak-$(date -u +%Y%m%d)
113 sudo mv /var/lib/aide/aide.db.new /var/lib/aide/aide.db
114 sudo chown root:root /var/lib/aide/aide.db
115 sudo chmod 0600 /var/lib/aide/aide.db
116 sudo systemctl start cron
117 ```
118
119 Or, surgically, comment out the `30 3 * * * /usr/local/bin/shithub-aide-check`
120 line in `crontab -l`. Re-baseline + re-enable when done.
121
122 ## What's watched, what isn't
123
124 The default Debian config (`/etc/aide/aide.conf` and the snippets
125 in `/etc/aide/aide.conf.d/`) covers `/etc`, `/bin`, `/sbin`,
126 `/usr/{bin,sbin,lib,libexec,local}`, `/root`, `/boot`, `/lib*`.
127 Our exclusions (`/etc/aide/aide.conf.d/99_shithub_exclude`):
128
129 | Path | Why excluded |
130 |---|---|
131 | `/data` | Repo data root — write-heavy by design |
132 | `/var/lib/postgresql` | Postgres rewrites these constantly |
133 | `/var/lib/shithub*` | Application state |
134 | `/var/lib/caddy`, `/var/log/caddy` | Cert renewals + access log churn |
135 | `/var/log/shithub` | App logs (incl. our own aide.log) |
136 | `/root/src/shithub` | Source tree fetched by every deploy |
137 | `/usr/local/share/shithub` | Restore-drill scratch |
138 | `/var/backups/shithub` | Nightly pg_dump |
139 | `/var/lib/aide` | AIDE's own DB |
140 | `/tmp/shithubd-new` | Deploy step's binary swap path |
141
142 If you add a new system path that legitimately churns, add it
143 here, commit, re-run ansible, then re-baseline.