markdown · 3088 bytes Raw Blame History

Regenerate AKC cache

The AuthorizedKeysCommand (shithubd ssh-authkeys %f) resolves an offered SSH key fingerprint to a user via the user_ssh_keys table. It does not maintain a write-through cache today (every call hits the DB), so there's nothing to "regenerate" in the sense of cache.invalidate(...).

What this runbook actually covers: the postgres-side caching that the AKC subprocess depends on, and the operational state you might need to reset around it.

When this matters

You may need to do something here if:

  • A user's just-added SSH key is being rejected on push despite showing in their Settings.
  • A removed SSH key is still being accepted (this would be a bug; don't shrug it off).
  • The AKC subprocess is timing out under load and you need to understand what's slow.

Diagnose first

# What's the AKC subcommand seeing?
sudo -u shithub-ssh /usr/local/bin/shithubd ssh-authkeys SHA256:abc...
# This prints what sshd would have read; empty output = no match.

Compare against the DB:

psql -d shithub -c "
  SELECT k.fingerprint, u.username
    FROM user_ssh_keys k
    JOIN users u ON u.id = k.user_id
   WHERE k.fingerprint = 'SHA256:abc...';"

Three possibilities:

psql says AKC says Diagnosis
match match sshd is broken, not us — check sshd logs.
match empty The AKC subcommand isn't reading what we think — check EnvironmentFile, the binary path, and that shithub-ssh user can read /etc/shithub/ssh.env.
empty empty Key isn't in the DB — user needs to re-add.
empty match Stale cache or replication lag. See below.

"Empty in psql, match in AKC" — actually impossible today

We don't run a read replica for the AKC. If you ever see this, the AKC subcommand is reading from a different DB than psql is — check db.url in ssh.env vs the operator's psql connection.

The remove-key-but-still-accepted case

A removed SSH key being accepted is not caused by a stale cache (we don't have one); it's caused by either:

  • The key wasn't actually removed (check the DB, not the UI).
  • sshd is using its own authorized_keys file in addition to AKC. Check /etc/ssh/sshd_config and per-user ~/.ssh/authorized_keys. The shipped sshd config disables per-user files for the git user; if someone's customized, that's the leak.
  • The session was already authenticated and is being held open. Kill it: sudo systemctl status sshd, find the per-conn process, kill <pid>.

Future: add a cache here

If we ever add an in-process cache to AKC (we'd want to, to shave the per-push DB call), invalidation becomes load-bearing:

  • Cache key: SHA-256 fingerprint.
  • Invalidate on: key add (insert) and key remove (delete).
  • TTL: 60 seconds is the longest acceptable window between removing a key and the AKC stopping to honor it.

When that lands, this runbook gets a real "regenerate" section.

View source
1 # Regenerate AKC cache
2
3 The `AuthorizedKeysCommand` (`shithubd ssh-authkeys %f`) resolves
4 an offered SSH key fingerprint to a user via the
5 `user_ssh_keys` table. It does not maintain a write-through
6 cache today (every call hits the DB), so there's nothing to
7 "regenerate" in the sense of `cache.invalidate(...)`.
8
9 What this runbook actually covers: the **postgres-side**
10 caching that the AKC subprocess depends on, and the operational
11 state you might need to reset around it.
12
13 ## When this matters
14
15 You may need to do something here if:
16
17 - A user's just-added SSH key is being rejected on push despite
18 showing in their Settings.
19 - A removed SSH key is still being accepted (this would be a
20 bug; don't shrug it off).
21 - The AKC subprocess is timing out under load and you need to
22 understand what's slow.
23
24 ## Diagnose first
25
26 ```sh
27 # What's the AKC subcommand seeing?
28 sudo -u shithub-ssh /usr/local/bin/shithubd ssh-authkeys SHA256:abc...
29 # This prints what sshd would have read; empty output = no match.
30 ```
31
32 Compare against the DB:
33
34 ```sh
35 psql -d shithub -c "
36 SELECT k.fingerprint, u.username
37 FROM user_ssh_keys k
38 JOIN users u ON u.id = k.user_id
39 WHERE k.fingerprint = 'SHA256:abc...';"
40 ```
41
42 Three possibilities:
43
44 | psql says | AKC says | Diagnosis |
45 |-----------|----------|------------------------------------------------|
46 | match | match | sshd is broken, not us — check sshd logs. |
47 | match | empty | The AKC subcommand isn't reading what we think — check `EnvironmentFile`, the binary path, and that `shithub-ssh` user can read `/etc/shithub/ssh.env`. |
48 | empty | empty | Key isn't in the DB — user needs to re-add. |
49 | empty | match | **Stale cache or replication lag.** See below. |
50
51 ## "Empty in psql, match in AKC" — actually impossible today
52
53 We don't run a read replica for the AKC. If you ever see this,
54 the AKC subcommand is reading from a different DB than psql is
55 — check `db.url` in `ssh.env` vs the operator's psql connection.
56
57 ## The remove-key-but-still-accepted case
58
59 A removed SSH key being accepted is **not** caused by a stale
60 cache (we don't have one); it's caused by either:
61
62 - The key wasn't actually removed (check the DB, not the UI).
63 - sshd is using its own `authorized_keys` file in addition to
64 AKC. Check `/etc/ssh/sshd_config` and per-user
65 `~/.ssh/authorized_keys`. The shipped sshd config disables
66 per-user files for the `git` user; if someone's customized,
67 that's the leak.
68 - The session was already authenticated and is being held open.
69 Kill it: `sudo systemctl status sshd`, find the per-conn
70 process, `kill <pid>`.
71
72 ## Future: add a cache here
73
74 If we ever add an in-process cache to AKC (we'd want to, to
75 shave the per-push DB call), invalidation becomes load-bearing:
76
77 - Cache key: SHA-256 fingerprint.
78 - Invalidate on: key add (insert) and key remove (delete).
79 - TTL: 60 seconds is the longest acceptable window between
80 removing a key and the AKC stopping to honor it.
81
82 When that lands, this runbook gets a real "regenerate" section.