tenseleyflow/shithub / 618280b

Browse files

Add docs/internal/2fa.md, extend config reference, add SHITHUB_TOTP_KEY env example

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
618280bf2ff74cbe79f0485be5c3a049eaf42363
Parents
639aace
Tree
a207127

3 changed files

StatusFile+-
M .env.example 4 0
A docs/internal/2fa.md 202 0
M docs/internal/config.md 1 0
.env.examplemodified
@@ -43,3 +43,7 @@ SHITHUB_AUTH__EMAIL_BACKEND=stdout
4343
 SHITHUB_AUTH__SMTP__ADDR=127.0.0.1:1025
4444
 # When email_backend=postmark:
4545
 # SHITHUB_AUTH__POSTMARK__SERVER_TOKEN=...
46
+
47
+# AEAD key for at-rest TOTP secrets (S06). Generate once and persist —
48
+# rotating without re-encrypting every row breaks every existing 2FA login.
49
+# SHITHUB_TOTP_KEY=$(openssl rand -base64 32)
docs/internal/2fa.mdadded
@@ -0,0 +1,202 @@
1
+# Two-factor authentication
2
+
3
+S06 ships TOTP-based 2FA: enrollment, login challenge, recovery codes, encrypted-at-rest secrets, audit-log writes, and an admin escape hatch. WebAuthn and SMS are intentionally out of scope (WebAuthn is its own future sprint; SMS is phishable and we won't ship it).
4
+
5
+## What's wired
6
+
7
+- `user_totp` (one row per user, encrypted secret + nonce + last-used counter).
8
+- `user_recovery_codes` (10 per enrollment, sha256-stored, single-use).
9
+- `auth_audit_log` (generic schema reused by future sprints).
10
+- `internal/auth/secretbox` — chacha20poly1305 wrapper, 32-byte key from config.
11
+- `internal/auth/totp` — secret/URI/Verify/recovery codes/SVG QR.
12
+- `internal/auth/audit` — typed action recorder.
13
+- `/login/2fa` challenge step.
14
+- `/settings/security/2fa/{enable,disable,regenerate}`.
15
+- `shithubd admin clear-2fa <username>` operator escape hatch.
16
+
17
+## Login flow with 2FA enrolled
18
+
19
+```
20
+POST /login (password)
21
+   │
22
+   ├─ password OK, no 2FA confirmed → set Session.UserID, redirect /
23
+   │
24
+   └─ password OK, 2FA confirmed → set Session.Pre2FAUserID,
25
+                                   redirect /login/2fa (?next=...)
26
+                                       │
27
+                                       ▼
28
+                                  POST /login/2fa (TOTP or recovery code)
29
+                                       │
30
+                                       ├─ accepted → drop Pre2FAUserID,
31
+                                       │             set UserID, save (re-issued cookie)
32
+                                       │             redirect to next or /
33
+                                       │
34
+                                       └─ rejected → form re-rendered;
35
+                                                     after 5 wrong tries / 5 min,
36
+                                                     pre-2FA marker is dropped
37
+                                                     and user must restart.
38
+```
39
+
40
+`Session.Pre2FAUserID` is the marker that says "password OK, 2FA pending." It carries no privileges — handlers that check `CurrentUser.IsAnonymous()` still see this as anonymous because `UserID == 0`. Only the 2FA challenge handler reads `Pre2FAUserID`.
41
+
42
+## Enrollment flow
43
+
44
+```
45
+GET /settings/security/2fa/enable
46
+   │  Server generates 20-byte secret. Encrypts with AEAD. Stores
47
+   │  user_totp row with confirmed_at NULL. Renders QR SVG inline.
48
+   ▼
49
+POST /settings/security/2fa/enable (current TOTP code)
50
+   │  Decrypt stored secret. Verify code with ±1 step skew.
51
+   │  Atomically:
52
+   │    - ConfirmUserTOTP (sets confirmed_at, last_used_counter).
53
+   │    - DeleteUserRecoveryCodes (defensive — clears any leftover codes).
54
+   │    - InsertRecoveryCode × 10.
55
+   │    - audit.Action2FAEnabled, audit.ActionRecoveryCodesIssued.
56
+   │  Render recovery codes ONCE.
57
+   │  notifyUser("2fa_enabled") best-effort.
58
+   ▼
59
+User saves the codes. From this point /login goes through /login/2fa.
60
+```
61
+
62
+Race protection: `ConfirmUserTOTP` is `UPDATE ... WHERE confirmed_at IS NULL`. If two concurrent confirms arrive, exactly one wins (rows-affected==1); the loser falls through to a 303 to `/disable`.
63
+
64
+## Disable / regenerate flow
65
+
66
+Both require password + current TOTP. This is non-negotiable: a hijacked session must not be able to disable 2FA without proving knowledge of the password AND the authenticator. The `confirmPasswordAndTOTP` helper enforces this in one place.
67
+
68
+- **Disable** clears `user_totp`, clears recovery codes, audits `2fa_disabled`, sends a notification email.
69
+- **Regenerate** keeps the TOTP row, deletes + re-issues recovery codes, audits `recovery_codes_regenerated`, sends a notification email. Used when the user has saved their codes somewhere they no longer trust.
70
+
71
+## Counter anti-replay
72
+
73
+`last_used_counter` stores the highest TOTP step we've accepted for the user. On every successful Verify, the handler calls `BumpTOTPCounter` which atomically:
74
+
75
+```sql
76
+UPDATE user_totp
77
+SET last_used_counter = $2
78
+WHERE user_id = $1 AND $2::bigint > last_used_counter
79
+```
80
+
81
+If rows-affected == 0, the code is from a step ≤ the stored counter — i.e. a replay — and the handler rejects it. Combined with the 30-second TOTP period, this caps the replay window at the `Verify` skew tolerance (±1 step → 30 seconds either side).
82
+
83
+The integration test `TestTwoFactor_CounterReplayRejected` exercises this: enroll, log in once with code from step T, log out, log in again with the SAME code — expects rejection.
84
+
85
+## Recovery codes
86
+
87
+Format: `XXXX-XXXX-XXXX` using a reduced alphabet (`ACDEFGHJKMNPQRTUVWXYZ234`) to avoid easily-confused glyphs (no `0/O`, no `1/I/L`, no `8/B`, no `S`).
88
+
89
+- Generated 10 at a time at enrollment or regenerate.
90
+- Stored as sha256 of the normalized form (uppercase, no dashes/spaces).
91
+- Single-use via `used_at`.
92
+- Shown to the user **once** at generation time. Handlers must not store them in the session.
93
+
94
+The login-challenge handler tells codes apart by shape (`LooksLikeRecoveryCode` checks length + alphabet). 6-digit numerics route to `verifyTOTPCode`; everything else routes to `consumeRecoveryCode`.
95
+
96
+## Encryption at rest
97
+
98
+`internal/auth/secretbox` wraps `chacha20poly1305`. Every TOTP row stores `(secret_encrypted, secret_nonce)`. The nonce is 12 random bytes per row — well below the birthday bound at our scale.
99
+
100
+The 32-byte key comes from config:
101
+
102
+```toml
103
+[auth]
104
+totp_key_b64 = "<base64-encoded 32 bytes>"
105
+```
106
+
107
+Or env: `SHITHUB_AUTH__TOTP_KEY_B64=...` (also accepts the alias `SHITHUB_TOTP_KEY` for symmetry with `SHITHUB_SESSION_KEY`).
108
+
109
+If the key isn't set, the `/settings/security/2fa/*` routes are NOT registered — startup logs a warning. Login through `/login/2fa` still works for users who haven't enrolled (they never go through that path).
110
+
111
+### Key rotation
112
+
113
+Rotating `auth.totp_key_b64` without re-encrypting every row breaks every existing 2FA login. The rotation procedure:
114
+
115
+1. Generate the new key: `openssl rand -base64 32`.
116
+2. Run a one-off migration that decrypts each row with the OLD key and re-seals with the NEW key. The migration must complete inside a single deploy window.
117
+3. Swap `auth.totp_key_b64` to the new value.
118
+4. Restart shithubd.
119
+
120
+A future sprint will ship this as a `shithubd admin rotate-totp-key` command. Until then, treat the key as effectively permanent.
121
+
122
+## Audit log
123
+
124
+`auth_audit_log` is intentionally generic. Every 2FA state change writes a row:
125
+
126
+| Action | Who writes | Notes |
127
+|---|---|---|
128
+| `2fa_enabled` | enroll handler | meta: `{}` |
129
+| `recovery_codes_issued` | enroll + regen | meta: `{count: 10}` |
130
+| `recovery_code_used` | challenge handler | meta: `{}` |
131
+| `recovery_codes_regenerated` | regen handler | meta: `{count: 10}` |
132
+| `2fa_disabled` | disable handler | meta: `{}` |
133
+| `admin_cleared_2fa` | `shithubd admin clear-2fa` | meta: `{admin: "cli"}` |
134
+
135
+Future sprints will reuse the same table (S07 SSH key changes, S15 permissions, S30 org membership, S34 admin actions). When adding a new action, append a constant in `internal/auth/audit/audit.go` and document it here.
136
+
137
+Audit retention: indefinite for security-relevant events. Revisit at S37.
138
+
139
+## Admin escape hatch
140
+
141
+```sh
142
+shithubd admin clear-2fa <username>
143
+```
144
+
145
+For support cases where the user has lost both their authenticator AND their recovery codes. Wipes `user_totp` + `user_recovery_codes`, writes `admin_cleared_2fa` to the audit log, and emails the user a notification (best-effort). The user must re-enroll 2FA after this.
146
+
147
+By policy this should only be invoked after manual identity verification through a support channel — there's no automated bypass.
148
+
149
+## QR rendering
150
+
151
+Server-side SVG. The otpauth URI is encoded via `boombuler/barcode` and emitted as a sequence of `<rect>` elements inside an `<svg>`. No client-side QR library, so no third-party JS sees the secret.
152
+
153
+The otpauth URI carries the secret in plaintext. Defense in depth:
154
+
155
+- The slog redactor (`internal/infra/log/log.go`) scrubs strings containing `otpauth://`.
156
+- The SVG is generated server-side and never persisted.
157
+- `TestQRSVG_Renders` asserts the rendered SVG does NOT contain the secret's base32 form (only the encoded QR modules).
158
+
159
+## Rate limiting
160
+
161
+The 2FA challenge step uses the existing `auth_throttle` table:
162
+
163
+| Scope | Identifier | Max | Window |
164
+|---|---|---|---|
165
+| `2fa` | `ip:<client-ip>\|uid:<user-id>` | 5 | 5 minutes |
166
+
167
+After 5 failed attempts, the pre-2FA marker is dropped and the user has to start over from `/login`.
168
+
169
+## Security properties
170
+
171
+- **Password reset does NOT bypass 2FA.** After a reset, the user is still anonymous (or pre-2FA). They must complete the challenge to get a full session.
172
+- **Disable requires password + TOTP.** A hijacked session can't strip 2FA.
173
+- **TOTP secrets are encrypted at rest.** A DB dump alone can't generate codes.
174
+- **Counter anti-replay** caps a stolen code's reuse window at ±30s of when it was first burned.
175
+- **Audit log** preserves enable/disable/regenerate/clear events for forensic review.
176
+- **otpauth URI never logged.** Slog redactor + handler discipline.
177
+
178
+## Testing
179
+
180
+- **Unit tests** (no DB): `internal/auth/totp`, `internal/auth/secretbox`. Run with `go test ./...`.
181
+- **DB-backed tests** (skip when `SHITHUB_TEST_DATABASE_URL` is unset): `internal/auth/audit`, `internal/web/handlers/auth`.
182
+
183
+Integration tests in `internal/web/handlers/auth/twofactor_test.go`:
184
+
185
+- `TestTwoFactor_Enroll_Logout_Login_Challenge_FullSession`: full happy path.
186
+- `TestTwoFactor_RecoveryCode_OneTimeUse`: recovery code consumed, second use rejected.
187
+- `TestTwoFactor_CounterReplayRejected`: same TOTP step rejected on second use.
188
+- `TestTwoFactor_DisableRequiresPasswordAndTOTP`: wrong creds rejected; correct creds succeed.
189
+
190
+## Pitfalls / what to remember
191
+
192
+- **Don't show recovery codes more than once.** The recovery template only renders the codes when `RecoveryCodes` is set on the page data; handlers set it exactly once at generation time.
193
+- **Don't bypass 2FA on password reset.** Post-reset flow must still go through `/login/2fa` if the user has 2FA enrolled. Reset just changes `users.password_hash` — `user_totp.confirmed_at` is untouched.
194
+- **Don't put secrets in `meta` JSON.** Audit rows can leak via DB-dump or admin queries.
195
+- **Authenticator clock drift** is bounded at ±30s by the SkewSteps=1 setting. Document this in user help when the settings page lands (S10).
196
+- **Lost authenticator + lost recovery codes = locked out.** That's by design. Admin escape hatch is the only recovery; treat it as a high-friction, identity-verified path.
197
+
198
+## Related docs
199
+
200
+- `docs/internal/auth.md` — email/password auth (S05).
201
+- `docs/internal/config.md` — full config reference (incl. `auth.totp_key_b64`).
202
+- `docs/internal/observability.md` — slog redaction (otpauth scrub).
docs/internal/config.mdmodified
@@ -67,6 +67,7 @@ shithubd version # includes a one-line summary of which sinks are confi
6767
 | `auth.argon2.memory_kib` | uint32 | `65536` | argon2id memory cost (KiB). |
6868
 | `auth.argon2.time` | uint32 | `3` | argon2id iterations. |
6969
 | `auth.argon2.threads` | uint8 | `2` | argon2id parallelism. |
70
+| `auth.totp_key_b64` | string | `""` | Base64 32-byte AEAD key for at-rest TOTP secrets. Aliased by `SHITHUB_TOTP_KEY`. Empty disables 2FA enrollment routes. |
7071
 
7172
 ## Env-var examples
7273