gardesk/garcard / a8becdc

Browse files

Add sprint 08 operational runbooks

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
a8becdcaa89ddf4688a71b0521a31363909e9ecd
Parents
d9e7a02
Tree
c466120

4 changed files

StatusFile+-
M README.md 5 0
A docs/runbooks/garcardctl-cookbook.md 45 0
A docs/runbooks/incident-triage.md 62 0
A docs/runbooks/migrate-from-external-agent.md 41 0
README.mdmodified
@@ -75,6 +75,11 @@ gartk prompt path with a persistent in-process modal session and falls back to
7575
    - run with debug logs: `RUST_LOG=garcard=debug cargo run -p garcard -- daemon`
7676
    - verify fallback path by setting `GARCARD_PROMPT_COMMAND` explicitly
7777
 
78
+## Runbooks
79
+1. `docs/runbooks/migrate-from-external-agent.md`
80
+2. `docs/runbooks/incident-triage.md`
81
+3. `docs/runbooks/garcardctl-cookbook.md`
82
+
7883
 ## Known Limitations
7984
 1. Policy results are host-specific; some actions may auto-authorize and not trigger prompts.
8085
 2. Current implementation targets logged-in user sessions on X11.
docs/runbooks/garcardctl-cookbook.mdadded
@@ -0,0 +1,45 @@
1
+# Garcardctl Operator Cookbook
2
+
3
+## Reachability And Runtime
4
+1. Ping daemon:
5
+   - `garcardctl ping`
6
+2. Runtime status and health surface:
7
+   - `garcardctl status`
8
+3. Extended diagnostics and remediation hints:
9
+   - `garcardctl diagnose`
10
+4. Version/protocol handshake:
11
+   - `garcardctl version`
12
+
13
+## Auth Lifecycle
14
+1. Inspect current auth state:
15
+   - `garcardctl auth-summary`
16
+2. Trigger policy challenge manually:
17
+   - `pkcheck --allow-user-interaction --process $$ --action-id com.mesonbuild.install.run`
18
+
19
+## Temporary Authorization Controls
20
+1. List temporary authorizations:
21
+   - `garcardctl temp-list`
22
+2. Revoke one authorization by id:
23
+   - `garcardctl temp-revoke <authorization-id>`
24
+3. Revoke all temporary authorizations:
25
+   - `garcardctl temp-revoke-all`
26
+
27
+## Service Control
28
+1. Request daemon shutdown:
29
+   - `garcardctl quit`
30
+2. Start daemon (workspace run):
31
+   - `cargo run -p garcard -- daemon`
32
+3. Restart user service deployment:
33
+   - `systemctl --user restart garcard.service`
34
+
35
+## Standard Troubleshooting Sequence
36
+1. `garcardctl ping`
37
+2. `garcardctl status`
38
+3. `garcardctl diagnose`
39
+4. `garcardctl auth-summary`
40
+5. `garcardctl temp-list`
41
+
42
+## Operational Notes
43
+1. IPC controls are same-UID restricted.
44
+2. `status` now includes authority and subject health fields for control-surface consumers.
45
+3. `diagnose` includes remediation hints for no-agent and denied-flow scenarios.
docs/runbooks/incident-triage.mdadded
@@ -0,0 +1,62 @@
1
+# Incident Triage Playbook
2
+
3
+## 1) No Agent Available
4
+Symptom:
5
+1. `pkcheck` returns: `Authorization requires authentication but no agent is available.`
6
+
7
+Checklist:
8
+1. Verify daemon socket/control path:
9
+   - `garcardctl ping`
10
+   - `garcardctl status`
11
+2. Check diagnostics:
12
+   - `garcardctl diagnose`
13
+3. If daemon is down, start/restart:
14
+   - `systemctl --user restart garcard.service`
15
+4. If polkit was restarted, restart garcard after polkit recovery:
16
+   - `garcardctl quit`
17
+   - relaunch daemon/service
18
+
19
+Escalate if:
20
+1. `diagnose.authority_connected` remains `false` after daemon restart.
21
+
22
+## 2) Repeated Denied Authentication
23
+Symptom:
24
+1. Prompt loops with denied results despite retries.
25
+
26
+Checklist:
27
+1. Inspect last outcome and phase:
28
+   - `garcardctl auth-summary`
29
+2. Verify operator account authorization policy (wheel/admin membership and action policy rules).
30
+3. Trigger a controlled challenge and inspect logs:
31
+   - `RUST_LOG=garcard=debug garcard daemon`
32
+   - `pkcheck --allow-user-interaction --process $$ --action-id com.mesonbuild.install.run`
33
+4. Confirm deny path vs transport failure in logs (`FAILURE` vs transport errors).
34
+
35
+Escalate if:
36
+1. Correct credentials consistently map to denied outcomes across multiple actions.
37
+
38
+## 3) Authority Disconnect / DBus Errors
39
+Symptom:
40
+1. `diagnose` reports authority connectivity errors.
41
+2. Temp-authorization commands fail with dbus/authority errors.
42
+
43
+Checklist:
44
+1. Check diagnostics:
45
+   - `garcardctl diagnose`
46
+2. Restart daemon:
47
+   - `garcardctl quit`
48
+   - relaunch daemon/service
49
+3. Re-check:
50
+   - `garcardctl status`
51
+   - `garcardctl temp-list`
52
+4. If still failing, restart polkit (system policy permitting) and restart daemon.
53
+
54
+Escalate if:
55
+1. Authority connectivity remains down after polkit + daemon restart.
56
+
57
+## Evidence To Capture For Any Incident
58
+1. `garcardctl status`
59
+2. `garcardctl auth-summary`
60
+3. `garcardctl diagnose`
61
+4. Relevant daemon logs with timestamps
62
+5. Exact `pkcheck` command and stderr/stdout
docs/runbooks/migrate-from-external-agent.mdadded
@@ -0,0 +1,41 @@
1
+# Migration Runbook: External Agent To Garcard
2
+
3
+## Scope
4
+Move from another desktop polkit agent (for example `polkit-kde-agent-1`, `lxqt-policykit-agent`, `mate-polkit`) to `garcard`.
5
+
6
+## Preconditions
7
+1. `garcard` and `garcardctl` are installed or runnable from this workspace.
8
+2. You have a user session with access to polkit prompts.
9
+3. Existing external polkit agent service is known.
10
+
11
+## Steps
12
+1. Stop and disable the existing agent for the user session.
13
+   - Example:
14
+     - `systemctl --user disable --now lxqt-policykit-agent.service`
15
+     - `systemctl --user disable --now polkit-kde-authentication-agent-1.service`
16
+2. Install and enable `garcard` user service.
17
+   - `install -Dm644 garcard.service ~/.config/systemd/user/garcard.service`
18
+   - `systemctl --user daemon-reload`
19
+   - `systemctl --user enable --now garcard.service`
20
+3. Verify control plane reachability.
21
+   - `garcardctl ping`
22
+   - `garcardctl status`
23
+   - `garcardctl diagnose`
24
+4. Trigger an auth challenge.
25
+   - `pkcheck --allow-user-interaction --process $$ --action-id com.mesonbuild.install.run`
26
+5. Validate lifecycle controls.
27
+   - `garcardctl auth-summary`
28
+   - `garcardctl temp-list`
29
+   - `garcardctl temp-revoke-all`
30
+
31
+## Success Criteria
32
+1. Only garcard owns the active auth-agent role in session.
33
+2. Prompt appears and accepts valid credentials.
34
+3. `garcardctl` lifecycle commands respond without permission/socket errors.
35
+
36
+## Rollback
37
+1. Disable garcard user service:
38
+   - `systemctl --user disable --now garcard.service`
39
+2. Re-enable prior agent service:
40
+   - `systemctl --user enable --now <previous-agent>.service`
41
+3. Re-run `pkcheck --allow-user-interaction ...` to confirm prior behavior is restored.