Add sprint 08 operational runbooks
Authored by
mfwolffe <wolffemf@dukes.jmu.edu>
- SHA
a8becdcaa89ddf4688a71b0521a31363909e9ecd- Parents
-
d9e7a02 - Tree
c466120
a8becdc
a8becdcaa89ddf4688a71b0521a31363909e9ecdd9e7a02
c466120| Status | File | + | - |
|---|---|---|---|
| M |
README.md
|
5 | 0 |
| A |
docs/runbooks/garcardctl-cookbook.md
|
45 | 0 |
| A |
docs/runbooks/incident-triage.md
|
62 | 0 |
| A |
docs/runbooks/migrate-from-external-agent.md
|
41 | 0 |
README.mdmodified@@ -75,6 +75,11 @@ gartk prompt path with a persistent in-process modal session and falls back to | ||
| 75 | 75 | - run with debug logs: `RUST_LOG=garcard=debug cargo run -p garcard -- daemon` |
| 76 | 76 | - verify fallback path by setting `GARCARD_PROMPT_COMMAND` explicitly |
| 77 | 77 | |
| 78 | +## Runbooks | |
| 79 | +1. `docs/runbooks/migrate-from-external-agent.md` | |
| 80 | +2. `docs/runbooks/incident-triage.md` | |
| 81 | +3. `docs/runbooks/garcardctl-cookbook.md` | |
| 82 | + | |
| 78 | 83 | ## Known Limitations |
| 79 | 84 | 1. Policy results are host-specific; some actions may auto-authorize and not trigger prompts. |
| 80 | 85 | 2. Current implementation targets logged-in user sessions on X11. |
docs/runbooks/garcardctl-cookbook.mdadded@@ -0,0 +1,45 @@ | ||
| 1 | +# Garcardctl Operator Cookbook | |
| 2 | + | |
| 3 | +## Reachability And Runtime | |
| 4 | +1. Ping daemon: | |
| 5 | + - `garcardctl ping` | |
| 6 | +2. Runtime status and health surface: | |
| 7 | + - `garcardctl status` | |
| 8 | +3. Extended diagnostics and remediation hints: | |
| 9 | + - `garcardctl diagnose` | |
| 10 | +4. Version/protocol handshake: | |
| 11 | + - `garcardctl version` | |
| 12 | + | |
| 13 | +## Auth Lifecycle | |
| 14 | +1. Inspect current auth state: | |
| 15 | + - `garcardctl auth-summary` | |
| 16 | +2. Trigger policy challenge manually: | |
| 17 | + - `pkcheck --allow-user-interaction --process $$ --action-id com.mesonbuild.install.run` | |
| 18 | + | |
| 19 | +## Temporary Authorization Controls | |
| 20 | +1. List temporary authorizations: | |
| 21 | + - `garcardctl temp-list` | |
| 22 | +2. Revoke one authorization by id: | |
| 23 | + - `garcardctl temp-revoke <authorization-id>` | |
| 24 | +3. Revoke all temporary authorizations: | |
| 25 | + - `garcardctl temp-revoke-all` | |
| 26 | + | |
| 27 | +## Service Control | |
| 28 | +1. Request daemon shutdown: | |
| 29 | + - `garcardctl quit` | |
| 30 | +2. Start daemon (workspace run): | |
| 31 | + - `cargo run -p garcard -- daemon` | |
| 32 | +3. Restart user service deployment: | |
| 33 | + - `systemctl --user restart garcard.service` | |
| 34 | + | |
| 35 | +## Standard Troubleshooting Sequence | |
| 36 | +1. `garcardctl ping` | |
| 37 | +2. `garcardctl status` | |
| 38 | +3. `garcardctl diagnose` | |
| 39 | +4. `garcardctl auth-summary` | |
| 40 | +5. `garcardctl temp-list` | |
| 41 | + | |
| 42 | +## Operational Notes | |
| 43 | +1. IPC controls are same-UID restricted. | |
| 44 | +2. `status` now includes authority and subject health fields for control-surface consumers. | |
| 45 | +3. `diagnose` includes remediation hints for no-agent and denied-flow scenarios. | |
docs/runbooks/incident-triage.mdadded@@ -0,0 +1,62 @@ | ||
| 1 | +# Incident Triage Playbook | |
| 2 | + | |
| 3 | +## 1) No Agent Available | |
| 4 | +Symptom: | |
| 5 | +1. `pkcheck` returns: `Authorization requires authentication but no agent is available.` | |
| 6 | + | |
| 7 | +Checklist: | |
| 8 | +1. Verify daemon socket/control path: | |
| 9 | + - `garcardctl ping` | |
| 10 | + - `garcardctl status` | |
| 11 | +2. Check diagnostics: | |
| 12 | + - `garcardctl diagnose` | |
| 13 | +3. If daemon is down, start/restart: | |
| 14 | + - `systemctl --user restart garcard.service` | |
| 15 | +4. If polkit was restarted, restart garcard after polkit recovery: | |
| 16 | + - `garcardctl quit` | |
| 17 | + - relaunch daemon/service | |
| 18 | + | |
| 19 | +Escalate if: | |
| 20 | +1. `diagnose.authority_connected` remains `false` after daemon restart. | |
| 21 | + | |
| 22 | +## 2) Repeated Denied Authentication | |
| 23 | +Symptom: | |
| 24 | +1. Prompt loops with denied results despite retries. | |
| 25 | + | |
| 26 | +Checklist: | |
| 27 | +1. Inspect last outcome and phase: | |
| 28 | + - `garcardctl auth-summary` | |
| 29 | +2. Verify operator account authorization policy (wheel/admin membership and action policy rules). | |
| 30 | +3. Trigger a controlled challenge and inspect logs: | |
| 31 | + - `RUST_LOG=garcard=debug garcard daemon` | |
| 32 | + - `pkcheck --allow-user-interaction --process $$ --action-id com.mesonbuild.install.run` | |
| 33 | +4. Confirm deny path vs transport failure in logs (`FAILURE` vs transport errors). | |
| 34 | + | |
| 35 | +Escalate if: | |
| 36 | +1. Correct credentials consistently map to denied outcomes across multiple actions. | |
| 37 | + | |
| 38 | +## 3) Authority Disconnect / DBus Errors | |
| 39 | +Symptom: | |
| 40 | +1. `diagnose` reports authority connectivity errors. | |
| 41 | +2. Temp-authorization commands fail with dbus/authority errors. | |
| 42 | + | |
| 43 | +Checklist: | |
| 44 | +1. Check diagnostics: | |
| 45 | + - `garcardctl diagnose` | |
| 46 | +2. Restart daemon: | |
| 47 | + - `garcardctl quit` | |
| 48 | + - relaunch daemon/service | |
| 49 | +3. Re-check: | |
| 50 | + - `garcardctl status` | |
| 51 | + - `garcardctl temp-list` | |
| 52 | +4. If still failing, restart polkit (system policy permitting) and restart daemon. | |
| 53 | + | |
| 54 | +Escalate if: | |
| 55 | +1. Authority connectivity remains down after polkit + daemon restart. | |
| 56 | + | |
| 57 | +## Evidence To Capture For Any Incident | |
| 58 | +1. `garcardctl status` | |
| 59 | +2. `garcardctl auth-summary` | |
| 60 | +3. `garcardctl diagnose` | |
| 61 | +4. Relevant daemon logs with timestamps | |
| 62 | +5. Exact `pkcheck` command and stderr/stdout | |
docs/runbooks/migrate-from-external-agent.mdadded@@ -0,0 +1,41 @@ | ||
| 1 | +# Migration Runbook: External Agent To Garcard | |
| 2 | + | |
| 3 | +## Scope | |
| 4 | +Move from another desktop polkit agent (for example `polkit-kde-agent-1`, `lxqt-policykit-agent`, `mate-polkit`) to `garcard`. | |
| 5 | + | |
| 6 | +## Preconditions | |
| 7 | +1. `garcard` and `garcardctl` are installed or runnable from this workspace. | |
| 8 | +2. You have a user session with access to polkit prompts. | |
| 9 | +3. Existing external polkit agent service is known. | |
| 10 | + | |
| 11 | +## Steps | |
| 12 | +1. Stop and disable the existing agent for the user session. | |
| 13 | + - Example: | |
| 14 | + - `systemctl --user disable --now lxqt-policykit-agent.service` | |
| 15 | + - `systemctl --user disable --now polkit-kde-authentication-agent-1.service` | |
| 16 | +2. Install and enable `garcard` user service. | |
| 17 | + - `install -Dm644 garcard.service ~/.config/systemd/user/garcard.service` | |
| 18 | + - `systemctl --user daemon-reload` | |
| 19 | + - `systemctl --user enable --now garcard.service` | |
| 20 | +3. Verify control plane reachability. | |
| 21 | + - `garcardctl ping` | |
| 22 | + - `garcardctl status` | |
| 23 | + - `garcardctl diagnose` | |
| 24 | +4. Trigger an auth challenge. | |
| 25 | + - `pkcheck --allow-user-interaction --process $$ --action-id com.mesonbuild.install.run` | |
| 26 | +5. Validate lifecycle controls. | |
| 27 | + - `garcardctl auth-summary` | |
| 28 | + - `garcardctl temp-list` | |
| 29 | + - `garcardctl temp-revoke-all` | |
| 30 | + | |
| 31 | +## Success Criteria | |
| 32 | +1. Only garcard owns the active auth-agent role in session. | |
| 33 | +2. Prompt appears and accepts valid credentials. | |
| 34 | +3. `garcardctl` lifecycle commands respond without permission/socket errors. | |
| 35 | + | |
| 36 | +## Rollback | |
| 37 | +1. Disable garcard user service: | |
| 38 | + - `systemctl --user disable --now garcard.service` | |
| 39 | +2. Re-enable prior agent service: | |
| 40 | + - `systemctl --user enable --now <previous-agent>.service` | |
| 41 | +3. Re-run `pkcheck --allow-user-interaction ...` to confirm prior behavior is restored. | |