tenseleyflow/shithub / b22a29e

Browse files

Add docs/internal/observability.md and docs/internal/config.md

Authored by mfwolffe <wolffemf@dukes.jmu.edu>
SHA
b22a29ef1e2bd39f2b43215dbf6893f416923497
Parents
653dfcb
Tree
542335a

2 changed files

StatusFile+-
A docs/internal/config.md 94 0
A docs/internal/observability.md 75 0
docs/internal/config.mdadded
@@ -0,0 +1,94 @@
1
+# Configuration
2
+
3
+shithub uses a layered configuration loader. Sources, in increasing-precedence order:
4
+
5
+1. **Built-in defaults.** Encoded in `internal/infra/config/Defaults()`.
6
+2. **TOML file.** Path comes from `SHITHUB_CONFIG` env var, falling back to `/etc/shithub/config.toml`. Absence is fine; bad syntax is a hard error.
7
+3. **Environment variables.** `SHITHUB_<area>__<key>` (double-underscore separates nested keys). Examples below.
8
+4. **CLI flag overrides.** Passed in by the caller (mostly `--addr` from `shithubd web`).
9
+
10
+After all four merge, `config.Load` applies a small set of named **aliases** (e.g. `SHITHUB_DATABASE_URL` → `db.url`) for backward compatibility, then runs `Validate`. Any validation failure causes `shithubd` to exit non-zero with a one-line error pointing at the offending key.
11
+
12
+## Inspecting the active configuration
13
+
14
+```sh
15
+shithubd config print     # writes the resolved config as TOML, with secrets redacted
16
+shithubd config validate  # exits non-zero if the resolved config is invalid
17
+shithubd version          # includes a one-line summary of which sinks are configured
18
+```
19
+
20
+`config print` redacts any field whose name contains `password`, `pass`, `secret`, `key`, `token`, `dsn`, or `url` (URL fields are redacted because they often carry credentials in the userinfo component).
21
+
22
+## Reference
23
+
24
+| Key | Type | Default | Notes |
25
+|---|---|---|---|
26
+| `env` | string | `dev` | One of `dev | staging | prod`. Drives default log format and Sentry environment. |
27
+| `web.addr` | string | `:8080` | Listen address. |
28
+| `web.read_timeout` | duration | `30s` | Per-request read timeout. |
29
+| `web.write_timeout` | duration | `30s` | Per-request write timeout. |
30
+| `web.shutdown_timeout` | duration | `10s` | Graceful drain on SIGTERM. |
31
+| `db.url` | string | `""` | Postgres DSN. Aliased by `SHITHUB_DATABASE_URL`. |
32
+| `db.max_conns` | int | `10` | pgxpool max conns. |
33
+| `db.min_conns` | int | `0` | pgxpool min conns. |
34
+| `db.connect_timeout` | duration | `5s` | |
35
+| `log.level` | string | `info` | One of `debug | info | warn | error`. |
36
+| `log.format` | string | `text` | One of `text | json`. |
37
+| `metrics.enabled` | bool | `true` | Mounts `/metrics`. |
38
+| `metrics.basic_auth_user` | string | `""` | When set together with `pass`, gate `/metrics` behind HTTP Basic. |
39
+| `metrics.basic_auth_pass` | string | `""` | |
40
+| `tracing.enabled` | bool | `false` | When true, `tracing.endpoint` is required. |
41
+| `tracing.endpoint` | string | `""` | OTLP HTTP endpoint, e.g. `http://otel-collector:4318`. |
42
+| `tracing.sample_rate` | float | `0.05` | Parent-based ratio sampler in [0, 1]. |
43
+| `tracing.service_name` | string | `shithubd` | OTel resource attribute. |
44
+| `error_reporting.dsn` | string | `""` | Sentry-protocol DSN (works against GlitchTip). Empty disables. |
45
+| `error_reporting.environment` | string | `""` | Tag for filtering events. |
46
+| `error_reporting.release` | string | `""` | Tag for filtering events. |
47
+| `session.key_b64` | string | `""` | Base64 32-byte AEAD key. Aliased by `SHITHUB_SESSION_KEY`. |
48
+| `session.max_age` | duration | `720h` | Cookie session lifetime (30 days). |
49
+| `session.secure` | bool | `false` | Set `Secure` cookie attribute. Enable under TLS (S37 deploy). |
50
+
51
+## Env-var examples
52
+
53
+```sh
54
+# Listen elsewhere
55
+export SHITHUB_WEB__ADDR=:9090
56
+
57
+# Connect to Postgres
58
+export SHITHUB_DATABASE_URL=postgres://shithub:dev@127.0.0.1:5432/shithub?sslmode=disable
59
+# (equivalent: export SHITHUB_DB__URL=...)
60
+
61
+# JSON logs for prod
62
+export SHITHUB_LOG__FORMAT=json
63
+export SHITHUB_LOG__LEVEL=info
64
+
65
+# Enable tracing
66
+export SHITHUB_TRACING__ENABLED=true
67
+export SHITHUB_TRACING__ENDPOINT=http://otel-collector.bare-metal:4318
68
+export SHITHUB_TRACING__SAMPLE_RATE=0.05
69
+
70
+# Error reporting via GlitchTip
71
+export SHITHUB_ERROR_REPORTING__DSN=https://glitchtip.bare-metal/<project-id>
72
+
73
+# Session signing key (deterministic across restarts in prod)
74
+export SHITHUB_SESSION_KEY=$(openssl rand -base64 32)
75
+
76
+# Gate /metrics behind Basic auth
77
+export SHITHUB_METRICS__BASIC_AUTH_USER=prom
78
+export SHITHUB_METRICS__BASIC_AUTH_PASS=<long-random>
79
+```
80
+
81
+## Secrets
82
+
83
+- **Never** commit secrets. `.env` is gitignored; production keys live in a systemd `EnvironmentFile=` with mode `0600`.
84
+- The redaction behavior of `config print` is documented above and tested in `internal/infra/config/config_test.go`. If you add a new secret-bearing field, name it so the redactor matches it (containing `pass`, `secret`, `key`, `token`, `dsn`, `password`, or `url`) — or extend `secretFieldNames` in `internal/infra/config/redact.go`.
85
+- Log-line redaction is independent (see `docs/internal/observability.md`). Both layers exist on purpose; secrets in env-loaded config and in handler-emitted logs travel different paths.
86
+
87
+## Adding a new key
88
+
89
+1. Add the field to the appropriate config struct in `internal/infra/config/config.go` with a `toml:` tag.
90
+2. Set its default in `Defaults()`.
91
+3. Add validation in `Validate()` if it has invariants.
92
+4. If it's secret-bearing, confirm its name matches the redactor.
93
+5. Document it in this file.
94
+6. Update `.env.example` if the env-var form is the typical usage.
docs/internal/observability.mdadded
@@ -0,0 +1,75 @@
1
+# Observability
2
+
3
+shithub ships four sinks: structured logging, Prometheus metrics, OpenTelemetry tracing, and a Sentry-protocol error reporter. All four are governed by the layered config loader (`docs/internal/config.md`) and degrade to no-ops when not configured.
4
+
5
+## Logging
6
+
7
+- Library: `log/slog` (stdlib).
8
+- Format: `text` (default in dev, human-friendly key=value) or `json` (default in prod, one object per line).
9
+- Level: `debug | info | warn | error` (config: `log.level`).
10
+- Standard fields on every line:
11
+  - `time`, `level`, `msg`
12
+  - `request_id` — when a request is in flight (set by the request_id middleware).
13
+  - `user_id` — when known (post-S05).
14
+  - `component` — set by the package emitting the line.
15
+  - `error`, `stack` — on error-level lines, where applicable.
16
+- **Redaction.** `internal/infra/log` wraps the chosen handler so each record passes through a redactor before output:
17
+  - Attribute keys matching `password|pass|secret|key|token|authorization|dsn|otpauth` are redacted to `***`.
18
+  - Attribute string values containing `shithub_pat_`, `otpauth://`, `Bearer `, or `Basic ` are redacted to `***` regardless of the key.
19
+  - Tested in `internal/infra/log/log_test.go`.
20
+
21
+## Metrics
22
+
23
+- Library: `github.com/prometheus/client_golang`.
24
+- Endpoint: `GET /metrics` (Prometheus text exposition).
25
+- Access control: HTTP Basic auth when `metrics.basic_auth_user`/`metrics.basic_auth_pass` are set; otherwise unauthenticated. S35 will tighten further (IP allow-list + per-deployment policy).
26
+- Standard metrics:
27
+  - `shithub_http_requests_total{route,method,status}` (counter)
28
+  - `shithub_http_request_duration_seconds{route,method}` (histogram, exponential buckets)
29
+  - `shithub_http_in_flight` (gauge)
30
+  - `shithub_panics_total` (counter, incremented by the recover middleware)
31
+  - `shithub_db_pool_acquired`, `shithub_db_pool_idle`, `shithub_db_pool_total` (gauges)
32
+  - `shithub_db_pool_acquire_wait_seconds_total` (counter)
33
+  - Standard Go runtime + process metrics (registered automatically).
34
+- **Cardinality discipline.** Route labels come from chi's `RoutePattern()` so we get `/owner/{repo}` instead of per-repo concrete paths. Never label by `user_id` or `repo_id`.
35
+- Per-domain metrics (added in later sprints) MUST register against `metrics.Registry` so a single `/metrics` scrape sees everything.
36
+
37
+## Tracing
38
+
39
+- Library: OpenTelemetry SDK with the OTLP-HTTP exporter.
40
+- Disabled by default. To enable:
41
+  ```toml
42
+  [tracing]
43
+  enabled = true
44
+  endpoint = "http://otel-collector.bare-metal:4318"
45
+  sample_rate = 0.05
46
+  service_name = "shithubd"
47
+  ```
48
+- The HTTP middleware (`tracing.Middleware`) emits one span per request when enabled.
49
+- pgx tracer hook lands when domain queries arrive (post-S05); the bare DB Open does not yet attach it.
50
+- Sample rate is parent-based with `TraceIDRatioBased(sample_rate)` — incoming traces with parent context honor the parent decision.
51
+
52
+## Error reporting
53
+
54
+- Library: `github.com/getsentry/sentry-go`. The wire format is Sentry-protocol-compatible, so a self-hosted GlitchTip works as a drop-in receiver (the lean per S03's sprint spec).
55
+- DSN comes from `error_reporting.dsn`. When empty, the package is a no-op.
56
+- Two integration points:
57
+  1. **Recover middleware** calls `errrep.CapturePanic(recovered, requestID)` for every recovered panic. The request_id is attached as a tag for forensic correlation with logs.
58
+  2. **`errrep.SlogHandler`** wraps the slog handler so any record at error level is also reported as a Sentry event. Tags: `request_id`, `user_id`, `component`, `route`. Other attrs land under the `shithub` context.
59
+- Flush: `errrep.Init` returns a `flush(ctx)` callback the server invokes on shutdown (drains queued events).
60
+
61
+## Request ID flow
62
+
63
+The `request_id` is the correlation key tying logs, metrics, traces, and error reports together:
64
+
65
+1. `middleware.RequestID` accepts an inbound `X-Request-Id` if it matches a strict whitelist; otherwise generates a 16-byte hex value.
66
+2. The id is stored in the request context and echoed in the response `X-Request-Id` header.
67
+3. `middleware.AccessLog` includes it as `request_id` on the per-request log line.
68
+4. `middleware.Recover` includes it on panic logs and on the Sentry/GlitchTip event.
69
+5. The styled error pages (`errors/{404,403,429,500}.html`) display it for end-user support reference.
70
+
71
+## Operational notes
72
+
73
+- `pg_stat_statements` is loaded by the dev compose Postgres (S01) and required in prod (S37).
74
+- GlitchTip and the OTLP collector run on bare metal in our prod topology (S37); the droplet's `/metrics` is scraped by Prometheus over WireGuard.
75
+- Configuration documented in `docs/internal/config.md`.