Observability
shithub ships four sinks: structured logging, Prometheus metrics, OpenTelemetry tracing, and a Sentry-protocol error reporter. All four are governed by the layered config loader (docs/internal/config.md) and degrade to no-ops when not configured.
Logging
- Library:
log/slog(stdlib). - Format:
text(default in dev, human-friendly key=value) orjson(default in prod, one object per line). - Level:
debug | info | warn | error(config:log.level). - Standard fields on every line:
time,level,msgrequest_id— when a request is in flight (set by the request_id middleware).user_id— when known (post-S05).component— set by the package emitting the line.error,stack— on error-level lines, where applicable.
- Redaction.
internal/infra/logwraps the chosen handler so each record passes through a redactor before output:- Attribute keys matching
password|pass|secret|key|token|authorization|dsn|otpauthare redacted to***. - Attribute string values containing
shithub_pat_,otpauth://,Bearer, orBasicare redacted to***regardless of the key. - Tested in
internal/infra/log/log_test.go.
- Attribute keys matching
Metrics
- Library:
github.com/prometheus/client_golang. - Endpoint:
GET /metrics(Prometheus text exposition). - Access control: HTTP Basic auth when
metrics.basic_auth_user/metrics.basic_auth_passare set; otherwise unauthenticated. S35 will tighten further (IP allow-list + per-deployment policy). - Standard metrics:
shithub_http_requests_total{route,method,status}(counter)shithub_http_request_duration_seconds{route,method}(histogram, exponential buckets)shithub_http_in_flight(gauge)shithub_panics_total(counter, incremented by the recover middleware)shithub_db_pool_acquired,shithub_db_pool_idle,shithub_db_pool_total(gauges)shithub_db_pool_acquire_wait_seconds_total(counter)- Standard Go runtime + process metrics (registered automatically).
- Cardinality discipline. Route labels come from chi's
RoutePattern()so we get/owner/{repo}instead of per-repo concrete paths. Never label byuser_idorrepo_id. - Per-domain metrics (added in later sprints) MUST register against
metrics.Registryso a single/metricsscrape sees everything.
Tracing
- Library: OpenTelemetry SDK with the OTLP-HTTP exporter.
- Disabled by default. To enable:
[tracing] enabled = true endpoint = "http://otel-collector.bare-metal:4318" sample_rate = 0.05 service_name = "shithubd" - The HTTP middleware (
tracing.Middleware) emits one span per request when enabled. - pgx tracer hook lands when domain queries arrive (post-S05); the bare DB Open does not yet attach it.
- Sample rate is parent-based with
TraceIDRatioBased(sample_rate)— incoming traces with parent context honor the parent decision.
Error reporting
- Library:
github.com/getsentry/sentry-go. The wire format is Sentry-protocol-compatible, so a self-hosted GlitchTip works as a drop-in receiver (the lean per S03's sprint spec). - DSN comes from
error_reporting.dsn. When empty, the package is a no-op. - Two integration points:
- Recover middleware calls
errrep.CapturePanic(recovered, requestID)for every recovered panic. The request_id is attached as a tag for forensic correlation with logs. errrep.SlogHandlerwraps the slog handler so any record at error level is also reported as a Sentry event. Tags:request_id,user_id,component,route. Other attrs land under theshithubcontext.
- Recover middleware calls
- Flush:
errrep.Initreturns aflush(ctx)callback the server invokes on shutdown (drains queued events).
Request ID flow
The request_id is the correlation key tying logs, metrics, traces, and error reports together:
middleware.RequestIDaccepts an inboundX-Request-Idif it matches a strict whitelist; otherwise generates a 16-byte hex value.- The id is stored in the request context and echoed in the response
X-Request-Idheader. middleware.AccessLogincludes it asrequest_idon the per-request log line.middleware.Recoverincludes it on panic logs and on the Sentry/GlitchTip event.- The styled error pages (
errors/{404,403,429,500}.html) display it for end-user support reference.
Operational notes
pg_stat_statementsis loaded by the dev compose Postgres (S01) and required in prod (S37).- GlitchTip and the OTLP collector run on bare metal in our prod topology (S37); the droplet's
/metricsis scraped by Prometheus over WireGuard. - Configuration documented in
docs/internal/config.md.
View source
| 1 | # Observability |
| 2 | |
| 3 | shithub ships four sinks: structured logging, Prometheus metrics, OpenTelemetry tracing, and a Sentry-protocol error reporter. All four are governed by the layered config loader (`docs/internal/config.md`) and degrade to no-ops when not configured. |
| 4 | |
| 5 | ## Logging |
| 6 | |
| 7 | - Library: `log/slog` (stdlib). |
| 8 | - Format: `text` (default in dev, human-friendly key=value) or `json` (default in prod, one object per line). |
| 9 | - Level: `debug | info | warn | error` (config: `log.level`). |
| 10 | - Standard fields on every line: |
| 11 | - `time`, `level`, `msg` |
| 12 | - `request_id` — when a request is in flight (set by the request_id middleware). |
| 13 | - `user_id` — when known (post-S05). |
| 14 | - `component` — set by the package emitting the line. |
| 15 | - `error`, `stack` — on error-level lines, where applicable. |
| 16 | - **Redaction.** `internal/infra/log` wraps the chosen handler so each record passes through a redactor before output: |
| 17 | - Attribute keys matching `password|pass|secret|key|token|authorization|dsn|otpauth` are redacted to `***`. |
| 18 | - Attribute string values containing `shithub_pat_`, `otpauth://`, `Bearer `, or `Basic ` are redacted to `***` regardless of the key. |
| 19 | - Tested in `internal/infra/log/log_test.go`. |
| 20 | |
| 21 | ## Metrics |
| 22 | |
| 23 | - Library: `github.com/prometheus/client_golang`. |
| 24 | - Endpoint: `GET /metrics` (Prometheus text exposition). |
| 25 | - Access control: HTTP Basic auth when `metrics.basic_auth_user`/`metrics.basic_auth_pass` are set; otherwise unauthenticated. S35 will tighten further (IP allow-list + per-deployment policy). |
| 26 | - Standard metrics: |
| 27 | - `shithub_http_requests_total{route,method,status}` (counter) |
| 28 | - `shithub_http_request_duration_seconds{route,method}` (histogram, exponential buckets) |
| 29 | - `shithub_http_in_flight` (gauge) |
| 30 | - `shithub_panics_total` (counter, incremented by the recover middleware) |
| 31 | - `shithub_db_pool_acquired`, `shithub_db_pool_idle`, `shithub_db_pool_total` (gauges) |
| 32 | - `shithub_db_pool_acquire_wait_seconds_total` (counter) |
| 33 | - Standard Go runtime + process metrics (registered automatically). |
| 34 | - **Cardinality discipline.** Route labels come from chi's `RoutePattern()` so we get `/owner/{repo}` instead of per-repo concrete paths. Never label by `user_id` or `repo_id`. |
| 35 | - Per-domain metrics (added in later sprints) MUST register against `metrics.Registry` so a single `/metrics` scrape sees everything. |
| 36 | |
| 37 | ## Tracing |
| 38 | |
| 39 | - Library: OpenTelemetry SDK with the OTLP-HTTP exporter. |
| 40 | - Disabled by default. To enable: |
| 41 | ```toml |
| 42 | [tracing] |
| 43 | enabled = true |
| 44 | endpoint = "http://otel-collector.bare-metal:4318" |
| 45 | sample_rate = 0.05 |
| 46 | service_name = "shithubd" |
| 47 | ``` |
| 48 | - The HTTP middleware (`tracing.Middleware`) emits one span per request when enabled. |
| 49 | - pgx tracer hook lands when domain queries arrive (post-S05); the bare DB Open does not yet attach it. |
| 50 | - Sample rate is parent-based with `TraceIDRatioBased(sample_rate)` — incoming traces with parent context honor the parent decision. |
| 51 | |
| 52 | ## Error reporting |
| 53 | |
| 54 | - Library: `github.com/getsentry/sentry-go`. The wire format is Sentry-protocol-compatible, so a self-hosted GlitchTip works as a drop-in receiver (the lean per S03's sprint spec). |
| 55 | - DSN comes from `error_reporting.dsn`. When empty, the package is a no-op. |
| 56 | - Two integration points: |
| 57 | 1. **Recover middleware** calls `errrep.CapturePanic(recovered, requestID)` for every recovered panic. The request_id is attached as a tag for forensic correlation with logs. |
| 58 | 2. **`errrep.SlogHandler`** wraps the slog handler so any record at error level is also reported as a Sentry event. Tags: `request_id`, `user_id`, `component`, `route`. Other attrs land under the `shithub` context. |
| 59 | - Flush: `errrep.Init` returns a `flush(ctx)` callback the server invokes on shutdown (drains queued events). |
| 60 | |
| 61 | ## Request ID flow |
| 62 | |
| 63 | The `request_id` is the correlation key tying logs, metrics, traces, and error reports together: |
| 64 | |
| 65 | 1. `middleware.RequestID` accepts an inbound `X-Request-Id` if it matches a strict whitelist; otherwise generates a 16-byte hex value. |
| 66 | 2. The id is stored in the request context and echoed in the response `X-Request-Id` header. |
| 67 | 3. `middleware.AccessLog` includes it as `request_id` on the per-request log line. |
| 68 | 4. `middleware.Recover` includes it on panic logs and on the Sentry/GlitchTip event. |
| 69 | 5. The styled error pages (`errors/{404,403,429,500}.html`) display it for end-user support reference. |
| 70 | |
| 71 | ## Operational notes |
| 72 | |
| 73 | - `pg_stat_statements` is loaded by the dev compose Postgres (S01) and required in prod (S37). |
| 74 | - GlitchTip and the OTLP collector run on bare metal in our prod topology (S37); the droplet's `/metrics` is scraped by Prometheus over WireGuard. |
| 75 | - Configuration documented in `docs/internal/config.md`. |