Processes
Observability
Logging, metrics, and alerting expectations.
Purpose
Ensure systems are observable enough to detect and resolve issues quickly.
Expectations
- Add logging and alerts where needed for new services and risky changes.
- Prefer structured logs and avoid logging sensitive data.
- Validate dashboards and alerts after significant releases.
Incident response expectations
- Alerts should point responders toward the owning service, likely impact, and a relevant dashboard or runbook.
- During incidents, responders should post relevant graphs, logs, traces, hypotheses, and verification steps into the incident channel.
- Keep evidence sanitized. Do not log or share payment/card details, credentials, keys, raw secrets, or unredacted customer-sensitive data.
- Dashboards used for incident decisions should be reliable enough that someone outside the owning team can understand the signal during a handoff.
- If an incident exposes missing telemetry, create a Linear follow-up rather than relying on memory.
Last updated on