Nest Engineering Docs
Processes

Observability

Logging, metrics, and alerting expectations.

Purpose

Ensure systems are observable enough to detect and resolve issues quickly.

Expectations

  • Add logging and alerts where needed for new services and risky changes.
  • Prefer structured logs and avoid logging sensitive data.
  • Validate dashboards and alerts after significant releases.

Incident response expectations

  • Alerts should point responders toward the owning service, likely impact, and a relevant dashboard or runbook.
  • During incidents, responders should post relevant graphs, logs, traces, hypotheses, and verification steps into the incident channel.
  • Keep evidence sanitized. Do not log or share payment/card details, credentials, keys, raw secrets, or unredacted customer-sensitive data.
  • Dashboards used for incident decisions should be reliable enough that someone outside the owning team can understand the signal during a handoff.
  • If an incident exposes missing telemetry, create a Linear follow-up rather than relying on memory.

Last updated on

On this page