Nest Engineering Docs
Processes

Incident Management

How we respond to production incidents.

Purpose

Restore service quickly while preserving customer trust.

Severity & ownership

  • Severity levels: Low, Medium, High.
  • Declare an incident when an outage cannot be resolved immediately.
  • Assign an Incident Commander and confirm roles.

Tooling

  • Incidents are managed in incident.io with Slack.
  • Join the Zoom or Google Meet created by incident.io when an incident is declared.

Procedure

  1. Triage and declare the incident.
  2. Mitigate and stabilize service impact.
  3. Communicate status updates in the incident channel.
  4. Escalate and pull in all required parties.
  5. Run post-incident review in incident.io and track follow-ups.

Last updated on