Nest Engineering Docs
Handbook

Incident response

Quick responder checklist for incidents

The full policy lives in Incident Management. This page is the short checklist for responders.

Declare early

  • An incident is urgent reactive work, not only a full outage.
  • Anyone at Nest can declare an incident in incident.io.
  • Use Minor, Major, or Critical. If unsure, start higher and adjust later.

Tooling

  • Incidents are managed in incident.io with Slack integration.
  • Work in the per-incident Slack channel and keep #incidents updated.
  • Use Google Meet or Zoom when voice/video is faster, then summarize decisions and owners back into the channel.
  • Track follow-up actions in Linear after impact is stable.

First five minutes

  1. Confirm the Incident Lead. Default leads are Akansh or John until a formal rotation exists.
  2. Post the current summary: impact, severity, status, lead, and next action.
  3. Assign an investigation owner if the Incident Lead is not driving diagnosis.
  4. If customers can see the issue, ask Support/Ops to own comms updates.
  5. For Critical incidents, escalate through the incident.io app and notify the relevant executives.

During response

  • Mitigate impact before deeper diagnosis.
  • Post commands, dashboards, errors, hypotheses, and verification steps in the channel. Summarize meeting decisions in Slack.
  • Do not post payment/card details, credentials, keys, raw secrets, or unredacted customer-sensitive data.
  • Send customer/status page updates every 30 minutes or as meaningful updates are available when impact is visible.

Closing

  • Move remaining work into Linear with owners.
  • Use an Incident Debrief for Critical incidents and a short debrief for Major incidents.
  • Focus debriefs on contributors, mitigators, risks, and learnings rather than a single root cause.

Last updated on

On this page