Processes
Incident Management
How we respond to production incidents.
Purpose
Restore service quickly while preserving customer trust.
Severity & ownership
- Severity levels: Low, Medium, High.
- Declare an incident when an outage cannot be resolved immediately.
- Assign an Incident Commander and confirm roles.
Tooling
- Incidents are managed in incident.io with Slack.
- Join the Zoom or Google Meet created by incident.io when an incident is declared.
Procedure
- Triage and declare the incident.
- Mitigate and stabilize service impact.
- Communicate status updates in the incident channel.
- Escalate and pull in all required parties.
- Run post-incident review in incident.io and track follow-ups.
Last updated on