Organizations try to be proactive about incidents and down-time. An outage or downtime can be devastating to the bottom line of businesses, not to mention a poor experience for their customers and users.
Organizations need to have protocols and automations in place to prevent small incidents from becoming significant. Teams commonly experience disjointed handling of events, not knowing who is on-call, slow or no response to acknowledging and finding a resolution.
In this article, we will discuss how to have the systems and tools in place to smoothly respond to alerts and incidents, reduce mean time to acknowledge and resolve, and along the way delight your customers. Today, we will focus in on three critical areas tied to a successful reply to alerts and incidents: on-call schedules and escalation policies.
Spreadsheets and white boards are a very inefficient and cumbersome way to keep a track of your team’s on-call schedules. Zenduty allows you to automate on-call schedules easily. You can create, manage, and track who is on call and for what duration.
On-call scheduling provides endless configuration options. Rotations can be customized based on daily, weekly, or custom shifts. You can even specify rotations based on day of the week and time of day.
What if your on-call engineer is in a basement, a parking lot, or otherwise in an area without cell service. If something goes wrong, they probably won’t get the alert until it’s too late.
Escalation policies ensure that after a set amount of time without acknowledgement, an escalation path is triggered so those critical alerts are never missed.