Providing customers with a world-class and seamless user experience is critical for the success of any business. It is therefore important that you have a robust on-call strategy that optimizes the availability of the right subject matter experts, on-call engineers, and support engineers to resolve critical, user-impacting incidents as soon as possible.
Providing on-call support can be stressful at times, especially for folks within companies that have a constantly evolving set of services and customers that are scaling fast. For that, you have good on-call compensation packages in place so that your on-call staff can remain motivated while carrying out an undoubtedly challenging role.
There are several models adopted by companies to compensate for on-call work. We spoke to a number of companies and some of our customers and compiled some of the most popular on-call comp models that you can explore within your organization:
Provide an extra half-day off for each week someone is on primary on-call, and generally lowering the “productive work” expectation of the on-call. Companies should cultivate a good culture of management by making sure people take some extra time off after particularly rough shifts. Of course, time-offs must be modeled for during the project planning phase(so every day off does not feel like less time to get your work done.)
Weekly or daily compensation for primary, and secondary on-calls. Comp can be for the entire week(irrespective of the number of incidents) or extra pay for actual incident work.
If you can map out your on-call schedules a year or few quarters in advance, the on-call comp can also be included within annual comp with fixed number of weeks or days or hours stipulated within the employment contract.
Providing both monetary compensation(weekly or daily) and time off compensation. For companies with unlimited PTO, people prefer the monetary compensation component over the time off component. Take Google’s on-call comp structure for example:
For any hour outside of 08:00-18:00 your local time, where you are on-call: - If your response SLA is 30(or 60) mins or less, you get 1/3 time-in-lieu - If your response SLA is 5 mins or less, you get 2/3 time-in-lieu - Time-in-lieu can be used for vacation, or it can be paid out to you at the end of the quarter. - Also, there is a hard cap on accruing 80 hours per quarter. So, if you're on a team with a strict response time it would not be uncommon to essentially have 8 additional weeks of vacation or pay per year.
Compensation in itself might not help with “feeling ownership to solve problems” directly. It does surface the cost to the company in a way that is easily explained to management, and the payouts make it easier to spread the on-call load around. You know you got it right when people are neither clambering to be on-call more nor trying to get rid of as many shifts as possible because the disruption to their life is about equivalent to the money they get.
On-call is such a mental tax because of the vast number of services we support, your brain is fried after that week. Most companies realize that the mental and physical strain of on-call should be compensated. Having said that, irrespective of your comp model, it is important that you have regular sessions with your on-call staff and keep track of their physical and mental health, and most importantly, aggressively tune your on-call alerts to minimize non-critical pages during non-business hours. Heavens forbid, someone YOLO mutes a spurious monitor at some ungodly hour.