How To Handle Major Incidents (6 Steps)

Lessons from managing large scale operational incidents.

Jul 13, 2024

∙ Paid

If you have worked in software engineering for any time, you’ll know how often things go wrong.

When your pager goes off at 2 a.m., the last thing you want someone to ask is, “What should we do?”.

Prolonged downtime to software systems can damage customer trust and sometimes cost thousands or millions of dollars. Many things can cause downtime, such as h…

Keep reading with a 7-day free trial

Subscribe to Software Engineering Manager to keep reading this post and get 7 days of free access to the full post archives.