How To Root Cause Problems
The 5 whys is a valuable technique for clarifying the root cause of a problem within your team.
If you work in software engineering for any time, there is one certainty—things will go wrong. Processes will fail, humans will make mistakes, and services will suffer major outages that impact customers.
When things go wrong, understanding the real root cause of a problem allows us to learn from failure and avoid the same mistakes in the future.
5-Whys
The 5-Whys is a valuable technique for clarifying the root cause of a problem. Our first answer to “why” something happened is usually not the root cause. We have to keep digging to get to the "actionable fix" we can take to avoid the problem happening again.
The 5-Whys is a recursive problem-solving method that repeatedly asks, "Why?" until you can’t go further. It's a way of quickly determining the root cause of a situation.
How Do You Use This Technique?
Write down the specific problem you need to understand. For example, “Why did our service have a major outage?”. Writing focuses your thinking and ensures everyone is focused on the same problem.
Ask yourself “why” did the problem happen. Write down the answer. If your answer doesn’t identify the real root cause of the problem, ask “why” again and write it down. Continue the process until you reach the real underlying root cause of your problem.
When you complete the process and identify the root cause, identify one or more countermeasures (actions) to prevent the problem from happening again.
Example
Problem: We lost $10,000 in sales this weekend due to a technical problem.
“Why did we lose $10,000 in sales?” Our payment service had a significant outage.
“Why did the payment service have a major outage?” There was a bug in our code.
“Why was there a bug in your code?” Because we deployed a change that contained a bug.
“Why did you deploy a change that contained a bug?” Because we didn’t test our code before pushing it to production.
“Why didn’t you test your code?” because we don’t include automated tests in our builds.
Action: Add automated tests to catch errors before releasing code to production.
Best Practices
Some critical best practices for applying the 5 whys technique.
5 Is A Rule Of Thumb
Treat the “5” as a rule of thumb rather than a rule. Sometimes, you might need to ask “why” more than five times to get to the root of a problem; sometimes, you get there in less than five.
Your Root Cause Should Be Fixable
Make sure you end up with a root cause that changes can fix. Your fix should be actionable. For example: “add automated testing before deploying to production.” Follow through and fix it. You don’t want to act based on good intentions—“try harder next time.”
There should be a real action to take that limits the probability of your problem happening again.
Don’t Give Up
Sometimes, you’ll get stuck and want to give up. Keep going until you know you have a root cause, not a symptom. Stop when you feel you’ve reached the root cause or when your answers stop being useful.
Summary
The 5 whys is a valuable technique for clarifying the root cause of a problem within your team.
Write down a specific problem - “Why did our site experience an outage last weekend?”
Recursively ask “why” it happened and write down the answer until you get to the root cause.
Identify actions to prevent the problem from happening again
Thanks for reading.
Get In Touch
I would love to hear from you! If you enjoy my writing and want to connect: