A Repeatable System For Making Things Better
A framework for systematically making improvements to problems in your team.
Making things better is a key part of your job as a manager. Recently I’ve been helping managers clarify how to fix problems more systematically.
There are endless problems to be solved at work, but it’s not always clear what steps we should take to solve them, leading to people getting “stuck” and taking no action. Improving systems and processes is a great way to increase your visibility and impact as a manager.
Here are seven steps you can take to improve things within your team.
1. Define And Clarify The Problem
Before fixing a problem, we must first be able to define it clearly in a way that everyone can agree on. Sometimes, it’s easy to see problems in your teams and other times, they are more subtle. Try thinking about what's causing the most frustration for your team.
What are the engineers in your team most frequently frustrated by?
What process areas are causing you the most problems?
Example: "On-call engineers are miserable and burned out because they get > 50 pages a week." or “Our deployment process is ineffective, and it takes days to roll out a release to our customers”.
2. Use Data To Understand The Problem 📊
Once you have identified a problem, you need a way to quantify it and ask questions using data. Exploring data will often give you unexpected insights.
When discussing problems, managers can typically articulate the general problem, but few can give specific details. For example, “On-call is getting bad” is something I might hear a manager say. These problem statements lack clarity. For example - what specifically about on-call is bad? How many alarms do you get per day? How long does it take for engineers to resolve the alarms? What specifically does “good” look like?
You need to talk about a problem in a quantifiable way before you can fix it.
For example, consider the following two statements:
“On-call is bad.”
“On-call is bad because engineers are paged on average ten times per day. For 90% of pages they receive, there is no action to take, and it’s a false alarm”.
The latter gives a clear picture of the problem and even clues about how to fix it.
As part of looking at the data, you should identify the key metrics that answer the question - “how do we measure this problem?”.
For example, in the above example, you might look at the “number of high severity alarms per day”, “average time to resolve on-call issues”, “number of pages without a runbook for resolution”, or “number of pages that were false alarms”.
3. Define Tenets
Engineering tenets are core values or principles that serve as a mental model to guide thinking. They are a powerful tool for ensuring your team thinks about a problem similarly.
Using our example, we might define tenets such as
Few pages. Engineers can only deal with a few out-of-hours pages before they become overwhelmed, and “pager fatigue” sets in.
All pages should be actionable. If a page is not actionable, there is no point in paging an on-call engineer.
Only page in an emergency. Engineers should only be paged if there is a real emergency
4. Set A Goal 🎯
Now that we understand the problem and have data, we can set a goal for improving our situation. Make your goal ambitious enough to inspire creative thinking but realistic to ensure you can make progress.
Example: "Reduce on-call pages by 50% from 50 to < 25 per week within one month".
5. Brainstorm And Plan
Work with your team to explore different ideas and encourage creative thinking for ideas to make progress towards your goal. Once you figure out what options there are, you can focus on doing the work to improve.
Example: "Turn off noisy alarms, reduce the sensitivity of alarms, add HTTP retries to client calls to avoid being paged for transient network blips".
6. Find Quick Wins
If you want to get early momentum on a goal, finding quick wins is helpful to build energy and belief within the team. Look for the quick wins that have a big impact and do them as quickly as possible.
What could you do this week that would progress you towards your goal?
Example: "One of our alarms is noisy and unnecessary. Let’s disable it now."
7. Review Progress Weekly.
Once you have an established goal, the final step is to build a regular cadence for reviewing your goal and how you’re progressing towards it. This is important to retain momentum.
Find a cadence to review your progress on your goals. This might be during a weekly meeting or as an offline written report you share with your team. A regular cadence allows you to check progress and reminds your team that the goal is still important.
Summary
You can follow a systematic process for improving process problems within your team.
The steps are
Define and clarify the problem
Use data to explore the problem in depth
Define tenets to guide your thinking
Set a goal
Brainstorm and plan
Find quick wins to build momentum
Setup a cadence to regularly review progress
Get in touch
Thanks for reading. If you want to talk about engineering management, I’d love to hear from you. You can contact me directly and connect on LinkedIn.