On Developer Metrics - Time to Debug
Is it common when setting up a new engineer on your team that at least one expert is needed to help them get
the code running? Does your team often fix code without running it locally and instead tests on your staging
environment? Does your team have no way to run components of your codebase in isolation without running the
entire application? This post explores what these situations mean, and how you can make measurable
improvements that increase the satisfaction and productivity of your engineering team.
Background
Engineering
Productivity, or software engineering effectiveness is an engineering domain focused on improving
the overall organizational output and satisfaction that engineers experience while doing their job. Commonly
the decision to create a focused and dedicated investment in engineering productivity happens due to a
collection of signals:
- The output of engineers no longer scales alongside the growth of the organization or software.
- 1:1s and exit-interviews expose frustration with tools and the inability to achieve “flow” while working.
- Engineers in the company are increasingly investing their time in bespoke tooling to improvise their
local team’s productivity. These individuals are a self-selecting cohort who initially built solutions
to scratch their own itch, but their work (rooted in developer
empathy) is quickly leveraged by the larger team and becomes part of the default toolkit for
work.
In my personal observations, signal 3 represents the most visible symptom leading to this decision;
most likely your team or larger group already has multiple people filling this role in a partial capacity.
As organizational resources and investments in this engineering productivity domain grow, being able to
measure the improvements becomes critical for justifying the resource allocation and impact to the company.
But what to measure? Software as service products and research
within this domain suggests looking at the following:
Metric | Humanized Metric |
---|
Time spent waiting for pre-submit verification | How long does it take for my CI to tell me if this is passing or failing? |
Volume of changes sent per engineer | How many pull requests does each engineer make each day? |
Code-review latency | How long does an engineer have to wait for someone to peer review their code? |
Number of rollouts per day | How often are we able to deploy to production? |
Number of defects that reach production | How often do we ship a major regression or bug that needs to be rolled back or that pages
someone at
a very undesirable hour? |
Here is where things get challenging (and where I disagree with the status quo). When looking to
improve the productivity of an organization, the metrics above focus too much on the output of the system
and not the input. If you don’t select the correct measures, you risk optimizing only for the output and
overfitting solutions for measurable outcomes instead of fixing the deeper challenges that lead to actual
improvements for the people in your organization.
The flaw in these metrics
I am not here to say that these metrics are bad or even wrong; rather, I want to make the case that these
measures alone miss an essential measure of the input to the system - it’s called Time To
Debug.
Time To Debug [TTD] |
Definition: The time it takes from observing a problem in production to when someone can debug the suspect
code. This metric “time to debug” is a composition of multiple user flows joined together into an
end-to-end user (engineer) journey. Listed below is a common but not exclusive version of the
TTD flow. - Time to checkout
- Time to configure your environment to run the application
- Time to build
- Time to run/boot
- Time to trigger the targeted line of code
|
In practice what does this look like?
Consider the case where a bug is filed about WidgetFoo in your app, it is expected that the widget when
tapped shows a value of 7 but instead is showing 6.9997 breaking the customer flow. An engineer is assigned
the bug and needs to fix it.
The engineer first using some type of code-search tooling locates the suspect logic and identifies potential
steps to debug the unexpected behavior:
WidgetFoo.ts
class WidgetFoo {
public calculateValue(num: number) {
//
// “BUSINESS LOGIC”
//
console.log(“input: “, num, “calculation 1: “, calc1Num);
return finalNum;
}
}
To continue the debugging workflow the engineer must now build and run this code - reproducing the bug and
now seeing their debugging log to continue their journey.
The goal and challenge of this metric is to understand how long the debug cycle takes. For this metric
specifically we are not only talking about the time from save to re-run; rather, we are talking about the
entire process of checking out the code, changing a line, and then seeing that log in your development
environment. Boiled down further - how long does it take from pointing at a screenshot of the issue until
someone can have that code running with a new debug statement.
My pitch is simple:
Optimizing the time to debug workflow is critical to ensure that your organization is able to move
quickly, onboard more engineers, and have fluidity within the organization.
But the execution is hard
First, let me acknowledge that this metric is not easy to measure. It cuts across enough tooling and
processes that it would be rare to have instrumentation in enough locations for your organization to easily
see this as a datapoint. Additionally this workflow is not static, as your organization grows and changes
over time the tooling and landscape of the workflow will always be changing as will the type of bug and
development that is happening. Rather than assume that you can quantify through programmatic measurement I
recommend leaning into the human aspect of the problem.
Recommendation:
- Generate a document that outlines the steps required to go from zero to debug within your codebase for a
sample bug. Add this document into your onboarding material / readme for your project
- Measure the time and number of steps it takes for you to go through this workflow based on the document
(point 1)
- Revisit points 1 & 2 every 3 months measuring the time and steps it takes - specifically it is important
that when revisiting point 1 you attempt to start from as close to zero as possible as often the initial
machine setup cost is quite high and ignorable after the first time you go through point 1.
This workflow is the critical user journey, it is a cycle (at least steps 2 to 4) that
repeats daily for most engineers. It should be seamless and fast to go from zero to debugging.
Omitting Time to debug from your success metrics carries with it a
risk of ignoring the input to
your system, and ignoring the experience that your team feels every day.
Thanks to Susie Lu, Paul Irish, Sebastian McKenzie, Fabian Canas, and Craig Jolicoeur for helping to review
and edit.
Are you interested in this concept? Want to discuss further? Let’s continue the conversation on Twitter
addendum
Does time to debug assume a local checkout / build is required?
For many software projects it is possible for debugging to be done entirely through an interactive debugger
using domain specific tooling. The TTD metric continues to be a valid measure even in this case.
Specifically time to debug in this case would be more inline with the following steps:
Time to debug:
- Time to deobfuscate the stack
- Time to attach the debugger to the code
- Time to trigger the line of code that is exhibiting the incorrect behavior.
The point of this metric is not to inflict rigid steps and a framework around the progress, but rather to
highlight the workflow and highlight the need to invest in the time to debug workflow.