On Developer Metrics - Time to Debug

Is it common when setting up a new engineer on your team that at least one expert is needed to help them get the code running? Does your team often fix code without running it locally and instead tests on your staging environment? Does your team have no way to run components of your codebase in isolation without running the entire application? This post explores what these situations mean, and how you can make measurable improvements that increase the satisfaction and productivity of your engineering team.

Background

Engineering Productivity, or software engineering effectiveness is an engineering domain focused on improving the overall organizational output and satisfaction that engineers experience while doing their job. Commonly the decision to create a focused and dedicated investment in engineering productivity happens due to a collection of signals:

  1. The output of engineers no longer scales alongside the growth of the organization or software.
  2. 1:1s and exit-interviews expose frustration with tools and the inability to achieve “flow” while working.
  3. Engineers in the company are increasingly investing their time in bespoke tooling to improvise their local team’s productivity. These individuals are a self-selecting cohort who initially built solutions to scratch their own itch, but their work (rooted in developer empathy) is quickly leveraged by the larger team and becomes part of the default toolkit for work.

In my personal observations, signal 3 represents the most visible symptom leading to this decision; most likely your team or larger group already has multiple people filling this role in a partial capacity.

As organizational resources and investments in this engineering productivity domain grow, being able to measure the improvements becomes critical for justifying the resource allocation and impact to the company. But what to measure? Software as service products and research within this domain suggests looking at the following:

MetricHumanized Metric
Time spent waiting for pre-submit verificationHow long does it take for my CI to tell me if this is passing or failing?
Volume of changes sent per engineerHow many pull requests does each engineer make each day?
Code-review latencyHow long does an engineer have to wait for someone to peer review their code?
Number of rollouts per dayHow often are we able to deploy to production?
Number of defects that reach productionHow often do we ship a major regression or bug that needs to be rolled back or that pages someone at a very undesirable hour?

Here is where things get challenging (and where I disagree with the status quo). When looking to improve the productivity of an organization, the metrics above focus too much on the output of the system and not the input. If you don’t select the correct measures, you risk optimizing only for the output and overfitting solutions for measurable outcomes instead of fixing the deeper challenges that lead to actual improvements for the people in your organization.

The flaw in these metrics

I am not here to say that these metrics are bad or even wrong; rather, I want to make the case that these measures alone miss an essential measure of the input to the system - it’s called Time To Debug.

Time To Debug [TTD]

Definition:
The time it takes from observing a problem in production to when someone can debug the suspect code.

This metric “time to debug” is a composition of multiple user flows joined together into an end-to-end user (engineer) journey. Listed below is a common but not exclusive version of the TTD flow.

  1. Time to checkout
    • Time to configure your environment to run the application
  2. Time to build
  3. Time to run/boot
  4. Time to trigger the targeted line of code

In practice what does this look like?

Consider the case where a bug is filed about WidgetFoo in your app, it is expected that the widget when tapped shows a value of 7 but instead is showing 6.9997 breaking the customer flow. An engineer is assigned the bug and needs to fix it.

The engineer first using some type of code-search tooling locates the suspect logic and identifies potential steps to debug the unexpected behavior:

WidgetFoo.ts
class WidgetFoo {
	public calculateValue(num: number) {
    	//
    	// “BUSINESS LOGIC”
        //
        console.log(“input: “, num, “calculation 1: “, calc1Num);
        return finalNum;
    }
}

To continue the debugging workflow the engineer must now build and run this code - reproducing the bug and now seeing their debugging log to continue their journey.

The goal and challenge of this metric is to understand how long the debug cycle takes. For this metric specifically we are not only talking about the time from save to re-run; rather, we are talking about the entire process of checking out the code, changing a line, and then seeing that log in your development environment. Boiled down further - how long does it take from pointing at a screenshot of the issue until someone can have that code running with a new debug statement.

My pitch is simple:

Optimizing the time to debug workflow is critical to ensure that your organization is able to move quickly, onboard more engineers, and have fluidity within the organization.

But the execution is hard

First, let me acknowledge that this metric is not easy to measure. It cuts across enough tooling and processes that it would be rare to have instrumentation in enough locations for your organization to easily see this as a datapoint. Additionally this workflow is not static, as your organization grows and changes over time the tooling and landscape of the workflow will always be changing as will the type of bug and development that is happening. Rather than assume that you can quantify through programmatic measurement I recommend leaning into the human aspect of the problem.

Recommendation:

  1. Generate a document that outlines the steps required to go from zero to debug within your codebase for a sample bug. Add this document into your onboarding material / readme for your project
  2. Measure the time and number of steps it takes for you to go through this workflow based on the document (point 1)
  3. Revisit points 1 & 2 every 3 months measuring the time and steps it takes - specifically it is important that when revisiting point 1 you attempt to start from as close to zero as possible as often the initial machine setup cost is quite high and ignorable after the first time you go through point 1.

This workflow is the critical user journey, it is a cycle (at least steps 2 to 4) that repeats daily for most engineers. It should be seamless and fast to go from zero to debugging.

Omitting Time to debug from your success metrics carries with it a risk of ignoring the input to your system, and ignoring the experience that your team feels every day.


Thanks to Susie Lu, Paul Irish, Sebastian McKenzie, Fabian Canas, and Craig Jolicoeur for helping to review and edit.

Are you interested in this concept? Want to discuss further? Let’s continue the conversation on Twitter


addendum

Does time to debug assume a local checkout / build is required?

For many software projects it is possible for debugging to be done entirely through an interactive debugger using domain specific tooling. The TTD metric continues to be a valid measure even in this case. Specifically time to debug in this case would be more inline with the following steps:

Time to debug:

  1. Time to deobfuscate the stack
  2. Time to attach the debugger to the code
  3. Time to trigger the line of code that is exhibiting the incorrect behavior.

The point of this metric is not to inflict rigid steps and a framework around the progress, but rather to highlight the workflow and highlight the need to invest in the time to debug workflow.