Measuring for Performance in Software Delivery

When it comes to the delivery of software projects, the adoption of the scientific method is becoming more and more prevalent. This means that we are ever increasingly trying to base our decisions on data, and to do that, we need to collect data. Lots of it. We generally do this by tracking metrics produced as byproducts of the work we perform, and almost inevitably, some of these metrics are used to measure our performance.

According to Goodhart’s law, “When a measure becomes a target, it ceases to be a good measure”. One fantastic example of this would be using a team’s velocity — something which is traditionally measured — as a target. Following Goodhart’s law, velocity is very likely to rapidly become meaningless, as teams provide artificially bloated estimates so that their velocity is perceived as being higher than it would be with more “honest” estimates. If you are measuring something which can be manipulated, and treating it as a target, it will invariantly be manipulated.

Another example of a potentially lousy target metric is Utilisation. Of course, you don’t want your team to be under-utilised, but Utilisation above a certain point has a negative effect. Queue Theory tells us that as Utilisation approaches 100%, Lead Time approaches infinity, and this holds for software teams as well. If your planned work consumes 100% of your team’s capacity, what happens when unplanned work pops up? When your ultimate goal is to get software out, an arbitrarily long lead time is not ideal.

What, then, are we to measure? And what are we to use as targets? The following are a set of metrics which through research conducted by Forsgren et al. (2018) have proven to be reliable indicators of high performing teams, without falling prey to Goodhart’s law.

Lead Time: The amount of time it takes to get a piece of work from the backlog, into production.
Deployment Frequency: How often your team releases new code into a production environment.
Time to restore: In the event of an outage, the amount of time it takes for all systems to restore to working order.
Change fail rate: The frequency with which changes pushed to production result in a fault.

Another essential step in moving towards a more data-driven culture for your team is to make these metrics clearly visible for all to see. Put your metrics up on the wall, post them in slack on a schedule, do whatever you have to do to get the data into the spotlight. Constantly visibility of data informs decisions. If your data is trending in the wrong direction, this is an indicator that something needs to change.

There is a strong correlation between teams which were determined to be “high performers” and teams who adopted Lean management practices. Forsgren notes “At the heart of Lean management is giving employees the necessary time and resources to improve their own work. This means creating a work environment that supports experimentation, failure, and learning, and allows employees to make decisions that affect their jobs.”

It is worth noting that the targets described above are made more attainable by Agile and Lean approaches, such as reducing batch sizes. Reduced batch sizes offer multiple benefits, including shortened cycle times, more frequent feedback, decreased risk (and ease of reverting if something does go wrong).

When it comes to software, what do you measure? What have you had success with, and what have you moved away from? Leave your thoughts in the comments.