TLTD #22 - Your Metrics Are Lying To You

A brief look at perverse incentives

Feb 10, 2025

Measuring and optimising productivity has been the goal of much of what management does since Frederick Taylor’s Scientific management.

Thankfully we have come far since then — at least in some companies! — and no longer think of engineering efforts in a reductionist way. Of course, that doesn’t prevent organisations from trying to measure engineering output — however most of the time this causes more harm than good.

The same can be said about broader product measures known as vanity metrics — they look good on a slide but don’t really move the needle.

Charles Goodhart said it best:

when a measure becomes a target, it ceases to be a good measure
— Goodhart’s Law

Let’s look at an example.

In Australia, Commonwealth Bank had a program called Dollarmites “aimed at” increasing financial literacy in children. (in quotes since there was a lot of controversy about its actual goal).

At any rate, in 2018 a big scandal about the program was exposed whereby bank staff opened fraudulent bank accounts for kids in order to hit aggressive performance targets.

You get what you measure — since staff incentives were about number of accounts opened with a deposit within 30 days of opening, they found a way to make it true: by misappropriating bank funds or making deposits using their own money.

Let’s bring it back to software products. We probably know of, or have worked at, organisations which value output over all else: either by measuring story points, number of features released in a quarter or — gasp — lines of code!

What these metrics all have in common is that in isolation they can easily be gamed, resulting in a poorer overall experience to the end customer. Pushing for story points or more features? Quality will likely suffer. Promoting more lines of code? Good luck maintaining that hot mess of unnecessarily complex software.

That’s why we need a set of metrics which can keep one another in check.

Self-balancing Metrics

Caring about speed to market isn’t bad. It really should be the focus at various different stages in the lifecycle of a product or company. What we need is to go one level deeper and ask what the are really trying to achieve.

What we need is to go one level deeper and ask what the real desired outcome is.

For example, when we ship this product or feature, what do we want to happen? Increased retention and adoption? Maintain a high NPS score? Increased MAUs? Reduced churn?

Once you decide what that is, you can use this metric to balance against the measure you’re trying to optimise — time to market, in this case.

As you change and optimise processes and incentives to increase your velocity, keep an eye on this balancing metric — if you push for speed too hard, quality is likely to drop past a certain point. If that happens, you know you need to go back and assess your approach.

The DORA research is a great example of self-balancing metrics in software engineering. It consists of four metrics, two of which illustrate this concept incredibly well:

Deployment Frequency
Change Failure Rate

A deployment represents a change. That change can introduce new features, bug fixes, UX updates, anything you want to get on your customer’s hands. The more deployments you have, the more value you are able to ship.

This is where Change Failure Rate becomes the measure of success — if we deploy too quickly too often, we risk shipping half-baked features which could lead to incidents, which in turn increase the Change Failure Rate - that’s our cue to pull back, assess and improve it for next time.

Second-order thinking

The key skill that helps to define self-balancing metrics is second-order thinking. With this decision-making approach, we are encouraged to think through the long-term consequences of our actions.

If we want to increase deployment frequency, these could be your direct and indirect consequences:

First-order effect: More features shipped to customers
Second-order effect: Potential increase in bugs and technical debt
Third-order effect: Decreased customer trust and team morale

It takes a little getting used to but by starting with questions such as “What will happen after I make this decision?” and then going further with “What will happen after that?”, you’ll start building the habit of thinking ahead and will be one step closer to making better long-term decisions.

From Metrics to Meaning

Metrics in isolation can create perverse incentives, but the solution isn't to stop measuring. We have to be smart about it. The trick is building a system of checks and balances that keeps your organisation honest and aligned with its true goals.

The goal is awareness of trade-offs and their implications

Start small:

Pick one critical metric you're currently tracking
Map out its potential second and third-order effects
Identify a counterbalancing metric that could help prevent unintended consequences
Test & Learn; adjusting based on real outcomes

The goal is awareness of trade-offs and their implications. Your metrics should tell a story about what your organisation is doing and whether its actions are creating the outcomes you actually want.

What metrics are you currently tracking? More importantly, what story (or lie!) are they telling you?

I'd love to hear about your experiences with self-balancing metrics. What unexpected consequences have you found? How did you adjust? Share your thoughts in the comments below.

The LeadTech Diet

Discussion about this post