Most Product Improvement Attempts Fail

What should product organizations do about that?

Jens-Fabian Goetzmann

2 March 2019 ‧ 7 min read

One of the most fascinating and least discussed facts about product development is that most attempts to improve the product fail to deliver value. Notably, depending on the source, between 2/3 and 3/4 of A/B tests do not lead to an improvement in the target metric - even at organizations like Google, where there is no shortage of smart product managers. This is somewhat astonishing: imagine if only every third bridge a structural engineer designed ended up not collapsing! However, for product organizations, this is a fact of life.

Accepting that most improvement efforts fail leads to a number of key principles for how to run a successful product organization, that I will expand on in the following:

Validation is mandatory (and so is killing features)
Invalidation is not failure
The only failure is failure to learn
The earlier you invalidate, the better
You should prioritize problems, not solutions
Goals need to be (more) focused
You need stakeholder buy-in

1 | Validation is mandatory

Perhaps the most obvious takeaway is that if most improvement efforts fail to deliver value, you need to identify the ones that do. The ones that don't might destroy value and should therefore not be shipped, but even those that have a "flat" result (neither positive nor negative) might increase product complexity and should therefore still not be shipped. At Yammer, we called this the "ship it or kill it" decision that every feature went through.

To make this decision, you obviously need information about whether the effort delivered the value it was meant to deliver. This is the "measure" step of the Lean Startup's Build - Measure - Learn cycle. The gold standard for validation is of course an A/B test, which is as close to scientific as we get in product development. However, not all product changes are A/B-testable (or it would take far too long to reach a sufficient sample size). In such cases, qualitative measures of value (like user testing or surveys) can be employed, and need to be carefully applied in order to avoid biases. There is also a category of improvements that get deemed valuable and thus shipped even without such evidence: the obvious example are compliance features (like recently all GDPR related adjustments), in which case the value is "not getting sued". There are also "strategic" features that unlock future improvements; however, that classification should be used sparingly.

In order to avoid biases from creeping in or "moving the goal posts", the validation criteria and method need to be defined up front. This includes for A/B tests defining what metric is the key success metric, and a hypothesis for what change in user behavior the feature will cause that will result in an improvement in the metric. What is generally not required is an expected order of magnitude by which the metric will move (although that might be useful in calculating the required sample size).

2 | Invalidation is not failure

The product organization needs to have a culture that accepts the fact that most improvement efforts do not deliver value. This has been called "acceptance of failure", but I would go further than that. While it's important to accept failure occasionally, it's very demotivating for team members if most of their work ends in failure. Therefore, invalidation should not be called failure - rather, we successfully concluded the validation, and the outcome was that our hypothesis wasn't true.

This culture needs to be instilled not only in the product team, but also with engineers: it can be difficult for an engineer to see the fruit of their labor thrown out because the A/B test came back negative, but it's better for the code base and the product if it is. (This has implications for refactoring, namely that it should be kept separate from feature work so that it can be shipped independently, but that's generally a topic for engineering leadership to figure out.)

3 | The only failure is failure to learn

How can we call an improvement effort that did not deliver value a success? By making sure we learn from it (the "Learn" step of Build - Measure - Learn). This learning needs to go beyond "well this didn't work", it needs to result in actionable insights for future product development. The most important factor to make that work is to write hypotheses that describe a change in user behavior that is predicted to happen. If this kind of hypothesis is invalidated, you have learned something about user behavior. It also means that validating incremental changes that change one aspect at a time are easier to learn from than complex changes: With a complex change, you often don't know where you "went wrong" and therefore don't learn much beyond that the specific experience didn't work.

A useful tool to make sure an improvement effort is set up for learning is to run a pre-mortem beforehand and asking yourself: "How might this fail to deliver value? What outcomes (e.g. metrics movements) might we see and what would they mean? How would this inform what we do afterwards?" If the answers to those questions aren't satisfactory, then maybe the scope, setup or hypothesis of the improvement effort needs to change. Keeping in mind that north of two thirds of the time, the improvement effort won't deliver the hypothesized value, this time is very well invested.

4 | The earlier you invalidate, the better

Time and engineering capacity is always a scarce resource, so if an idea for a product improvement is bound to not deliver value, you should try to identify that as early as possible. In practice, this means two things: Firstly, where possible, you should test hypotheses without shipping code. This can be done in many ways, but the ones most frequently employed are user interviews / surveys and prototyping (paper / click prototypes). Of course, as qualitative research methods, these are more prone to bias (on researcher and user side) than A/B testing, but if you try to correct for that, you can often shortcut the time to learning.

Secondly, you should scope improvement efforts in the smallest way possible that will allow you to (in-)validate the hypothesis. This is often called Minimum Viable Product (MVP). The term MVP has seen a lot of criticism and alternatives like "Riskiest Assumption Test" (RAT) have been suggested, but the core concept is the same: figure out what you need to learn, and then build the minimal thing you can that will allow you to learn that. This could mean taking shortcuts with the UX (such as disregarding some edge cases and accounting for that while analyzing results) or the implementation. As an example for the latter, the first time we tested a machine learning powered feature at 8fit, we did not integrate it deeply in our existing architecture but hacked it in by standing up a temporary service that fed its results into the experiment system - it wasn't a pretty solution and not scalable since the model wouldn't continue learning, but it allowed us to test the hypothesis cheaply and when the hypothesis was validated, we could then productionize it.

5 | You should prioritize problems, not solutions

I am paraphrasing this point from an excellent article by Teresa Torres, which you should read for suggestions how to think about prioritization not of solutions but of problems / opportunities. While her article starts from a different premise, the prescription also follows directly from the insight that most improvement efforts do not deliver value: you don't know which solutions will or won't work, and you might have to try multiple times until you find the "right" solution for a problem, so it simply doesn't make sense to prioritize solutions.

6 | Goals need to be (more) focused

I've witnessed the following fallacy far too many times (and fallen for it): "We have time for 3 bigger projects this quarter, so we should have 3 different goals: 1. increase new user activation, 2. increase conversion, 3. increase engagement". Keeping in mind that only one in three improvement efforts will pan out, this is setting yourself up for failure. Once the first idea (let's say for goal number 1) fails, you now face the difficult decision: do we learn from this and try something new? Or do we move on to the next goal (and mostly give up on reaching the first one)?

The only solution here is to set fewer different goals and allow for trying multiple things / iterations to reach those goals than you would initially think. One of the hallmarks of good product leadership is recognizing both this need for focus and the tendency of product teams to be overly optimistic and therefore over-commit, and correcting for that in the goal-setting process.

7 | You need stakeholder buy-in

This last point almost goes without saying, since "I need stakeholder buy-in" might as well be a PM bumper sticker. However, if you follow these principles, there are a number of things that might work fundamentally different to before. Most notably, it gets harder to commit to delivering specific features (much less by a certain date), because they might not end up getting validated. Marketing and sales tends to not like that very much. Also, if you frequently A/B test features only to later kill them, support and customer success have a much harder time reproducing the exact context that a user or customer is in.

Setting very narrow goals might raise some eyebrows with senior management, but is absolutely critical to push for. The goals can be ambitious, but if they go in too many different directions, you are essentially playing the lottery as to which goals you will hit and which ones you won't.

Finally and perhaps most importantly, if you shift to only shipping validated improvements, you are likely slowing down the pace of adding features to the product. However, that's a good thing (and that it is indeed a good thing will require a lot of convincing of stakeholders from engineering to marketing and sales to senior management)! If you simply ship everything you build without validation, you are only making the appearance of progress, where in actuality some changes you make at best add complexity and at worst destroy value.

In conclusion, it's absolutely critical for product teams to accept the fact that most improvement attempts fail, and react accordingly. Product leadership has the important role to set up goals and processes that account for this fact and validate everything before shipping, and instilling a culture of expecting invalidation and learning from it (and communicating that to stakeholders across the org).

I hope this article was helpful. If it was, feel free to follow me on Twitter where I share interesting product management articles I come across daily.