Flaky Tests: Are You Sure You Want to Rerun Them?
Different teams have different approaches to dealing with flaky tests. Some even go as far as using the “Let’s run each test 10 times and if it passes at least once, it passed” approach.
I personally think that rerunning failed tests is poisonous — it legitimizes and encourages entropy, and rots the test suite in the long run.
The half-cocked approach
Some teams see rerunning failed tests as a very convenient short-term solution. In my experience, there is unfortunately no such thing as ‘a short-term solution’. All temporary solutions tend to become permanent.
Along with some other techniques that are efficient in the short term, but are otherwise devastating, rerunning tests is very popular with a certain category of managers. It’s particularly common in corporate environments: there are company goals, and then there are personal goals (ladders to climb). In such environments, some people tend to focus only on what needs to happen until the end of the current quarter or year. What happens later is often seen as someone else’s concern. Looking from that perspective, test rerunning is both fast and efficient, which makes it a desirable and convenient solution.
Keeping flaky tests and brute-forcing them to pass defeats the purpose of testing. There is an unspoken assumption that something is wrong with the tests, and that it’s fine to just rerun them. This assumption is dangerous. Who’s to say that the race or the time-out that causes the flakiness is in the test, and not in the production code? And that it’s not affecting the customer?
The sustainable solution
The long-term solution is to either fix or replace the flaky tests. If one developer cannot fix them, another one should try. If a test cannot be fixed, it should be deleted and written from scratch, preferably by somebody who didn’t see the flaky one. Test coverage tools can be used as a kind of a safety net, showing if some tests have been deleted without being adequately replaced.
Not being able to develop stable tests for some part of the code usually means one of these two things — either that something is wrong with the test and/or the testing approach, or that something is wrong with the code being tested. If we are reasonably certain that the tests are fine, it’s time to take a deeper look at the code itself.
Our position on flaky tests
Deleting and fixing flaky tests is a pretty aggressive measure, and rewriting tests can be time consuming. However, not taking care of flaky tests leads to certain long-term test suite degradation.
On the other hand, there are some legitimate use-cases for flaky test reruns. For example, when time-to-market is of essential importance, and when technical debt is deliberately accumulated with the intention and a clear plan on paying it off in the near future.
As a CI/CD tool vendor, we feel that our choice whether to support rerunning failing flaky tests affects numerous customers. Not just the way they work, but, much more importantly, the way they perceive flaky tests and the testing process itself. At this point, we are choosing not to support rerunning failed tests, since our position is that this approach is harmful much more often than it is useful.