To say that flaky tests are annoying is to put it mildly. They can decimate both a developer’s time and productivity, are notoriously difficult to deal with, and are unfortunately a reality of software development.
Our collegues have written more general posts in the past about dealing with flaky tests and why it is important. I would encourage you to read them. On the other hand, I will be focusing on sharing the experience I have gathered after about a year of dealing with flaky tests while working on a Rails application.
While there can be various sources of inconsistent behaviour, in this article I will focus on those with which I have the most practical experience, and deal specifically with FactoryGirl, Timestamps, external network requests, and Ajax.
Flaky tests and FactoryGirl
When using FactoryGirl to create persisted records in tests, it is necessary to be aware that you are usually creating a lot more records than is obvious in the test and even if you have fair knowledge of the codebase it is safe to asume you are creating a few more then you are aware of even then.
While this is usually harmless aside from slowing down tests, it is capable of creating inconsistent behaviour in your tests. Some scenarios where this is probable are tests that count the number of records and tests that are expecting a certain record and assume that only it exists. This is an antipattern called Incidental Database state and it’s described in our collegue Marko’s post on Rails testing antipatterns.
Flaky tests and timestamps
When working with records in a way that implicitly sets timestamps with the intention of comparing them, it is possible to create tests that are dependent on code running faster or slower than a certain threshold. This is troublesome, since background processes can have an influence on execution speed. Also, running your tests on different machines (i.e. local, CI) can produce different execution times.
This can be avoided with either explicitly setting timestamps, or using Timecop or an equivalent tool that alows to you to freeze and travel trough time in your tests.
Flaky tests and external requests
Any request hitting an external service is inherently prone to being flaky. Whenever the there are issues with the service, the service is unavailable, or there are network issues in general, the test will fail. Worst of all, you as a developer have no control over this, which can result in long periods when it is impossible to test your code properly.
These kinds of requests need to be properly stubbed. We use vcr for this purpose, but there other options. If you would like to learn more about handling external requests and stubbing in general, I would recommend reading the following tutorials:
- Stubbing External Services in Rails,
- Mocking with RSpec: Doubles and Expectations, and
- Mocking in Ruby with Minitest.
Flaky tests and Ajax
I feel that I should first mention that my experience primarily comes from working with Selenium, but I assume that other tools can have similar issues.
Ajax calls can prove problematic when integration testing your web application. Code that follows an Ajax request might not give it enough time to finish. Selenium is good at waiting for your synchronous requests to finish, but if they are followed by aditional asynchrounous requests, your test will continue executing regardless of their state. Same as with timestamps, this can cause your test to be dependent on code execution speed.
Problems like these are most commonly solved by adding sleep statements, but this is a messy solution that should be avoided if at all possible. Ideally, in a test environment your Ajax requests should leave a mark on your html document. This will allow you to be aware of their status from inside your steps.
coffeescript window.inTestEnv = -> pageAttr("rails-env") == "test" return unless inTestEnv() $(document).on "js-state", (event, data) -> if $("#js-state").length == 0 $("body").prepend("
#{data.message}") else $("p#js-state").html(data.message) $(document).ajaxStart -> $(document).trigger("js-state", {message: "Ajax started"}) $(document).ajaxStop -> $(document).trigger("js-state", {message: "Ajax finished"})
This enables you to write code that waits for those markers to indicate that everything is ready before you proceed.
expect(page).to have_content("Ajax finished")
Taking this idea a step further, you can have an active Ajax request counter. With that counter in place, your test shouldn’t procede until its value is 0:
coffeescript window.inTestEnv = -> pageAttr("rails-env") == "test" return unless inTestEnv() activeAjaxCount = 0 $(document).on "js-state", (event, data) -> if $("#js-state").length == 0 $("body").prepend("
Active Ajax count: #{activeAjaxCount}") else $("p#js-state").html("Active Ajax count: #{activeAjaxCount}") $(document).ajaxStart -> activeAjaxCount += 1 $(document).trigger("js-state") $(document).ajaxStop -> activeAjaxCount -= 1 $(document).trigger("js-state")
And the expectation would change to:
expect(page).to have_content("Active Ajax count: 0")
This can slow down some tests — for instance, if your test only depends on one Ajax request and doesn’t care about the others that may or may not be running. Waiting for these requests might seem like a waste of time. I would argue that this makes your tests more consistent, since, even if you don’t explicitly wait for them, your Ajax requests will finish sometimes, and other times they won’t. This can be another source of flakiness later down the road.
Closing
Like I mentioned in the Ajax section, my experience primarily comes from the tools and frameworks I have used while working on Semaphore. However, I believe that if not all, then most of these problems aren’t exclusive to these tools and frameworks.
While working on fixing flaky tests has been a frustrating experience, I personally find it very rewarding and would go so far as to call it fun at times. Many times it pushed me to dig a little deeper and better understand the tools and code I’m working with. Usually, this is what is necessary to get to the source of the flakiness.
I would again encourage you to read our collegue Predrag’s post on the importance of factoring flaky test maintenance into your daily/monthly routine. As flaky tests are a reality of developement, it is only logical that their maintenance should be as well.
I would love to hear some of your tips on dealing with flaky tests, feel free to share them in the comments below. Also, if you found this article useful, you can share it so others can find it as well.
A good way to detect flaky tests is to randomize the order of their execution, and while you’re at it you might as well parallelize your test suite to make the cycle even faster. If you’re working on a Rails application, we’ve developed a tool to automate test parallelization for you and cut its runtime to just a few minutes — Semaphore Boosters. Learn about it here.