Rails Testing Antipatterns: Fixtures and Factories

In the upcoming series of posts, we’ll explore some common antipatterns in writing tests for Rails applications. The presented opinions come from our experience in building web applications with Rails (we’ve been doing it since 2007) and is biased towards using RSpec and Cucumber. Developers working with other technologies will probably benefit from reading as well.

Antipattern zero: no tests at all

If your app has at least some tests, congratulations: you’re among the better developers out there. If you think that writing tests is hard — it is, but you just need a little more practice. I recommend reading Rails Testing Handbook and Effective Testing with RSpec 3 if you haven’t already.

If you don’t know how to add more tests to a large system you inherited, I recommend going through Working Effectively with Legacy Code. If you have no one else to talk to about testing in your company, there are many great people to meet at events such as CITCON.

If you recognize some of the practices discussed here in your own code, don’t worry. The methodology is evolving, and many of us have “been there and done that”. And finally, this is all just advice. If you disagree, feel free to share your thoughts in the comment section below. Now, onwards with the code.

Fixtures and factories

Using fixtures

Fixtures are Rails’ default way to prepare and reuse test data. Do not use fixtures.

Let’s take a look at a simple fixture:

# spec/fixtures/users.yml
marko:
  first_name: Marko
  last_name: Anastasov
  phone: 555-123-6788

You can use it in a test like this:

RSpec.describe User do
  fixtures :all

  describe "#full_name" do
    it "is composed of first and last name" do
      user = users(:marko)
      expect(user.full_name).to eql "Marko Anastasov"
    end
  end
end

There are a few problems with this test code:

– It is not clear where the user came from and how it is set up.
– We are testing against a “magic value” — implying something was defined in some code, somewhere else.

In practice, these shortcomings are addressed by comment essays:

RSpec.describe Dashboard do

  fixtures :all

  describe "#show" do
    # User with preferences to view posts about kittens
    # and in the group with special access to Burmese cats
    # with 4 friends that like ridgeback dogs.
    let(:user) { users(:kitten_fan) }
  end
end

Maintaining fixtures of more complex records can be tedious. I recall working on an app where there was a record with dozens of attributes. Whenever a column would be added or changed in the schema, all fixtures needed to be changed by hand. Of course I only recalled this after a few test failures.

A common solution is to use factories. If you recall from the common design patterns, factories are responsible for creating whatever you need to create, in this case records. Factory Bot is a good choice.

Factories let you maintain simple definitions in a single place, but manage all data related to the current test in the test itself when you need to. For example:

FactoryBot.define do
  factory :user do
    first_name "Marko"
    last_name  "Anastasov"
    phone "555-123-6788"
  end
end

Now your test can set the related attributes before checking for the expected outcome:

RSpec.describe User do
  describe "#full_name" do
    let(:user) { build(:user, first_name: "Johnny", last_name: "Bravo") }

    it "is composed of first and last name" do
      expect(user.full_name).to eql "Johnny Bravo"
    end
  end
end

A good factory library will let you not just create records, but easily generate unsaved model instances, stubbed models, attribute hashes, define types of records and more — all from a single definition source. Factory Bot’s getting started guide has more examples, and I also recommend you take a look at Working Effectively with Data Factories Using FactoryBot.

Factories pulling too many dependencies

Factories let you specify associations, which get automatically created. For example, this is how we say that creating a new Comment should automatically create a Post that it belongs to:

FactoryBot.define do
  factory :comment do
    post
    body "groundbreaking insight"
  end
end

Ask yourself if creating or instantiating that post in every call to the Comment factory is really necessary. It might be if your tests require a record that was saved in the database, and you have a validation on Comment#post_id. But that may not be the case with all associations.

In a large system, calling one factory may silently create many associated records, which accumulates to make the whole test suite slow (more on that later). As a guideline, always try to create the smallest amount of data needed to make the test work.

Factories that contain unnecessary data

A spec is effectively a specification of behavior. That is how we look at it when we open one. Similarly, we look at factories as definitions of data necessary for a model to function.

In the first factory example above, including phone in User factory was not necessary if there is not a validation of presence. If the data is not critical, just remove it.

Factories depending on database records

Adding a hard dependency on specific database records in factory definitions leads to build failures in CI environment. Consider the following example:

factory :active_schedule do
  start_date Date.current - 1.month
  end_date 1.month.since(Date.current)
  processing_status "processed"
  schedule_duration ScheduleDuration.find_by_name("Custom")
end

It is important to know that the code for factories is executed when the Rails test environment loads. This may not be a problem locally because the test database had been created and some kind of seed structure applied some time in the past. In the CI environment, however, the builds starts from a blank database, so you will have an error before the test suite starts to run. To reproduce and identify such issue locally, you can do db:drop followed by db:setup, and then run your tests again.

One way to fix this is to use Factory Bot’s traits:

factory :schedule_duration do
  name "Test Duration"

  trait :custom do
    name "Custom"
  end
end

factory :active_schedule do
  association :schedule_duration, :custom
end

Keep in mind that a custom schedule_duration will be created for every active_schedule, therefore this strategy will not work if schedule_duration has a uniqueness constraint.

Another way is to defer the initialization in a callback. This, however, adds an implicit requirement that test code defines the associated record before the parent:

factory :active_schedule do
  before :create do |schedule|
    schedule.schedule_duration = ScheduleDuration.find_by(name: 'Custom')
  end
end

If your application *really* requires a dependency on such records and there’s no way around the issue, consider using Rails seeds. You’ll need to set it up in RSpec to run before the test suite:

# rails_helper.rb
RSpec.configure do |config|
  config.before :suite do
    Rails.application.load_seed
  end
end

In that case, you might want to make sure your seeds clean up themselves before creating any records:

# db/seeds.rb
ScheduleDuration.delete_all
ScheduleDuration.create! name: 'Custom'

Factories with random data instead of sequences

When used alongside factories, random data generators such as faker may compromise the reliability of your test suite. Suppose the following factory definition got commited into your application:

FactoryBot.define do
  factory :category do
    name { Faker::Lorem.word.capitalize }
  end
end

Your test suite runs smoothly for months, but you suddenly start to see random exceptions in CI that look like this:

ActiveRecord::RecordNotUnique:
  SQLite3::ConstraintException: UNIQUE constraint failed: categories.name: INSERT INTO "categories" ("name", "created_at", "updated_at") VALUES (?, ?, ?)

And the exception explodes somewhere near the following line:

name { Faker::Lorem.word.capitalize }

You’ve collected a few stack traces related to this error, and by looking at the git log you notice that some problematic specs recently sneaked into the codebase. Differently from other specs, these ones have quite a lot of calls to create(:category).

It turns out that the categories.name field has a unique key constraint, but random data generated by faker isn’t guaranteed to be unique. In other words, an exception will be thrown whenever the same category name gets generated twice during a test.

Factory Bot provides a solution to this problem: sequences. A sequence keeps track of an incremental counter that can be used to generate unique names, therefore you can rest assured that you’ll wind up with unique values. Here’s how to fix the above factory with a sequence:

FactoryBot.define do
  factory :category do
    sequence(:name) { |n| "Category number #{n}" }
  end
end

While mixing unique keys with random data can be dangerous, it is not the only danger lurking in the dark: it may occur that the generated data does not meet other validation requirements, or that an obscure combination of data triggers an error that happens 1 out of 100 times, and you can’t easily figure out what combination it is.

Given that fragile tests are among the worst enemies of a test suite and that you can avoid pulling in an extra dependency such as faker for this very purpose, you might want to avoid using random data in your factories altogether.

Noisy setup

This anti-pattern is commonly found in growing and legacy applications. Suppose you are testing a database query that needs to run through a deep object graph. To make sure it returns the expected data, you wire together a few objects with the help of Factory Bot:

let(:product_1) { create(:product, name: 'iPad') }
let(:product_sale_1) { create(:product_sale, retail_price: 500, product: product_1) }
let(:product_2) { create(:product, name: 'iPhone') }
let(:product_sale_2) { create(:product_sale, retail_price: 500, product: product_2) }
let(:product_sales) { [product_sale_1, product_sale_2] }
let(:sale) { create(:sale, name: 'Apple Bundle', product_sales: product_sales) }
let(:user) { create(:user, name: 'Thiago') }
let!(:line_item) { create(:order_line_item, order: order, sale: sale) }
let(:order) { create(:order, user: user) }

it 'retrieves the expected data' do
  # Run the query and make assertions
end

As you can see, there is a lot of complexity going on and it screams at the reader. We are creating a verbose sequence of low-level records that is difficult to comprehend, and its complexity is inherent to the data model and to the query range we need to cover. Since the point of our test is to interact with the database, mocking and stubbing would not help at all. Assuming other examples require inserting more than one bundle of the same structure into the database, copying and pasting the same setup would do nothing but add more noise and thus make matters even worse.

However, that doesn’t mean we can’t express our intent clearly. To help understand what we are dealing with, we should first of all lay out the hierarchical structure of the data, which has not been made clear by the above example. We can rewire the setup like so:

# Always make sure all the data you're working with will be verified
# by the test, otherwise avoid redundancy.
let(:order_line_item) do
  create(
    :order_line_item,
    order: create(
      :order,
      user: create(:user, name: 'Thiago')
    ),
    sale: create(
      :sale,
      name: 'Apple Bundle',
      product_sales: [
        build(
          :product_sale,
          retail_price: 500,
          product: create(:product, name: 'iPad')
        ),
        build(
          :product_sale,
          retail_price: 500,
          product: create(:product, name: 'iPhone')
        )
      ]
    )
  )
end

let!(:order) { order_line_item.order }

This code is easier to understand, but the verbosity still remains. And there is a subtle problem: we are being forced to obtain the order through the order_line_item because dependencies exist between records, which means record creation needs to follow a strict order. The complexity of the data model is nakedly exposed, and we are forced to deal with it every time a similar arrangement is required.

We can make our setup look more natural by creating a helper method. First, let’s imagine what an ideal call to that helper would look like:

order = create_full_order(
  line_item: {
    sale: {
      name: 'Apple Bundle',
      products: [
        { name: 'iPad',   retail_price: 500 },
        { name: 'iPhone', retail_price: 500 }
      ]
    }
  },
  user: { name: 'Thiago' }
)

This is easier on the eyes, and it hides the complexity away by centering the attributes and relations around the order as a single abstraction. Also note that we made the product_sales relation disappear and become an internal detail. Follows a simplified implementation of the above helper:

def create_full_order(line_item:, user:)
  products_attrs = line_item[:sale].delete(:products)
  sale_attrs = line_item[:sale]
  user_attrs = user

  user = create(:user, user_attrs)
  order = create(:order, user: user)
  sale = create(:sale, sale_attrs)

  products_attrs.each do |product_attrs|
    attrs = {
      retail_price: product_attrs.delete(:retail_price),
      product: create(:product, product_attrs),
      sale: sale,
    }
    create :product_sale, attrs
  end

  create :order_line_item, order: order, sale: sale

  order
end

Because this is the first occurrence of such a setup in our test suite, we can define the helper within the spec file itself. Now we can go on creating any number of orders in a very readable fashion:

before do
  create_full_order(
    line_item: {
      sale: {
        name: 'Apple Bundle',
        products: [
          { name: 'iPad',   retail_price: 500 },
          { name: 'iPhone', retail_price: 500 }
        ]
      }
    },
    user: { name: 'Thiago' }
  )

  create_full_order(
    line_item: {
      sale: {
        name: 'Special Software Bundle',
        products: [
          { name: 'Alfred',       retail_price: 50 },
          { name: 'TextExpander', retail_price: 25 }
        ]
      }
    },
    user: { name: 'Thiago' }
  )
end

Our helper is currently limited to a single order_line_item per call, but we can make it more flexible on an as-needed basis. We can even turn the method into a class if required:

class CreateFullOrder
  include FactoryBot::Syntax::Methods

  def call(line_item:, user:)
    # do some work
  end

  # ...
end

This smell does not have a definite answer, but in most cases resorting to a local helper is a good first step. The key idea is to not worry about a grand architecture from the beginning and to do the simplest thing possible to improve your spec, and make sure your intent is clearly expressed. Over time, if you notice other specs repeating a similar arrangement, you can always extract, reuse, and evolve the helper.

Want to focus on writing code and not worry about how your tests run?

Try Semaphore, the simplest and fastest CI/CD for free.