Faster Rails: How to Check if a Record Exists
Ruby and Rails are slow — this argument is often used to downplay the worth of the language and the framework. This statement in itself is not false. Generally speaking, Ruby is slower than its direct competitors such as Node.js and Python. Yet, many businesses from small startups to platforms with millions of users use it as the backbone of their operations. How can we explain these contradictions?
What makes your application slow?
While there can be many reasons behind making your application slow, database queries usually play the biggest role in an application’s performance footprint. Loading too much data into memory, N+1 queries, lack of cached values, and the lack of proper databases indexes are the biggest culprits that cause slow requests.
There are some legitimate domains where Ruby is simply too slow. However, most of the slow responses in our applications usually boil down to unoptimized database calls and the lack of proper caching.
Even if your application is blazing fast today, it can become much slower in only several months. API calls that worked just fine can suddenly start killing your service with a dreaded HTTP 502 response. After all, working with a database table with several hundred records is very different from working with a table that has millions of records.
Existence checks in Rails
Existence checks are probably the most common calls that you send to your database. Every request handler in your application probably starts with a lookup, followed by a policy check that uses multiple dependent lookups in the database.
However, there are multiple ways to check the existence of a database
record in Rails. We have
exists?, and various other
counting-based approaches, and they all have vastly different performance
In general, when working on Semaphore, I always prefer to use
I’ll use our production database to illustrate why I prefer
.exists? over the
alternatives. We will try to look up if there has been a
passed build in the
Let’s observe the database calls produced by our calls.
Build.where(:created_at => 7.days.ago..1.day.ago).passed.present? # SELECT "builds".* FROM "builds" WHERE ("builds"."created_at" BETWEEN # '2017-02-22 21:22:27.133402' AND '2017-02-28 21:22:27.133529') AND # "builds"."result" = $1 [["result", "passed"]] Build.where(:created_at => 7.days.ago..1.day.ago).passed.any? # SELECT COUNT(*) FROM "builds" WHERE ("builds"."created_at" BETWEEN # '2017-02-22 21:22:16.885942' AND '2017-02-28 21:22:16.886077') AND # "builds"."result" = $1 [["result", "passed"]] Build.where(:created_at => 7.days.ago..1.day.ago).passed.empty? # SELECT COUNT(*) FROM "builds" WHERE ("builds"."created_at" BETWEEN # '2017-02-22 21:22:16.885942' AND '2017-02-28 21:22:16.886077') AND # "builds"."result" = $1 [["result", "passed"]] Build.where(:created_at => 7.days.ago..1.day.ago).passed.exists? # SELECT 1 AS one FROM "builds" WHERE ("builds"."created_at" BETWEEN # '2017-02-22 21:23:04.066301' AND '2017-02-28 21:23:04.066443') AND # "builds"."result" = $1 LIMIT 1 [["result", "passed"]]
The first call that uses
.present? is very inefficient. It loads all the
records from the database into memory, constructs the Active Record objects,
and then finds out if the array is empty or not. In a huge database table, this
can cause havoc and potentially load millions of records, that can even lead to
downtimes in your service.
The second and third approaches,
empty?, are optimized in Rails and
COUNT(*) into the memory.
COUNT(*) queries are usually
efficient, and you can use them even on semi-large tables without any dangerous
The third approach,
exists?, is even more optimized, and it should be your first
choice when checking the existence of a record. It uses the
... LIMIT 1 approach, which is very fast.
Here are some numbers from our production database for the above queries:
present? => 2892.7 ms any? => 400.9 ms empty? => 403.9 ms exists => 1.1 ms
This small tweak can make your code up to 400 times faster in some cases.
If you take into account that 200ms is considered the upper limit for an acceptable response time, you will realize that this tweak can spell the difference between a good, sluggish, and bad user experience.
Should I always use exists?
exists? a good sane default that usually has the best performance
footprint. However, there are some exceptions.
For example, if we are checking for the existence of an association record
without any scope,
empty? will also produce a very optimized
query that uses
SELECT 1 FROM ... LIMIT 1 form, but
any? fill not hit the
database again if the records are already loaded into memory.
any? faster by one whole database call when the records are
already loaded into memory:
project = Project.find_by_name("semaphore") project.builds.load # eager loads all the builds into the association cache project.builds.any? # no database hit project.builds.exists? # hits the database # if we bust the association cache project.builds(true).any? # hits the database project.builds(true).exists? # hits the database
As a conclusion, my general advice is to always use
exists? and improve the
code based on metrics.
At Semaphore, we’re all about speed. We’re on a mission to make continuous integration fast and easy. Driven by numerous conversations with our customers and our own experiences, we’ve built a new CI feature that can automatically parallelize any test suite and cut its runtime to just a few minutes - Semaphore Boosters. Learn more and try it out.