Faster Rails: Indexing Large Database Tables Without Downtime

· 21 Jun 2017 · Semaphore Engineering Blog

Faster Rails: Indexing Large Database Tables Without Downtime

This article is part of our Faster Rails series. Check out the previous article about proper database indexing.

As the scope and size of a Rails project grows, actions that were blazingly fast can become slow, and even downright unacceptable. The cause behind this issue can be an exponential growth of your database tables that makes lookups and updates much slower. If this is the case, adding missing indexes to your database is a cheap and easy way to drastically improve the performance of your application.

However, adding a new index to a database table that's already big can be dangerous. Don't forget, index creation on a database table is a synchronous action that prevents INSERT, UPDATE, and DELETE operations until the full index is created. If the system is a live production database, this can have severe effects. Indexing very large tables can take many hours. For a system like Semaphore, even short periods are unacceptable. If this happens during deployment, we can potentially cause an unwanted downtime for the whole system.

note: There might be a database vendor that doesn't lock the table by default. We are mostly familiar with PostgreSQL and MySQL. Both of them lock write access on your table while the index is being created.

Building Indexes Concurrently

PostgreSQL – our database of choice while developing Semaphore – has a handy option that enables us to build indexes concurrently without locking up our database.

For example, let's build an index concurrently for branches on the build model:

CREATE INDEX CONCURRENTLY idx_builds_branch ON builds USING btree (branch_id);

The main benefit of concurrent index creation is that it does not require a lock on the table to build the index tree so we can avoid the issue of accidental downtimes.

Keep in mind that while concurrent index building is a safe option for your production system, the build itself takes up to several times longer to complete. The database must perform two scans of the table, and it must wait for all existing transactions that could modify or use the index to terminate. The concurrent index build also imposes extra CPU and I/O load that might slow down other database operations.

Concurrent Index Creation in Rails

In Rails Migrations, you can use the algorithm option to trigger a concurrent index build on your database table.

For example, we recently noticed that we miss a database index for accessing our build_metrics database table from our build models, which in a snowball effect slowed down job creation on Semaphore.

Our build_metrics table is huge, counting many millions of elements, and it's also accessed very frequently. We could not risk introducing a migration that would lock this table and potentially block build processing on Semaphore.

We used the safe route, and triggered a concurrent index build:

def change
  add_index :builds, :build_metric_id, :algorithm => :concurrently

However, we immediately learned that you can't run the above from inside of a transaction. Active Record creates a transition around every migration step. To avoid this, we used the disable_ddl_transaction! introduced in Rails 4 to run this one migration without a transaction wrapper:

class AddIndexToBuildMetricIdOnBuilds < ActiveRecord::Migration

  def change
    add_index :builds, :build_metric_id, :algorithm => :concurrently

The results were phenomenal. With this simple little tweak, our job processing capabilities got around 2.5 times faster.

2.5x Job Processing

Small tweaks can sometimes bring great improvements. Premature optimization can be a huge anti-pattern, however investing in metrics and gaining a deep understanding of your system never is.

Keep building and tweaking!

At Semaphore, we're all about speed. We're on a mission to make continuous integration fast and easy. Driven by numerous conversations with our customers and our own experiences, we've built a new CI feature that can automatically parallelize any test suite and cut its runtime to just a few minutes - Semaphore Boosters. Learn more and try it out.

comments powered by Disqus

Occasional lightweight product and blog updates. Unsubscribe at any time.

© 2009-2017 Rendered Text. All rights reserved. Terms of Service, Privacy policy, Security.