25 Nov 2020 · Software Engineering

    Faster Rails: Indexing Large Database Tables Without Downtime

    4 min read
    Contents

    This article is part of our Faster Rails series. Check out the previous article about proper database indexing.

    As the scope and size of a Rails project grows, actions that were blazingly fast can become slow, and even downright unacceptable. The cause behind this issue can be an exponential growth of your database tables that makes lookups and updates much slower. If this is the case, adding missing indexes to your database is a cheap and easy way to drastically improve the performance of your application.

    However, adding a new index to a database table that’s already big can be dangerous. Don’t forget, index creation on a database table is a synchronous action that prevents INSERT, UPDATE, and DELETE operations until the full index is created. If the system is a live production database, this can have severe effects. Indexing very large tables can take many hours. For a system like Semaphore, even short periods are unacceptable. If this happens during deployment, we can potentially cause an unwanted downtime for the whole system.

    note: There might be a database vendor that doesn’t lock the table by default. We are mostly familiar with PostgreSQL and MySQL. Both of them lock write access on your table while the index is being created.

    Building Indexes Concurrently

    PostgreSQL – our database of choice while developing Semaphore – has a handy option that enables us to build indexes concurrently without locking up our database.

    For example, let’s build an index concurrently for branches on the build model:

    sql
    CREATE INDEX CONCURRENTLY idx_builds_branch ON builds USING btree (branch_id);
    

    The main benefit of concurrent index creation is that it does not require a lock on the table to build the index tree so we can avoid the issue of accidental downtimes.

    Keep in mind that while concurrent index building is a safe option for your production system, the build itself takes up to several times longer to complete. The database must perform two scans of the table, and it must wait for all existing transactions that could modify or use the index to terminate. The concurrent index build also imposes extra CPU and I/O load that might slow down other database operations.

    Concurrent Index Creation in Rails

    In Rails Migrations, you can use the algorithm option to trigger a concurrent index build on your database table.

    For example, we recently noticed that we miss a database index for accessing our build_metrics database table from our build models, which in a snowball effect slowed down job creation on Semaphore.

    Our build_metrics table is huge, counting many millions of elements, and it’s also accessed very frequently. We could not risk introducing a migration that would lock this table and potentially block build processing on Semaphore.

    We used the safe route, and triggered a concurrent index build:

    def change
      add_index :builds, :build_metric_id, :algorithm => :concurrently
    end
    

    However, we immediately learned that you can’t run the above from inside of a transaction. Active Record creates a transition around every migration step. To avoid this, we used the disable_ddl_transaction! introduced in Rails 4 to run this one migration without a transaction wrapper:

    class AddIndexToBuildMetricIdOnBuilds < ActiveRecord::Migration
      disable_ddl_transaction!
    
      def change
        add_index :builds, :build_metric_id, :algorithm => :concurrently
      end
    end
    

    The results were phenomenal. With this simple little tweak, our job processing capabilities got around 2.5 times faster.

    Small tweaks can sometimes bring great improvements. Premature optimization can be a huge anti-pattern, however investing in metrics and gaining a deep understanding of your system never is.

    Keep building and tweaking!

    At Semaphore, we’re all about speed. We’re on a mission to make continuous integration fast and easy. Driven by numerous conversations with our customers and our own experiences, we’ve built a new CI feature that can automatically parallelize any test suite and cut its runtime to just a few minutes – Semaphore Boosters. Learn more and try it out.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Avatar
    Writen by:
    Chief Architect at Semaphore. A decade of experience in dev productivity, helping close to 50,000 organizations with operational excellence.