As any application grows in features, running all automated tests in the
continuous integration (CI) environment begins to take a significant amount of
time. A slow CI build — anything longer than 10
minutes — takes
a toll on everyone’s focus, flow and productivity. How do you move fast when
even a trivial update or hotfix takes 15 minutes to reach production? Half an
hour? Forty-five minutes?
Today, we’re announcing Semaphore Boosters, a new CI feature that can cut a build’s
runtime from over an hour down to just a few minutes by automatically
parallelizing your test suite. It drastically speeds up
testing, and helps your team get faster feedback, save time, be more productive,
and deliver updates to users much more frequently.
What can you expect from using Semaphore Boosters?
Early adopter customers have seen some amazing improvements in productivity:
“Before we started working with Boosters, test times were one of our biggest
development bottlenecks. Running our test suite took over 90 minutes, and forced
us to try a whole host of clunky workarounds just to try to get quicker
feedback. Semaphore Boosters have given us a tremendous time savings — our full
suite now runs in just under 16 minutes. Our productivity has increased
tremendously, almost doubling our output in terms of tickets we close each week.
Using Boosters has really helped our large project feel a lot more nimble to
develop!” says Bryce Senz, CIO at Credda.
How does it work?
Watch the video below to see Semaphore Boosters in action:
In the video, we started with an application whose test suite took 8 minutes to
run. In only a few clicks, without any change in source code, we configured a CI
build that runs in only a minute and half.
Semaphore Boosters monitor your test suite and dynamically distribute your
test files across parallel jobs. This ensures best possible performance
regardless of how your code changes over time. The only thing that you as a user
need to do is select the number of parallel jobs you’d like to run.
Semaphore Boosters currently support Ruby on Rails applications via RSpec
and Cucumber. We plan to support other languages and frameworks, which we’ll
500px has been our customer since 2014, and they have been growing and evolving
along with Semaphore.
Moving fast is crucial to the 500px team. The less time they spend on testing,
the more value they can create for their users. They put new code into
production several times per day, and automated testing allows them to ensure
that new features work, while spending less time reviewing previously-tested
functionality. In order to accomplish this, they rely on Semaphore to
automatically run their tests in parallel and speed up their test suite.
Recently, Shreya Khasnis wrote a post on what happens behind the scenes and how
they maintain 500px on their blog:
“Pull requests are reviewed by developers, but also reviewed by machines! We
have a large suite of automated tests, which run when new pull requests are
opened. These tests are a great way to ensure that new features work as expected
and verify that these new changes do not break existing functionality. Given the
variety of features on our site, it would be time-consuming to test all aspects
by hand on every code change. Currently, we have about 4,000 automated tests
that are separated into threads which run simultaneously. We use a continuous
integration framework called Semaphore CI that runs these tests on every
proposed change. The tests are randomly executed, which encourages the
development of independent tests to ensure the order of execution does not
impact the expected result. This helps us parallelize the test suite into
different threads. Semaphore can also be integrated with Slack to inform
developers about tests that have passed or failed. From this, developers are
able to triage through and fix the code that broke things.”
This article is part of our Faster Rails series. Check out
the previous article about fast existence checks.
My Rails app used to be fast and snappy, and everything was working just fine for several
months. Then, slowly, as my product grew and users started to flock in, web
requests become slow and my database’s CPU usage started hitting the roof. I hadn’t
changed anything, why was my app getting slower?
Is there any cure for the issues I’m having with my application, or is Rails simply
not able to scale?
What makes your Rails application slow?
While there can be many reasons behind an application’s slowness, database
queries usually play the biggest role in an application’s performance footprint.
Loading too much data into memory, N+1 queries, lack of cached values, and the
lack of proper databases indexes are the biggest culprits that can cause slow
Missing database indexes on foreign keys and commonly searched columns or values
that need to be sorted can make a huge difference. The missing index is an
issue that is not even noticeable for tables with several thousand records.
However, when you start hitting millions of records, the lookups in the table
become painfully slow.
The role of database indexes
When you create a database column, it’s vital to consider if you will need to
find and retrieve records based on that column.
For example, let’s take a look at the internals of Semaphore. We have a
Project model, and every project has a name attribute. When someone visits a
project on Semaphore, e.g. https://semaphoreci.com/renderedtext/test-boosters,
the first thing we need to do in the projects controller is to find the project
based on its name — test-boosters.
Without an index, the database engine would need to check every record in the
projects table, one by one, until a match is found.
However, if we introduce an index on the ‘projects’ table, as in the following
example, the lookup will be much, much faster.
A good way to think about indexes is to imagine them as the index section at the
end of a book. If you want to find a word in a book, you can either read the
whole book and find the word, or your can open the index section that contains
a alphabetically sorted list of important words with a locator that points to
the page that defines the word.
What needs to be indexed?
A good rule of thumb is to create database indexes for everything that is
referenced in the WHERE, HAVING and ORDER BY parts of your SQL queries.
Indexes for unique lookups
Any lookup based on a unique column value should have an index.
We need to make sure that we create a double index:
# Bad: This will not improve the lookup speedadd_index:projects,:owner_idadd_index:projects,:owner_type# Good: This will create the proper indexadd_index:projects,[:owner_id,:owner_type]
Indexes for ordered values
Any frequently used sorting can be improved by using a dedicated index.
can be improved with a dedicated index:
Should I always use indexes?
While using indexes for important fields can immensely improve the performance of
your application, sometimes the effect can be negligible, or it can even make your
For example, tables that have elements that are frequently deleted can
negatively impact the performance of your database. Huge tables with many
millions of records also require more storage for your indexes.
Always be concious about the changes you introduce in your database, and if in
doubt, be sure to base your decisions on real world data and measurements.
One of the most important things we teach the junior programmers who join the
Semaphore team is the mindset of shipping
in small iterations. This is a simple concept, however there’s an inevitable misunderstanding
that stems from the subjective ideas of “small”. Thus, in practice we need to teach
by example what we really mean by small.
When you’re inexperienced, the desire to do and show your best work often leads to
perfectionism. In programming, perfectionism manifests itself as “I haven’t
submitted my pull request because I haven’t completed everything yet”.
Perfectionism is at odds with the goals of developing business software — giving
something useful to users, preferably sooner rather than later. Perfectionists
create imaginary obstacles and never end up building anything.
Recently, a pair of junior programmers was building a new reporting screen for our marketing team. The
screen needed to combine two sources of data for a given time range and present
a paginated view of results. The team that needed the report has never seen the
data this screen would provide. Would it hurt if the first version of the report
did not include a date picker and pagination of results beyond the top 25? Hell
no. So, we encouraged them to ship the screen without the date range and pagination. The
initial results provided more than enough value and ideas for improvement.
The marketing team had some data they could work with while the developers
continued working on the remaining tasks.
The crux of the matter lies in decomposing a task into minimal useful pieces.
Next, you estimate the complexity of each piece and communicate expectations with
the “stakeholder” (customer, client, product manager, or feature user).
Say a designer has recently updated several details that affect four distinct
screens. Would it be best to integrate these changes in four separate pull
requests, or one? This is where complexity, i.e. the time it would take to
complete each one, needs to be considered. If they would take a day each, four separate
pull requests are probably best. If all of them together would take you less than an hour to complete, go ahead and
combine them all into one pull request. Are three tasks really easy, but the fourth one requires
additional input from the designer who’s having a day off, as well as more time than all
others combined? Best to please your users with what you can finish soon, and
then do the last thing separately.
Shipping early will often provide you with surprising feedback. Perhaps the initial
version is so good that nobody really needs the stuff that’s “missing”. Or, the
whole idea didn’t really deliver what was expected and needs to be reconsidered.
The goal is to learn and help others. Just keep moving.
Fast feedback on the work we’ve done minimizes developer context switching and keeps us in the state of flow. Waiting for all
the jobs to finish in order to see that a job has failed can waste a lot of
time. If a job fails, the developer should have the option to
be notified right away, rather than wait until all the tests are run.
In order to make you more productive when building on Semaphore, we bring you fast failing as a feature.
The fast failing approach
On Semaphore, fast failing means that the developers get instant feedback when a job fails. All the running jobs of a build are stopped as soon as
one of the jobs fails. This means that you don’t need to wait for all the
other jobs to finish in order to get build feedback.
For example, if a build takes 10 minutes, the fast failing approach gives you
feedback in 1 minute, and fixing issues takes 1 minute, the entire process
along with re-building takes 12 minutes in total. With the approach that
does not allow fast failing, the entire process would take 21 minutes.
The fast failing approach also minimizes developer context switch due to faster feedback cycles.
How to enable fast failing on Semaphore
You can select the type of fast failing in the branch settings of your project.
You can either enable it for all branches, or for all branches except the
Different teams have different approaches to dealing with flaky tests.
Some even go as far as using the “Let’s run each test 10 times and
if it passes at least once, it passed” approach.
I personally think that rerunning failed tests is poisonous —
it legitimizes and encourages entropy, and rots the test suite in the long run.
The half-cocked approach
Some teams see rerunning failed tests as a very convenient short-term solution.
In my experience, there is unfortunately no such thing as ‘a short-term solution’.
All temporary solutions tend to become permanent.
Along with some other techniques that are efficient in the short term, but are
otherwise devastating, rerunning tests is very popular with a certain category of managers.
It’s particularly common in corporate environments:
there are company goals, and then there are personal goals (ladders to climb).
In such environments, some people tend to focus only on what needs to happen until the end
of the current quarter or year.
What happens later is often seen as someone else’s concern.
Looking from that perspective, test rerunning is both fast and efficient, which makes it
a desirable and convenient solution.
Keeping flaky tests and brute-forcing them to pass defeats the purpose of testing.
There is an unspoken assumption that something is wrong with the tests, and that it’s fine to just rerun them.
This assumption is dangerous.
Who’s to say that the race or the time-out that causes the flakiness is in the test, and
not in the production code?
And that it’s not affecting the customer?
The sustainable solution
The long-term solution is to either fix or replace the flaky tests.
If one developer cannot fix them, another one should try.
If a test cannot be fixed, it should be deleted and written from scratch,
preferably by somebody who didn’t see the flaky one.
Test coverage tools can be used as a kind of a safety net,
showing if some tests have been deleted without being adequately replaced.
Not being able to develop stable tests for some part of the code usually means one of
these two things — either that something is wrong with the test and/or the testing approach,
or that something is wrong with the code being tested.
If we are reasonably certain that the tests are fine, it’s time to take a deeper
look at the code itself.
Our position on flaky tests
Deleting and fixing flaky tests is a pretty aggressive measure, and rewriting tests can be time consuming.
However, not taking care of flaky tests leads to certain long-term test suite degradation.
On the other hand, there are some legitimate use-cases for flaky test reruns.
For example, when time-to-market is of essential importance, and
when technical debt is deliberately accumulated with the intention and
a clear plan on paying it off in the near future.
As a CI/CD tool vendor, we feel that our choice whether to support
rerunning failing flaky tests affects numerous customers.
Not just the way they work, but, much more importantly, the way they perceive
flaky tests and the testing process itself.
At this point, we are choosing not to support rerunning failed tests,
since our position is that this approach is harmful much more often than it is useful.
For a long time, Semaphore has been limiting your build command execution time
to a fixed 60 minutes. This restriction worked great for the majority of builds on
Semaphore, however there are some cases when this limit is simply not
Tasks such as compiling large binaries, provisioning big infrastructures, or running tests
that are difficult to parallelize sometimes require more than 60 minutes to complete.
On the other hand, developers like to restrict build duration to
prevent their test suites from getting stuck because of an accidental debug statement
or a network call that will never complete.
For this reason, we have introduced a new configuration option in the admin
section of your project’s configuration that allows you to choose a best
suited timeout for your project:
The microservice architecture has recently been gaining traction, with many
companies sharing their positive experiences with applying it. The early
adopters have been tech behemoths such as Amazon and Netflix, or companies with
huge user bases like SoundCloud. Based on the profiles of these companies and
the assumption that there’s more complexity to running and deploying many things
than to deploying a single application, many people understand microservices as
an interesting idea that does not apply to them. It’s something
that mere mortals could qualify for in the far distant future, if ever.
However, obsessing about “being ready” is rarely a good strategy in life. I
think that it’s far more useful to first learn how to detect when
the opposite approach — a monolithic application — is no longer optimal. The
knowledge that helps us to recognize the need enables us to start taking action
when the time comes for us to make the change.
This and future posts on our blog will be based on our experience of scaling up Semaphore
to manage tens of thousands of private CI jobs on a daily basis.
Overweight monoliths exhibit two classes of problems: degrading system
performance and stability, and slow development cycles. So, whatever we do comes
from the desire to escape these technical and consequently social challenges.
The single point of fragility
Today’s typical large monolithic systems started off as web applications written
in an MVC framework, such as Ruby on Rails. These systems are characterized by
either being a single point of failure, or having severe bottlenecks under
Of course, having potential bottlenecks, or having an entire system that is a single
point of failure is not inherently a problem. When you’re in month 3 of your
MVP, this is fine. When you’re working in a team of a few developers on a client
project which serves 100 customers, this is fine. When most of your app’s
functionality are well-designed CRUD operations based on human input with a linear
increase of load, things are probably going to be fine for a long time.
Also, there’s nothing inherently wrong about big apps. If you have
one and you’re not experiencing any of these issues, there’s absolutely no reason
to change your approach. You shouldn’t build microservices solely in the service
of making the app smaller — it makes no sense to replace the parts
that are doing their job well.
Problems begin to arise after your single point of failure has actually started
failing under heavy load.
At that point, having a large attack surface can start keeping the team in a
perpetual state of emergency. For example:
An outage in non-critical data processing brings down your entire website.
With Semaphore, we had events where the monolith was handling callbacks
from many CI servers, and when that part of the system failed, it brought the
entire service down.
You moved all time-intensive tasks to one huge group of background workers,
and keeping them stable gradually becomes a full-time job for a small
Changing one part of the system unexpectedly affects some other parts even
though they’re logically unrelated, which leads to some nasty surprises.
As a consequence, your team spends more time solving technical issues than
building cool and useful stuff for your users.
Slow development cycles
The second big problem is when making any change happen begins to take too
There are some technical factors that are not difficult to measure. A good question
to consider is how much time it takes your team to ship a hotfix to production.
Not having a fast delivery pipeline is painfully obvious to your users in
the case of an outage.
What’s less obvious is how much the slow development cycles are affecting
your company over a longer period of time. How long does it take your team to
get from an idea to something that customers can use in production? If the
answer is weeks or months, then your company is vulnerable to being
outplayed by competition.
Nobody wants that, but that’s where the compound effects of monolithic,
complex code bases lead to.
Slow deployment: this issue is typical for monoliths that have accumulated many
dependencies and assets. There are often multiple app instances, and we
need to replace each one without having downtime. Moving to container-based
deployment can make things even worse, by adding the time needed to build and copy
the container image.
High bus factor on the old guard, long onboarding for the newcomers: it takes
months for someone new to become comfortable with making a non-trivial
contribution in a large code base. And yet, all new code is just a small
percentile of the code that has already been written. The idiosyncrasies of old
code affect and constrain all new code that is layered on top of the old one. This
leaves those who have watched the app grow with an ever-expanding
responsibility. For example, having 5 developers that are waiting for a single
person to review their pull requests is an indicator of this.
Emergency-driven context switching: we may have begun working on a new
feature, but an outage has just exposed a vulnerability in our system. So,
healing it becomes a top priority, and the team needs to react and switch
to solving that issue. By the time they return to the initial project, internal
or external circumstances can change and reduce its impact, perhaps
even make it obsolete. A badly designed distributed system can make this even
worse — hence one of the requirements for making one is having solid design
skills. However, if all code is part of a single runtime hitting one database,
our options for avoiding contention and downtime are very limited.
Change of technology is difficult: our current framework and tooling
might not be the best match for the new use cases and the problems we
face. It’s also common for monoliths to depend on outdated software. For
example, GitHub upgraded to Rails 3 four years after it was
released. Such latency can either limit our design choices, or generate
additional maintenance work. For example, when the library version
that you’re using is no longer receiving security updates, you need to
find a way to patch it yourself.
Decomposition for fun and profit
While product success certainly helps, a development team that’s
experiencing all of these issues won’t have the highest morale. Nor will its
people be able to develop their true potential.
The root cause is simple. A monolithic application grows multiple applications
within itself, and it meets high traffic and large volumes of data.
Big problems are best solved by breaking them up into many smaller ones that are easier to handle.
This basic engineering idea is what leads teams to start decomposing large
monoliths into smaller services, and eventually into microservices. The ultimate goal is
to go back to being creative and successful by enabling the team to develop useful products as
quickly as possible.
We’re happy to announce a new Semaphore feature that
will help you monitor and improve your CI speed over time. We’re calling it
CI Speed Insights, and it’s available to all Semaphore users as of today.
Here’s how it works. On the project page, below all of your recently active
branches, you’ll find a chart. In it, you’ll be able to see the runtime
duration of recent builds and an indicator of your average CI speed, in minutes:
The chart will also include a line indicating the 10 minute mark, in case your
build runs around or longer than that.
Why 10 minutes?
We’re convinced that having a build slower than 10 minutes is not proper
continuous integration. When a build takes longer than 10 minutes, we waste too
much precious time and energy waiting, or context switching back and forth. We
merge rarely, making every deploy more risky. Refactoring is hard to do well.
You can read our full rationale about this in our
recent blog post — What is Proper Continuous
There’s more to CI Speed Insights if you click to view the details.
Measuring CI speed
We’ve defined CI speed as the average time that elapses from pushing new code to getting the
build status report. In many cases this is how long it takes for your
build setup and tests to run on Semaphore. However, another possibility is that your build
is blocked waiting for available resources, which are being used by
other builds. When this happens, you’ll see hints on Semaphore that point you to the
So, if your build runs for 4 minutes on average, but is waiting for other builds
to finish for 5 minutes on average, then the CI speed for that project is
reported as 9 minutes. This is what the screenshot above illustrates.
At the bottom of the screen, you’ll find an interactive chart of your recent builds
that shows both the build runtime and your waiting time. You can click on each one to
view full build details.
If your build lasts too long, Semaphore will recommend running tests in parallel. You
can set this up by manually configuring parallel
jobs. This is a great
way to cut down your build time.
Automatic test parallelisation
Still, wouldn’t it be great if Semaphore could automatically parallelise all
your tests for you, without any effort needed on your side? We’ve been hard at work
on making that possible. A recent beta customer managed to reduce build
time from 2 hours to 8 minutes(!).
The initial release of automatic parallelisation will support Ruby — RSpec and
Cucumber to be precise. If you’re using these tools and are interested in
setting up automatic parallelisation in your project, please get in touch with
us to schedule a personal demo. We’re excited to show
you how Semaphore can optimize the way your team runs tests.
Ruby and Rails are slow — this argument is often used to downplay the worth of
the language and the framework. This statement in itself is not false. Generally speaking,
Ruby is slower than its direct competitors such as Node.js and Python. Yet,
many businesses from small startups to platforms with millions of users use it as the
backbone of their operations. How can we explain these contradictions?
What makes your application slow?
While there can be many reasons behind making your application slow, database queries usually play the biggest role
in an application’s performance footprint. Loading too much data into memory,
N+1 queries, lack of cached values, and the lack of proper databases indexes are
the biggest culprits that cause slow requests.
There are some legitimate domains where Ruby is simply too slow. However, most of
the slow responses in our applications usually boil down to unoptimized
database calls and the lack of proper caching.
Even if your application is blazing fast today, it can become much slower in only
several months. API calls that worked just fine can suddenly start
killing your service with a dreaded HTTP 502 response. After all,
working with a database table with several hundred records is very different from working with
a table that has millions of records.
Existence checks in Rails
Existence checks are probably the most common calls that you send to your
database. Every request handler in your application probably starts with a
lookup, followed by a policy check that uses multiple dependent lookups in the
However, there are multiple ways to check the existence of a database
record in Rails. We have present?, empty?, any?, exists?, and various other
counting-based approaches, and they all have vastly different performance
In general, when working on Semaphore, I always prefer to use .exists?.
I’ll use our production database to illustrate why I prefer .exists? over the
alternatives. We will try to look up if there has been a passed build in the
last 7 days.
Let’s observe the database calls produced by our calls.
Build.where(:created_at=>7.days.ago..1.day.ago).passed.present?# SELECT "builds".* FROM "builds" WHERE ("builds"."created_at" BETWEEN# '2017-02-22 21:22:27.133402' AND '2017-02-28 21:22:27.133529') AND# "builds"."result" = $1 [["result", "passed"]]Build.where(:created_at=>7.days.ago..1.day.ago).passed.any?# SELECT COUNT(*) FROM "builds" WHERE ("builds"."created_at" BETWEEN# '2017-02-22 21:22:16.885942' AND '2017-02-28 21:22:16.886077') AND# "builds"."result" = $1 [["result", "passed"]]Build.where(:created_at=>7.days.ago..1.day.ago).passed.empty?# SELECT COUNT(*) FROM "builds" WHERE ("builds"."created_at" BETWEEN# '2017-02-22 21:22:16.885942' AND '2017-02-28 21:22:16.886077') AND# "builds"."result" = $1 [["result", "passed"]]Build.where(:created_at=>7.days.ago..1.day.ago).passed.exists?# SELECT 1 AS one FROM "builds" WHERE ("builds"."created_at" BETWEEN# '2017-02-22 21:23:04.066301' AND '2017-02-28 21:23:04.066443') AND# "builds"."result" = $1 LIMIT 1 [["result", "passed"]]
The first call that uses .present? is very inefficient. It loads all the
records from the database into memory, constructs the Active Record objects,
and then finds out if the array is empty or not. In a huge database table, this
can cause havoc and potentially load millions of records, that can even lead to
downtimes in your service.
The second and third approaches, any? and empty?, are optimized in Rails and
load only COUNT(*) into the memory. COUNT(*) queries are usually
efficient, and you can use them even on semi-large tables without any dangerous
The third approach, exists?, is even more optimized, and it should be your first
choice when checking the existence of a record. It uses the SELECT 1
... LIMIT 1 approach, which is very fast.
Here are some numbers from our production database for the above queries:
This small tweak can make your code up to 400 times faster in some cases.
If you take into account that 200ms is considered the upper
limit for an acceptable response time, you will realize that this tweak can spell the
difference between a good, sluggish, and bad user experience.
Should I always use exists?
I consider exists? a good sane default that usually has the best performance
footprint. However, there are some exceptions.
For example, if we are checking for the existence of an association record
without any scope, any? and empty? will also produce a very optimized
query that uses SELECT 1 FROM ... LIMIT 1 form, but any? fill not hit the
database again if the records are already loaded into memory.
This makes any? faster by one whole database call when the records are
already loaded into memory:
project=Project.find_by_name("semaphore")project.builds.load# eager loads all the builds into the association cacheproject.builds.any?# no database hitproject.builds.exists?# hits the database# if we bust the association cacheproject.builds(true).any?# hits the databaseproject.builds(true).exists?# hits the database
As a conclusion, my general advice is to always use exists? and improve the
code based on metrics.