Lightweight Docker Images in 5 Steps
Lightweight Docker Images Speed Up Deployment
Deploying your services packaged in lightweight Docker images has many practical
benefits. In a container, your service usually comes with all the
dependencies it needs to run, it’s isolated from the rest of the system,
and deployment is as simple as running a
docker run command on the target
However, most of the benefits of dockerized services can be negated if your Docker images are several gigabytes in size and/or they take several minutes to boot up. Caching Docker layers can help, but ideally you want to have small and fast containers that can be deployed and booted in a mater of minutes, or even seconds.
The first time we used Docker at Rendered Text to package one of Semaphore services, we made many mistakes that resulted in a huge Docker image that was painful for deployment and maintenance. However, we didn’t give up, and, step by step, we improved our images.
We’ve managed to make a lot of improvements since our first encounter with Docker, and we’ve successfully reduced the footprint of our images from several gigabytes to around 20 megabytes for our latest microservices, with boot times that are always under 3 seconds.
Our First Docker Service
You might be wondering how a Docker image can possibly be larger than a gigabyte. When you take a standard Rails application — with gems, assets, background workers and cron jobs — and package it using a base image that comes with everything but the kitchen sink preinstalled, you will surely cross the 1 GB threshold.
We started our Docker journey with a service that used Capistrano for deployment. To make our transition easy, we started out with a base Docker image that resembled our old workflow. The phusion/passenger-full image was a great candidate, and we managed to package up our application very quickly.
A big downside of using
passenger-full was that it’s around 300 MB in size.
When you add all of your application’s dependency gems, which can easily
be around 300 MB in size, you are already starting at around 600 MB.
The deployment of that image took around 20 minutes, which is an unacceptable time frame if you want to be happy with your continuous delivery pipeline. However, this was a good first step.
We knew that we could do better.
Step 1: Use Fewer Layers
One of the first things you learn when building your Docker images is that you should squash multiple Docker layers into one big layer.
Let’s take a look at the following
Dockerfile, and demonstrate why it’s
better to use fewer layers in a Docker image:
FROM ubuntu:14.04 RUN apt-get update -y # Install packages RUN apt-get install -y curl RUN apt-get install -y postgresql RUN apt-get install -y postgresql-client # Remove apt cache to make the image smaller RUN rm -rf /var/lib/apt/lists/* CMD bash
When we build the image with
docker build -t my-image ., we get an image that is 279
MB in size. With
docker history my-image we can list the layers of our
$ docker history my-image IMAGE CREATED CREATED BY SIZE 47f6bd778b89 7 minutes ago /bin/sh -c #(nop) CMD ["/bin/sh" "-c" "bash" 0 B 3650b449ca91 7 minutes ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B 0c43b2bf2d13 7 minutes ago /bin/sh -c apt-get install -y postgresql-client 1.101 MB ce8e5465213b 7 minutes ago /bin/sh -c apt-get install -y postgresql 56.72 MB b3061ed9d53a 7 minutes ago /bin/sh -c apt-get install -y curl 11.38 MB ee62ceeafb06 8 minutes ago /bin/sh -c apt-get update -y 22.16 MB ff6011336327 3 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B <missing> 3 weeks ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.895 kB <missing> 3 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B <missing> 3 weeks ago /bin/sh -c set -xe && echo '#!/bin/sh' > /u 194.6 kB <missing> 3 weeks ago /bin/sh -c #(nop) ADD file:4f5a660d3f5141588d 187.8 MB
There are several things to note in the output above:
RUNcommand creates a new Docker layer
apt-get updatecommand increases the image size by 23 MB
rm -rf /var/lib/apt/lists/*command doesn’t reduce the size of the image
When working with Docker, we need to keep in mind that any layer added to the image
is never removed. In other words, it’s smarter to update the apt cache, install
some packages, and remove the cache in a single Docker
Let’s see if we can reduce the size of our image with this technique:
FROM ubuntu:14.04 RUN apt-get update -y && \ apt-get install -y curl postgresql postgresql-client && \ rm -rf /var/lib/apt/lists/* CMD bash
Hooray! After the successful build, the size of our image dropped to 250 megabytes.
We’ve just reduced the size by 25 MB just by joining the installation commands in our
Step 2: Make Container Boot Time Predictable
This step describes an anti-pattern that you should avoid in your deployment pipeline.
When working on a Rails-based application, the biggest portion of your Docker images will be gems and assets. To circumvent this, you can try to be clever and place your gems outside of the container.
For example, you can run the Docker image by mounting a directory on the host machines, and cache the gems between two subsequent runs of your Docker image.
FROM ruby WORKDIR /home/app ADD . /home/app CMD bundle install --path vendor/bundle && bundle exec rails server
Let’s build such an image. Notice that we use the
CMD keyword, which means
that our gems will be installed every time we run our Docker image. The build
step only pushes the source code into the container.
docker build -t rails-image .
When we start our image this time around, it will first install the gems, and then start our Rails server.
docker run -tdi rails-image
Now, let’s use a volume to cache the gems between each run of our image. We
will achieve this by mounting an external folder in our Docker image with
-v /tmp/gems:vendor/bundle option.
docker run -v /tmp/gems:vendor/bundle -tdi rails-image
Hooray! Or is it?
The technique above looks promising, but in practice, it turns out to be a bad idea. Here are some reasons why:
Your Docker images are not stateless. If you run the image twice, you can experience different behaviour. This is not ideal because it makes your deployment cycle more exciting than it should be.
Your boot time can differ vastly depending on the content of your cache directory. For bigger Rails projects, the boot time can range from several seconds up to 20 minutes.
We have tried to build our images with this technique, but we ultimately had to drop this idea because of the above drawbacks. As a rule of thumb, predictable boot time and immutability of your images outweigh any speed improvement you may gain by extracting dependencies from your containers.
Step 3: Understand and Use Docker Cache Effectively
When creating your first Docker image, the most obvious choice is to use the same commands you would use in your development environment.
For example, if you’re working on a Rails project, you would probably want to use the following:
FROM ruby WORKDIR /home/app ADD . /home/app RUN bundle install --path vendor/bundle RUN bundle exec rake asset:precompile CMD bundle exec rails server
However, by doing this, you will effectively wipe every cached layer, and start from scratch on every build.
New Docker layers are created for every
COPY command. When you
build a new image, Docker first checks if a layer with the same content and
history exists on your machine. If it already exists, Docker reuses it. If it doesn’t exist,
Docker needs to create a new layer.
In the above example,
ADD . /home/app creates a new layer even if you make
the smallest change in your source code. Then, the next command
RUN bundle install --path vendor/bundle always needs to do a fresh install
of every gem because the history of your cached layers has changed.
To avoid this, it’s better to just add the
Gemfile first, since it changes
rarely compared to the source code. Then, you should install all the gems, and add your
source code on top of it.
FROM ruby WORKDIR /tmp/gems ADD Gemfile /tmp/gems/Gemfile RUN bundle install --path vendor/bundle WORKDIR /home/app ADD . /home/app RUN bundle exec rake asset:precompile RUN mv /tmp/gems/vendor/bundle vendor/bundle CMD bundle exec rails server
With the above technique, you can shorten the build time of your image and reduce the number of layers that need to be uploaded on every deploy.
Step 4: Use a Small Base Image
Big base images are great when you’re starting out with Docker, but you’ll eventually want to move on to smaller images that contain only the packages that are essential for your application.
For example, if you start with
phusion/passenger-full, a logical next step
would be to try out
phusion/baseimage-docker and enable only the packages that are
necessary. We followed this path too, and we successfully reduced the size of our
Docker images by 200 megabytes.
But why stop there? You can also try to run your image on a
ubuntu image. Then, as a next step, go and try out
debian that’s only
around 80 MB in size.
You will notice that every time you reduce the image size, some of the dependencies will be missing, and you will probably need to spend some time on figuring out how to install them manually. However, this is only a one-time issue, and once you’ve resolved it, you can most likely enjoy faster deployment for several months to follow.
ubuntu for several months too. However, as we moved away from
Ruby as our primary language and started using Elixir, we knew that we could go
Being a compiled language, Elixir has a nice property that the resulting
compiled binaries are self-contained, and can pretty much run on any Linux
distribution. This is when the
alpine image becomes an awesome candidate.
The base image is only 5 MB, and if you compile your Elixir application, you can achieve images that are only around 25 MB in size. This is awesome comparing to the 1.5 GB beast from the beginning of our Docker journey.
With Go, which we also use occasionally, we could go even further and build an image
FROM scratch, and achieve 5 MB-sized images.
Step 5: Build Your Own Base Image
Using lightweight images can be a great way to improve build and deployment performance, but bootstrapping a new microservice can be painful if you need to remember to install a bunch of packages before you can use it.
If you create new services frequently, building your own customized base Docker image can bring great improvement. By building your own custom Docker image and publishing it on DockerHub, you can have small images, that are also easy to use.
Keep Learning, Docker is Great
Switching to a Docker-based development and deployment environment can be tricky at first. You can even get frustrated and think that Docker is not for you. However, if you persist and do your best to learn some good practices including how to make and keep your Docker images lightweight, the Docker ecosystem will reward you with speed, stability and reliability you’ve never experienced before.
P.S. If you’re looking for a continuous integration and deployment solution that works great with Docker — including fully-featured toolchain support, registry integrations, image caching, and fast image builds — try Semaphore for free.