Watch the replay of our CI/CD Express Conference 2023 →

    11 Mar 2022 · Software Engineering

    Lightweight Docker Images in 5 Steps

    10 min read

    Lightweight Docker Images Speed Up Deployment

    Deploying your services packaged in lightweight Docker images has many practical benefits. In a container, your service usually comes with all the dependencies it needs to run, it’s isolated from the rest of the system, and deployment is as simple as running a docker run command on the target system.

    However, most of the benefits of dockerized services can be negated if your Docker images are several gigabytes in size and/or they take several minutes to boot up. Caching Docker layers can help, but ideally, you want to have small and fast containers that can be deployed and booted in a matter of minutes, or even seconds.

    The first time we used Docker at Rendered Text to package one of Semaphore services, we made many mistakes that resulted in a huge Docker image that was painful for deployment and maintenance. However, we didn’t give up, and, step by step, we improved our images.

    We’ve managed to make a lot of improvements since our first encounter with Docker, and we’ve successfully reduced the footprint of our images from several gigabytes to around 20 megabytes for our latest microservices, with boot times that are always under 3 seconds.

    Our First Docker Service

    You might be wondering how a Docker image can possibly be larger than a gigabyte. When you take a standard Rails application — with gems, assets, background workers and cron jobs — and package it using a base image that comes with everything but the kitchen sink preinstalled, you will surely cross the 1 GB threshold.

    We started our Docker journey with a service that used Capistrano for deployment. To make our transition easy, we started out with a base Docker image that resembled our old workflow. The phusion/passenger-full image was a great candidate, and we managed to package up our application very quickly.

    A big downside of using passenger-full was that it’s around 300 MB in size. When you add all of your application’s dependency gems, which can easily be around 300 MB in size, you are already starting at around 600 MB.

    The deployment of that image took around 20 minutes, which is an unacceptable time frame if you want to be happy with your continuous delivery pipeline. However, this was a good first step.

    We knew that we could do better.

    Step 1: Use Fewer Layers

    One of the first things you learn when building your Docker images is that you should squash multiple Docker layers into one big layer.

    Let’s take a look at the following Dockerfile, and demonstrate why it’s better to use fewer layers in a Docker image:

    FROM ubuntu:14.04
    RUN apt-get update -y
    # Install packages
    RUN apt-get install -y curl
    RUN apt-get install -y postgresql
    RUN apt-get install -y postgresql-client
    # Remove apt cache to make the image smaller
    RUN rm -rf /var/lib/apt/lists/*
    CMD bash

    When we build the image with docker build -t my-image ., we get an image that is 279 MB in size. With docker history my-image we can list the layers of our Docker image:

    $ docker history my-image
    IMAGE               CREATED             CREATED BY                                      SIZE
    47f6bd778b89        7 minutes ago       /bin/sh -c #(nop)  CMD ["/bin/sh" "-c" "bash"   0 B
    3650b449ca91        7 minutes ago       /bin/sh -c rm -rf /var/lib/apt/lists/*          0 B
    0c43b2bf2d13        7 minutes ago       /bin/sh -c apt-get install -y postgresql-client 1.101 MB
    ce8e5465213b        7 minutes ago       /bin/sh -c apt-get install -y postgresql        56.72 MB
    b3061ed9d53a        7 minutes ago       /bin/sh -c apt-get install -y curl              11.38 MB
    ee62ceeafb06        8 minutes ago       /bin/sh -c apt-get update -y                    22.16 MB
    ff6011336327        3 weeks ago         /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B
               3 weeks ago         /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB
               3 weeks ago         /bin/sh -c rm -rf /var/lib/apt/lists/*          0 B
               3 weeks ago         /bin/sh -c set -xe   && echo '#!/bin/sh' > /u   194.6 kB
               3 weeks ago         /bin/sh -c #(nop) ADD file:4f5a660d3f5141588d   187.8 MB

    There are several things to note in the output above:

    1. Every RUN command creates a new Docker layer
    2. The apt-get update command increases the image size by 23 MB
    3. The rm -rf /var/lib/apt/lists/* command doesn’t reduce the size of the image

    When working with Docker, we need to keep in mind that any layer added to the image is never removed. In other words, it’s smarter to update the apt cache, install some packages, and remove the cache in a single Docker RUN command.

    Let’s see if we can reduce the size of our image with this technique:

    FROM ubuntu:14.04
    RUN apt-get update -y && \
        apt-get install -y curl postgresql postgresql-client && \
        rm -rf /var/lib/apt/lists/*
    CMD bash

    Hooray! After the successful build, the size of our image dropped to 250 megabytes. We’ve just reduced the size by 25 MB just by joining the installation commands in our Dockerfile.

    Step 2: Make Container Boot Time Predictable

    This step describes an anti-pattern that you should avoid in your deployment pipeline.

    When working on a Rails-based application, the biggest portion of your Docker images will be gems and assets. To circumvent this, you can try to be clever and place your gems outside of the container.

    For example, you can run the Docker image by mounting a directory on the host machines, and cache the gems between two subsequent runs of your Docker image.

    FROM ruby
    WORKDIR /home/app
    ADD . /home/app
    CMD bundle install --path vendor/bundle && bundle exec rails server

    Let’s build such an image. Notice that we use the CMD keyword, which means that our gems will be installed every time we run our Docker image. The build step only pushes the source code into the container.

    docker build -t rails-image .

    When we start our image this time around, it will first install the gems, and then start our Rails server.

    docker run -tdi rails-image

    Now, let’s use a volume to cache the gems between each run of our image. We will achieve this by mounting an external folder in our Docker image with -v /tmp/gems:vendor/bundle option.

    docker run -v /tmp/gems:vendor/bundle -tdi rails-image

    Hooray! Or is it?

    The technique above looks promising, but in practice, it turns out to be a bad idea. Here are some reasons why:

    1. Your Docker images are not stateless. If you run the image twice, you can experience different behaviour. This is not ideal because it makes your deployment cycle more exciting than it should be.
    2. Your boot time can differ vastly depending on the content of your cache directory. For bigger Rails projects, the boot time can range from several seconds up to 20 minutes.

    We have tried to build our images with this technique, but we ultimately had to drop this idea because of the above drawbacks. As a rule of thumb, predictable boot time and immutability of your images outweigh any speed improvement you may gain by extracting dependencies from your containers.

    Step 3: Understand and Use Docker Cache Effectively

    When creating your first Docker image, the most obvious choice is to use the same commands you would use in your development environment.

    For example, if you’re working on a Rails project, you would probably want to use the following:

    FROM ruby
    WORKDIR /home/app
    ADD . /home/app
    RUN bundle install --path vendor/bundle
    RUN bundle exec rake asset:precompile
    CMD bundle exec rails server

    However, by doing this, you will effectively wipe every cached layer, and start from scratch on every build.

    New Docker layers are created for every ADD, RUN and COPY command. When you build a new image, Docker first checks if a layer with the same content and history exists on your machine. If it already exists, Docker reuses it. If it doesn’t exist, Docker needs to create a new layer.

    In the above example, ADD . /home/app creates a new layer even if you make the smallest change in your source code. Then, the next command RUN bundle install –path vendor/bundle always needs to do a fresh install of every gem because the history of your cached layers has changed.

    To avoid this, it’s better to just add the Gemfile first, since it changes rarely compared to the source code. Then, you should install all the gems, and add your source code on top of it.

    FROM ruby
    WORKDIR /tmp/gems
    ADD Gemfile /tmp/gems/Gemfile
    RUN bundle install --path vendor/bundle
    WORKDIR /home/app
    ADD . /home/app
    RUN bundle exec rake asset:precompile
    RUN mv /tmp/gems/vendor/bundle vendor/bundle
    CMD bundle exec rails server

    With the above technique, you can shorten the build time of your image and reduce the number of layers that need to be uploaded on every deploy.

    Step 4: Use a Small Base Image

    Big base images are great when you’re starting out with Docker, but you’ll eventually want to move on to smaller images that contain only the packages that are essential for your application.

    For example, if you start with phusion/passenger-full, a logical next step would be to try out phusion/baseimage-docker and enable only the packages that are necessary. We followed this path too, and we successfully reduced the size of our Docker images by 200 megabytes.

    But why stop there? You can also try to run your image on a base ubuntu image. Then, as a next step, go and try out debian that’s only around 80 MB in size.

    You will notice that every time you reduce the image size, some of the dependencies will be missing, and you will probably need to spend some time on figuring out how to install them manually. However, this is only a one-time issue, and once you’ve resolved it, you can most likely enjoy faster deployment for several months to follow.

    We used ubuntu for several months too. However, as we moved away from Ruby as our primary language and started using Elixir, we knew that we could go even lighter.

    Being a compiled language, Elixir has a nice property that the resulting compiled binaries are self-contained, and can pretty much run on any Linux distribution. This is when the alpine image becomes an awesome candidate.

    The base image is only 5 MB, and if you compile your Elixir application, you can achieve images that are only around 25 MB in size. This is awesome comparing to the 1.5 GB beast from the beginning of our Docker journey.

    With Go, which we also use occasionally, we could go even further and build an image FROM scratch, and achieve 5 MB-sized images.

    Step 5: Build Your Own Base Image

    Using lightweight images can be a great way to improve build and deployment performance, but bootstrapping a new microservice can be painful if you need to remember to install a bunch of packages before you can use it.

    If you create new services frequently, building your own customized base Docker image can bring great improvement. By building your own custom Docker image and publishing it on DockerHub, you can have small images, that are also easy to use.

    Keep Learning, Docker is Great

    Switching to a Docker-based development and deployment environment can be tricky at first. You can even get frustrated and think that Docker is not for you. However, if you persist and do your best to learn some good practices including how to make and keep your Docker images lightweight, the Docker ecosystem will reward you with speed, stability and reliability you’ve never experienced before.

    P.S. If you’re looking for a continuous integration and deployment solution that works great with Docker — including fully-featured toolchain support, registry integrations, image caching, and fast image builds — try Semaphore for free.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Writen by:
    Chief Architect at Semaphore. A decade of experience in dev productivity, helping close to 50,000 organizations with operational excellence.