8 Dec 2021 · Software Engineering

    Change Management for Containers

    15 min read
    Contents

    Touching a working Dockerfile can feel like playing with fire. We know that an innocent-looking change can have branching, hard-to-debug consequences. It’s easy to get burned. But change is inevitable, and while commits on Dockerfiles are easy to control, the impact of those changes on the resulting image are not. Fortunately, where there’s a need, there’s a tool. So, let’s elaborate a bit more in our container-diff tutorial.

    Introducing container-diff

    Available in macOS, Linux, and Windows, container-diff (like the name suggests) is diff for container images.

    The project, developed by many of the same faces behind Container Structure Tests, does a lot more than just diffing: it can analyze container images, show installed packages, and reverse-engineer the commands used to generate them.

    Testing containers

    Container-diff has the following test modes:

    • Size: shows the total filesystem size.
    • Packages: shows a list of OS-installed packages (only for Debian-based distros), as well as those installed with pip and npm.
    • Filesystem: shows all the files in the image and their size.
    • Layer history: prints the commands that generated each of the layers in the image.

    The command to analyze an image looks like this:

    container-diff analyze [--type=TEST_TYPE] <IMAGE_NAME>

    The tool pulls the image from the registry and unpacks the filesystem into $HOME/.container-diff/cache. Then, the contents are scanned, and a report is printed out.

    So, for instance, we can analyze a PostgreSQL image with:

    $ container-diff analyze postgres:14

    -----Size-----

    Analysis for postgres:14:
    IMAGE           DIGEST                                                       SIZE
    postgres       sha256:3ee027aeb3c8bc4a5870b21 ... 6e27685ac1eab6d4ada        352.9M

    The default test is size. Change it to --type=apt to find out which OS-level packages are installed.

    $ container-diff analyze --type=apt postgres:14

    -----Apt-----

    Packages found in postgres:14:
    NAME                             VERSION                             SIZE
    -adduser                         3.118                               849K
    -apt                             2.2.4                               4.2M
    -base-files                      11.1 deb11u1                       340K
    -base-passwd                     3.5.51                             243K
    -bash                            5.1-2 b3                            6.3M
    -bsdutils                        1:2.36.1-8                         394K
    -coreutils                       8.32-4 b1                           17.1M

    ...

    -util-linux                      2.36.1-8                            4.5M
    -xz-utils                        5.2.5-2                             612K
    -zlib1g                          1:1.2.11.dfsg-2                     166K

    Similarly, you can get a list of globally-installed packages for Node and Python with --type=node and --type=pip.

    $ container-diff analyze --type=pip python:3.10-bullseye

    -----Pip-----

    Packages found in python:3.10-bullseye:
    NAME               VERSION       SIZE         INSTALLATION
    -pip               21.2.4         5.1M         /usr/local/lib/python3.10/site-packages
    -setuptools        57.5.0         2.4M         /usr/local/lib/python3.10/site-packages
    -wheel             0.37.0         94.4K       /usr/local/lib/python3.10/site-packages

    You can see every file in the image with --type=file, along with its size.

    $ container-diff analyze --type=file postgres:14

    -----File-----

    Analysis for postgres:14:
    FILE             SIZE
    /bin              5.1M
    /bin/bash         1.2M
    /bin/cat          42.9K

    ...

    /var/spool       7B
    /var/spool/mail   7B
    /var/tmp          0

    💡 Use --order to show files ordered by size instead of alphabetically.

    Finally, the history test shows the Docker layers, which roughly reflect the Dockerfile. The output of --type=history is hard to read, so we’ll format it with sed.

    $ container-diff analyze --type=history postgres:14 | sed 's/ */ /g;s/;/\n\t/g'

    -----History-----

    Analysis for postgres:14:
    -/bin/sh -c #(nop) ADD file:16dc2c6d1932194edec28d730b004fd6deca3d0f0e1a07bc5b8b6e8a1662f7af in /
    -/bin/sh -c #(nop) CMD ["bash"]
    -/bin/sh -c set -ex
     if ! command -v gpg > /dev/null
     then apt-get update
     apt-get install -y --no-install-recommends gnupg dirmngr
     rm -rf /var/lib/apt/lists/*
     fi
    -/bin/sh -c set -eux
     groupadd -r postgres --gid=999
     useradd -r -g postgres --uid=999 --home-dir=/var/lib/postgresql --shell=/bin/bash postgres
     mkdir -p /var/lib/postgresql
     chown -R postgres:postgres /var/lib/postgresql

    ...

    Comparing containers

    We’re only scratching the surface so far. Container-diff really shines when comparing images. The command for this is:

    container-diff diff [--type=TEST_TYPE] <IMAGE1> <IMAGE2>

    Let’s see some use cases for image comparison.

    Use case 1: generating a changelog

    Container-diff works great for generating changelogs. And, as we’ll see in the next section, the output format can be customized using a template.

    We can list what changed at the OS level:

    $ container-diff diff --type=size --type=apt postgres:13 postgres:14

    -----Apt-----

    Packages found only in postgres:13:
    NAME                         VERSION                 SIZE
    -postgresql-13               13.5-1.pgdg110 1        46.9M
    -postgresql-client-13        13.5-1.pgdg110 1        6.3M

    Packages found only in postgres:14:
    NAME                         VERSION                 SIZE
    -postgresql-14               14.1-1.pgdg110 1        48.9M
    -postgresql-client-14        14.1-1.pgdg110 1        7.1M

    Version differences: None

    -----Size-----

    Image size difference between postgres:13 and postgres:14:
    SIZE1         SIZE2
    350.2M        352.9M

    In the same vein, we can compare globally-installed Node packages:

    $ container-diff diff --type=node node:16 node:17

    -----Node-----

    Packages found only in node:16: None

    Packages found only in node:17: None

    Version differences:
    PACKAGE IMAGE1 (node:16) IMAGE2 (node:17)
    -npm 8.1.0, 8M 8.1.2, 8M

    Or changes in Python packages:

    $ container-diff diff --type=pip python:3.6.15-buster python:3.10-bullseye

    -----Pip-----

    Packages found only in python:3.6.15-buster:
    NAME VERSION SIZE
    -argparse 1.2.1 87.1K
    -mercurial 4.8.2 9.5M
    -wsgiref 0.1.2 98.7K

    Packages found only in python:3.10-bullseye: None

    Use case 2: troubleshooting containers

    Debugging a failing container is easy when we have a healthy image to use as a reference. To see all the file changes, run container-diff with --type=file:

    $ container-diff diff --type=file myapp/myservice:v1 myapp/myservice:v2

    -----File-----

    These entries have been added to myapp/myservice:v1:

    FILE SIZE
    /app/node_modules/fsevents 186.2K
    /app/node_modules/fsevents/LICENSE 1.1K
    /app/node_modules/fsevents/README.md 2.9K


    These entries have been deleted from myapp/myservice:v1:

    FILE SIZE
    /app/.npm/_cacache/index-v5/ce/9f/58654f1 310B
    /app/.npm/_cacache/index-v5/3d/b7/10f6556 309B
    /app/.npm/_cacache/index-v5/7e/eb/c1538ff 308B

    These entries have been changed between myapp/myservice:v1: and myapp/myservice:v2:
    FILE SIZE1 SIZE2
    /app/package-lock.json 554.6K 554.6K
    /app/node_modules/.package-lock.json 297.7K 298.1K
    /app/node_modules/clean-css/History.md 77.5K 77.8K

    Once the problematic file is identified, you can compare the files in both containers to see what changed.

    $ container-diff diff <IMAGE1> <IMAGE2> --type=file --filename=PATH/TO/FILE

    Use case 3: test-driving new containers

    You can run container-diff to preview the impact of your changes in a build. For instance, to quickly try out different base images or play with the Dockerfile. You can iterate until you’re sure you’ve got it right.

    Container-diff is not limited to images in remote repositories. You can analyze any local image by prefixing its name with daemon://.

    container-diff diff --type=TEST_TYPE daemon://IMAGE_NAME:TAG daemon://IMAGE_NAME:TAG

    Imagine that you’re building a container for a Ruby app and want to try upgrading from Ruby 2.7 to 3.0. As a Ruby developer, you know what to expect from the language side, but can you say the same about the container?

    To answer the question, let’s compare the respective Ruby images:

    $ container-diff diff --type=size --type=apt ruby:2.7.4-bullseye ruby:3.0.2-bullseye

    -----Apt-----

    Packages found only in ruby:2.7.4-bullseye: None

    Packages found only in ruby:3.0.2-bullseye: None

    Version differences: None

    -----History-----

    Docker history lines found only in ruby:2.7.4-bullseye:
    -/bin/sh -c #(nop) ENV RUBY_MAJOR=2.7
    -/bin/sh -c #(nop) ENV RUBY_VERSION=2.7.4
    -/bin/sh -c #(nop) ENV RUBY_DOWNLOAD_SHA256=2a80824e0ad6100826b69b9890bf55cfc4cf2b61a1e1330fccbcb30c46cef8d7


    Docker history lines found only in ruby:3.0.2-bullseye:
    -/bin/sh -c #(nop) ENV RUBY_MAJOR=3.0
    -/bin/sh -c #(nop) ENV RUBY_VERSION=3.0.2
    -/bin/sh -c #(nop) ENV RUBY_DOWNLOAD_SHA256=570e7773100f625599575f363831166d91d49a1ab97d3ab6495af44774155c40

    -----Size-----

    Image size difference between ruby:2.7.4-bullseye and ruby:3.0.2-bullseye:
    SIZE1 SIZE2
    819.2M 835.8M

    Compare that with changing the OS flavor in the Node image. What happens if you want to swap out Bullseye for Bullseye Slim?

    $ container-diff diff --type=size --type=apt --type=node node:17-bullseye node:17-bullseye-slim

    -----Apt-----

    Packages found only in node:17-bullseye:
    NAME VERSION SIZE
    -autoconf 2.69-14 1.8M
    -automake 1:1.16.3-2 1.8M
    -autotools-dev 20180224.1 nmu1 157K
    ...

    -----Node-----

    Packages found only in node:17-bullseye: None

    Packages found only in node:17-bullseye-slim: None

    Version differences: None

    -----Size-----

    Image size difference between node:17-bullseye and node:17-bullseye-slim:
    SIZE1 SIZE2
    942.9M 230.7M

    Comparing regular Bullseye vs. Slim shows that:

    • Node stays the same.
    • Slim image is about 12 MB smaller.
    • The smaller image has a long list of missing packages.

    This information will help you decide which is the best version for you. It makes sense to pick Slim in order to reduce the attack surface if you don’t need the extra packages.

    Extending and customizing container-diff

    When the default text output is not enough, we can write an output template. You can see the examples in the built-in template file.

    The --format option lets us customize how information is printed out, giving us a way to export the data to other formats, such as CSV:

    $ container-diff diff python:3.9-bullseye python:3.10-bullseye --type=pip --format='
    package,{{.Image1}},{{.Image2}}
    {{range .Diff.InfoDiff}}{{.Package}},{{range .Info1}}{{.Version}}{{end}},{{range .Info2}}{{.Version}}{{"\n"}}{{end}}{{end}}
    '

    package,python:3.9-bullseye,python:3.10-bullseye
    pip,21.2.4,21.2.4
    setuptools,57.5.0,57.5.0
    wheel,0.37.0,0.37.0

    When custom formats are not enough, container-diff can be extended by writing your own differ. You’ll need solid knowledge of Go for that, though.

    Automated container testing with CI/CD

    How does container-diff help us deploy safely? Well, if you’re doing continuous integration, you’re probably deploying several times a day, which means each new container is only a little bit different from the previous one.

    Following that logic, we can assume that if too many things change at once, it may be a signal that further analysis is needed before deployment. Maybe some unexpected file snuck into the build and the image size doubled, or the base image was updated in the registry and unexpectedly shipped with different libraries.

    container-diff comparing two images

    We have to strike the right balance between stability and mutability. Every team will have different thresholds but, as a starting point, let’s say that we’ll reject images that:

    • Grow more than 10% in size.
    • Have different OS libraries.
    • Have different globally-installed Node packages.
    • Were built from a different Dockerfile.

    Gauging change rate between images

    We can evaluate the changes by running container-diff with --json and processing the output. The format is:

    {
    "Image1": "foo",
    "Image2": "bar",
    "DiffType": "Test_Type",
    "Diff": {
    // Differences Object
    }
    }

    We can process the report with a combination of shell scripts and jq, the JSON Query CLI tool. First, run all the tests at once and save the output in a file:

    $ container-diff --type=size --type=apt --type=node --type=history --json > diff.json

    Then, pipe the output to jq. You can filter the results per test by selecting DiffType. Use the following command to see the APT changes:

    $ jq '.[] | select(.DiffType=="Apt")' diff.json

    You can get the total number of changed packages by appending .Diff.Packages1 + .Diff.Packages2 | length to the query.

    $ jq '.[] | select(.DiffType=="Apt") | .Diff.Packages1 + .Diff.Packages2 | length' diff.json

    💡 You can try jq online at jq play.

    Once we have all the jq queries ready, we can write a script that runs the differ, filters the results, and fails if the changes exceed certain thresholds.

    #!/bin/bash
    # Compare container and stop pipeline when changes exceed control parameters
    # Parameters expected:
    # $ALLOWED_APT_CHANGES - max number of allowed APT packages changed
    # $ALLOWED_HISTORY_CHANGES - max number of Dockerfile commands changed
    # $ALLOWED_NPM_CHANGES - max number of NPM packages changed
    # $MAX_GROWTH_RATIO - percentual growth size allowed (0 is no growth, 100 is double size)

    set -ex

    image1=$1
    image2=$2

    diffile=$(mktemp XXXXXX.json)

    container-diff diff \
    --type=history --type=node --type=size --type=apt --json \
    "$image1" \
    "$image2" \
    > ${diffile}

    changes_apt=$(jq '.[] | select(.DiffType=="Apt") | .Diff.Packages1 + .Diff.Packages2 | length' ${diffile})

    changes_history=$(jq '.[] | select(.DiffType=="History") | .Diff.Adds + .Diff.Dels | length' ${diffile})

    changes_npm=$(jq '.[] | select(.DiffType=="Node") | .Diff.Packages1 + .Diff.Packages2 | length' ${diffile})

    # When sizes are equal jq returns a string "null"
    size1=$(jq '.[] | select(.DiffType=="Size") | .Diff[0].Size1 ' ${diffile})
    if [ "$size1" = "null" ]
    then
    size_ratio=0
    else
    size_ratio=$(jq '.[] | select(.DiffType=="Size") | 100 * .Diff[0].Size2 / .Diff[0].Size1 - 100 | floor' ${diffile})
    fi

    # Evaluate thresholds
    if [ $changes_apt -gt $ALLOWED_APT_CHANGES ] \
    || [ $changes_history -gt $ALLOWED_HISTORY_CHANGES ] \
    || [ $changes_npm -gt $ALLOWED_NPM_CHANGES ] \
    || [ $size_ratio -gt $MAX_GROWTH_RATIO ]
    then
    exit 1
    else
    echo OK
    fi

    Adding a change-control job to CI/CD

    Where were we? Let’s see, we have two images and a script to compare them. What we need now is a CI/CD pipeline that builds the image. Semaphore has the capabilities that we want for this task. If you’ve never used Semaphore before, I recommend checking out the getting started guide.

    Open the workflow editor and add a block after the container image build step. Then, add the following commands in the job:

    curl -LO https://storage.googleapis.com/container-diff/latest/container-diff-linux-amd64
    sudo install container-diff-linux-amd64 /usr/local/bin/container-diff
    echo "${DOCKER_PASSWORD}" | docker login -u "${DOCKER_USERNAME}" --password-stdin
    checkout
    chmod a+x container-diff-test.sh && ./container-diff-test.sh "${DOCKER_USERNAME}"/mycontainer:latest "${DOCKER_USERNAME}"/mycontainer:$SEMAPHORE_WORKFLOW_ID

    This job installs container-diff in the CI machine, logs in the Docker Hub registry (you’ll need to activate a secret), clones the repository, and runs the comparison script. Change the parameters in container-diff-test.sh as needed. In this case, we’re comparing the latest image against the one tagged with the unique id $SEMAPHORE_WORKFLOW_ID.

    Setting up a container control job
    Container control job

    That’s it! You can complete the pipeline with the deployment method of your choice.

    An example pipeline.
    Example pipeline

    If you need inspiration for setting up a deployment, check these resources to learn how you can deploy with Semaphore:

    Wrapping up

    Container-diff is yet another quality tool to keep containers in check. Remember, when using containers, you’re responsible for the whole mini OS that comes with them, not just the code.

    Increase your Docker-fu with these posts:

    Thank you for reading!

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Avatar
    Writen by:
    I picked up most of my skills during the years I worked at IBM. Was a DBA, developer, and cloud engineer for a time. After that, I went into freelancing, where I found the passion for writing. Now, I'm a full-time writer at Semaphore.