In 2013, Solomon Hykes showed a demo of the first version of Docker during the PyCon conference in Santa Clara. Since then, the benefits of Docker containers have spread to seemingly every corner of the software industry. While Docker (the project and the company) made containers so popular, they were not the first project to leverage containers out there; and they are definitely not the last either.
Five years later, we can hopefully see beyond the hype as some powerful, efficient patterns emerged to leverage containers to develop and ship better software, faster.
I spent seven years working for Docker, and I was running containers in production back when Docker was still dotCloud. I had the privilege to help many organizations and teams to get started with containers, and I would like to share a few things here.
First, the kind of benefits that you can expect from implementing Docker containers.
Then, a realistic roadmap that any organization can follow realistically, to attain these benefits.
What benefits of Docker can we expect?
Containers will not instantly turn our monolithic, legacy applications into distributed, scalable microservices.
Containers will not transform overnight all our software engineers into “DevOps engineers”. Notably, because DevOps is not defined by our tools or skills, but rather by a set of practices and cultural changes.
So what can containers do for us?
Set up development environments in minutes
One of my favorite demos with Docker and its companion tool Compose is to show how to run a complex app locally, on any machine, in less than five minutes.
It sums up to:
$ git clone https://github.com/jpetazzo/dockercoins $ cd dockercoins $ docker-compose up
You can run these three lines on any machine where Docker is installed (Linux, macOS, Windows), and in a few minutes, you will get the DockerCoins demo app up and running. I wrote DockerCoins in 2015; it has multiple components written in Python, Ruby, and Node.js, as well as a Redis store. Years later, without changing anything in the code, we can still bring it up with the same three commands.
This means that onboarding a new team member, or switching from a project to another, can now be quick and reliable. It doesn’t matter if DockerCoins is using Python 2.7 and Node.js 8 while your other apps are using Python 3 and Node.js 10, or if your system is using even different versions of these languages; each container is perfectly isolated from the others and from the host system.
We will see how to get there.
Deploy easily in the cloud or on premises
After we build container images, we can run them consistently on any server environment. Automating server installation would usually require steps (and domain knowledge) specific to our infrastructure. For instance, if we are using AWS EC2, we may use AMI (Amazon Machine Images), but these images are different (and built differently) from the ones used on Azure, Google Cloud, or a private OpenStack cluster.
Configuration management systems (like Ansible, Chef, Puppet, or Salt) help us by describing our servers and their configuration as manifests that live in version-controlled source repositories. This helps, but writing these manifests is no easy task, and they don’t guarantee reproducible execution. These manifests have to be adapted when switching distros, distro versions, and sometimes even from a cloud provider to another, because they would use different network interface or disk naming, for instance.
Once we have installed the Docker Engine (the most popular option), it can run any container image and effectively abstract these environment discrepancies.
The ability to stage up new environments easily and reliably gives us exactly what we need to set up continuous integration and continuous deployment. We will see how to get there. Ultimately, it means that advanced techniques, such as blue/green deployments, or immutable infrastructure, become accessible to us, instead of being the privilege of larger organizations able to spend a lot of time to build their perfect custom tooling. I’m (enviously) looking at you, Netflix!
A roadmap to adopting Docker
I’m now going to share with you a roadmap that works for organizations and teams of all size, regardless of their existing knowledge of containers. Even better, this roadmap will give you tangible benefits at each step, so that the gains realized give you more confidence into the whole process.
Sounds too good to be true?
Here is the quick overview, before I dive into the details:
- Write one Dockerfile. (We will pick the one service where this will have the most impact.)
- Write more Dockerfiles. (The goal is to get a whole applications in containers.)
- Write a Compose file. (Now anyone can get this app running on their machine in minutes.)
- Make sure that all developers are on board. (Do they all have a Docker setup in good condition?)
- Use this to facilitate QA and end-to-end testing.
- Automate this process: congratulations, you are now doing continuous deployment to staging.
- The last logical step is continuous deployment to production.
Each step is a self-contained iteration. Some steps are easy, others are more work; but each of them will improve your workflow.
Choosing the first project to Dockerize
A good candidate for our first Dockerfile is a service that is a pain in the neck to build, and moves quickly. For instance, that new Rails app that we’re building, and where we’re adding or updating dependencies every few days as we’re adding features. Pure Ruby dependencies are fine, but as soon as we rely on a system library, we will hit the infamous “works on my machine (not on yours)” problem, between the developers who are on macOS, and those who are on Linux, for instance. Docker will help with that.
Another good candidate is an application that we are refactoring or updating, and where we want to make sure that we are using the latest version of the language or framework; without breaking the environment for everything else.
If we have a component that is tricky enough to require a tool like Vagrant to run on our developer’s machines, it’s also a good hint that Docker can help there. While Vagrant is an amazing product, there are many scenarios where maintaining a Dockerfile is easier than maintaining a Vagrantfile; and running Docker is also easier and lighter than running Vagrant boxes.
Writing our first Dockerfile
There are various ways to write our first Dockerfile, and none of them is inherently right or wrong. Some people prefer to follow the existing environment as close as possible. We’re currently using PHP 7.2 with Apache 2.4, and have some very specific Apache configuration and
.htaccess files? Sure, we can put that in containers. But if we prefer to start anew from our
.php files, serve them with PHP FPM, and host the static assets from a separate NGINX container (an incredibly powerful and scalable combination!), that’s fine too. Either way, the official PHP images got us covered.
During this phase, we’ll want to make sure that the team working on that service has Docker installed on their machine, but only a few people will have to meddle with Docker at this point. They will be leveling the field for everyone else.
Once we have a working Dockerfile for that app, we can start using this container image as the official development environment for this specific service or component. If we picked a fast-moving one, we will see the benefits very quickly, since Docker makes library and other dependency upgrades completely seamless. Rebuilding the entire environment with a different language version now becomes effortless. And if we realize after a difficult upgrade that the new version doesn’t work as well, rolling back is just as easy and instantaneous, because Docker keeps a cache of previous image builds around.
Writing more Dockerfiles
The next step is to get an entire application in containers.
Don’t get me wrong: we are not talking about production (yet), and even if your first experiments go so well that you want to roll out some containers to production, you can do so selectively, only for some components. In particular, it is advised to keep databases and other stateful services outside of containers until you gain more operational experience.
But in development, we want everything in containers, including the precious databases (because the ones sitting on our developers’ machines don’t, or shouldn’t, contain any precious data anyway).
We will probably have to write a few more Dockerfiles, but for standard services like Redis, MySQL, PostgreSQL, MongoDB, and many more, we will be able to use standard images from the Docker Hub. These images often come with special provisions to make them easy to extend and customize; for instance the official PostgreSQL image will automatically run
.sql files placed in the suitable directory (to pre-load our database with table structure or sample data).
Once we have Dockerfiles (or images) for all the components of a given application, we’re ready for the next step.
Writing a Compose file
A Dockerfile makes it easy to build and run a single container; a Compose file makes it easy to build and run a stack of multiple containers.
So once each component runs correctly in a container, we can describe the whole application with a Compose file.
This gives us the very simple workflow that we mentioned earlier:
$ git clone https://github.com/jpetazzo/dockercoins $ cd dockercoins $ docker-compose up
Compose will analyze the file
docker-compose.yml, pull the required images, and build the ones that need to. Then it will create a private bridge network for the application, and start all the containers in that network. Why use a private network for the application? Isn’t that a bit overkill?
Since Compose will create a new network for each app that it starts, this lets us run multiple apps next to each other (or multiple versions of the same app) without any risk of interference.
This pairs with Docker’s service discovery mechanism, which relies on DNS. When an application needs to connect to, say, a Redis server, it doesn’t need to specify the IP address of the Redis server, or its FQDN. Instead, it can just use
redis as the server host name. For instance, in PHP:
$redis = new Redis(); $redis->connect('redis', 6379);
Docker will make sure that the name
redis resolves to the IP address of the Redis container in the current network. So multiple applications can each have a
redis service, and the name
redis will resolve to the right one in each network.
Once we have that Compose file, it’s a good time to make sure that everyone is on board; i.e. that all our developers have a working installation of Docker. Windows and Mac users will find this particularly easy thanks to Docker Desktop.
Our team will need to know a few Docker and Compose commands; but in many scenarios, they will be fine if they only know
docker-compose up --build. This command will make sure that all images are up-to-date, and run the whole application, showing its log in the terminal. If we want to stop the app, all we have to do is hit
At this point, we are already benefiting immensely from Docker and containers: everyone gets a consistent development environment, up and running in minutes, independently of the host system.
For simple applications that don’t need to span multiple servers, this would almost be good enough for production; but we don’t have to go there yet, as there are other fields where we take advantage of Docker without the high stakes associated with production.
End-to-end testing and QA
When we want to automate a task, it’s a good idea to start by having it done by a human, and write down the necessary steps. In other words: do things manually first, but document them. Then, these instructions can be given to another person, who will execute them. That person will probably ask us some clarifying questions, which will allow us to refine our manual instructions.
Once these manual instructions are perfectly accurate, we can turn them into a program (a simple script will often suffice) that we can then execute automatically.
My suggestion is to follow these principles to deploy test environments, and execute CI (Continuous Integration) or end-to-end testing (depending on the kind of tests that you use in your organization). Even if you don’t have automated testing, I guess that you have some kind of testing happening before you ship a feature (even if it’s just someone messing around with the app in staging before your users see it).
In practice, this means that we will document (and then automate) the deployment of our application, so that anyone can get it up and running by running a script.
The example that we gave above involved 3 lines, but in a real application, we might have other steps. On the first run, we probably want to populate the database with initial objects; on subsequent runs, we might have to run database migrations (when a release changes the database schema).
Our final deployment scripts will certainly have more than 3 lines, but they will also be way simpler (to write and to run) than full-blown configuration management manifests, VM images, and so on.
If we have a QA team, they are now empowered to test new releases without relying on someone else to deploy the code for them!
Don’t get me wrong: I’m fully aware that many QA teams are perfectly capable of deploying code themselves; but as projects grow in complexity, we tend to get more specialized in our respective roles, and it’s not realistic to expect our whole QA team to have both solid testing skills and “5 years of experience with Capistrano, Puppet, Terraform”.
If you’re doing any kind of unit testing or end-to-end testing, you can now automate these tasks as well, by following the same principle as we did to automate the deployment process.
We now have a whole sequence of actions: building images, starting containers, executing initialization or migration hooks, running tests … From now on, we will call this the pipeline, because all these actions have to happen in a specific order, and if one of them fails, we don’t execute the subsequent stages.
Continuous Deployment to staging
The next step is to hook our pipeline to our source repository, to run it automatically on our code when we push changes to the repository.
If we’re using a system like GitHub or GitLab, we can set it up to notify us (through a webhook) each time someone opens (or updates) a pull request. We could also monitor a specific branch, or a specific set of branches.
Each time there are relevant changes, our pipeline will automatically:
- build new images,
- run unit tests on these images (if applicable),
- deploy them in a temporary environment,
- run end-to-end tests on the application,
- make the application available for human testing.
If we had to build this from scratch, this would certainly be a lot of work; but with the roadmap that I described, we can get there one step at a time, while enjoying concrete benefits at each step.
Note that we still don’t require container orchestration for all of this to work. If our application (in a staging environment) can fit on a single machine, we don’t need to worry about setting up a cluster (yet). In fact, thanks to Docker’s layer system, running side-by-side images that share a common ancestry (which will be the case for images corresponding to successive versions of the same component) is very disk- and memory-efficient; so there is a good chance that we will be able to run many copies of our app on a single Docker Engine.
But this is also the right time to start looking into orchestration, and platforms like Docker Swarm or Kubernetes. Again, I’m not suggesting that we roll that out straight to production; but that we use one of these orchestrators to deploy the staging versions of our application.
This will give us a low-risk environment where we can ramp up our skills on container orchestration and scheduling, while having the same level of complexity (minus the volume of requests and data) that our production environment.
Continuous Deployment to production
It might be a while before we go from the previous stage to the next, because we need to build confidence and operational experience.
However, at this point, we already have a continuous deployment pipeline that takes every pull request (or every change in a specific branch or set of branches) and deploys the code on a staging cluster, in a fully automated way.
Of course, we need to learn how to collect logs, and metrics, and how to face minor incidents and major outages; but eventually, we will be ready to extend our pipeline all the way to the production environment.
An additional benefit of more Docker containers: less risks
Independently of our CI/CD pipeline, we may want to use containers in production for other reasons. One of the benefits of using Docker is that containers can help us to reduce the risks associated with a new release.
When we start a new version of our app (by running the corresponding image), if something goes wrong, rolling back is very easy. All we have to do is stop the container, and restart the previous version. The image for the previous version will still be around and will start immediately.
This is way safer than attempting a code rollback, especially if the new version implied some dependency upgrades. Are we sure that we can downgrade to the previous version? Is it still available on the package repositories? If we are using containers, we don’t have to worry about that, since our container image is available and ready.
This pattern is sometimes called immutable infrastructure, because instead of changing our services, we deploy new ones. Initially, immutable infrastructure happened with virtual machines: each new release would happen by starting a new fleet of virtual machines. Containers make this even easier to use.
As a result, we can deploy with more confidence, because we know that if something goes wrong, we can easily go back to the previous version.
Making Docker benefits a reality
For a practical example of implementing a CI/CD pipeline for a Docker-based project, see a recent article on Semaphore blog on implementing CI/CD for microservices.