Design an Effective Build Stage for Continuous Integration

The first move towards releasing software is the build stage. This practical guide will show you the general principles of build automation and reference implementations using Semaphore. Optimizing your build stage will accelerate your development workflow and cut down costs.

What happens in a build stage

In continuous integration (CI), this is where we build the application for the first time. The build stage is the first stretch of a CI/CD pipeline, and it automates steps like downloading dependencies, installing tools, and compiling.

Besides building code, build automation includes using tools to check that the code is safe and follows best practices. The build stage usually ends in the artifact generation step, where we create a production-ready package. Once this is done, the testing stage can begin.

We’ll be covering testing in-depth in future articles (subscribe to the newsletter so you don’t miss them). Today, we’ll focus on build automation.

Build automation verifies that the application, at a given code commit, can qualify for further testing. We can divide it into three parts:

Compilation: the first step builds the application.
Linting: checks the code for programmatic and stylistic errors.
Code analysis: using automated source-checking tools, we control the code’s quality.
Artifact generation: the last step packages the application for release or deployment.

Step 1: Compilation

Step one of the build stage compiles the application. On compiled languages, it means that we can generate a working binary, whereas on interpreted languages, we confirm that we have the required dependencies and tools to successfully build the application.

The result can be a binary file, a code tarball, an installable package, a website, a container image — the main thing is that we have something that we can run and test.

Step 2: Analyzing Your Code

Programming requires discipline. We must solve hard problems while following good practices and abiding by a common coding style. Automated analysis tools help you keep out gnarly code at bay. In general terms, there are three classes of code analysis tools:

Linting: linters improve code quality by pointing problematic and hard-to-maintain bits. Linters use a set of rules to enforce a unified standard that improves readability for the team.
Quality: this class of tools uses metrics to find places where code can be improved. The collected metrics include the number of lines, documentation coverage, and complexity level.
Security: security analysis tools scan the code, flag parts that may cause vulnerabilities, and check that dependencies don’t have known security issues.

Since no one wants to ship unsafe or buggy code, we must design a CI pipeline that stops when errors are found. When the CI pipeline runs the right code analysis tools, developers who review code can focus on more valuable questions such as “Is this code solving the right problem?” or “Does this code increase our technical debt?“.

Step 3: Preparing your Application for Release

The released package will include everything needed to run or install the application, including:

Application code.
Installation scripts.
Dependencies and libraries.
Application metadata.
Documentation.
License information.

The final package can be published on a website, in a package database such as npmjs.com, in a container registry, or can be directly deployed into a QA or production environment.

Useful Semaphore commands for build automation

A CI/CD pipeline consists of blocks and jobs. In Semaphore, jobs are where your commands and code run, so it’s here where we define the commands for each step in the build stage. While setting up build automation, you’ll find the following Semaphore commands useful:

checkout: clones and CDs into the Git repository. It usually belongs near the top of the job.
sem-version: activates a specific version of a language. Run it to ensure the CI environment uses the same version you’re coding with.
cache: stores your downloaded libraries, intermediate, and compiled files. The cache is a short-lived space used to pass files between jobs and workflows.
artifact: the artifact command lets store files on a more permanent basis. You can access these directly from the website. Semaphore provides three levels for artifacts:
- job: used to store logs and other debugging details.
- workflow: used to persist files between jobs.
- project: used to store the final project output.

How to implement a build stage with Semaphore

Shall we see a few practical examples of build automation in a CI pipeline? Let’s roll up our sleeves and write some code.

But first, a note about how I’m writing the commands:

Commands starting with $ are meant to run in your local development machine.
Commands without the dollar sign are for you to copy into your CI jobs.

👉 Jump to the example that interests you the most:

A build stage for Java Applications
A build stage for JavaScript applications
A build stage for Python Applications
Creating Docker images in the build stage

Example 1: A build stage for Java applications

Java is a compiled language, so the build stage must compile the source into JVM bytecode. Here’s a starter pipeline:

java build stage — A Java build stage. Linting and code analysis run in parallel in the same block.

Compilation block

The leading popular build tools in Java are Apache Maven and Gradle, both are included in Semaphore’s machines. Let’s go with Maven; it handles dependencies and compiles the source with a single command: mvn compile. Use these commands in your CI job to build the application:

sem-version java 11
checkout
cache restore
mvn compile
cache store

The cache command will detect the compiled output in the target directory and store it for future use. Ideally, you should set the environment variable MAVEN_OPTS to -Dmaven.repo.local=.m2, so downloaded dependencies are also cached.

Linting with style

Java developers use powerful IDEs that lint code in realtime. Powerful as they are, they are not enough. Quality checks must also be part of the CI pipeline. We can use checkstyle to add a linting job.

These commands download checkstyle and run it on a file called MyFile.java:

sem-version java 11
wget https://github.com/checkstyle/checkstyle/releases/download/checkstyle-8.41/checkstyle-8.41-all.jar
java -jar checkstyle-8.41-all.jar -c /sun_checks.xml MyFile.java

Or, if you are using Maven, there’s a plugin for automatic linting. To enable it, add the maven.checkstyle.plugin to properties in your pom.xml.

<properties>
    <maven.checkstyle.plugin.version>3.1.2</maven.checkstyle.plugin.version>
</properties>

Then configure the plugin to fail on error. Add the following snippet to the build section in your pom.xml:

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-checkstyle-plugin</artifactId>
            <version>${maven.checkstyle.plugin.version}</version>
            <configuration>
                <failsOnError>true</failsOnError>
                <failOnViolation>true</failOnViolation>
            </configuration>
        </plugin>
    </plugins>
</build>

Now you can run checkstyle with:

sem-version java 11
checkout
mvn checkstyle:checkstyle

Finding Mistakes with PMD

Another nice checker to use in your CI environment is PMD. PMD works with Java, JavaScript, and Oracle’s SQL. You can scan code with the default rules for Java with the following:

sem-version java 11
checkout
wget https://github.com/pmd/pmd/releases/download/pmd_releases%2F6.32.0/pmd-bin-6.32.0.zip
unzip pmd-bin-6.32.0.zip
./pmd-bin-6.32.0/bin/run.sh pmd -d . -R rulesets/java/quickstart.xml -f text

Security analysis with Infer

Infer was created by Facebook to find code that can lead to runtime errors, things such as race conditions and resource leaks. Infer works with Java and Android. Facebook has open-sourced this tool and uses it to fix their mobile client and the main Facebook app.

To run Infer on your CI, download the binary and build the application. Infer will output a report on the infer-out folder, which you can store for later analysis. The following command will make the pipeline stop when issues are found:

checkout
cache restore
mvn clean
curl -sSL "https://github.com/facebook/infer/releases/download/v1.0.0/infer-linux64-v1.0.0.tar.xz" | tar -xJ
./infer-linux64-v1.0.0/bin/infer run -- mvn package
artifact push job --expire-in 1w infer-out
./infer-linux64-v1.0.0/bin/infer analyze --fail-on-issue

Finding bugs with Find Security Bugs

Find Security Bugs uses a security database to detect almost 140 different vulnerability types in Java web applications.

To use this tool, add this snippet to the plugin section in your pom.xml:

<plugin>
     <groupId>com.github.spotbugs</groupId>
     <artifactId>spotbugs-maven-plugin</artifactId>
     <version>4.0.4</version>
     <configuration>
         <effort>Max</effort>
         <threshold>Low</threshold>
         <failOnError>true</failOnError>
         <includeFilterFile>${session.executionRootDirectory}/spotbugs-security-include.xml</includeFilterFile>
         <excludeFilterFile>${session.executionRootDirectory}/spotbugs-security-exclude.xml</excludeFilterFile>
         <plugins>
             <plugin>
                 <groupId>com.h3xstream.findsecbugs</groupId>
                 <artifactId>findsecbugs-plugin</artifactId>
                 <version>1.10.1</version>
             </plugin>
         </plugins>
     </configuration>
</plugin>

Then, create two files: spotbugs-include.xml and spotbugs-exclude.xml. The contents of the first one are:

<FindBugsFilter>
     <Match>
         <Bug category="SECURITY"/>
     </Match>
</FindBugsFilter>

You can exclude rules in the exclude file:

<FindBugsFilter></FindBugsFilter>

Commit the changes into your repository. Then, create a job that generates the report and pushes it to the artifact store:

checkout
cache restore 
mvn spotbugs:spotbugs 
artifact push job target/spotbugsXml.xml  --expire-in 1w 
mvn spotbugs:check

Building a JAR

We can generate a JAR file with: mvn package. Since we already tested the application, we can skip tests.

sem-version java 11
checkout 
cache restore 
mvn package 
cache store

The output JAR is uploaded to the artifact store, where it can be downloaded or used in continuous delivery and deployment pipelines.

👉 For a complete example of setting up a CI/CD pipeline for Java applications, see CI/CD for Spring Boot Microservices.

Example 2: A build stage for JavaScript applications

JavaScript code doesn’t need compiling. The build stage will only download dependencies, then lint the code, and finally, build a package.

Filling up node_modules

We can install JavaScript modules with npm:

checkout 
npm ci

The npm ci command creates a clean copy of the node_modules folder. Unfortunately, as a result we can’t use the cache at this stage. npm ci is the cleanest and safest way of installing modules.

But if you want or need to use the cache, you’ll need to run npm install instead of npm ci:

sem-version node 14.16 
checkout 
cache restore 
npm install 
cache store

If the project includes a .nvmrc config file, you can replace sem-version with nvm use:

checkout 
nvm use 
cache restore 
npm install 
cache store

The CI machine includes yarn as well:

sem-version node 14.16 
checkout 
cache restore 
yarn install 
cache store

Linting with ESLint and JSHint

JavaScript has come a long way since the Netscape years. It grew from a language whipped up in less than ten days into the closest thing we have to a lingua franca — almost all devices have JavaScript support.

As you can imagine, the language changed a lot over time, and not all its parts are good. Using a linter will help us stay away from the bad parts of JavaScript. In my experience, the ESlint and JSHint linters integrate very well into the CI environment. Any of these can be installed with npm install --save-dev.

For ESLint, you’ll need to create a configuration with:

$ npx eslintrc --init

Next, add it to the script section of package.json:

"scripts": {
     "lint": "eslint '**/*.js' --ignore-pattern node_modules",
     ...
 }

After committing the changes, use the following to run the linter:

sem-version node 14.16 
checkout 
npm run lint

Security audits: npm and AuditJS

Security audits are built-in npm. You can check your dependencies with npm audit. Note, however, that you must set a threshold level in order for the command to fail on error:

sem-version node 14.16 
checkout 
cache restore 
npm audit --audit-level=high

If you find security issues, try running npm audit fix to attempt automatic resolution.

An alternative to npm audit is AuditJS. This tool uses the OSS index of vulnerabilities, which feeds from many security databases, so results are more accurate.

You can add AuditJS to your project with:

$ npm install --save-dev auditjs

Then use the following command to run the test:

sem-version node 14.16
checkout
npx auditjs ossi

One nice feature AuditJS has is that it allows us to whitelist security issues. This is great, for instance, when we want to ignore a vulnerability because we know it doesn’t affect us.

For that, create a JSON file detailing the vulnerability ID and a reason for ignoring it:

{
  "ignore": [
    {
        "id": "78a61524-80c5-4371-b6d1-6b32af349043",
        "reason": "Vulnerability doesn't affect our code"
    }
  ]
}

Then run AuditJS with the whitelist enabled:

sem-version node 14.16
checkout
auditjs ossi --whitelist my-whitelist.json

Creating Node packages

Use npm pack to create a tarball you can distribute, or use npm publish to upload the module to npmjs.com.

checkout
tarfile=$(npm pack)
artifact push project --force "$tarfile"

👉 For a complete example of setting up a CI/CD pipeline for a JavaScript application, see How To Build and Deploy a Node.js Application To Kubernetes or Continuous Integration with Deno.

Example 3: A build stage for Python applications

We don’t need to run any compile commands in Python. As a result, the build job, like JavaScript’s, will only download and cache dependencies.

Installing modules

In this example, we run pip to install the project dependencies in the .pip_cache folder:

sem-version python 3.9
checkout
cache restore
pip download --cache-dir .pip-cache -r requirements.txt
cache store

Linting with Pylint

Python developers are used to following strict coding standards. One good way of checking for good style is Pylint. Pylint ruleset includes checks for unused modules, variable name formats, and line length. It strictly adheres to Python’s PEP 8 style guide.

Like most linters, Pylint can be integrated into the IDE. But, in this case, we’ll use it in the pipeline to check commit quality. To show lint errors in a single package or file use:

sem-version python 3.9
checkout
pylint -E [MODULE|PYTHON_FILE]

You can scan every Python file in the repository with the following command:

sem-version python 3.9
checkout
git ls-files | grep -E '.py$' | xargs pylint -E

Complexity analysis with Xenon

One thing PEP 8 style guide cannot capture is cyclomatic complexity, a measure of how many independent paths there are in the codebase. To calculate it, you can use a tool like Xenon.

To get started with Xenon, add it to your Python project with:

$ pip install xenon
$ pip freeze > requirements.txt

Xenon uses a ranking system that goes from A to F, where A is the lowest complexity level. In the job, you can supply the minimum level permitted before failing.

sem-version python 3.9
checkout
cache restore
xenon --max-absolute C --max-modules B --max-average A

The example will make the job fail if the average complexity is lower than A, the complexity in the modules goes lower than B, or the complexity of a block is lower than C.

Packing for distribution

Python’s package format allows you to upload modules to the official PyPI index or distribute it on your website.

To create a package, use the following commands:

sem-version python 3.9
python -m pip install --upgrade build
checkout
python -m build
artifact push project --force dist

👉 For a complete Python pipeline see Python Continuous Integration and Deployment From Scratch.

Example 4: Creating Docker images in the build stage

Docker is a container platform that standardizes build automation by letting us package any software regardless of the language. By building a Docker image, we get a portable application we can run on any machine, as long as it has a container runtime installed. It also allows us to run the application at scale on orchestration platforms like Docker Swarm or Kubernetes.

Semaphore includes everything you need to build a Docker image. You only need to supply a Dockerfile with the build instructions.

The easiest way of writing a Dockerfile is by starting from a base image. Semaphore provides images for popular languages in the Semaphore Container Registry. For example, this Dockerfile is for packaging a Python application:

# use image in Semaphore container registry
FROM registry.semaphoreci.com/python:3.9

# copy source files
ADD . ./opt/
WORKDIR /opt/

# install dependencies
RUN pip install --upgrade pip
RUN pip install -r requirements.txt

# start application
CMD ["python","app.py"]

Use these commands in the CI job to build the image:

checkout
docker build -t my-image .

It works, but the image is lost as soon as the job completes. We must upload it into a registry in order to keep it. And for that, you have to supply login information for an image registry with a secret.

To create a secret, click on your account badge on the top right corner and then Settings.

Semaphore Settings menu — Where to find Settings menu.

Then click on Secrets > New Secret.

Create a secret called dockerhub with your Docker Hub username and password:

create a secret on Semaphore — Create a secret to store Docker Hub credentials

Now you can access the username and password in the job:

echo "${DOCKER_PASSWORD}" | docker login -u "${DOCKER_USERNAME}" --password-stdin

Next, you can pull the last image to speed up the build process:

docker pull $DOCKER_USERNAME/awesome-application:latest || true

Finally, build and push the image:

docker build -t $DOCKER_USERNAME/awesome-application:latest --cache-from=$DOCKER_USERNAME/awesome-application:latest .
docker push $DOCKER_USERNAME/awesome-application:latest

The complete job commands are:

echo "${DOCKER_PASSWORD}" | docker login -u "${DOCKER_USERNAME}" --password-stdin
docker pull $DOCKER_USERNAME/awesome-application:latest || true
docker build -t $DOCKER_USERNAME/awesome-application:latest --cache-from=$DOCKER_USERNAME/awesome-application:latest .
docker push $DOCKER_USERNAME/awesome-application:latest

Don’t forget to activate the secret in your job:

What we have seen are the basic building blocks for a Docker image. The process is mostly the same for any language. The only thing that changes are the contents of the Dockerfile. We have a few in-depth tutorials and example Dockerfiles for you to peruse below:

What’s Next

We learned what build automation is and saw a few examples. The principles always remain the same, regardless of the language or framework used.

The principle used in all examples is that the build stage runs in a pipeline. By running fast code checks right after compilation— and before the automated test suite—you optimize your continuous integration process for fast feedback.

I hope these recommendations help you optimize your build process. Happy building!

2 thoughts on “Design an Effective Build Stage for Continuous Integration”

Rachit Zambre says:

December 20, 2022 at 9:57 am

These stages are the most important in terms of continuous testing

Sanjay Kumar says:

May 25, 2023 at 12:49 pm

Thanks, Thomas! Your explanations were thorough and easily understandable. What stood out to me was the way you seamlessly integrated code examples and practical demonstrations throughout the article. This is one of the best web development tutorials.