14 Jun 2022 · Software Engineering

    Design an Effective Build Stage for Continuous Integration

    16 min read
    Contents

    The first move towards releasing software is the build stage. This practical guide will show you the general principles of build automation and reference implementations using Semaphore. Optimizing your build stage will accelerate your development workflow and cut down costs.

    What happens in a build stage

    In continuous integration (CI), this is where we build the application for the first time. The build stage is the first stretch of a CI/CD pipeline, and it automates steps like downloading dependencies, installing tools, and compiling.

    Besides building code, build automation includes using tools to check that the code is safe and follows best practices. The build stage usually ends in the artifact generation step, where we create a production-ready package. Once this is done, the testing stage can begin.

    build stage
    The build stage starts from code commit and runs from the beginning up to the test stage

    We’ll be covering testing in-depth in future articles (subscribe to the newsletter so you don’t miss them). Today, we’ll focus on build automation.

    Build automation verifies that the application, at a given code commit, can qualify for further testing. We can divide it into three parts:

    1. Compilation: the first step builds the application.
    2. Linting: checks the code for programmatic and stylistic errors.
    3. Code analysis: using automated source-checking tools, we control the code’s quality.
    4. Artifact generation: the last step packages the application for release or deployment.

    Step 1: Compilation

    Step one of the build stage compiles the application. On compiled languages, it means that we can generate a working binary, whereas on interpreted languages, we confirm that we have the required dependencies and tools to successfully build the application.

    The result can be a binary file, a code tarball, an installable package, a website, a container image — the main thing is that we have something that we can run and test.

    Step 2: Analyzing Your Code

    Programming requires discipline. We must solve hard problems while following good practices and abiding by a common coding style. Automated analysis tools help you keep out gnarly code at bay. In general terms, there are three classes of code analysis tools:

    • Linting: linters improve code quality by pointing problematic and hard-to-maintain bits. Linters use a set of rules to enforce a unified standard that improves readability for the team.
    • Quality: this class of tools uses metrics to find places where code can be improved. The collected metrics include the number of lines, documentation coverage, and complexity level.
    • Security: security analysis tools scan the code, flag parts that may cause vulnerabilities, and check that dependencies don’t have known security issues.

    Since no one wants to ship unsafe or buggy code, we must design a CI pipeline that stops when errors are found. When the CI pipeline runs the right code analysis tools, developers who review code can focus on more valuable questions such as “Is this code solving the right problem?” or “Does this code increase our technical debt?“.

    Step 3: Preparing your Application for Release

    The released package will include everything needed to run or install the application, including:

    • Application code.
    • Installation scripts.
    • Dependencies and libraries.
    • Application metadata.
    • Documentation.
    • License information.

    The final package can be published on a website, in a package database such as npmjs.com, in a container registry, or can be directly deployed into a QA or production environment.

    Useful Semaphore commands for build automation

    A CI/CD pipeline consists of blocks and jobs. In Semaphore, jobs are where your commands and code run, so it’s here where we define the commands for each step in the build stage. While setting up build automation, you’ll find the following Semaphore commands useful:

    • checkout: clones and CDs into the Git repository. It usually belongs near the top of the job.
    • sem-version: activates a specific version of a language. Run it to ensure the CI environment uses the same version you’re coding with.
    • cache: stores your downloaded libraries, intermediate, and compiled files. The cache is a short-lived space used to pass files between jobs and workflows.
    • artifact: the artifact command lets store files on a more permanent basis. You can access these directly from the website. Semaphore provides three levels for artifacts:
      • job: used to store logs and other debugging details.
      • workflow: used to persist files between jobs.
      • project: used to store the final project output.

    How to implement a build stage with Semaphore

    Shall we see a few practical examples of build automation in a CI pipeline? Let’s roll up our sleeves and write some code.

    But first, a note about how I’m writing the commands:

    • Commands starting with $ are meant to run in your local development machine.
    • Commands without the dollar sign are for you to copy into your CI jobs.

    👉 Jump to the example that interests you the most:

    1. A build stage for Java Applications
    2. A build stage for JavaScript applications
    3. A build stage for Python Applications
    4. Creating Docker images in the build stage

    Example 1: A build stage for Java applications

    Java is a compiled language, so the build stage must compile the source into JVM bytecode. Here’s a starter pipeline:

    java build stage
    A Java build stage. Linting and code analysis run in parallel in the same block.

    Compilation block

    The leading popular build tools in Java are Apache Maven and Gradle, both are included in Semaphore’s machines. Let’s go with Maven; it handles dependencies and compiles the source with a single command: mvn compile. Use these commands in your CI job to build the application:

    sem-version java 11
    checkout
    cache restore
    mvn compile
    cache store

    The cache command will detect the compiled output in the target directory and store it for future use. Ideally, you should set the environment variable MAVEN_OPTS to -Dmaven.repo.local=.m2, so downloaded dependencies are also cached.

    build stage
    Setting up environment variables in the build block.

    Linting with style

    Java developers use powerful IDEs that lint code in realtime. Powerful as they are, they are not enough. Quality checks must also be part of the CI pipeline. We can use checkstyle to add a linting job.

    These commands download checkstyle and run it on a file called MyFile.java:

    sem-version java 11
    wget https://github.com/checkstyle/checkstyle/releases/download/checkstyle-8.41/checkstyle-8.41-all.jar
    java -jar checkstyle-8.41-all.jar -c /sun_checks.xml MyFile.java

    Or, if you are using Maven, there’s a plugin for automatic linting. To enable it, add the maven.checkstyle.plugin to properties in your pom.xml.

    <properties>
        <maven.checkstyle.plugin.version>3.1.2</maven.checkstyle.plugin.version>
    </properties>

    Then configure the plugin to fail on error. Add the following snippet to the build section in your pom.xml:

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-checkstyle-plugin</artifactId>
                <version>${maven.checkstyle.plugin.version}</version>
                <configuration>
                    <failsOnError>true</failsOnError>
                    <failOnViolation>true</failOnViolation>
                </configuration>
            </plugin>
        </plugins>
    </build>

    Now you can run checkstyle with:

    sem-version java 11
    checkout
    mvn checkstyle:checkstyle

    Finding Mistakes with PMD

    Another nice checker to use in your CI environment is PMD. PMD works with Java, JavaScript, and Oracle’s SQL. You can scan code with the default rules for Java with the following:

    sem-version java 11
    checkout
    wget https://github.com/pmd/pmd/releases/download/pmd_releases%2F6.32.0/pmd-bin-6.32.0.zip
    unzip pmd-bin-6.32.0.zip
    ./pmd-bin-6.32.0/bin/run.sh pmd -d . -R rulesets/java/quickstart.xml -f text

    Security analysis with Infer

    Infer was created by Facebook to find code that can lead to runtime errors, things such as race conditions and resource leaks. Infer works with Java and Android. Facebook has open-sourced this tool and uses it to fix their mobile client and the main Facebook app.

    To run Infer on your CI, download the binary and build the application. Infer will output a report on the infer-out folder, which you can store for later analysis. The following command will make the pipeline stop when issues are found:

    checkout
    cache restore
    mvn clean
    curl -sSL "https://github.com/facebook/infer/releases/download/v1.0.0/infer-linux64-v1.0.0.tar.xz" | tar -xJ
    ./infer-linux64-v1.0.0/bin/infer run -- mvn package
    artifact push job --expire-in 1w infer-out
    ./infer-linux64-v1.0.0/bin/infer analyze --fail-on-issue

    Finding bugs with Find Security Bugs

    Find Security Bugs uses a security database to detect almost 140 different vulnerability types in Java web applications.

    To use this tool, add this snippet to the plugin section in your pom.xml:

    <plugin>
         <groupId>com.github.spotbugs</groupId>
         <artifactId>spotbugs-maven-plugin</artifactId>
         <version>4.0.4</version>
         <configuration>
             <effort>Max</effort>
             <threshold>Low</threshold>
             <failOnError>true</failOnError>
             <includeFilterFile>${session.executionRootDirectory}/spotbugs-security-include.xml</includeFilterFile>
             <excludeFilterFile>${session.executionRootDirectory}/spotbugs-security-exclude.xml</excludeFilterFile>
             <plugins>
                 <plugin>
                     <groupId>com.h3xstream.findsecbugs</groupId>
                     <artifactId>findsecbugs-plugin</artifactId>
                     <version>1.10.1</version>
                 </plugin>
             </plugins>
         </configuration>
    </plugin>

    Then, create two files: spotbugs-include.xml and spotbugs-exclude.xml. The contents of the first one are:

    <FindBugsFilter>
         <Match>
             <Bug category="SECURITY"/>
         </Match>
    </FindBugsFilter>

    You can exclude rules in the exclude file:

    <FindBugsFilter></FindBugsFilter>

    Commit the changes into your repository. Then, create a job that generates the report and pushes it to the artifact store:

    checkout
    cache restore 
    mvn spotbugs:spotbugs 
    artifact push job target/spotbugsXml.xml  --expire-in 1w 
    mvn spotbugs:check

    Building a JAR

    We can generate a JAR file with: mvn package. Since we already tested the application, we can skip tests.

    sem-version java 11
    checkout 
    cache restore 
    mvn package 
    cache store

    The output JAR is uploaded to the artifact store, where it can be downloaded or used in continuous delivery and deployment pipelines.

    👉 For a complete example of setting up a CI/CD pipeline for Java applications, see CI/CD for Spring Boot Microservices.

    Example 2: A build stage for JavaScript applications

    JavaScript code doesn’t need compiling. The build stage will only download dependencies, then lint the code, and finally, build a package.

    JavaScript build stage
    JavaScript build stage

    Filling up node_modules

    We can install JavaScript modules with npm:

    checkout 
    npm ci

    The npm ci command creates a clean copy of the node_modules folder. Unfortunately, as a result we can’t use the cache at this stage. npm ci is the cleanest and safest way of installing modules.

    But if you want or need to use the cache, you’ll need to run npm install instead of npm ci:

    sem-version node 14.16 
    checkout 
    cache restore 
    npm install 
    cache store

    If the project includes a .nvmrc config file, you can replace sem-version with nvm use:

    checkout 
    nvm use 
    cache restore 
    npm install 
    cache store

    The CI machine includes yarn as well:

    sem-version node 14.16 
    checkout 
    cache restore 
    yarn install 
    cache store

    Linting with ESLint and JSHint

    JavaScript has come a long way since the Netscape years. It grew from a language whipped up in less than ten days into the closest thing we have to a lingua franca — almost all devices have JavaScript support.

    As you can imagine, the language changed a lot over time, and not all its parts are good. Using a linter will help us stay away from the bad parts of JavaScript. In my experience, the ESlint and JSHint linters integrate very well into the CI environment. Any of these can be installed with npm install --save-dev.

    For ESLint, you’ll need to create a configuration with:

    $ npx eslintrc --init

    Next, add it to the script section of package.json:

    "scripts": {
         "lint": "eslint '**/*.js' --ignore-pattern node_modules",
         ...
     }

    After committing the changes, use the following to run the linter:

    sem-version node 14.16 
    checkout 
    npm run lint

    Security audits: npm and AuditJS

    Security audits are built-in npm. You can check your dependencies with npm audit. Note, however, that you must set a threshold level in order for the command to fail on error:

    sem-version node 14.16 
    checkout 
    cache restore 
    npm audit --audit-level=high

    If you find security issues, try running npm audit fix to attempt automatic resolution.

    An alternative to npm audit is AuditJS. This tool uses the OSS index of vulnerabilities, which feeds from many security databases, so results are more accurate.

    You can add AuditJS to your project with:

    $ npm install --save-dev auditjs

    Then use the following command to run the test:

    sem-version node 14.16
    checkout
    npx auditjs ossi

    One nice feature AuditJS has is that it allows us to whitelist security issues. This is great, for instance, when we want to ignore a vulnerability because we know it doesn’t affect us.

    For that, create a JSON file detailing the vulnerability ID and a reason for ignoring it:

    {
      "ignore": [
        {
            "id": "78a61524-80c5-4371-b6d1-6b32af349043",
            "reason": "Vulnerability doesn't affect our code"
        }
      ]
    }

    Then run AuditJS with the whitelist enabled:

    sem-version node 14.16
    checkout
    auditjs ossi --whitelist my-whitelist.json

    Creating Node packages

    Use npm pack to create a tarball you can distribute, or use npm publish to upload the module to npmjs.com.

    checkout
    tarfile=$(npm pack)
    artifact push project --force "$tarfile"

    👉 For a complete example of setting up a CI/CD pipeline for a JavaScript application, see How To Build and Deploy a Node.js Application To Kubernetes or Continuous Integration with Deno.

    Example 3: A build stage for Python applications

    We don’t need to run any compile commands in Python. As a result, the build job, like JavaScript’s, will only download and cache dependencies.

    python build stage
    Python build stage

    Installing modules

    In this example, we run pip to install the project dependencies in the .pip_cache folder:

    sem-version python 3.9
    checkout
    cache restore
    pip download --cache-dir .pip-cache -r requirements.txt
    cache store

    Linting with Pylint

    Python developers are used to following strict coding standards. One good way of checking for good style is Pylint. Pylint ruleset includes checks for unused modules, variable name formats, and line length. It strictly adheres to Python’s PEP 8 style guide.

    Like most linters, Pylint can be integrated into the IDE. But, in this case, we’ll use it in the pipeline to check commit quality. To show lint errors in a single package or file use:

    sem-version python 3.9
    checkout
    pylint -E [MODULE|PYTHON_FILE]

    You can scan every Python file in the repository with the following command:

    sem-version python 3.9
    checkout
    git ls-files | grep -E '.py$' | xargs pylint -E

    Complexity analysis with Xenon

    One thing PEP 8 style guide cannot capture is cyclomatic complexity, a measure of how many independent paths there are in the codebase. To calculate it, you can use a tool like Xenon.

    To get started with Xenon, add it to your Python project with:

    $ pip install xenon
    $ pip freeze > requirements.txt

    Xenon uses a ranking system that goes from A to F, where A is the lowest complexity level. In the job, you can supply the minimum level permitted before failing.

    sem-version python 3.9
    checkout
    cache restore
    xenon --max-absolute C --max-modules B --max-average A

    The example will make the job fail if the average complexity is lower than A, the complexity in the modules goes lower than B, or the complexity of a block is lower than C.

    Packing for distribution

    Python’s package format allows you to upload modules to the official PyPI index or distribute it on your website.

    To create a package, use the following commands:

    sem-version python 3.9
    python -m pip install --upgrade build
    checkout
    python -m build
    artifact push project --force dist

    👉 For a complete Python pipeline see Python Continuous Integration and Deployment From Scratch.

    Example 4: Creating Docker images in the build stage

    Docker is a container platform that standardizes build automation by letting us package any software regardless of the language. By building a Docker image, we get a portable application we can run on any machine, as long as it has a container runtime installed. It also allows us to run the application at scale on orchestration platforms like Docker Swarm or Kubernetes.

    Semaphore includes everything you need to build a Docker image. You only need to supply a Dockerfile with the build instructions.

    The easiest way of writing a Dockerfile is by starting from a base image. Semaphore provides images for popular languages in the Semaphore Container Registry. For example, this Dockerfile is for packaging a Python application:

    # use image in Semaphore container registry
    FROM registry.semaphoreci.com/python:3.9
    
    # copy source files
    ADD . ./opt/
    WORKDIR /opt/
    
    # install dependencies
    RUN pip install --upgrade pip
    RUN pip install -r requirements.txt
    
    # start application
    CMD ["python","app.py"]

    Use these commands in the CI job to build the image:

    checkout
    docker build -t my-image .

    It works, but the image is lost as soon as the job completes. We must upload it into a registry in order to keep it. And for that, you have to supply login information for an image registry with a secret.

    To create a secret, click on your account badge on the top right corner and then Settings.

    Semaphore Settings menu
    Where to find Settings menu.

    Then click on Secrets > New Secret.

    Create a secret called dockerhub with your Docker Hub username and password:

    create a secret on Semaphore
    Create a secret to store Docker Hub credentials

    Now you can access the username and password in the job:

    echo "${DOCKER_PASSWORD}" | docker login -u "${DOCKER_USERNAME}" --password-stdin

    Next, you can pull the last image to speed up the build process:

    docker pull $DOCKER_USERNAME/awesome-application:latest || true

    Finally, build and push the image:

    docker build -t $DOCKER_USERNAME/awesome-application:latest --cache-from=$DOCKER_USERNAME/awesome-application:latest .
    docker push $DOCKER_USERNAME/awesome-application:latest

    The complete job commands are:

    echo "${DOCKER_PASSWORD}" | docker login -u "${DOCKER_USERNAME}" --password-stdin
    docker pull $DOCKER_USERNAME/awesome-application:latest || true
    docker build -t $DOCKER_USERNAME/awesome-application:latest --cache-from=$DOCKER_USERNAME/awesome-application:latest .
    docker push $DOCKER_USERNAME/awesome-application:latest

    Don’t forget to activate the secret in your job:

    activate the secret
    Activate the secret in the block

    What we have seen are the basic building blocks for a Docker image. The process is mostly the same for any language. The only thing that changes are the contents of the Dockerfile. We have a few in-depth tutorials and example Dockerfiles for you to peruse below:

    What’s Next

    We learned what build automation is and saw a few examples. The principles always remain the same, regardless of the language or framework used.

    The principle used in all examples is that the build stage runs in a pipeline. By running fast code checks right after compilation— and before the automated test suite—you optimize your continuous integration process for fast feedback.

    I hope these recommendations help you optimize your build process. Happy building!

    2 thoughts on “Design an Effective Build Stage for Continuous Integration

    1. Thanks, Thomas! Your explanations were thorough and easily understandable. What stood out to me was the way you seamlessly integrated code examples and practical demonstrations throughout the article. This is one of the best web development tutorials.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Avatar
    Writen by:
    I picked up most of my skills during the years I worked at IBM. Was a DBA, developer, and cloud engineer for a time. After that, I went into freelancing, where I found the passion for writing. Now, I'm a full-time writer at Semaphore.