Feature Flags: The Hidden Switch Behind Continuous Deployment

Think of a world where you don’t need a separate testing environment, where you can test everything in production and capture valuable data that helps you improve along the way. The secret ingredient: feature flags.

If you can’t decide if testing in production is a foolish or genius idea, this tutorial will definitely help.

What are feature flags?

Features flags is a software engineering technique that lets developers integrate code constantly into the main trunk. It involves shipping incomplete features into production, which remain dormant until ready. Feature flags also take part in software delivery; when a feature is complete, the code can be activated with the flick of a switch.

Feature flags control which code paths are active at any given time. Also known as feature toggles, switchers, or flippers, these flags can be switched on and off — either at build time or at runtime — allowing teams to change the behavior of an application without having to update the code.

Benefits of feature flags

Despite adding a layer of complexity in the codebase, feature flags are powerful when it comes to software delivery:

Short development cycle: without feature flags, you have to hold off deployment of a feature until it’s thoroughly tested — a process that can take weeks. With them, we can deploy several times per day, try partially-developed features, and get instant feedback.
Simplified version control: we can do away with long-lived topic branches. Feature flags encourage using trunk-based development. We can merge every day, integrate continuously, minimize merge conflicts, and iterate much more quickly.
Test in production: new features can initially be enabled only for developers and beta users. No separate testing environment is needed.
Decouple business and technical decisions: sometimes a feature is ready, but we’re quite not ready to publish it. Feature flags allow us to switch it on when it makes the most sense.
Fine-grained releases: feature flags permit a high level of control to conduct canary or blue-green releases.

Use cases for feature flags

First and foremost, feature flags are used to release new features. In addition to canary launches, you can do cool things like activating features for a seasonal event (think Black Friday) and installing per-user or per-region toggles. No other technique offers such a degree of control.

The roadmap for using feature flags is:

Code: deploy the new feature, which is initially disabled for everyone.
Test: when the feature is complete-ish, toggle it on for internal testers and developers.
Canary/Beta: after sufficient testing and enough iterations, toggle it on for beta users or a percentage of the general population.
Iterate: collect metrics and usage analytics. Gather feedback. Continue iterating.
Release: finally, toggle the feature on for everyone.

Deployment and release must be separate. You should decide when to make a feature available based on business parameters, not on technical merits. Feature flags are one way to achieve this.

Running experiments with feature flags

Since feature flags allow us to change behavior with such a fine degree of control, it’s the go-to method for conducting experiments. We can use feature flags to compare alternative versions of a feature.

Say you want to add a Call to Action and you have two alternatives. One is a short form and a button. The other, a single big round button. Want to find out which one gets more clicks? Feature flags make it simple to set up A/B experiments.

So you run your test: the first half sees option A and the other half sees option B. After collecting usage data for a period, you’ll see which one is better. One last switch will toggle the winning option for all users.

Feature toggles as operational switches

Not all toggles are temporary. Some of them can be permanent. Having a limited or “lite” version can be a life-saver in high-demand periods.

What’s more, a flag can act as a kill switch to disable code that is causing a crash. Thus, feature flags give the ops team a quick way of reacting to problems.

Flags as an alternative to branching

Let’s say that you’ve got a great idea for a new feature. What’s your first instinct as a developer? To quickly pull the latest revision, create a new branch, and get to work. After about 30 minutes, you have a working prototype. All tests are passing and things are looking good.

Yet, you hesitate to integrate the branch into the main trunk because the feature is incomplete. There’s still work to do. So you keep the branch isolated for a few days. What could go wrong, right?

By not merging the change right away, you’ve missed a vital moment. When the feature is finally ready, the hidden cost of branching emerges — teammates have merged their code in the meantime, the trunk has moved on, and you now have the work of fixing the conflicts ahead of you. You will need to redo some of your work in the best-case scenario. At the very worst, you’ll have to discard some of your changes.

The moral of the story is that you can’t wait until a feature is complete to commit it to the main branch, because the longer a branch exists, the higher the chance of a conflict down the road. It’s like playing a game; the further you advance without hitting a checkpoint, the more time you’ll waste when you invevitably die and have to reload.

“If you merge every day, suddenly you never get to the point where you have huge merge conflicts that are hard to resolve.” — Linus Torvalds

Hopefully by now, it’s clear why long-lived branches are bad. We must shrink them to the absolute bare minimum — if code is constantly being merged into the main trunk, there are little to no integration conflicts. Ideally, we should be merging about three or four times per hour. At the very least, once per day. That way, you know you’re safe.

For this to work, we must be comfortable with committing partial features into the main trunk. Here is where feature flags shine. Feature flags let us share new code to the team while preventing users from viewing/using incomplete features.

Implementing feature flags

To visualize how feature flags work, let’s imagine we are building an e-commerce site. We have a recommendation engine that picks suggestions based on the products users are browsing.

We think we can make a better engine by using machine learning. The hypothesis is that getting better suggestions will result in more sales. But getting the model right takes time, so we’re not prepared to release it overnight. We want to have room for experimentation.

So, we deploy both versions of the engine:

// returns the active recommendation engine
function engineFactory(){
    let useML = false;
    //let useML = true;  // UNCOMMENT TO ENABLE NEW ENGINE

    if(!useML){ // SINGLE TOGGLE POINT
        return classicRecomendationEngine();
    }
    else{
        return MLRecomendationEngine();
    }
}

let recommended_products = engineFactory()(viewed_product)

This pattern lets us have the toggle point in a single place. Uncoupling the decision point from the rest of the code is essential; otherwise, we’ll end up with if-then-else popping up all over the place.

Comments may work for quick experiments, but it’s cumbersome. Can we do better? We can have a router that knows the state of each feature flag and returns the correct engine. As long as the interface to the engine is standardized, they are interchangeable. This pattern is called branch by abstraction.

import toggleRouter from "toggleRouter"

// toggle router for engine
function engineFactory(){
    if(features.isFeatureEnabled("use-machine-learning-engine")) { // TOGGLE ROUTER
        return MLRecomendationEngine();
    }
    else{
        return classicRecomendationEngine();
    }
}

// instantiate the correct engine and use it
let recommended_products = engineFactory()(request.viewed_product);

Features flags are not limited to true or false. We can have multivalue flags. For example, we can write a low-quality naive engine that uses fewer resources to handle traffic spikes.

The more code paths we have, the more tests we’ll need to write. A unit test for the recommendation engine should cover all the supported alternatives.

import toggleRouter from "toggleRouter"

describe("Test recommendation engines", function() {
    it("works with classic engine", engine => {
        const recommended_products = engineFactory('classic')(current_product);
        // check results
    });

    it("works with ML engine", engine => {
        const recommended_products = engineFactory('machine-learning')(current_product);
        // check results
    });

    it("works with ML engine", engine => {
        const recommended_products = engineFactory('naive')(current_product);
        // check results
    });
});

Turning on feature flags

Now that we have our first feature flags coded, it’s time to decide how to turn them on. The question boils down to two alternatives: at startup or during runtime.

Configuring flags at startup is the simplest and preferred solution unless you need dynamic toggling. You can store the status of all flags in a config file in the same repository as the project, and employ environment variables or command-line switches.

Runtime toggles

Are static flags not flexible enough for you? If you need to level up the complexity, you can use a feature flag database. This lets you change settings on the fly, audit changes, and log feature utilization. You will, however, have to carefully manage flag status changes.

A somewhat less complex alternative is to control flags by request, that is, to activate a feature when a given user logs in, on a special day of the year, or when a request includes a special cookie or header. You can build the flag-selecting logic into the application.

Feature flags and CI/CD

Let’s look at feature flags from the perspective of CI/CD. When you practice continuous integration, you have to decide how you will test feature flags. There are two ways to go about this:

Build stage flags
Runtime flags

In build stage flags, the state of all flags is known at build time. Since you define every flag at the beginning of the CI pipeline, you’re testing the same build that will be deployed.

The flip side is that we have to rebuild and redeploy the application for a feature flag change to take effect.

📙 Stacks such as Docker and Kubernetes let you rollout application updates without interruption. You can learn more about that in our free ebook: CI/CD for Docker and Kubernetes.

Runtime flags and CI/CD

You must set flags either while deploying or at runtime if you don’t do it in the build stage. In both cases, you can’t know which features are enabled during the build, so your pipeline may not be testing the same artifact that ships to production.

And because you can’t test all possible toggle combinations, it’s imperative to have sane defaults (the application should run well without any flags defined) and be smart about testing. You must plan carefully anticipate which toggle permutations could clash and write adequate tests.

Recommendations for using feature flags

Use them with measure. Feature flags can get out of control.
Don’t ever, ever, ever repurpose a flag. Always create a new one for every change or risk making a half a billion-dollar mistake.
Minimize flag debt. Delete old and unused flags. Don’t let dead code linger.
Unless you have a good reason not to, keep the feature flag lifespan short (weeks).
Choose descriptive names for your flags. New-Feature-2 is NOT a good name.
Adopt trunk-based development and continuous delivery.
Consider having a way to view the state of all your flags. You can write an admin dashboard or an API for this.
Track and audit flag usage.

Final thoughts

A feature flag adds to the system’s complexity but brings more flexibility. You can ship code more frequently, test on production, and wow your users by revealing a feature at the right moment. Mastering feature flags has basically become a requirement for trunk-based development and continuous delivery.

Feature Flags: A Gentle Introduction