14 Feb 2023 · Software Engineering

    A First Look at Neon: A Postgres Database That Branches

    12 min read
    Contents

    Relational databases have a long, long history. The first appeared in the 1970s, and while the technology has certainly evolved, the relational model has proved the most popular over the last 5 decades.

    Is there room for innovation after 50 years of history? The folks at Neon are showing us that one can, in fact, teach an old dog new tricks.

    What is Neon?

    Neon is an open-source (Apache 2.0) alternative to AWS Aurora or Google’s Cloud SQL for Postgres. Neon is a serverless, scalable implementation of PostgreSQL that you can run on-premise or try through its managed service.

    Neon decomposes the PostgreSQL architecture into two layers: compute and storage. The compute layer consists of stateless PostgreSQL running on Kubernetes, allowing pods to be scaled on demand — even to zero.

    Persistence is achieved with the Neon storage engine, a custom-made layer that handles both transactions and data. The transaction log is processed through a set of redundant safekeeper services, while data pages are managed on disk by the pageserver.

    An achitecture diagram. Many compute nodes are running as pods in a Kubernetes cluster. The nodes are part of the compute plane. The plane connects with the storage plane in two ways: first, compute nodes connect to a pageserver. The page server then stores the data in the storage backend. Second, compute nodes communicate with safekeeper nodes, which deal with the transaction log (WAL) stream.
    Neon architecture splits the database into scalable compute and storage planes. Compute nodes can be started and stopped depending on demand.

    Currently, the managed service is running a free tech preview with some limitations that we’ll discuss later.

    Neon’s killer feature: branches

    Neon was launched in June 2021. Being a new project, the managed service may have fewer features than the competition. But Neon has one feature that, to my knowledge, no one else has: branches.

    Every developer is familiar with branches. In Neon, branches work pretty much the same as in Git, except they cannot be merged (although there are plans to add schema-based merging in the future). You can, at any point, branch off the main trunk, effectively creating an “alternate timeline”.

    Since branches in Neon are writable, this feature allows us to do things no other database engine can do. For instance:

    • Freely experiment without impacting the main branch.
    • Instantly back up the database. So, if data is lost by mistake, we can switch to the last good branch.
    • Simplify integration testing. Developers can run tests in disposable test-specific branches.
    • Safely try out automated database migrations on production.
    • Run analytics or machine learning workloads in isolation.

    Instantly duplicate all the databases that serve to a specific cluster of microservices.

    You can’t do any of these things on traditional database engines. Not easily at least. Some database engines like SQL Server have snapshots, which indeed can create instant copies of a database. But snapshots are read-only and this limits their utility. On most database engines, we have to resort to clunkier mechanisms like backup and restore or replication.

    The diagram shows 4 branches. Three branches split from the main one. They are called 'test A', 'test B', and 'test C'. The branches have the contents of the database at the branching point. From that moment on, they follow their own timelines and can differ from each other and the main branch.
    A few use cases for Neon branches.

    Branches are per-project. And a project can have multiple databases. That means that creating a branch duplicates all the databases in that project. We can take advantage of this project > database hierarchy to clone a group of related databases in one operation.

    Getting started with Neon

    Let’s try out Neon’s managed service. To create a tech preview free account, just follow these steps:

    1. Sign up at neon.tech/sign_in.
    2. Click on Create a project.
    3. Click on Download env.txt. This file contains everything you need to connect to the database instance.
    4. Click on Settings and copy the project id.

    We also need to generate an API Key, as shown below:

    1. Click on your avatar and select Account > Developer Settings > Create new API key
    2. Edit env.txt and add the following lines:
      • export NEON_API_KEY=Your-API-Key
      • export PROJECT_ID=Your-Project-ID
    3. Add the keyword export before every variable.

    The final env.txt file should look like this example:

    # Connection details
    export PGHOST=ep-random-name.us-east-2.aws.neon.tech
    export PGDATABASE=neondb
    export PGUSER=Tommy
    export PGPASSWORD=sekret1
    
    # Connection string
    export DATABASE_URL=postgres://Tommy:sekret1@ep-random-name.us-east-2.aws.neon.tech/neondb
    
    # Neon config
    export NEON_API_KEY=MyApiKey
    export NEON_PROJECT_ID=random-name-140532

    We’ll need this file to connect to the Neon database and API.

    The Neon UI

    The Neon dashboard includes an SQL editor to run commands and controls for creating branches or endpoints.

    Screenshot of the Neon dashboard managed service at console.neon.tech.
    The managed service dashboard.

    On the Branches page, we’ll find options for creating a new branch. Here, you can select the what and when. You must choose the parent branch and how much data to include:

    • Head: the new branch is a copy of the current database.
    • Time: the branch has the parent’s data up to a specified date and time.
    • LSN: the branch has the parent’s data up to a specified log sequence number.

    Screenshot of the create branch UI. It shows a parent branch selector, three options to create the branch: head, time, and LSN, and the option to create an endpoint for the new branch.
    The create branch UI.

    Endpoints for the branches can be created on the same screen or on the Endpoints page.

    Connecting to Neon

    Neon is a PostgreSQL database, so we’ll need to install the client tools. Check which version yours is running with:

    $ psql --version
    psql (PostgreSQL) 15.1 (Ubuntu 15.1-1.pgdg20.04+1)

    Neon works best with versions 14 and 15 of the client tools. So, if needed, head to postgresql.org/download to get the latest release.

    Now, let’s source the env.txt and try connecting:

    $ source env.txt
    $ psql
    
    psql (15.1 (Ubuntu 15.1-1.pgdg20.04+1), server 14.6)
    
    neondb=> SELECT version();
                                                  version
    ---------------------------------------------------------------------------------------------------
     PostgreSQL 14.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
    (1 row)

    You can also do browser-based authentication by running the command shown below. This will open a browser window and let you select the project you want to connect to.

    $ psql -h pg.neon.tech

    We may also test the Neon API key with curl:

    $ curl -s 'https://console.neon.tech/api/v2/projects' \
      -H 'accept: application/json' \
      -H "Authorization: Bearer $NEON_API_KEY"
    
    {
      "projects": [
        {
          "id": "calm-guy-140532",
          "platform_id": "aws",
          "region_id": "aws-us-east-2",
          "name": "test-project",
          "provisioner": "k8s-pod",
          "pg_version": 14,
          "locked": false,
          "created_at": "2022-12-22T18:16:13Z",
          "updated_at": "2022-12-22T18:16:13Z",
          "proxy_host": "us-east-2.aws.neon.tech"
        }
      ]
    }

    Automated branching with CI/CD

    One of the most delicate parts of deployment is the database migration step because there is always the possibility of data loss. There are several techniques to make this process safer. But with Neon, we can leverage branches to make the process infallible.

    Let’s say we have the following continuous deployment pipeline. Here, we run the database migration job on production before deployment. The problem with this setup is that we could be stuck with a corrupt database if the job fails.

    A Semaphore pipeline screenshot. There are two pipelines: CI and CD. The first one has build and test jobs. The second, a DB migration job followed by a deploy job.
    Example initial pipeline. We have not added branch management yet.

    How can we make the process safer? Well, we can run the database migration script on a disposable branch. This will give us two critical pieces of information: that the migration works and how long it takes.

    Once we’re done setting up the migration test job, the pipeline should look like the following picture. You should be able to adapt the following instructions to any CI/CD project.

    A Semaphore pipeline diagram. Same as the previous one. The only difference is that a migration test job was added before the actual database migration in the CD pipeline.
    The pipeline once we add a database migration test job on a Neon branch.

    Creating secrets

    Before creating the job, we should upload env.txt to Semaphore as secret, so the Semaphore CI machine can connect with the database.

    You can find your secrets by clicking on your organization icon on the top right of the Semaphore dashboard and selecting Settings. Click on Secret > New Secret.

    Upload env.txt and name the secret “neon”:

    A Semaphore secret with env.txt uploaded as a file to the path /home/semaphore/env.txt.
    The environment file has all the parameters needed to interact with the database.

    Writing helper scripts

    Next, we’ll need to create a few helper scripts to manage branches through the API. Neon docs describe how it works.

    The first script shows all existing branches in our project. Before trying it on your machine, ensure you have sourced env.txt. You’ll also need to have Bash and jq installed.

    #!/usr/bin/env bash
    # List the IDs of all existing branches IDs
    
    set -e
    
    curl -s "https://console.neon.tech/api/v2/projects/$NEON_PROJECT_ID/branches" \
      -H 'accept: application/json' \
      -H "Authorization: Bearer $NEON_API_KEY" | 
      jq -r ".branches[] | .id"

    Execute the script to check that everything works (the main branch is never listed):

    $ ./get-branches.sh
    br-blue-feather-415684

    The second script creates branches. We’ll call it create-branch.sh:

    #!/usr/bin/env bash
    # Create a new Neon branch. Writes updated environment file.
    #
    # Usage: create-branch.sh PARENT_BRANCH_NAME OUTPUT_FILE
    # Example: create-branch.sh main env-branch
    
    set -e
    set -u
    
    parent_name=$1
    outfile=$2
    
    parent_id=$(curl -s "https://console.neon.tech/api/v2/projects/$NEON_PROJECT_ID/branches" \
      -H 'accept: application/json' \
      -H "Authorization: Bearer $NEON_API_KEY" | 
      jq -r ".branches[] | select(.name==\"$parent_name\") | .id")
    
    if [ -z "$parent_id" ]; then
        echo "Unable to find id of parent branch '$parent_name'" >&2
        exit 1
    fi
    
    branch_data=$(curl -s -X POST "https://console.neon.tech/api/v2/projects/$NEON_PROJECT_ID/branches" \
      -H "Accept: application/json" \
      -H "Authorization: Bearer $NEON_API_KEY" \
      -H "Content-Type: application/json" \
      -d "{
      \"endpoints\": [
        {
          \"type\": \"read_write\"
        }
      ],
      \"branch\": {
        \"parent_id\": \"$parent_id\"
      }
    }")
    
    branch_endpoint=$(echo "$branch_data" | jq ' .endpoints[] | .host' | sed 's/"//g')
    branch_id=$(echo "$branch_data" | jq ' .branch | .id' | sed 's/"//g')
    
    if [ -z "$branch_endpoint" ] || [ -z "$branch_id" ]; then
        echo "Unable to create new branch" >&2
        exit 1
    fi
    
    url=$(echo "$DATABASE_URL" | sed "s/$PGHOST/$branch_endpoint/")
    
    echo "Created branch: $branch_id"
    
    echo "# Generated with create-branch.sh" > $outfile
    echo "export BRANCH_ID=$branch_id" >> $outfile
    echo "export PGHOST=$branch_endpoint" >> $outfile
    echo "export DATABASE_URL=$url" >> $outfile
    
    echo "Wrote environment file: $outfile"

    We can now create a branch with create-branch.sh <PARENT_BRANCH_NAME> <ENV_FILE>. The first argument is the name of the parent branch (typically, main). The second argument is an environment file to write with the connection parameters for the next branch.

    For instance, we can create a new branch and connect to it with the following:

    $ ./create-branch.sh main env-branch
    $ source env-branch
    $ psql

    Finally, we need to be able to delete the branch we just created. For that, we need to provide the branch_id, which is randomly defined on branch creation (and stored in the environment file generated with create-branch.sh).

    #!/usr/bin/env bash
    # Delete Neon branch
    #
    # Usage: delete-branch.sh BRANCH_ID
    # Example: delete-branch.sh br-dawn-shape-137746
    
    set -e 
    set -u
    
    branch_id=$1
    
    id=$(curl -s -X 'DELETE' \
      "https://console.neon.tech/api/v2/projects/$NEON_PROJECT_ID/branches/$branch_id" \
      -H "accept: application/json" \
      -H "Authorization: Bearer $NEON_API_KEY"  |
      jq ' .branch | .id ' | sed 's/"//g')
    
    echo "Deleted branch: $id"

    When finished, commit all the new scripts into your repository. We’ll keep them in a folder called db-scripts at the project’s root.

    Add job to the pipeline

    Next, we’ll add a test job in the continuous deployment pipeline. We’ll open our workflow in the Editor and click on Add Block in the continuous deployment pipeline.

    screenshot

    We’ll use the prologue to update the Postgres client library:

    sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
    wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
    sudo apt-get update
    sudo apt-get -y install postgresql-client-15
    sem-version ruby 3.0
    checkout
    cache restore
    bundler install
    cache store

    Now we can run the scripts that we created earlier:

    1. Create a branch and switch to it.
    2. Run the migration script. In my case, I’m upgrading a Rails application, so I run bin/rails db:migrate. Change this command to suit your project. For example, if you’re working with Django use python manage makemigrations. The same thing on Laravel is achieved with php artisan migrate.
    3. Optionally run any tests.
    4. Delete the branch.

    The job should look like this:

    source $HOME/env.txt
    ./db-scripts/create-branch.sh main env-branch
    ./db-scripts/get-branches.sh
    source env-branch
    bin/rails db:migrate
    ./db-scripts/delete-branch.sh $BRANCH_ID

    To finish the setup, enable the neon secret so the job can authenticate with the database.

    A screenshot of the Semaphore pipeline. The new job runs first in the continuous deployment pipeline. The neon secret is enabled, and the job command and prologue are filled with the discussed commands.
    Database migration test job added to the pipeline.

    Press the Run the workflow button to save your changes and test the workflow.

    A Semaphore pipeline with a migration test job added before the actual database migration in the CD pipeline
    The final pipeline.

    Technical preview limitations

    At the time of writing Neon is in a technical preview stage. While the managed service is entirely free, it comes with some limitations:

    • You can only have one project per user. But a project can have multiple databases.
    • A project can have up to nine branches in addition to the main branch.
    • You can have up to three endpoints. One is always reserved for the main database. That leaves only two endpoints accessible for two other branches.
    • The size limit is 3GB per branch on the free tier.
    • Point-in-time branches can only go up to seven days into the past.
    • There are no backup or restore options on the UI. The only alternative seems to run pg_dump neondb and take a remote backup.
    • There are a few observations around importing data from another PostgreSQL instance or a backup.
    • There is a 100 concurrent connections limit. You can enable connection pooling to raise the limit to 1,000 connections.

    The good news is that if you like the database, you can always run it on-premise or in your cloud of choice to remove these limitations.

    Conclusion

    Neon’s branching feature presents new options for development and database management. Even in its current technical preview stage, I can see great potential for this engine. Of course, we’ll have to see how the project evolves, especially how the final pricing model will turn out.

    I hope you found this project interesting, and if so, you might want to consider contributing to Neon.

    Thanks for reading!

    One thought on “A First Look at Neon: A Postgres Database That Branches

    1. Thanks Tomas

      Insightful. I am in a process of deciding between Neon and Supabase – so many good DB options out there now!

      Have you done any work/writing on supabase and how you compared the two? They seem to offer similar things but it feels like supabase has more of a passionate community which can be a big deciding factor

      Your thoughts appreciated

      Thanks

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Avatar
    Writen by:
    I picked up most of my skills during the years I worked at IBM. Was a DBA, developer, and cloud engineer for a time. After that, I went into freelancing, where I found the passion for writing. Now, I'm a full-time writer at Semaphore.