21 Jun 2023 · Software Engineering

    Managing Stateful Applications on Kubernetes: Challenges and Best Practices

    12 min read
    Contents

    Stateful applications are computer programs that store information, or state, about their previous interactions with users or other components of a system. The state can include things like user preferences, session data, and application settings. Stateful applications are commonly used for tasks such as e-commerce, banking, and gaming, where it is necessary to maintain a consistent state across multiple interactions with users. But how do you manage such applications in Kubernetes? What are the common challenges and how do you solve them? If you have these questions running through your mind, then this article is for you.

    What are stateful applications in Kubernetes?

    In Kubernetes, stateful applications are those that require persistent storage to maintain their state and data across different instances or nodes. Examples of stateful applications include databases (MongoDB, MySQL, PostgreSQL, etc) messaging systems, big data platforms, and other applications that require a consistent and reliable data storage layer. These systems are often complex and require careful design and implementation to ensure that they are reliable, scalable, and secure.

    Stateful applications in Kubernetes must always posses the following characteristics:

    • Persistence: stateful applications store state information persistently so that it can be accessed across multiple user sessions or interactions with other components of the system.
    • Consistency: stateful applications ensure that the state is consistent across different interactions. This means that the state should not change unexpectedly and that changes made to the state in one interaction should be reflected in subsequent interactions.
    • Scalability: stateful applications can be more challenging to scale than stateless applications because the state information needs to be replicated or shared across multiple instances of the application.
    • Reliability: stateful applications need to be designed to handle failures and errors gracefully because an unexpected failure could result in data loss or corruption.
    • Security: stateful applications may need to handle sensitive user data, so they need to be designed with appropriate security measures to protect this data from unauthorized access or modification.

    Challenges of running stateful applications on Kubernetes

    Running stateful applications on Kubernetes can present several challenges, particularly in the areas of data storage, networking, security, and monitoring. One of the primary challenges of running stateful applications on Kubernetes is managing persistent data storage. Traditional stateless applications can simply be replicated across multiple nodes, but stateful applications also require persistent data storage, which can be difficult to manage in a containerized environment. Kubernetes provides several options for data storage, including local storage, network-attached storage, and cloud storage, but choosing the right storage solution can be challenging.

    Networking can also be a challenge for stateful applications on Kubernetes. Because stateful applications typically require communication between nodes, it’s important to ensure that the networking infrastructure is designed to support this. Kubernetes provides several networking options, including container networking, pod networking, and service networking, but configuring these options correctly can be complex.

    Security is another key challenge for stateful applications on Kubernetes. Because stateful applications often store sensitive data, it’s important to ensure that the container environment is secure. Kubernetes provides several security features, including RBAC, pod security policies, and network policies, but properly configuring these features can be difficult.

    Finally, monitoring stateful applications on Kubernetes can be a challenge. Because stateful applications require persistent data storage, it’s important to monitor the health and performance of the data storage system. Kubernetes provides several monitoring tools, including Prometheus, Grafana, and the Kubernetes Dashboard, but configuring these tools to monitor stateful applications can be complex.

    Best practices for managing stateful applications on Kubernetes

    Designing and managing stateful applications on Kubernetes requires careful consideration of data storage, networking, security, and monitoring. If you are considering deploying a stateful application in Kubernetes, here are some of the best practices:

    Data storage

    First, the use of storage abstraction layers is very important; these storage abstraction layers include Kubernetes StatefulSets or operators. Second, be sure to use the appropriate storage solution based on your specific use case, whether it’s local storage, network-attached storage, or cloud storage. In addition, make use of Kubernetes persistent volumes and claims to ensure data is stored persistently and can be accessed by multiple pods if necessary. And last, be sure to implement backup and disaster recovery solutions to protect data in case of data loss.

    Networking

    When it comes to networking, be sure to use Kubernetes services to manage networking between stateful applications (this can also be microservices that require data persistence and consistency), use headless services to provide direct access to individual pods in a StatefulSet, implement network policies to control traffic between pods and restrict access to sensitive data, and use a service mesh, such as Istio or Linkerd, to provide advanced networking features such as traffic shaping, fault tolerance, and encryption.

    Security

    For ensured security, it is highly recommended to use Kubernetes RBAC, pod security policies to restrict access to sensitive data and resources, and external secret operators (ESO) to ensure total security. Also, implement network policies to restrict access to sensitive data, use secrets and ConfigMaps to manage sensitive configuration data, such as passwords and API keys, and be sure to use container scanning tools, such as Anchore or Clair, to ensure that container images are free of vulnerabilities.

    Monitoring

    To be sure your stateful application is healthy and to prevent downtime, use Kubernetes monitoring tools, such as Prometheus and Grafana or Kubernetes probes, to monitor the health and performance of stateful applications. Also, use logging and tracing tools, such as Fluentd and Jaeger, to track and diagnose issues in stateful applications and implement auto-scaling based on application performance metrics, such as CPU and memory usage, to ensure that stateful applications are always available and responsive. Last, implement service-level objectives (SLOs) and service-level agreements (SLAs) to ensure that stateful applications meet performance and availability requirements.

    By following these best practices, you can ensure that your stateful applications are reliable, scalable, and secure in a Kubernetes environment.

    Managing stateful applications on Kubernetes in real-time

    Enough of all the theory and let’s get to work. We will deploy a stateful application, and apply the aforementioned best practices in a simplified form.

    First, you might want to consider providing data persistence and consistency. We will make use of Kubernetes persistence and claims. This tutorial will make use of a cloud-based Kubernetes cluster with two nodes.

    Execute the following command to verify that our nodes are ready:

    kubectl get nodes

    Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) are Kubernetes objects used to provide persistent storage for containerized applications running in a cluster. PVs represent a piece of networked storage in a cluster and are provisioned by an administrator. A PVC is a request for storage by a user, which can request a specific amount of storage, access mode, and other properties. When a PVC is created, Kubernetes will automatically provision a matching PV and bind the two together. Applications can then use the PV to store and retrieve data, even as the pods that use the PV come and go. This allows data to persist even when pods are deleted or recreated.

    In Kubernetes, persistent volumes are not namespaced resources, meaning that they exist globally in the Kubernetes cluster and can be used from any namespace. On the other hand, Persistent Volume Claims (PVCs) are namespaced resources, meaning that they belong to a specific namespace and can only be used by Pods in the same namespace. 

    To create a persistent volume and a persistent volume claim, create a file (e.g. pv-pvc.yaml) and paste in the following configuration settings:

    apiVersion: v1
    kind: PersistentVolume
    metadata:
       name: postgres-volume # Name of the persistent volume
       labels:
         type: local
    spec:
       storageClassName: hostpath # Name of the storage class
       capacity:
         storage: 5Gi # Amount of storage this volume should hold
       accessModes:
         - ReadWriteOnce # To be read and written only once
       hostPath: # Storage class type
         path: '/mnt/data' # File path to mount volume                   
    
    ---
    
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: postgres-volume-claim # Name of the persistent volume claim
    spec:
      storageClassName: hostpath # Name of the storage class
      accessModes:
        - ReadWriteOnce # volume can be mounted as read-write by a single node in the cluster.
      resources:
        requests:
          storage: 500Mi # Indicates this claim requests only 50Mi of storage from a PV

    The code above defines two Kubernetes objects: a PersistentVolume and a PersistentVolumeClaim.

    The PersistentVolume defines a storage volume that can be used by a Kubernetes pod. It is named postgres-volume in this case and has a storageClassName of hostpath, indicating that it is backed by a local file system. The capacity field sets the size of the volume to 5Gi, while accessModes specify that the volume can be mounted as read-write by a single node in the cluster, which also means that once it is mounted, other pods cannot mount this volume until it is released. This is useful when you want to ensure that only one pod has access to the data stored on the volume at a time. Finally, the hostPath field indicates the file path on the host machine where the volume will be mounted.

    The PersistentVolumeClaim is a request for a storage volume by a Kubernetes pod. It is named postgres-volume-claim in this case and requests a volume with the hostpath storage class and a size of 500Mi. The accessModes field is set to ReadWriteOnce, indicating that the volume can be mounted as read-write by a single node in the cluster, and finally the resources field sets the storage request to 500Mi.

    Now to configure the persistent volume and claim, execute the following commands below:

    kubectl apply -f pv-pvc.yaml #applies the configuration settings to the cluster
    kubectl get pv #gets persistent volume
    kubectl get pvc #gets persistent volume claim

    From the output above, you can see that the persistent volume was created with an Available status, and persistent volume claim was created with the Bound status, which means it is bound to the persistent volume.

    In simple terms when a persistent volume is created, it is allocated but not yet assigned to any specific pod and its status is set to Available. When a persistent volume claim is created and it is bound to a persistent volume, the storage resources requested by the claim are assigned to a specific persistent volume and the status of the persistent volume claim is set to Bound.

    Next, create a secret object to store the PostgreSQL credentials: a username john and password 12345678:

    kubectl create secret generic postgres --from-literal=username=john --from-literal=password=12345678

    The credentials used above are just meant for demo purposes. In a production environment it’s very important to use a more secure username and password. 

    Create a StatefulSet object to deploy the PostgreSQL database using the secret you just created, then create a file (e.g. statefulset.yaml) and paste in the following configuration settings:

    ---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: postgres
    spec:
      serviceName: postgres
      selector:
        matchLabels:
          app: postgres
      replicas: 1
      template:
        metadata:
          labels:
            app: postgres
        spec:
          containers:
            - name: postgres
              image: postgres
              imagePullPolicy: IfNotPresent
              env:
                - name: POSTGRES_USER
                  valueFrom:
                    secretKeyRef:
                      name: postgres
                      key: username
                - name: POSTGRES_PASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: postgres
                      key: password
              ports:
                - containerPort: 5432
              volumeMounts:
                - name: data
                  mountPath: /var/lib/postgres/data
          volumes:
            - name: data
              persistentVolumeClaim:
                claimName: postgres-volume-claim

    The previous code will create a StatefulSet object in Kubernetes that runs a PostgreSQL container with a postgres image and environment variables, referencing the postgres secret created earlier to get the username and password. It also sets up a persistent volume for the container’s data using a persistent volume claim: postgres-volume-claim.

    Create the Statefulset with the following command:

    kubectl apply -f statefulset.yaml
    kubectl get statefulset
    kubectl get pods

    Create a service to expose the PostgreSQL database within the Kubernetes cluster, then create a file (e.g. service.yaml) and paste in the following configuration settings:

    apiVersion: v1
    kind: Service
    metadata:
       name: postgres
       labels:
         app: postgres
    spec:
       selector:
         app: postgres
       ports:
         - protocol: TCP
           name: http
           port: 5432
           targetPort: 5432

    Execute the following command to create this service object:

    kubectl apply -f service.yaml
    kubectl get service #outputs the service

    Now, log in to the database admin using the following commands:

    kubectl exec -it postgres-0 -- bash # Goes into to postgres-0 pod
    psql --username=john postgres #Accesses the postgresql database admin with the username configured in the secret object

    If you have a successful login, you should have the following output:

    Any data you create and store in the database remains, even if this pod is deleted and recreated. This is because you have configured a persistent volume and a volume claim. And also, used a StatefulSet to deploy the database which is what is recommended for stateful applications.

    Conclusion

    As you have seen, managing stateful applications in Kubernetes is very vital as you have to put certain conditions like data persistence, consistency, scalability, reliability and security into consideration. As a next step, you can further manage the PostgreSQL database by configuring prometheus and grafana to monitor the PostgreSQL database, and as long as the postgres database application is healthy, you can consider scaling the postgres database horizontally that way the database either scales up or down when there is a high or reduced traffic. If you are considering scaling the database, you can see the following guide from the Kubernetes official documentation.

    For additional security, you can consider configuring Roled-based Access Control to make sure only the right people have access to the database. With all the knowledge you have gained in this article, you can now go ahead to deploy high-availability stateful applications in Kubernetes.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Avatar
    Writen by:
    Mercy Bassey is a JavaScript programmer with a passion for technical writing. Her area of expertise is Full-stack web development and DevOps/IT
    Avatar
    Reviewed by:
    I picked up most of my skills during the years I worked at IBM. Was a DBA, developer, and cloud engineer for a time. After that, I went into freelancing, where I found the passion for writing. Now, I'm a full-time writer at Semaphore.