Managing Stateful Applications on Kubernetes: Challenges and Best Practices

Stateful applications are computer programs that store information, or state, about their previous interactions with users or other components of a system. The state can include things like user preferences, session data, and application settings. Stateful applications are commonly used for tasks such as e-commerce, banking, and gaming, where it is necessary to maintain a consistent state across multiple interactions with users. But how do you manage such applications in Kubernetes? What are the common challenges and how do you solve them? If you have these questions running through your mind, then this article is for you.

What are stateful applications in Kubernetes?

In Kubernetes, stateful applications are those that require persistent storage to maintain their state and data across different instances or nodes. Examples of stateful applications include databases (MongoDB, MySQL, PostgreSQL, etc) messaging systems, big data platforms, and other applications that require a consistent and reliable data storage layer. These systems are often complex and require careful design and implementation to ensure that they are reliable, scalable, and secure.

Stateful applications in Kubernetes must always posses the following characteristics:

Persistence: stateful applications store state information persistently so that it can be accessed across multiple user sessions or interactions with other components of the system.
Consistency: stateful applications ensure that the state is consistent across different interactions. This means that the state should not change unexpectedly and that changes made to the state in one interaction should be reflected in subsequent interactions.
Scalability: stateful applications can be more challenging to scale than stateless applications because the state information needs to be replicated or shared across multiple instances of the application.
Reliability: stateful applications need to be designed to handle failures and errors gracefully because an unexpected failure could result in data loss or corruption.
Security: stateful applications may need to handle sensitive user data, so they need to be designed with appropriate security measures to protect this data from unauthorized access or modification.

Challenges of running stateful applications on Kubernetes

Running stateful applications on Kubernetes can present several challenges, particularly in the areas of data storage, networking, security, and monitoring. One of the primary challenges of running stateful applications on Kubernetes is managing persistent data storage. Traditional stateless applications can simply be replicated across multiple nodes, but stateful applications also require persistent data storage, which can be difficult to manage in a containerized environment. Kubernetes provides several options for data storage, including local storage, network-attached storage, and cloud storage, but choosing the right storage solution can be challenging.

Networking can also be a challenge for stateful applications on Kubernetes. Because stateful applications typically require communication between nodes, it’s important to ensure that the networking infrastructure is designed to support this. Kubernetes provides several networking options, including container networking, pod networking, and service networking, but configuring these options correctly can be complex.

Security is another key challenge for stateful applications on Kubernetes. Because stateful applications often store sensitive data, it’s important to ensure that the container environment is secure. Kubernetes provides several security features, including RBAC, pod security policies, and network policies, but properly configuring these features can be difficult.

Finally, monitoring stateful applications on Kubernetes can be a challenge. Because stateful applications require persistent data storage, it’s important to monitor the health and performance of the data storage system. Kubernetes provides several monitoring tools, including Prometheus, Grafana, and the Kubernetes Dashboard, but configuring these tools to monitor stateful applications can be complex.

Best practices for managing stateful applications on Kubernetes

Designing and managing stateful applications on Kubernetes requires careful consideration of data storage, networking, security, and monitoring. If you are considering deploying a stateful application in Kubernetes, here are some of the best practices:

Data storage

First, the use of storage abstraction layers is very important; these storage abstraction layers include Kubernetes StatefulSets or operators. Second, be sure to use the appropriate storage solution based on your specific use case, whether it’s local storage, network-attached storage, or cloud storage. In addition, make use of Kubernetes persistent volumes and claims to ensure data is stored persistently and can be accessed by multiple pods if necessary. And last, be sure to implement backup and disaster recovery solutions to protect data in case of data loss.

Networking

When it comes to networking, be sure to use Kubernetes services to manage networking between stateful applications (this can also be microservices that require data persistence and consistency), use headless services to provide direct access to individual pods in a StatefulSet, implement network policies to control traffic between pods and restrict access to sensitive data, and use a service mesh, such as Istio or Linkerd, to provide advanced networking features such as traffic shaping, fault tolerance, and encryption.

Security

For ensured security, it is highly recommended to use Kubernetes RBAC, pod security policies to restrict access to sensitive data and resources, and external secret operators (ESO) to ensure total security. Also, implement network policies to restrict access to sensitive data, use secrets and ConfigMaps to manage sensitive configuration data, such as passwords and API keys, and be sure to use container scanning tools, such as Anchore or Clair, to ensure that container images are free of vulnerabilities.

Monitoring

To be sure your stateful application is healthy and to prevent downtime, use Kubernetes monitoring tools, such as Prometheus and Grafana or Kubernetes probes, to monitor the health and performance of stateful applications. Also, use logging and tracing tools, such as Fluentd and Jaeger, to track and diagnose issues in stateful applications and implement auto-scaling based on application performance metrics, such as CPU and memory usage, to ensure that stateful applications are always available and responsive. Last, implement service-level objectives (SLOs) and service-level agreements (SLAs) to ensure that stateful applications meet performance and availability requirements.

By following these best practices, you can ensure that your stateful applications are reliable, scalable, and secure in a Kubernetes environment.

Managing stateful applications on Kubernetes in real-time

Enough of all the theory and let’s get to work. We will deploy a stateful application, and apply the aforementioned best practices in a simplified form.

First, you might want to consider providing data persistence and consistency. We will make use of Kubernetes persistence and claims. This tutorial will make use of a cloud-based Kubernetes cluster with two nodes.

Execute the following command to verify that our nodes are ready:

kubectl get nodes

Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) are Kubernetes objects used to provide persistent storage for containerized applications running in a cluster. PVs represent a piece of networked storage in a cluster and are provisioned by an administrator. A PVC is a request for storage by a user, which can request a specific amount of storage, access mode, and other properties. When a PVC is created, Kubernetes will automatically provision a matching PV and bind the two together. Applications can then use the PV to store and retrieve data, even as the pods that use the PV come and go. This allows data to persist even when pods are deleted or recreated.

In Kubernetes, persistent volumes are not namespaced resources, meaning that they exist globally in the Kubernetes cluster and can be used from any namespace. On the other hand, Persistent Volume Claims (PVCs) are namespaced resources, meaning that they belong to a specific namespace and can only be used by Pods in the same namespace.

To create a persistent volume and a persistent volume claim, create a file (e.g. pv-pvc.yaml) and paste in the following configuration settings:

apiVersion: v1
kind: PersistentVolume
metadata:
   name: postgres-volume # Name of the persistent volume
   labels:
     type: local
spec:
   storageClassName: hostpath # Name of the storage class
   capacity:
     storage: 5Gi # Amount of storage this volume should hold
   accessModes:
     - ReadWriteOnce # To be read and written only once
   hostPath: # Storage class type
     path: '/mnt/data' # File path to mount volume                   

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-volume-claim # Name of the persistent volume claim
spec:
  storageClassName: hostpath # Name of the storage class
  accessModes:
    - ReadWriteOnce # volume can be mounted as read-write by a single node in the cluster.
  resources:
    requests:
      storage: 500Mi # Indicates this claim requests only 50Mi of storage from a PV

The code above defines two Kubernetes objects: a PersistentVolume and a PersistentVolumeClaim.

The PersistentVolume defines a storage volume that can be used by a Kubernetes pod. It is named postgres-volume in this case and has a storageClassName of hostpath, indicating that it is backed by a local file system. The capacity field sets the size of the volume to 5Gi, while accessModes specify that the volume can be mounted as read-write by a single node in the cluster, which also means that once it is mounted, other pods cannot mount this volume until it is released. This is useful when you want to ensure that only one pod has access to the data stored on the volume at a time. Finally, the hostPath field indicates the file path on the host machine where the volume will be mounted.

The PersistentVolumeClaim is a request for a storage volume by a Kubernetes pod. It is named postgres-volume-claim in this case and requests a volume with the hostpath storage class and a size of 500Mi. The accessModes field is set to ReadWriteOnce, indicating that the volume can be mounted as read-write by a single node in the cluster, and finally the resources field sets the storage request to 500Mi.

Now to configure the persistent volume and claim, execute the following commands below:

kubectl apply -f pv-pvc.yaml #applies the configuration settings to the cluster
kubectl get pv #gets persistent volume
kubectl get pvc #gets persistent volume claim

From the output above, you can see that the persistent volume was created with an Available status, and persistent volume claim was created with the Bound status, which means it is bound to the persistent volume.

In simple terms when a persistent volume is created, it is allocated but not yet assigned to any specific pod and its status is set to Available. When a persistent volume claim is created and it is bound to a persistent volume, the storage resources requested by the claim are assigned to a specific persistent volume and the status of the persistent volume claim is set to Bound.

Next, create a secret object to store the PostgreSQL credentials: a username john and password 12345678:

kubectl create secret generic postgres --from-literal=username=john --from-literal=password=12345678

The credentials used above are just meant for demo purposes. In a production environment it’s very important to use a more secure username and password.

Create a StatefulSet object to deploy the PostgreSQL database using the secret you just created, then create a file (e.g. statefulset.yaml) and paste in the following configuration settings:

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  selector:
    matchLabels:
      app: postgres
  replicas: 1
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres
          imagePullPolicy: IfNotPresent
          env:
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: postgres
                  key: username
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres
                  key: password
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgres/data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: postgres-volume-claim

The previous code will create a StatefulSet object in Kubernetes that runs a PostgreSQL container with a postgres image and environment variables, referencing the postgres secret created earlier to get the username and password. It also sets up a persistent volume for the container’s data using a persistent volume claim: postgres-volume-claim.

Create the Statefulset with the following command:

kubectl apply -f statefulset.yaml
kubectl get statefulset
kubectl get pods

Create a service to expose the PostgreSQL database within the Kubernetes cluster, then create a file (e.g. service.yaml) and paste in the following configuration settings:

apiVersion: v1
kind: Service
metadata:
   name: postgres
   labels:
     app: postgres
spec:
   selector:
     app: postgres
   ports:
     - protocol: TCP
       name: http
       port: 5432
       targetPort: 5432

Execute the following command to create this service object:

kubectl apply -f service.yaml
kubectl get service #outputs the service

Now, log in to the database admin using the following commands:

kubectl exec -it postgres-0 -- bash # Goes into to postgres-0 pod
psql --username=john postgres #Accesses the postgresql database admin with the username configured in the secret object

If you have a successful login, you should have the following output:

Any data you create and store in the database remains, even if this pod is deleted and recreated. This is because you have configured a persistent volume and a volume claim. And also, used a StatefulSet to deploy the database which is what is recommended for stateful applications.

Conclusion

As you have seen, managing stateful applications in Kubernetes is very vital as you have to put certain conditions like data persistence, consistency, scalability, reliability and security into consideration. As a next step, you can further manage the PostgreSQL database by configuring prometheus and grafana to monitor the PostgreSQL database, and as long as the postgres database application is healthy, you can consider scaling the postgres database horizontally that way the database either scales up or down when there is a high or reduced traffic. If you are considering scaling the database, you can see the following guide from the Kubernetes official documentation.

For additional security, you can consider configuring Roled-based Access Control to make sure only the right people have access to the database. With all the knowledge you have gained in this article, you can now go ahead to deploy high-availability stateful applications in Kubernetes.

Managing Stateful Applications on Kubernetes: Challenges and Best Practices

What are stateful applications in Kubernetes?

Challenges of running stateful applications on Kubernetes

Best practices for managing stateful applications on Kubernetes

Data storage

Networking

Security

Monitoring

Managing stateful applications on Kubernetes in real-time

Conclusion

Learn CI/CD

Leave a Reply Cancel reply

CI/CD Weekly Newsletter

What are stateful applications in Kubernetes?

Challenges of running stateful applications on Kubernetes

Best practices for managing stateful applications on Kubernetes

Data storage

Networking

Security

Monitoring

Managing stateful applications on Kubernetes in real-time

Conclusion

CI/CD Weekly Newsletter 🔔

Semaphore Uncut Podcast 🎙️

Learn CI/CD

Leave a Reply Cancel reply

CI/CD Weekly Newsletter