Persistent Storage and PVCs in Kubernetes



In Kubernetes, storage is a critical component of managing applications, especially stateful ones that require data persistence. By default, Kubernetes pods are ephemeral, meaning that any data stored within them is lost if the pod is restarted, rescheduled, or deleted. To solve this problem, Kubernetes provides Persistent Volumes (PVs) and Persistent Volume Claims (PVCs), which allow applications to retain data beyond the lifecycle of a pod.

In this chapter, we will explore how Kubernetes handles persistent storage, the key components involved, and how to use Persistent Volumes and Persistent Volume Claims to ensure data is retained across pod restarts.

Understanding Kubernetes Storage

Kubernetes offers two primary types of storage −

  • Ephemeral Storage − Temporary storage that is lost when the pod is deleted.
  • Persistent Storage − Storage that remains available even if the pod using it is removed or rescheduled.

Why Persistent Storage Matters

Many applications, such as databases, need to store data that survives beyond the pod's lifecycle. Kubernetes dynamically schedules pods across different nodes, making local storage unreliable. Persistent storage provides a way to decouple storage from compute resources, enabling flexible scaling.

What is a Persistent Volume?

A Persistent Volume (PV) is a piece of storage in the Kubernetes cluster that has been provisioned by an administrator. It is independent of any single pod and provides an abstraction for storage systems like NFS, AWS EBS, Azure Disks, or Google Persistent Disks.

What is a Persistent Volume Claim?

A Persistent Volume Claim (PVC) is a request made by a pod to use a Persistent Volume. It allows users to dynamically request storage resources without needing to manage the storage infrastructure themselves.

Key Components of a PV

Following are the key components of a Persistent Volume -

  • Capacity − Defines the storage size (e.g., 10Gi).
  • Access Modes
    • ReadWriteOnce (RWO) - Single-node read/write.
    • ReadOnlyMany (ROX) - Multiple nodes read-only.
    • ReadWriteMany (RWX) - Multiple nodes read/write.
  • Reclaim Policy
    • Retain - Keeps data after a PV is released.
    • Delete - Deletes storage after a PV is released.
    • Recycle - Performs a basic cleanup before reuse.
  • Storage Class − Defines different types of storage with parameters.

How PVCs Work

When a PVC is created, Kubernetes tries to bind it to an existing PV with matching storage requirements.

If a suitable PV is found, it is bound to the PVC.

If no PV is available, the PVC remains pending until a suitable PV is created.

Setting Up Persistent Storage in Kubernetes

To understand how Kubernetes handles persistent storage, let's go through a hands-on example where we create a Persistent Volume (PV), a Persistent Volume Claim (PVC), and attach it to a pod.

Step 1. Creating a Persistent Volume

A PV is defined using a YAML manifest, specifying storage capacity, access modes, and the underlying storage provider. We will first create a Persistent Volume (PV) that provisions 1Gi of storage from the host path.

Using an editor, create the following YAML file -

$ sudo nano pv.yaml

Paste the following content −

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-pv
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  hostPath:
    path: "/mnt/data"

Apply the configuration -

$ kubectl apply -f pv.yaml

Output

persistentvolume/my-pv created

This PV reserves 1Gi of storage on the host machine at /mnt/data. The Retain reclaim policy ensures that the storage is not deleted when the PVC releases it.

Step 2. Creating a Persistent Volume Claim

A PVC allows pods to request storage. Kubernetes will bind it to an available PV.

Using an editor, create the following YAML file -

$ sudo nano pvc.yaml

Paste the following content -

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Apply the configuration -

$ kubectl apply -f pvc.yaml

Output

It will produce the following output -

persistentvolumeclaim/my-pvc created

The PVC requests 1Gi of storage with ReadWriteOnce access mode. If a matching PV is available, Kubernetes binds the PVC to it.

Verify the PVC status -

$ kubectl get pvc

Output

It will produce the following output -

NAME             STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
data-mymysql-0   Pending                                                     <unset>                 6d1h
my-pvc           Bound     my-pv    1Gi        RWO                           <unset>                 55s

The PVC is in a Bound state, meaning Kubernetes successfully allocated the requested storage.

Mounting a PVC to a Pod

To use the PVC, we need to attach it to a pod. Using an editor, create the following YAML file -

$ sudo nano pod.yaml

Paste the following content -

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
    - name: app-container
      image: nginx
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: storage
  volumes:
    - name: storage
      persistentVolumeClaim:
        claimName: my-pvc

Explanation

When the Pod is scheduled, Kubernetes attaches the Persistent Volume (PV) associated with the my-pvc claim. The container accesses the PVC storage at /usr/share/nginx/html. Any data placed in /usr/share/nginx/html inside the container remains available across Pod restarts.

Apply the configuration -

$ kubectl apply -f pod.yaml

Output

It will produce the following output -

pod/my-pod created

Now, the pod will use the PVC to persist its data.

Dynamic Provisioning with Storage Classes

Manual PV provisioning is inefficient for large-scale deployments. Storage Classes allow dynamic provisioning, enabling Kubernetes to create PVs automatically when a PVC is requested.

Creating a Storage Class

Using an editor, create the following YAML file −

$ nano storage-class.yaml

Paste the following content −

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-storage
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
reclaimPolicy: Delete

Apply the configuration −

$ kubectl apply -f storage-class.yaml

Output

It will produce the following output -

storageclass.storage.k8s.io/fast-storage created

This confirms that a new StorageClass named fast-storage has been created. This StorageClass enables dynamic provisioning using AWS EBS volumes with the gp2 type, and its reclaimPolicy is set to Delete, meaning the volume is automatically deleted when the PVC is deleted.

Using a Storage Class in a PVC

Using an editor, create the following YAML file -

$ sudo nano dynamic-pvc.yaml

Paste the following content -

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dynamic-pvc
spec:
  storageClassName: fast-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

Apply the configuration −

kubectl apply -f dynamic-pvc.yaml

Output

It will produce the following output -

persistentvolumeclaim/dynamic-pvc created

Since this PVC references the fast-storage StorageClass, Kubernetes automatically provisions a new AWS EBS volume with 5Gi of storage. Unlike manually created PVs, this storage is dynamically created and managed by Kubernetes.

Managing and Deleting PVCs and PVs

Checking Persistent Volumes

To list existing PVs -

$ kubectl get pv

Output

It will produce the following output -

NAME    CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
my-pv   1Gi        RWO            Retain           Bound    default/my-pvc                  <unset>                          11m

Checking Persistent Volume Claims

To list all PVCs -

$ kubectl get pvc

Output

It will produce the following output -

NAME             STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
data-mymysql-0   Pending                                                     <unset>                 6d1h
dynamic-pvc      Pending                                      fast-storage   <unset>                 2m30s
my-pvc           Bound     my-pv    1Gi        RWO                           <unset>                 11m

Deleting a PVC

To remove a PVC −

$ kubectl delete pvc my-pvc

Output

It will produce the following output -

persistentvolumeclaim "my-pvc" deleted

Note − Deleting a PVC does not immediately delete the associated PV unless the reclaim policy is Delete.

Real-World Use Cases for Persistent Storage

Databases − Persistent storage is critical for databases such as MySQL, PostgreSQL, or MongoDB in Kubernetes. Without it, database data would be lost every time a pod restarts.

Logging and Monitoring − Applications often need persistent storage for logging purposes. Tools like ELK Stack (Elasticsearch, Logstash, Kibana) or Prometheus rely on PVCs to store log and metric data.

File Storage for Web Applications − Web applications, such as a CMS (WordPress, Drupal), require persistent storage to save uploaded images, documents, and media files.

Troubleshooting Persistent Storage Issues

PVC Stuck in Pending State − Check if a matching PV exists using kubectl get pv. Ensure the PV has the correct storage class and size.

Pod Stuck in ContainerCreating State − Verify that the PVC is correctly bound using kubectl get pvc. Check logs using kubectl describe pod <pod-name>.

Best Practices for Managing Persistent Storage

  • Use Storage Classes − Instead of manually defining PVs, use StorageClass to provision storage dynamically.
  • Monitor Storage Usage − Keep an eye on storage utilization using kubectl get pvc and Prometheus metrics.
  • Backup Strategies − Implement backup solutions like Velero or cloud-based snapshots.
  • Use ReadWriteMany (RWX) when needed − Some workloads may require shared access to the same storage.
  • Automate Cleanup − Implement policies to reclaim unused storage to optimize resource utilization.

Conclusion

Persistent storage in Kubernetes ensures that applications requiring data retention can function reliably. By using Persistent Volumes (PVs) and Persistent Volume Claims (PVCs), Kubernetes enables decoupled and scalable storage solutions. Dynamic provisioning with Storage Classes further automates storage management, making deployments more efficient.

Understanding these concepts is essential for running stateful applications, such as databases and content management systems, in a Kubernetes environment. By properly configuring persistent storage, we can ensure that critical data remains intact, even when pods are rescheduled or restarted.

Advertisements