A blog about systems and statements
Move a volume from one node to another in Kubernetes
Move a volume from one node to another in Kubernetes

Move a volume from one node to another in Kubernetes

So you have a cluster and you have some persistent volumes on a few of the nodes. Now let's say you want to decommission one of the nodes but it has volumes on it your pods make use of. You'd think this would be simple enough with a popular platform like k8s, but you'd be wrong.

⚠️ Disclaimer: At the time I solved this problem I didn't do thorough research and didn't know about pv-migrate. That tool, which has raving reviews btw, is great, but automates a bit less than I needed for my use case. Still, rather use pv-migrate than roll your own like I did.

If you're using a cloud-backed block storage like AWS EBS then this isn't such a big problem. Basically, with a few AWS commands you can detach and re-attach a volume to another EC2 node. But if you don't trust AWS or don't like what they stand for, or both, like myself, then you're running nodes a cheaper cloud provider that doesn't grant such luxuries. In fact, in my case I had nodes running in both Germany and South Africa, communicating over the open internet (bad security posture I know). I wanted to decom my SA nodes and move their volumes to my German nodes.

How do volumes work in k8s? Here's the summary:

flowchart TD pvc["Persistent Volume Claim"] -..->|I want 10GB disk space| controller[Storage Controller] controller -..-> |k, creating...| pv["Persistent Volume: 10GB"] pvc -->|I'm bound to you now| pv

Big oversimplification, but as long as you get the idea: Usually you create a PVC along with a Deployment (or whatever pod controller you like), and then indicate on the pod that it should mount a PVC (.spec.volumes.*.persistentVolumeClaim). Once the pod is created for the first time, whichever storage controller you have set up will provision a persistent volume on one of the nodes, and the pod will start running on that node and mount that volume. Again, this is only one way of doing persistent volumes on k8s, and it is a simplification.

What are we trying to do here? Ideally this:

  1. Duplicate a given volume from one node onto another.
  2. Minimal configuration changes. I want to scale down the deployment, do the volume copy, then scale the deployment back up and have it run on the new node with the newly-copied volume.

Challenge 1: Copy the volume contents from one node to another

My strategy (hilariously) assumes pod-to-pod network encryption, and basically is a dead-simple tar-copy.

sequenceDiagram participant srcpv as Source volume participant reader as Reader pod participant writer as Writer pod participant destpv as Destination volume reader ->> srcpv: mount volume writer ->> destpv: mount volume reader ->> reader: sleep writer ->> reader: kubectl exec -- tar -czf - reader ->> srcpv: read files srcpv -->> reader: compress with gz reader -->> writer: compressed stream writer ->> destpv: decompressed files

The reader pod will naturally run on the node where the source volume is, but to ensure the destination volume is provisioned on the correct node, be sure to set the nodeSelector on the writer pod.

This approach works quite well for up to 10GB or so. I imagine for larger volumes you'd need a more reliable file transfer solution that can handle retries and so on. I'll leave that as an exercise to the reader.

Challenge 2: Modifying a PVC in-place

Certain fields you actually can't modify in-place because of api-server-level restrictions. But that just means we have to get creative 🙂

So to start off with let's assume our little tar-copy job above copied the data into a new PVC and PV on the destination node. I call the destination pvc tmp-copy, as we'll delete it later, and replace it with the source PVC's config.

flowchart LR srcpv[Source PV] --> |Bound to| srcpvc[Source PVC] destpv[Destination PV] --> |Bound to| destpvc[Destination PVC]
  1. Save the full source PVC config (we'll recreate it later)
kubectl get pvc "$SRC_PVC_NAME" -o yaml > /tmp/src_pvc.yaml

2. "Detach" the source pv from its pvc, then delete the pvc.

kubectl patch pv $SRC_PV -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
kubectl patch pvc $SRC_PVC_NAME -p '{"metadata":{"finalizers":null}}'
kubectl delete pvc $SRC_PVC_NAME

3. Do the same with the destination pv and its pvc

kubectl patch pv $DEST_PV -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
kubectl patch pvc tmp-copy -p '{"metadata":{"finalizers":null}}'

4. Now that we only have two loose PV's (and, crucially, the yaml config of the original PVC) we just need to re-create the PVC, attaching it onto the destination PV:

cat /tmp/src_pvc.yaml | yq -Y ".spec.volumeName = \"$DEST_PV\"" | kubectl apply -f -
# Update PVC node annotation
kubectl annotate --overwrite pvc "$SRC_PVC_NAME" "volume.kubernetes.io/selected-node=$DEST_NODE"

5. Cleanup

# Delete last-applied-configuration so that we don't confuse argo
kubectl annotate --overwrite pvc "$SRC_PVC_NAME" kubectl.kubernetes.io/last-applied-configuration-
# Optional: yeet the original pv
kubectl delete pv $SRC_PV

And that's it.

Source please

Here you go. I split the code into 3 parts:

Part 0: Create an orchestrator job with the needed permissions to manage this whole operation. I intentionally wanted this to run fully in-cluster to make it a bit more reliable.

Part 1: Create the reader pod

Part 2: Create and run the writer pod

After the file copy is complete the orchestrator cleans up the reader and writer pods before doing the PVC re-attachment magic.

And there you have it. A fully automated pv-copy strategy that took me weeks to figure out, nicely condensed for you into a 5-min read.

Maak 'n opvolg-bydrae

Jou e-posadres sal nie gepubliseer word nie. Verpligte velde word met * aangedui