Etcd Recovery
Overview
Etcd pods for hosted clusters run as part of a statefulset (etcd). The statefulset relies on persistent storage to store etcd data per member. In the case of a HighlyAvailable control plane, the size of the statefulset is 3 and each member (etcd-N) has its own PersistentVolumeClaim (etcd-data-N).
Checking cluster health
Execute into a running etcd pod:
$ oc rsh -n ${CONTROL_PLANE_NAMESPACE} -c etcd etcd-0
Setup the etcdctl environment:
export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/etcd/tls/etcd-ca/ca.crt
export ETCDCTL_CERT=/etc/etcd/tls/client/etcd-client.crt
export ETCDCTL_KEY=/etc/etcd/tls/client/etcd-client.key
export ETCDCTL_ENDPOINTS=https://etcd-client:2379
Print out endpoint health for each cluster member:
etcdctl endpoint health --cluster -w table
Single Node Recovery
If a single etcd member of a 3-node cluster has corrupted data, it will most likely start crash looping, as in:
$ oc get pods -l app=etcd -n ${CONTROL_PLANE_NAMESPACE}
NAME READY STATUS RESTARTS AGE
etcd-0 2/2 Running 0 64m
etcd-1 2/2 Running 0 45m
etcd-2 1/2 CrashLoopBackOff 1 (5s ago) 64m
To recover the etcd member, delete its persistent volume claim (data-etcd-N) as well as the pod (etcd-N):
oc delete pvc/data-etcd-2 pod/etcd-2 --wait=false
When the pod restarts, the member should get re-added to the etcd cluster and become healthy again:
$ oc get pods -l app=etcd -n $CONTROL_PLANE_NAMESPACE
NAME READY STATUS RESTARTS AGE
etcd-0 2/2 Running 0 67m
etcd-1 2/2 Running 0 48m
etcd-2 2/2 Running 0 2m2s
Recovery from Quorum Loss
If multiple members of the etcd cluster have lost data or are in a crashloop state, then etcd must be restored from a snapshot. The following procedure requires down time for the control plane as the etcd database is restored.
NOTE: The following instructions require the oc
and jq
binaries.
- Setup environment variables that point to your hosted cluster:
CLUSTER_NAME=my-cluster
CLUSTER_NAMESPACE=clusters
CONTROL_PLANE_NAMESPACE="${CLUSTER_NAMESPACE}-${CLUSTER_NAME}"
- Pause reconciliation on the HostedCluster (setting CLUSTER_NAME and CLUSTER_NAMESPACE to values that correspond to your hosted cluster):
oc patch -n ${CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":"true"}}' --type=merge
- Scale down API servers:
oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/kube-apiserver --replicas=0
oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-apiserver --replicas=0
oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-oauth-apiserver --replicas=0
-
Take a snapshot of etcd data using one of the following methods:
a. Use a previously backed up snapshot
b. Take a snapshot from a running etcd pod (PREFERRED but requires available etcd pod):
``` # List etcd pods oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd # If a pod is available: # 1. take a snapshot of its database and save it locally # Set ETCD_POD to the name of the pod that is available ETCD_POD=etcd-0 oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- env ETCDCTL_API=3 /usr/bin/etcdctl \ --cacert /etc/etcd/tls/etcd-ca/ca.crt \ --cert /etc/etcd/tls/client/etcd-client.crt \ --key /etc/etcd/tls/client/etcd-client.key \ --endpoints=https://localhost:2379 \ snapshot save /var/lib/snapshot.db # 2. Verify that the snapshot is good oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/snapshot.db # 3. Make a local copy of the snapshot oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/snapshot.db /tmp/etcd.snapshot.db ```
c. Make a copy of the snapshot db from etcd persistent storage:
# List etcd pods oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd # Find a pod that is running and set its name as the value of ETCD_POD ETCD_POD=etcd-0 # Copy the snapshot db from it oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/data/member/snap/db /tmp/etcd.snapshot.db
-
Scale down the etcd statefulset:
oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=0
-
Delete volumes for 2nd and 3rd members:
oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-1 pvc/data-etcd-2
-
Create pod to access the first etcd member's data:
# Save etcd image
ETCD_IMAGE=$(oc get -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd -o jsonpath='{ .spec.template.spec.containers[0].image }')
# Create pod that will allow access to etcd data:
cat << EOF | oc apply -n ${CONTROL_PLANE_NAMESPACE} -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: etcd-data
spec:
replicas: 1
selector:
matchLabels:
app: etcd-data
template:
metadata:
labels:
app: etcd-data
spec:
containers:
- name: access
image: $ETCD_IMAGE
volumeMounts:
- name: data
mountPath: /var/lib
command:
- /usr/bin/bash
args:
- -c
- |-
while true; do
sleep 1000
done
volumes:
- name: data
persistentVolumeClaim:
claimName: data-etcd-0
EOF
- Clear previous data and restore snapshot
# Wait for the etcd-data pod to start running
oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd-data
# Get the name of the etcd-data pod
DATA_POD=$(oc get -n ${CONTROL_PLANE_NAMESPACE} pods --no-headers -l app=etcd-data -o name | cut -d/ -f2)
# Copy local snapshot into the pod
oc cp /tmp/etcd.snapshot.db ${CONTROL_PLANE_NAMESPACE}/${DATA_POD}:/var/lib/restored.snap.db
# Remove old data
oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm -rf /var/lib/data
oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- mkdir -p /var/lib/data
# Restore snapshot
oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- etcdutl snapshot restore /var/lib/restored.snap.db \
--data-dir=/var/lib/data --skip-hash-check \
--name etcd-0 \
--initial-cluster-token=etcd-cluster \
--initial-cluster etcd-0=https://etcd-0.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380,etcd-1=https://etcd-1.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380,etcd-2=https://etcd-2.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380 \
--initial-advertise-peer-urls https://etcd-0.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380
# Remove snapshot from etcd-0 data directory
oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm /var/lib/restored.snap.db
- Delete data access deployment:
oc delete -n ${CONTROL_PLANE_NAMESPACE} deployment/etcd-data
- Scale up etcd cluster:
oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=3
Wait for the all etcd member pods to come up and report available:
oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd -w
- Scale apiservers back up:
oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/kube-apiserver --replicas=3
oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-apiserver --replicas=3
oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-oauth-apiserver --replicas=3
- Remove hosted cluster pause:
oc patch -n ${CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":""}}' --type=merge