K8s Cluster maintenance involves managing and ensuring smooth operation in K8s cluster by performing various tasks such as
upgrading Kubernetes components
Scaling Cluster resources
Managing Node failure
Monitoring cluster health
Upgrading K8s Components
Workflow
Upgrade primary control node
Upgrade other control nodes
Upgrade worker node
Before doing an upgrade, we need to know about Backup and restore
This can be done by a few plugins.
Backup and restore Cluster
One such plugin is K10.
What is special about these tools? Mostly using etcd and kubeadm when backups are taken for the cluster, only the workloads are backed up and can be used to restore the workloads.
But this doesn't back up the complete cluster without its data and application resources. Here comes the concept of data management in K8s.
The data(PV or PVC or public cloud volumes) will be stored in some physical storage. This could be cloud storage or an on-prem server.
Check this Blog for backup and restore clusters using K10
Reference: https://www.youtube.com/watch?v=01qcYSck1c4
Upgrade primary control node
Upgrade kubeadm:
# replace x in 1.27.x-00 with the latest patch version apt-mark unhold kubeadm && \ apt-get update && apt-get install -y kubeadm=1.27.x-00 && \ apt-mark hold kubeadm
Verify that the download works:
kubeadm version
Verify upgrade plan:
kubeadm upgrade plan
Choose a version to upgrade to, and run the appropriate command. For example:
sudo kubeadm upgrade apply v1.27.x
Manually upgrade your CNI provider plugin
For the other control plane nodes
Run this command to upgrade the node: sudo kubeadm upgrade node
Drain the node
kubectl drain <node-to-drain> --ignore-daemonsets
Upgrade kubelet and kubectl
apt-mark unhold kubelet kubectl && \
apt-get update && apt-get install -y kubelet=1.27.x-00 kubectl=1.27.x-00 && \
apt-mark hold kubelet kubectl
Restart the kubelet:
sudo systemctl daemon-reload sudo systemctl restart kubelet
Uncordon the node
kubectl uncordon <node-to-uncordon>
Upgrade worker nodes
The upgrade procedure on worker nodes should be executed one node at a time or a few nodes at a time, without compromising the minimum required capacity for running your workloads.
Scaling Cluster Resource
Scale a Deployment by using the following command:
kubectl scale deployment/nginx-deployment --replicas=10
The output is similar to this:
deployment.apps/nginx-deployment scaled
Managing Node Failure
Managing the node failing can be mitigated using scaling the Node and replica set.
But Node failure happens due to several reasons, it could be
Due to server meltdown
Due to network failure
Installing upgrading the node without downtime
All these reasons must have the plan to be executed to bring back the node and provide the application continuity.
Monitoring Cluster health
There are several plugins to monitor cluster health. Whereas we can use K8s Dashboard to monitor the cluster.
The dashboard is a web-based Kubernetes user interface. You can use Dashboard to deploy containerized applications to a Kubernetes cluster, troubleshoot your containerized application, and manage the cluster resources.
The dashboard also provides information on the state of Kubernetes resources in your cluster and on any errors that may have occurred.
Deploying the Dashboard UI
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
Accessing the Dashboard UI
To protect your cluster data, Dashboard deploys with a minimal RBAC configuration by default. Currently, Dashboard only supports logging in with a Bearer Token.
You can enable access to the Dashboard using the kubectl proxy
Logs viewer:
Pod lists and detail pages link to a logs viewer that is built into Dashboard. The viewer allows for drilling down logs from containers belonging to a single Pod.
Monitoring node health is also important. Please refer to this.
Reference:
Upgrading Kubernetes components: Upgrading Kubeadm clusters | Kubernetes
Scaling Cluster resources: Deployments | Kubernetes
Managing Node Failure: Upgrading Kubeadm clusters | Kubernetes
Monitoring Cluster Health:
https://kubernetes.io/docs/tasks/debug/debug-cluster/monitor-node-health/
https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/
Summary:
Maintaining the cluster includes many activities more than what I have mentioned in this blog, like upgrading the nodes, application maintenance and monitoring, disaster recovery, network failures and automatic failure recovery of workloads. So we need to analyze and plan those ways that clusters can have application availability. Then the plan can be used for upgrades, maintenance, monitoring, backup, recovery,serverloads, network loads and some emergency failovers.
Thanks for reading my blog. Hope it helps to understand a few topics in K8s Cluster maintenance.
Suggestions are always welcomed.
Will see you in the next blog ........... :)
~~Saraa