K8s Cluster maintenance

K8s Cluster maintenance involves managing and ensuring smooth operation in K8s cluster by performing various tasks such as

upgrading Kubernetes components
Scaling Cluster resources
Managing Node failure
Monitoring cluster health

Upgrading K8s Components

Workflow

Upgrade primary control node
Upgrade other control nodes
Upgrade worker node

Before doing an upgrade, we need to know about Backup and restore

This can be done by a few plugins.

Backup and restore Cluster

One such plugin is K10.

What is special about these tools? Mostly using etcd and kubeadm when backups are taken for the cluster, only the workloads are backed up and can be used to restore the workloads.

But this doesn't back up the complete cluster without its data and application resources. Here comes the concept of data management in K8s.

The data(PV or PVC or public cloud volumes) will be stored in some physical storage. This could be cloud storage or an on-prem server.

Check this Blog for backup and restore clusters using K10

Reference: https://www.youtube.com/watch?v=01qcYSck1c4

Upgrade primary control node

Upgrade kubeadm:

# replace x in 1.27.x-00 with the latest patch version apt-mark unhold kubeadm && \ apt-get update && apt-get install -y kubeadm=1.27.x-00 && \ apt-mark hold kubeadm
Verify that the download works: kubeadm version
Verify upgrade plan: kubeadm upgrade plan
Choose a version to upgrade to, and run the appropriate command. For example: sudo kubeadm upgrade apply v1.27.x
Manually upgrade your CNI provider plugin

For the other control plane nodes

Run this command to upgrade the node: sudo kubeadm upgrade node

Drain the node

kubectl drain <node-to-drain> --ignore-daemonsets

Upgrade kubelet and kubectl

apt-mark unhold kubelet kubectl && \
apt-get update && apt-get install -y kubelet=1.27.x-00 kubectl=1.27.x-00 && \
apt-mark hold kubelet kubectl

Restart the kubelet:

sudo systemctl daemon-reload sudo systemctl restart kubelet

Uncordon the node

kubectl uncordon <node-to-uncordon>

Upgrade worker nodes

The upgrade procedure on worker nodes should be executed one node at a time or a few nodes at a time, without compromising the minimum required capacity for running your workloads.

Scaling Cluster Resource

Scale a Deployment by using the following command:

kubectl scale deployment/nginx-deployment --replicas=10

The output is similar to this:

deployment.apps/nginx-deployment scaled

Managing Node Failure

Managing the node failing can be mitigated using scaling the Node and replica set.

But Node failure happens due to several reasons, it could be

Due to server meltdown
Due to network failure
Installing upgrading the node without downtime

All these reasons must have the plan to be executed to bring back the node and provide the application continuity.

Monitoring Cluster health

There are several plugins to monitor cluster health. Whereas we can use K8s Dashboard to monitor the cluster.

The dashboard is a web-based Kubernetes user interface. You can use Dashboard to deploy containerized applications to a Kubernetes cluster, troubleshoot your containerized application, and manage the cluster resources.

The dashboard also provides information on the state of Kubernetes resources in your cluster and on any errors that may have occurred.

Deploying the Dashboard UI

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml

Accessing the Dashboard UI

To protect your cluster data, Dashboard deploys with a minimal RBAC configuration by default. Currently, Dashboard only supports logging in with a Bearer Token.

You can enable access to the Dashboard using the kubectl proxy

Logs viewer:

Pod lists and detail pages link to a logs viewer that is built into Dashboard. The viewer allows for drilling down logs from containers belonging to a single Pod.

Monitoring node health is also important. Please refer to this.

Reference:

Upgrading Kubernetes components: Upgrading Kubeadm clusters | Kubernetes
Scaling Cluster resources: Deployments | Kubernetes
Managing Node Failure: Upgrading Kubeadm clusters | Kubernetes
Monitoring Cluster Health:

https://kubernetes.io/docs/tasks/debug/debug-cluster/monitor-node-health/

https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/

https://www.tigera.io/learn/guides/kubernetes-monitoring/

Summary:

Maintaining the cluster includes many activities more than what I have mentioned in this blog, like upgrading the nodes, application maintenance and monitoring, disaster recovery, network failures and automatic failure recovery of workloads. So we need to analyze and plan those ways that clusters can have application availability. Then the plan can be used for upgrades, maintenance, monitoring, backup, recovery,serverloads, network loads and some emergency failovers.

Thanks for reading my blog. Hope it helps to understand a few topics in K8s Cluster maintenance.

Suggestions are always welcomed.

Will see you in the next blog ........... :)

~~Saraa