K8s Troubleshooting

K8s Troubleshooting

In this topic let's discuss only troubleshooting K8s Cluster. Troubleshooting in K8s is a bigger topic. As we are mainly dealing learnt about clusters, we will learn about monitoring, troubleshooting and debugging them.

Sections of troubleshooting

  1. Debugging your application

  2. Debugging your cluster

Debugging application

Here we will cover the issues that are related to deployed applications in K8s cluster following workloads and container messages and debugging them.

Diagnosing the problem

The first step is to look where is the issue in the deployed application.

Debugging pods, service and RC

  • Check the current state of pods -kubectl describe pods ${POD_NAME}

  • Validate the yaml file - kubectl apply --validate -f mypod.yaml

  • Check the current state of replication controller - kubectl describe rc ${CONTROLLER_NAME}

  • Check for status of service - kubectl get endpoints ${SERVICE_NAME}

Debugging Running Pods

To check the running pods and describe more on each pod

kubectl describe pod / Kubectl get pod

To get events related to the namespace

kubectl get events

kubectl get events --namespace=my-namespace

Examining pod logs

First, look at the logs of the affected container:

kubectl logs ${POD_NAME} ${CONTAINER_NAME}

Access the previous container's crash log with:

kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}

Debugging service

Once the service is exposed we will see the services using the below commands

kubectl get svc hostnames

Check if service endpoints exists

kubectl get endpoints hostnames

Check if kube-proxy is working

ps auxw | grep kube-proxy

Refer here to know more about debugging services.

Debugging StatefulSet

Install the kubectl command-line tool

List all the pods which belong to a StatefulSet

kubectl get pods -l app.kubernetes.io/name=MyApp

If you find that any Pods listed are in Unknown or Terminating state for an extended period of time, they can be deleted

kubectl delete statefulsets <statefulset-name>

The associated headless service must also be deleted using

kubectl delete service <service-name>

Debug InitContainers

Checking the status of Init Containers

  • Check the status of your pod: kubectl get pod <pod-name>

  • Get details of init container: kubectl describe pod <pod-name>\

Debug Logs from Init container

  • kubectl logs <pod-name> -c <init-container-2>

Get a Shell to run Container

  • Get a shell to the running container: kubectl exec --stdin --tty shell-demo -- /bin/bash

Debugging Cluster

Listing your Cluster:

To get nodes: kubectl get nodes

To get detailed information about the overall health of your cluster, you can run: kubectl cluster-info dump

kubectl get nodes

kubectl describe node kube-worker-1

Looking at Logs

On systemd-based systems, you may need to use journalctl instead of examining log files.

Control Plane nodes

  • /var/log/kube-apiserver.log - API Server, responsible for serving the API

  • /var/log/kube-scheduler.log - Scheduler, responsible for making scheduling decisions

  • /var/log/kube-controller-manager.log - a component that runs most Kubernetes built-in controllers, with the notable exception of scheduling (the kube-scheduler handles scheduling).

Worker Nodes

  • /var/log/kubelet.log - logs from the kubelet, responsible for running containers on the node

  • /var/log/kube-proxy.log - logs from kube-proxy, which is responsible for directing traffic to Service endpoints


Thanks for reading my blog. Hope it helps to understand a few topics in K8s Cluster Troubleshooting.

Suggestions are always welcomed.
