As DevOps teams embrace containerization to support the speed and flexibility needed for CI/CD (continuous integration/continuous delivery) implementations, Kubernetes has quickly become the most popular container orchestration platform. In Kubernetes, individual containers are packaged together as pods, which you can automatically deploy, scale, and manage across entire clusters.
When we hear the term “automatic,” we tend to think “easy,” but Kubernetes disproves this notion. There are numerous errors you could run into while deploying a Kubernetes pod that could cause it to fail. Let’s discuss 5 tips for troubleshooting Kubernetes deployments.
So, you’ve deployed a Kubernetes pod, but how do you know if it’s up and running? The first step for troubleshooting Kubernetes deployments is to proactively verify deployed pods are in a ‘ready’ state. To do this, run the command:
kubectl get pods
This command will show you all running pods. The output will tell you if the pod is ready, its current status, how many times it has restarted, and the age of the pod's status (i.e. the length of time it’s been in its current state). If your pod isn’t running, then the STATUS column will likely list an error message.
If you need to troubleshoot a specific pod, you should run:
kubectl get pod [podname]
However, keep in mind that if your pod is part of a Kubernetes deployment, its name will have the replica set and pod IDs appended. You can learn this information from the “get pods” command or, if you only want a list of the replica sets, run:
kubectl get rs
Here are five of the most common Kubernetes pod deployment error messages, along with tips for troubleshooting and fixing the underlying issues that cause them.
If the status of your pod is CreateContainerConfigError, that usually means Kubernetes couldn’t find the Secret or ConfigMap. A Secret is a Kubernetes object that stores sensitive information the pod needs to authenticate, like database credentials. A ConfigMap is pretty much what it sounds like—a “map” containing the pod’s configuration information. If your new pod can’t mount the Secrets or ConfigMaps specified, it won’t be able to deploy.
To find out what’s missing, run the command:
kubectl describe pod [name]
Where [name] is the name of your pod without the brackets.
If a Secret or ConfigMap is missing, you’ll receive an error that looks like this:
Error: configmap “configmap-1” not found
Or
Error: secret “secret-1” not found
This tells you which ConfigMap or Secret you need to find or replace. To see if the missing Secret or ConfigMap exists in your cluster, you can run a version of this command (substituting the correct info as needed):
kubectl get configmap configmap-1
If the result is “null”, then it means the ConfigMap or Secret is missing. Then you either need to create it or modify the name of the ConfigMap that your pod requests during deployment.
Bonus tip: make sure you double-check the name and spelling of the missing ConfigMap or Secret in your error message. Sometimes the CreateContainerConfigError is caused by a simple typo in the name!
Another common Kubernetes deployment error is CrashLoopBackoff. This error could, unfortunately, mean quite a few things, including an issue with mounting the volume or insufficient resources on the node. Typically, the pod will stay in a “pending” state until the cluster has the resource available to schedule it. If you get the CrashLoopBackOff error, you’ll need to troubleshoot your Kubernetes deployment by running through a list of common causes and eliminating them one by one.
To narrow down the possible causes of your CrashLoopBackOff error, you can look in the pod details for clues. Run the command:
kubectl describe pod [name]
If you get a “Liveness probe failed” and “Back-off restarting failed container” message, that can mean there aren’t enough resources available. While you may be tempted to fix this problem by adjusting “periodSeconds” or “timeoutSeconds” to give your application a longer window of time to respond before shutting down, doing so could mask legitimate performance issues. You should check the configurations of the liveness probe to see what triggered the failure and whether the probe is properly configured for your use case.
If you don’t get the “Liveness probe failed” output from your pod details, then you’ll need to keep digging. The pod details may contain more clues—under the “Last State: Terminated” section, check the Reason, Message, and Exit Code to see if there’s any helpful information. For example, if the Reason is “OOMkilled,” then the pod was trying to use more memory than the node had. To resolve this, you would have to either adjust or specify resources for this pod.
The next step is to check the logs from your previous container instance to see if there are any clues in there. After running kubectl get pod, you can use this command to see the last ten lines from the pod’s log:
kubectl logs --previous --tail 10
You can also check the deployment logs by running:
kubectl logs -f deploy/ -n
The deployment logs will tell you about issues at the application level, such as volume mounting errors.
If you see the ImagePullBackOff or ErrImagePull status, that means your pod couldn’t pull its container image from the registry. The underlying cause could be an incorrect image name or tag, or an authentication issue due to a problem with the Secret. To determine which it is, run the command:
kubectl describe pod [name]
You can expect to see one of these outputs:
If you see the status Unknown, that usually means the worker node has shut down or crashed, so all stateful pods that reside on the node are unavailable. After five minutes (by default), Kubernetes will change the status of all pods scheduled on that node to Unknown and will attempt to schedule those pods on a different node.
If this is your issue, when you run the “kubectl get pod [name]” command, you’ll see the affected pod listed twice. One will have the status Unknown. The other will have the status ContainerCreating, indicating that Kubernetes is attempting to schedule your pod on a working node.
In that output, you can check the NODE column to see the name of both the crashed node and the working node. Using the name of the crashed node, run the command:
kubectl get node [name]
This output will likely indicate that the node is in NotReady status. If that’s the case, then the issue will probably resolve itself when the failed node is able to recover and rejoin the cluster. The original pod with the Unknown status will be deleted. The second pod that was automatically scheduled on a different node will finish deploying.
If that process is taking too long, or if the node doesn’t automatically recover, then you can manually delete the failed node. This will allow Kubernetes to finish deploying the new pod on the successful node. Simply run the command:
kubectl delete node [name]
When you run your kubectl get pod [name] command, you might get an output that just says:
No resources found.
That means your new pod deployment exceeded the CPU or memory limits set by your cluster administrator. You can verify those limits using the command:
kubectl describe limitrange
If this is the issue affecting your Kubernetes pod deployment, you have two options: ask your cluster administrator to increase the limit or reduce the requested resources to something below the existing limit.
Bonus tip: Make sure you have specified the correct namespace. If you don’t, kubectl get pod [name] will only show you what’s in the default namespace.
These are the issues you’re most likely to run into when you deploy your Kubernetes pods, but it’s far from an exhaustive list. If you can’t find the cause of your problem or if you need more extensive Kubernetes troubleshooting, then you shouldn’t be afraid to ask the experts for help. Copado’s team of DevOps experts is here to help you achieve digital transformation through containerization, Kubernetes orchestration, cloud-native architecture development, and more.
Level up your Salesforce DevOps skills with our resource library.