Archy's Blog

Posts

Showing posts from April, 2023

OCP4 - useful debugging commands

I've been running openshift (okd) in my homelab for about a month now for all my container needs and so far, I really like what I see. Along the way, I had to debug some issues so here are a few commands that should help you getting started in finding the root cause of a problem. Gathering information: $ oc get nodes -o wide $ oc get pods -A -o wide $ oc adm top node $ oc -n ${namespace} get events [-o wide] $ oc -n ${namespace} describe ${resourceype}/${resourcename} Check why pods are failing: $ oc -n ${namespace} logs [-f | --tail=${linecount}) deployments/${deployment} $ oc -n ${namespace} logs [-f | --tail=${linecount}] pods/${pod} $ oc -n ${namespace} get pods/${podname} -o jsonpath='{.status.containerStatuses}' | jq $ oc -n ${namespace} get pods/${podname} -o yaml | oc adm policy scc-subject-review -f - Checking what pods are behind a given service: $ oc -n ${namespace} get endpoints ${service} Some useful commands for debugging pods / node

OCP4 - cluster is shown as 'stale'

Openshift and OKD send telemetry back to redhat by default which allows insights and recommendations to be generated. Depending on the stability of your connection, the telemetry-client might not be able to phone back home and mark the cluster as 'stale', here's one way to fix it on openshift-4.12 / okd-4.12: Check if all nodes are available: $ oc get nodes -o wide $ oc adm top node Check the openshoft-monitoring namespace for any failed pods: $ oc -n openshift-monitoring get pods Next, check the telemetry-client logs: $ oc -n openshift-monitoring logs --tail=4 pods/telemeter-client-7fcf756fb9-5hvzv Output: level=error caller=forwarder.go:276 ts=2023-04-13T12:54:24.193722974Z component=forwarder/worker msg="unable to forward results" err="Post \"https://infogw.api.openshift.com/upload\": dial tcp: lookup infogw.api.openshift.com on 172.30.0.10:53: read udp 10.131.0.27:32903->172.30.0.10:53: i/o timeout" level=error caller=for