I've recently run into a issue where the GPU Operator prevented the Machine Config Operator to apply Cluster Updates because of not being able to unload the Driver. In my case, the nodename was 'cl1gpu08.cluster.example.com' since it's going to be referenced in some commands. The fix was actually simple. First, disable the GPU Operator on the node: $ oc label node/cl1gpu08.cluster.prod.example.com nvidia.com/gpu.deploy.operands=false Next, make sure there are no NVIDIA GPU Operator Workloads running on that gpu: $ oc -n nvidia-gpu-operator get pods -o wide --field-selector spec.nodeName=cl1gpu08.cluster.prod.example.com If you're impatient, you can go ahead and remove the remaining pods as well as restart the machine-config-daemon. Once the node is back, set the label to 'true' so that the GPU Operator can be scheduled again on that node: $ oc label node/cl1gpu08.cluster.prod.example.com nvidia.com/gpu.deploy.operands- --overwrite Sources used: - ...
After upgrading to Foreman 3.14 / Katello 4.16 I've had issues syncing ansible collections and jobs failed with the following error: Error message: the server returns an error HTTP status code: 502 Response headers: {"date"=>"Sun, 20 Apr 2025 14:30:42 GMT", "server"=>"Apache", "content-length"=>"341", "content-type"=>"text/html; charset=iso-8859-1"} Response body: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>502 Proxy Error</title> </head><body> <h1>Proxy Error</h1> <p>The proxy server received an invalid response from an upstream server.<br /> The proxy server could not handle the request<p>Reason: <strong>Error reading from remote server</strong></p></p> </body></html> Also, there's a Foreman Discourse Threa...