Troubleshooting

Troubleshooting the MaaS Code Assistant AI Quickstart pattern

Use this page to diagnose and resolve common issues when deploying or operating this pattern.

Prerequisite and tooling issues

Podman version not supported

The pattern.sh script requires Podman 4.3.0 or later. Earlier versions do not support the --userns=keep-id flag required for correct UID/GID mapping inside the container.

Symptom

The script exits with an error referencing the Podman version or keep-id.

Resolution

Check your Podman version:
```
$ podman --version
```
If the version is earlier than 4.3.0, upgrade Podman. For instructions, see the Podman installation documentation.

KUBECONFIG path is outside the HOME directory

The pattern.sh script runs inside a container and mounts your $HOME directory. If your KUBECONFIG file is located outside $HOME, the container cannot access it.

Symptom

The script fails to connect to the cluster or reports that the kubeconfig file cannot be found.

Resolution

Move your kubeconfig file to a path inside your home directory and export the updated path:

$ cp <current-kubeconfig-path> ~/kubeconfig
$ export KUBECONFIG=~/kubeconfig

Deployment issues

ArgoCD applications are not syncing or are unhealthy

After running ./pattern.sh make install, ArgoCD applications can take 15–30 minutes to reach a healthy state. Model downloads and GPU operator initialization take additional time.

Symptom

Running ./pattern.sh make argo-healthcheck reports applications in Progressing or Degraded state.

Resolution

Check which applications are not healthy:

$ oc get applications -n openshift-gitops

Inspect the failing application for error details:

$ oc describe application <application-name> -n openshift-gitops

Check the logs of the ArgoCD application controller:

$ oc logs -n openshift-gitops deployment/openshift-gitops-application-controller

If applications are stuck in Progressing, wait an additional 10 minutes and re-run the health check. Model downloads from OCI registries can take significant time depending on network conditions.

Values file schema validation fails

The pattern validates values-*.yaml files against a schema before deployment.

Symptom

Running ./pattern.sh make install fails with a schema validation error.

Resolution

Run the validation step independently to see the full error output:
```
$ ./pattern.sh make validate-schema
```
Review the error message to identify the malformed field and correct the value in your values-secret.yaml or overrides/maas-quickstart.yaml file.

GPU and inference issues

GPU nodes are not ready

The NVIDIA GPU Operator must successfully initialize on each GPU node before model serving can start.

Symptom

Inference service pods remain in Pending state, or oc get inferenceservice -A shows services not ready.

Resolution

Check the status of GPU nodes:

$ oc get nodes -l nvidia.com/gpu.present=true

Check the NVIDIA GPU Operator pods:
```
$ oc get pods -n nvidia-gpu-operator
```

Check for driver initialization errors:

$ oc logs -n nvidia-gpu-operator -l app=nvidia-driver-daemonset

If you are using a provider other than AWS, confirm that GPU nodes were present in the cluster before you deployed the pattern. The pattern does not provision GPU nodes on providers other than AWS.

Inference endpoints are not serving

Symptom

oc get inferenceservice -A shows inference services in a non-ready state, or the Continue AI extension in DevSpaces returns connection errors.

Resolution

Check the status of inference services:
```
$ oc get inferenceservice -A
```

Check the vLLM model server pod logs for a specific model:

$ oc logs -n redhat-ods-applications -l serving.kserve.io/inferenceservice=<model-name>

Confirm that the GPU nodes have sufficient available VRAM. Each model requires a GPU with at least 48 GB of VRAM. If both models are scheduled on the same node, the node requires at least 96 GB of VRAM or you must use two separate GPU nodes.

Rate limiting and authentication issues

Rate limiting is not enforced

Symptom

Requests from all users succeed regardless of the configured rate limits, or requests are blocked for all users.

Resolution

Check the status of the Kuadrant operator and Limitador pod:
```
$ oc get pods -n kuadrant-system
```

Check the Limitador logs for policy errors:

$ oc logs -n kuadrant-system deployment/limitador

Confirm that rate limit policies are applied correctly:
```
$ oc get ratelimitpolicy -A
```

Users cannot authenticate

Symptom

Users receive authentication errors when accessing the inference API or DevSpaces.

Resolution

Confirm that the htpasswd secret was correctly provisioned by the External Secrets Operator:
```
$ oc get externalsecret -A
$ oc get secret htpasswd-secret -n openshift-config
```
If the secret is missing or incorrect, verify that your values-secret.yaml file contains the correct passwords for all four users (admin, free-user, premium-user, enterprise-user) and redeploy the pattern.

OpenShift DevSpaces issues

Continue AI extension cannot connect to inference endpoints

Symptom

Code suggestions are not returned in DevSpaces, or the Continue extension reports a connection error.

Resolution

Confirm that the inference services are healthy:
```
$ oc get inferenceservice -A
```
Navigate to Networking → Routes in the namespace where the inference services are running and confirm the routes are accessible.
In DevSpaces, open the Continue extension settings and verify that the endpoint URL matches the route URL for the vLLM service.

Getting help

If you cannot resolve an issue using this guide:

Check the GitHub issues for known problems and workarounds.
Open a new issue with the output of the following command to help diagnose the problem:
```
$ oc get pods -A | grep -v Running | grep -v Completed
```

Edit this page Open a documentation issue