Validated Patterns

Troubleshooting the Lemonade Stand AI Quickstart pattern

Prerequisite and tooling issues

Podman version not supported

The pattern.sh script requires Podman 4.3.0 or later. Earlier versions do not support the --userns=keep-id flag required for correct UID/GID mapping inside the container.

Symptom

The script exits with an error referencing the Podman version or keep-id.

Resolution
  1. Check your Podman version:

    $ podman --version
  2. If the version is earlier than 4.3.0, upgrade Podman. For instructions, see the Podman installation documentation.

KUBECONFIG path is outside the HOME directory

The pattern.sh script runs inside a container and mounts your $HOME directory. If your KUBECONFIG file is located outside $HOME, the container cannot access it.

Symptom

The script fails to connect to the cluster or reports that the kubeconfig file cannot be found.

Resolution

Move your kubeconfig file to a path inside your home directory and export the updated path:

$ cp <current-kubeconfig-path> ~/kubeconfig
$ export KUBECONFIG=~/kubeconfig

Deployment issues

ArgoCD applications are not syncing or are unhealthy

After running ./pattern.sh make install, ArgoCD applications can take 15–30 minutes to reach a healthy state. Model downloads and GPU operator initialization take additional time.

Symptom

Running ./pattern.sh make argo-healthcheck reports applications in Progressing or Degraded state.

Resolution
  1. Check which applications are not healthy:

    $ oc get applications -n openshift-gitops
  2. Inspect the failing application for error details:

    $ oc describe application <application-name> -n openshift-gitops
  3. Check the logs of the ArgoCD application controller:

    $ oc logs -n openshift-gitops deployment/openshift-gitops-application-controller
  4. If applications are stuck in Progressing, wait an additional 10 minutes and re-run the health check. Detector model downloads from Hugging Face through MinIO and GPU operator initialization can take significant time.

GPU and inference issues

GPU nodes are not ready

The NVIDIA GPU Operator must successfully initialize on the GPU node before model serving can start.

Symptom

The vLLM inference service pod remains in Pending state, or oc get inferenceservice -A shows the service not ready.

Resolution
  1. Check the status of GPU nodes:

    $ oc get nodes -l nvidia.com/gpu.present=true
  2. Check the NVIDIA GPU Operator pods:

    $ oc get pods -n nvidia-gpu-operator
  3. Check for driver initialization errors:

    $ oc logs -n nvidia-gpu-operator -l app=nvidia-driver-daemonset
  4. If you are using a provider other than AWS, confirm that a GPU node was present in the cluster before you deployed the pattern. The pattern does not provision GPU nodes on providers other than AWS.

Inference endpoint is not serving

Symptom

oc get inferenceservice -A shows the inference service in a non-ready state, or the chatbot returns connection errors.

Resolution
  1. Check the status of the inference service:

    $ oc get inferenceservice -A
  2. Check the vLLM model server pod logs:

    $ oc logs -n lemonade-stand -l serving.kserve.io/inferenceservice=llm-service
  3. Confirm that the GPU node has sufficient available VRAM. The Llama 3.2 3B Instruct model requires a GPU with at least 24 GB of VRAM.

Guardrails orchestrator issues

Guardrails Orchestrator pod is not ready

All detector models must be available and healthy before the Guardrails Orchestrator can serve requests.

Symptom

The orchestrator pod is in CrashLoopBackOff or Error state, or the chatbot returns 503 errors.

Resolution
  1. Check the status of all pods in the lemonade-stand namespace:

    $ oc get pods -n lemonade-stand
  2. Check the orchestrator pod logs for detector connection errors:

    $ oc logs -n lemonade-stand -l app=guardrails-orchestrator
  3. Verify that all detector services are running:

    $ oc get inferenceservice -n lemonade-stand
  4. If detector models are not ready, check that MinIO has successfully downloaded the model artifacts from Hugging Face:

    $ oc logs -n lemonade-stand -l app=minio

Guardrails are blocking all requests

Symptom

Every user query is blocked by the guardrails, even when the content appears safe and in English.

Resolution
  1. Check the R Shiny dashboard to identify which detector is triggering. Navigate to Networking → Routes in the lemonade-stand namespace and open the dashboard route.

  2. If the Lingua detector is blocking English text, the language confidence threshold may be too high. Review the Lingua threshold in the fms-orchestr8-config-nlp ConfigMap.

  3. If the HAP or prompt injection detector is triggering on safe content, their detection thresholds may be too aggressive. See Configuring detector thresholds.

Application issues

Lemonade Stand chatbot UI is not accessible

Symptom

The chatbot UI route returns a 503 or connection error.

Resolution
  1. Check that the lemonade-stand pod is running:

    $ oc get pods -n lemonade-stand -l app=lemonade-stand
  2. Check the application logs for startup errors:

    $ oc logs -n lemonade-stand -l app=lemonade-stand
  3. Verify the route is correctly configured:

    $ oc get routes -n lemonade-stand

R Shiny dashboard shows no data

Symptom

The dashboard loads but shows zero values for all metrics, or displays errors.

Resolution
  1. Confirm that the lemonade-stand application is running and the /metrics endpoint is accessible:

    $ oc exec -n lemonade-stand deployment/shiny-dashboard -- curl -s http://lemonade-stand:8080/metrics
  2. Check the Shiny dashboard pod logs:

    $ oc logs -n lemonade-stand -l app=shiny-dashboard
  3. Verify that the shinyDashboard.metrics.url in the Helm chart values points to the correct metrics endpoint.

Getting help

If you cannot resolve an issue using this guide:

  • Check the GitHub issues for known problems and workarounds.

  • Open a new issue with the output of the following command to help diagnose the problem:

    $ oc get pods -A | grep -v Running | grep -v Completed