Customizing this pattern

Customizing the MaaS Code Assistant AI Quickstart pattern

This pattern deploys an AI code assistant with tiered user access, rate limiting, and NVIDIA Nemotron model serving. You can customize the models, rate limit policies, user tiers, and IDE configuration.

Changing models

The pattern serves two models by default:

nemotron-3-nano-30b-a3b-fp8 — Available to premium and enterprise tier users.
gpt-oss-20b — Available to all user tiers.

To change or add models, edit the models list in overrides/maas-quickstart.yaml. The pattern pulls models from OCI registries and does not require a HuggingFace API token.

The model definitions specify the model URI, resource requirements, GPU tolerations, and vLLM arguments. For example:

models:
  - name: gpt-oss-20b
    displayName: OpenAI gpt-oss-20b
    uri: oci://registry.redhat.io/rhelai1/modelcar-gpt-oss-20b:1.5
    resources:
      limits:
        cpu: "4"
        memory: 24Gi
        nvidia.com/gpu: "1"
      requests:
        cpu: "2"
        memory: 16Gi
        nvidia.com/gpu: "1"
    extraArgs:
      - --enable-force-include-usage
    tolerations:
      - effect: NoSchedule
        key: nvidia.com/gpu
        operator: Exists

Each model requires a GPU with at least 48 GB of VRAM. Adding models beyond the default two requires additional GPU nodes.

Adjusting rate limits and user tiers

The pattern uses Kuadrant (Red Hat Connectivity Link) to enforce per-tier rate limits on inference requests. The default tiers and limits are:

Tier	Rate limit	Description
Free	5 requests per 2 minutes	Basic access for evaluation
Premium	20 requests per 2 minutes	Standard production usage
Enterprise	50 requests per 2 minutes	High-throughput workloads

Tier

Rate limit

Description

Free

5 requests per 2 minutes

Basic access for evaluation

Premium

20 requests per 2 minutes

Standard production usage

Enterprise

50 requests per 2 minutes

High-throughput workloads

To adjust rate limits, modify the tiers section in overrides/maas-quickstart.yaml. The following example increases the premium tier request limit to 40 and the token limit to 20000:

tiers:
  premium:
    users:
      - premium-user
    requestRates:
      - limit: 40
        window: 2m
    tokenRates:
      - limit: 20000
        window: 1m

Push your changes to your forked repository so the GitOps framework applies the updated configuration.

Managing users

htpasswd with OpenShift OAuth handles user authentication. The default users are:

admin — Full administrative access (enterprise tier)
free-user — Free tier access
premium-user — Premium tier access
enterprise-user — Enterprise tier access

HashiCorp Vault and the External Secrets Operator store and manage user passwords in the values-secret.yaml file. To change a user password after initial deployment, update the secret value in your values-secret.yaml file and redeploy the pattern.

To assign users to different tiers, modify the tiers section in overrides/maas-quickstart.yaml:

tiers:
  free:
    users:
      - free-user
  premium:
    users:
      - premium-user
      - user1
  enterprise:
    users:
      - admin
      - enterprise-user

Configuring OpenShift DevSpaces

The pattern integrates the Continue AI extension in OpenShift DevSpaces to provide code assistance directly in the IDE. DevSpaces is preconfigured to clone the AI Quickstart repository and connect to the vLLM inference endpoints.

To customize the DevSpaces configuration, you can adjust:

Default IDE settings and extensions
Resource limits for developer workspaces
The inference endpoint URL used by the Continue extension

Provisioning GPU nodes

This pattern requires at least 2 NVIDIA GPU nodes with 48 GB or more of VRAM each. On AWS, the pattern automatically provisions g6e.2xlarge GPU machine sets with NVIDIA L40S GPUs.

If your cluster does not have GPU nodes, you must add them before you deploy the pattern. The pattern installs all required operators, including the NVIDIA GPU Operator, automatically during deployment.

Edit this page Open a documentation issue