global:
db:
type: "AZURESQL" # Options: AZURESQL, REDIS, EDB, ELASTIC
Deploying the RAG-LLM GitOps Pattern on Azure
Prerequisites
Before you start, ensure the following:
You are logged into an existing ARO cluster.
Your Azure subscription has sufficient quota for GPU instances (default:
Standard_NC8as_T4_v3
, requiring at least 8 CPUs).You’ve created a token on HuggingFace and accepted the terms of the model you’ll deploy. By default, the pattern uses the Mistral-7B-Instruct-v0.3-AWQ model.
Model and database defaults are defined in overrides/values-Azure.yaml . You can override them by editing this file. |
Database Options
The pattern defaults to using Azure SQL Server. Alternatively, you may deploy a local Redis, PostgreSQL, or Elasticsearch instance within your cluster.
To select your database type, edit overrides/values-Azure.yaml
:
Choosing Redis, PostgreSQL (EDB), or Elasticsearch (ELASTIC) will deploy local database instances. Ensure your cluster has sufficient resources available. |
Deploying Azure SQL Server (Optional)
Follow these steps if you plan to use Azure SQL Server:
Navigate to the Azure portal and create a new SQL Database server.
Select
Use SQL authentication
.Record your
Server name
,Server admin login
, andPassword
(these will be needed later).On the Networking tab, set
Allow Azure services and resources to access this server
toYes
.Click Review + create, and then Create.
Wait until the server status shows as active before proceeding.
Creating Required Secrets
Before installation, create a secrets YAML file at ~/values-secret-rag-llm-gitops.yaml
. Populate it as follows:
version: "2.0"
secrets:
- name: hfmodel
fields:
- name: hftoken
value: hf_your_huggingface_token
- name: azuresql
fields:
- name: user
value: adminuser
- name: password
value: your_password
- name: server
value: yourservername.database.windows.net
Replace these placeholders with your actual credentials:
hftoken
: Your HuggingFace token (you must accept the model’s terms).user
: Azure SQL server admin username.password
: Azure SQL admin password.server
: Fully qualified Azure SQL server name.
If you’re not using Azure SQL Server, omit the entire azuresql section. |
Creating GPU Nodes (MachineSet)
Your cluster requires GPU nodes with a specific taint to host the vLLM inference service:
- key: odh-notebook
value: "true"
effect: NoSchedule
Creating GPU Nodes Automatically
If no GPU nodes exist, run this command to provision one default GPU node:
./pattern.sh make create-gpu-machineset-azure
This creates a single Standard_NC8as_T4_v3
GPU node.
Customizing GPU Node Creation
To control GPU node specifics, provide additional parameters:
./pattern.sh make create-gpu-machineset-azure GPU_REPLICAS=3 OVERRIDE_ZONE=2 GPU_VM_SIZE=Standard_NC16as_T4_v3
Parameters available:
GPU_REPLICAS
: Number of GPU nodes to provision.OVERRIDE_ZONE
: Availability zone (optional).GPU_VM_SIZE
: Azure VM SKU for GPU nodes.
The script automatically applies the required taint. The Nvidia Operator installed by the pattern will handle CUDA driver installation on GPU nodes.
Installing the Pattern
Ensure you’ve completed the following steps:
Logged into your ARO cluster.
Created your database (Azure SQL Server) if applicable.
Prepared the secrets YAML file (
~/values-secret-rag-llm-gitops.yaml
).Provisioned GPU nodes with the required taint.
Finally, install the pattern by running:
./pattern.sh make install
Your RAG-LLM GitOps Validated Pattern will now deploy to your Azure Red Hat OpenShift cluster.