Configuring this pattern
This guide covers common customizations, such as changing the default large language model (LLM), adding new models, and configuring retrieval-augmented generation (RAG) data sources. This guide assumes that you have already completed the Getting started guide.
Configuration overview
ArgoCD manages this pattern by using GitOps. All application configurations are defined in the values-prod.yaml file. To customize a component, complete the following steps:
- Enable an override: In the
values-prod.yamlfile, locate the application that you want to change, such asllm-inference-service, and add anextraValueFiles:entry that points to a new override file, such as$patternref/overrides/llm-inference-service.yaml. - Create the override file: Create the new
.yamlfile in the/overridesdirectory. - Add settings: Add the specific values that you want to change to the new file.
- Commit and synchronize: Commit your changes and allow ArgoCD to synchronize the application.
Task: Changing the default LLM
By default, the pattern deploys the mistral-7b-instruct-v0.2.Q5_0.gguf model. You can change this to a different model, such as a different quantization, or adjust the resource usage. To change the default LLM, create an override file for the existing llm-inference-service application.
Enable the override: In the
values-prod.yamlfile, update thellm-inference-serviceapplication to use an override file:clusterGroup: # ... applications: # ... llm-inference-service: name: llm-inference-service namespace: rag-llm-cpu chart: llm-inference-service chartVersion: 0.3.* extraValueFiles: # <-- ADD THIS BLOCK - $patternref/overrides/llm-inference-service.yamlCreate the override file: Create a new file named
overrides/llm-inference-service.yaml. The following example switches to a different model file (Q8_0) and increases the CPU and memory requests:inferenceService: resources: # <-- Increaed allocated resources requests: cpu: "8" memory: 12Gi limits: cpu: "12" memory: 24Gi servingRuntime: args: - --model - /models/mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed model file model: repository: TheBloke/Mistral-7B-Instruct-v0.2-GGUF files: - mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed file to download
Task: Adding a second LLM
You can deploy an additional LLM and add it to the demonstration user interface (UI). The following example deploys the HuggingFace TGI runtime instead of llama.cpp. This process requires two steps: deploying the new LLM and configuring the frontend UI.
Step 1: Deploying the new LLM service
Define the new application: In the
values-prod.yamlfile, add a new application namedanother-llm-inference-serviceto the applications list.clusterGroup: # ... applications: # ... another-llm-inference-service: # <-- ADD THIS NEW APPLICATION name: another-llm-inference-service namespace: rag-llm-cpu chart: llm-inference-service chartVersion: 0.3.* extraValueFiles: - $patternref/overrides/another-llm-inference-service.yamlCreate the override file: Create a new file named
overrides/another-llm-inference-service.yaml. This file defines the new model and disables the creation of resources, such as secrets, that the first LLM already created.dsc: initialize: false externalSecret: create: false # Define the new InferenceService inferenceService: name: hf-inference-service # <-- New service name minReplicas: 1 maxReplicas: 1 resources: requests: cpu: "8" memory: 32Gi limits: cpu: "12" memory: 32Gi # Define the new runtime (HuggingFace TGI) servingRuntime: name: hf-runtime port: 8080 image: docker.io/kserve/huggingfaceserver:latest modelFormat: huggingface args: - --model_dir - /models - --model_name - /models/Mistral-7B-Instruct-v0.3 - --http_port - "8080" # Define the new model to download model: repository: mistralai/Mistral-7B-Instruct-v0.3 files: - generation_config.json - config.json - model.safetensors.index.json - model-00001-of-00003.safetensors - model-00002-of-00003.safetensors - model-00003-of-00003.safetensors - tokenizer.model - tokenizer.json - tokenizer_config.jsonIMPORTANT: A known issue in the model-downloading container requires that you explicitly list all files that you want to download from the HuggingFace repository. Ensure that you list every file required for the model to run.
Step 2: Adding the new LLM to the demonstration UI
Configure the frontend to recognize the new LLM.
Edit the frontend overrides: Open the
overrides/rag-llm-frontend-values.yamlfile.Update LLM_URLS: Add the URL of the new service to the
LLM_URLSenvironment variable. The URL uses thehttp://<service-name>-predictor/v1format orhttp://<service-name>-predictor/openai/v1for the HuggingFace runtime. In theoverrides/rag-llm-frontend-values.yamlfile:env: # ... - name: LLM_URLS value: '["http://cpu-inference-service-predictor/v1","http://hf-inference-service-predictor/openai/v1"]'
Task: Customizing RAG data sources
By default, the pattern ingests data from the Validated Patterns documentation. You can change this to point to public Git repositories or web pages.
Edit the vector database overrides: Open the
overrides/vector-db-values.yamlfile.Update sources: Modify the
repoSourcesandwebSourceskeys. You can add any publicly available Git repository or public web URL. The job also processes PDF files fromwebSources. In theoverrides/vector-db-values.yamlfile:providers: qdrant: enabled: true mssql: enabled: true vectorEmbedJob: repoSources: - repo: https://github.com/your-org/your-docs.git # <-- Your repo globs: - "**/*.md" webSources: - https://your-company.com/product-manual.pdf # <-- Your PDF chunking: size: 4096
Task: Adding a new RAG database provider
By default, the pattern enables qdrant and mssql. You can also enable redis, pgvector, or elastic. This process requires three steps: adding secrets, enabling the database, and configuring the UI.
Step 1: Updating the secrets file
If the new database requires credentials, add them to the main secrets file:
vim ~/values-secret-rag-llm-cpu.yamlAdd the necessary credentials. For example:
secrets: # ... - name: pgvector fields: - name: user value: user # <-- Update the user - name: password value: password # <-- Update the password - name: db value: db # <-- Update the db
NOTE: For information about the expected values, see the
values-secret.yaml.templatefile.
Step 2: Enabling the provider in the vector database chart
Edit the overrides/vector-db-values.yaml file and set enabled: true for the providers that you want to add.
In the overrides/vector-db-values.yaml file:
providers:
qdrant:
enabled: true
mssql:
enabled: true
pgvector: # <-- ADD THIS
enabled: true
elastic: # <-- OR THIS
enabled: true
Step 3: Adding the provider to the demonstration UI
Edit the overrides/rag-llm-frontend-values.yaml file to configure the UI:
- Add the secrets for the new provider to the
dbProvidersSecret.vaultlist. - Add the connection details for the new provider to the
dbProvidersSecret.providerslist.
The following example shows the configuration for non-default RAG database providers:
In the overrides/rag-llm-frontend-values.yaml file:
dbProvidersSecret:
vault:
- key: mssql
field: sapassword
- key: pgvector # <-- Add this block
field: user
- key: pgvector
field: password
- key: pgvector
field: db
- key: elastic # <-- Add this block
field: user
- key: elastic
field: password
providers:
- type: qdrant # <-- Example for Qdrant
collection: docs
url: http://qdrant-service:6333
embedding_model: sentence-transformers/all-mpnet-base-v2
- type: mssql # <-- Example for MSSQL
table: docs
connection_string: >-
Driver={ODBC Driver 18 for SQL Server};
Server=mssql-service,1433;
Database=embeddings;
UID=sa;
PWD={{ .mssql_sapassword }};
TrustServerCertificate=yes;
Encrypt=no;
embedding_model: sentence-transformers/all-mpnet-base-v2
- type: redis # <-- Example for Redis
index: docs
url: redis://redis-service:6379
embedding_model: sentence-transformers/all-mpnet-base-v2
- type: elastic # <-- Example for Elastic
index: docs
url: http://elastic-service:9200
user: "{{ .elastic_user }}"
password: "{{ .elastic_password }}"
embedding_model: sentence-transformers/all-mpnet-base-v2
- type: pgvector # <-- Example for PGVector
collection: docs
url: >-
postgresql+psycopg://{{ .pgvector_user }}:{{ .pgvector_password }}@pgvector-service:5432/{{ .pgvector_db }}
embedding_model: sentence-transformers/all-mpnet-base-v2
