Configuring this pattern

This guide covers common customizations, such as changing the default large language model (LLM), adding new models, and configuring retrieval-augmented generation (RAG) data sources. This guide assumes that you have already completed the Getting started guide.

Configuration overview

ArgoCD manages this pattern by using GitOps. All application configurations are defined in the values-prod.yaml file. To customize a component, complete the following steps:

Enable an override: In the values-prod.yaml file, locate the application that you want to change, such as llm-inference-service, and add an extraValueFiles: entry that points to a new override file, such as $patternref/overrides/llm-inference-service.yaml.
Create the override file: Create the new .yaml file in the /overrides directory.
Add settings: Add the specific values that you want to change to the new file.
Commit and synchronize: Commit your changes and allow ArgoCD to synchronize the application.

Task: Changing the default LLM

By default, the pattern deploys the mistral-7b-instruct-v0.2.Q5_0.gguf model. You can change this to a different model, such as a different quantization, or adjust the resource usage. To change the default LLM, create an override file for the existing llm-inference-service application.

Enable the override: In the values-prod.yaml file, update the llm-inference-service application to use an override file:

clusterGroup:
  # ...
  applications:
    # ...
    llm-inference-service:
      name: llm-inference-service
      namespace: rag-llm-cpu
      chart: llm-inference-service
      chartVersion: 0.3.*
      extraValueFiles: # <-- ADD THIS BLOCK
        - $patternref/overrides/llm-inference-service.yaml

Create the override file: Create a new file named overrides/llm-inference-service.yaml. The following example switches to a different model file (Q8_0) and increases the CPU and memory requests:

inferenceService:
  resources: # <-- Increaed allocated resources
    requests:
      cpu: "8"
      memory: 12Gi
    limits:
      cpu: "12"
      memory: 24Gi

servingRuntime:
  args:
    - --model
    - /models/mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed model file

model:
  repository: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
  files:
    - mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed file to download

Task: Adding a second LLM

You can deploy an additional LLM and add it to the demonstration user interface (UI). The following example deploys the HuggingFace TGI runtime instead of llama.cpp. This process requires two steps: deploying the new LLM and configuring the frontend UI.

Step 1: Deploying the new LLM service

Define the new application: In the values-prod.yaml file, add a new application named another-llm-inference-service to the applications list.

clusterGroup:
  # ...
  applications:
    # ...
    another-llm-inference-service: # <-- ADD THIS NEW APPLICATION
      name: another-llm-inference-service
      namespace: rag-llm-cpu
      chart: llm-inference-service
      chartVersion: 0.3.*
      extraValueFiles:
        - $patternref/overrides/another-llm-inference-service.yaml

Create the override file: Create a new file named overrides/another-llm-inference-service.yaml. This file defines the new model and disables the creation of resources, such as secrets, that the first LLM already created.

dsc:
  initialize: false
externalSecret:
  create: false

# Define the new InferenceService
inferenceService:
  name: hf-inference-service # <-- New service name
  minReplicas: 1
  maxReplicas: 1
  resources:
    requests:
      cpu: "8"
      memory: 32Gi
    limits:
      cpu: "12"
      memory: 32Gi

# Define the new runtime (HuggingFace TGI)
servingRuntime:
  name: hf-runtime
  port: 8080
  image: docker.io/kserve/huggingfaceserver:latest
  modelFormat: huggingface
  args:
    - --model_dir
    - /models
    - --model_name
    - /models/Mistral-7B-Instruct-v0.3
    - --http_port
    - "8080"

# Define the new model to download
model:
  repository: mistralai/Mistral-7B-Instruct-v0.3
  files:
    - generation_config.json
    - config.json
    - model.safetensors.index.json
    - model-00001-of-00003.safetensors
    - model-00002-of-00003.safetensors
    - model-00003-of-00003.safetensors
    - tokenizer.model
    - tokenizer.json
    - tokenizer_config.json

IMPORTANT: A known issue in the model-downloading container requires that you explicitly list all files that you want to download from the HuggingFace repository. Ensure that you list every file required for the model to run.

Step 2: Adding the new LLM to the demonstration UI

Configure the frontend to recognize the new LLM.

Edit the frontend overrides: Open the overrides/rag-llm-frontend-values.yaml file.
Update LLM_URLS: Add the URL of the new service to the LLM_URLS environment variable. The URL uses the http://<service-name>-predictor/v1 format or http://<service-name>-predictor/openai/v1 for the HuggingFace runtime. In the overrides/rag-llm-frontend-values.yaml file:
```
env:
  # ...
  - name: LLM_URLS
    value: '["http://cpu-inference-service-predictor/v1","http://hf-inference-service-predictor/openai/v1"]'
```

Task: Customizing RAG data sources

By default, the pattern ingests data from the Validated Patterns documentation. You can change this to point to public Git repositories or web pages.

Edit the vector database overrides: Open the overrides/vector-db-values.yaml file.

Update sources: Modify the repoSources and webSources keys. You can add any publicly available Git repository or public web URL. The job also processes PDF files from webSources. In the overrides/vector-db-values.yaml file:

providers:
  qdrant:
    enabled: true
  mssql:
    enabled: true

vectorEmbedJob:
  repoSources:
    - repo: https://github.com/your-org/your-docs.git # <-- Your repo
      globs:
        - "**/*.md"
  webSources:
    - https://your-company.com/product-manual.pdf # <-- Your PDF
  chunking:
    size: 4096

Task: Adding a new RAG database provider

By default, the pattern enables qdrant and mssql. You can also enable redis, pgvector, or elastic. This process requires three steps: adding secrets, enabling the database, and configuring the UI.

Step 1: Updating the secrets file

If the new database requires credentials, add them to the main secrets file:
```
vim ~/values-secret-rag-llm-cpu.yaml
```

Add the necessary credentials. For example:

secrets:
  # ...
  - name: pgvector
    fields:
      - name: user
        value: user # <-- Update the user
      - name: password
        value: password # <-- Update the password
      - name: db
        value: db # <-- Update the db

NOTE: For information about the expected values, see the values-secret.yaml.template file.

Step 2: Enabling the provider in the vector database chart

Edit the overrides/vector-db-values.yaml file and set enabled: true for the providers that you want to add.

In the overrides/vector-db-values.yaml file:

providers:
  qdrant:
    enabled: true
  mssql:
    enabled: true
  pgvector: # <-- ADD THIS
    enabled: true
  elastic: # <-- OR THIS
    enabled: true

Step 3: Adding the provider to the demonstration UI

Edit the overrides/rag-llm-frontend-values.yaml file to configure the UI:

Add the secrets for the new provider to the dbProvidersSecret.vault list.
Add the connection details for the new provider to the dbProvidersSecret.providers list.

The following example shows the configuration for non-default RAG database providers:

In the overrides/rag-llm-frontend-values.yaml file:

dbProvidersSecret:
  vault:
    - key: mssql
      field: sapassword
    - key: pgvector # <-- Add this block
      field: user
    - key: pgvector
      field: password
    - key: pgvector
      field: db
    - key: elastic # <-- Add this block
      field: user
    - key: elastic
      field: password
  providers:
    - type: qdrant # <-- Example for Qdrant
      collection: docs
      url: http://qdrant-service:6333
      embedding_model: sentence-transformers/all-mpnet-base-v2
    - type: mssql # <-- Example for MSSQL
      table: docs
      connection_string: >-
        Driver={ODBC Driver 18 for SQL Server};
        Server=mssql-service,1433;
        Database=embeddings;
        UID=sa;
        PWD={{ .mssql_sapassword }};
        TrustServerCertificate=yes;
        Encrypt=no;        
      embedding_model: sentence-transformers/all-mpnet-base-v2
    - type: redis # <-- Example for Redis
      index: docs
      url: redis://redis-service:6379
      embedding_model: sentence-transformers/all-mpnet-base-v2
    - type: elastic # <-- Example for Elastic
      index: docs
      url: http://elastic-service:9200
      user: "{{ .elastic_user }}"
      password: "{{ .elastic_password }}"
      embedding_model: sentence-transformers/all-mpnet-base-v2
    - type: pgvector # <-- Example for PGVector
      collection: docs
      url: >-
        postgresql+psycopg://{{ .pgvector_user }}:{{ .pgvector_password }}@pgvector-service:5432/{{ .pgvector_db }}        
      embedding_model: sentence-transformers/all-mpnet-base-v2

Edit this page Open a documentation issue