Validated Patterns

Configuring this pattern

This guide covers common customizations, such as changing the default large language model (LLM), adding new models, and configuring retrieval-augmented generation (RAG) data sources. This guide assumes that you have already completed the Getting started guide.

Configuration overview

ArgoCD manages this pattern by using GitOps. All application configurations are defined in the values-prod.yaml file. To customize a component, complete the following steps:

  1. Enable an override: In the values-prod.yaml file, locate the application that you want to change, such as llm-inference-service, and add an extraValueFiles: entry that points to a new override file, such as $patternref/overrides/llm-inference-service.yaml.
  2. Create the override file: Create the new .yaml file in the /overrides directory.
  3. Add settings: Add the specific values that you want to change to the new file.
  4. Commit and synchronize: Commit your changes and allow ArgoCD to synchronize the application.

Task: Changing the default LLM

By default, the pattern deploys the mistral-7b-instruct-v0.2.Q5_0.gguf model. You can change this to a different model, such as a different quantization, or adjust the resource usage. To change the default LLM, create an override file for the existing llm-inference-service application.

  1. Enable the override: In the values-prod.yaml file, update the llm-inference-service application to use an override file:

    clusterGroup:
      # ...
      applications:
        # ...
        llm-inference-service:
          name: llm-inference-service
          namespace: rag-llm-cpu
          chart: llm-inference-service
          chartVersion: 0.3.*
          extraValueFiles: # <-- ADD THIS BLOCK
            - $patternref/overrides/llm-inference-service.yaml
    
  2. Create the override file: Create a new file named overrides/llm-inference-service.yaml. The following example switches to a different model file (Q8_0) and increases the CPU and memory requests:

    inferenceService:
      resources: # <-- Increaed allocated resources
        requests:
          cpu: "8"
          memory: 12Gi
        limits:
          cpu: "12"
          memory: 24Gi
    
    servingRuntime:
      args:
        - --model
        - /models/mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed model file
    
    model:
      repository: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
      files:
        - mistral-7b-instruct-v0.2.Q8_0.gguf # <-- Changed file to download
    

Task: Adding a second LLM

You can deploy an additional LLM and add it to the demonstration user interface (UI). The following example deploys the HuggingFace TGI runtime instead of llama.cpp. This process requires two steps: deploying the new LLM and configuring the frontend UI.

Step 1: Deploying the new LLM service

  1. Define the new application: In the values-prod.yaml file, add a new application named another-llm-inference-service to the applications list.

    clusterGroup:
      # ...
      applications:
        # ...
        another-llm-inference-service: # <-- ADD THIS NEW APPLICATION
          name: another-llm-inference-service
          namespace: rag-llm-cpu
          chart: llm-inference-service
          chartVersion: 0.3.*
          extraValueFiles:
            - $patternref/overrides/another-llm-inference-service.yaml
    
  2. Create the override file: Create a new file named overrides/another-llm-inference-service.yaml. This file defines the new model and disables the creation of resources, such as secrets, that the first LLM already created.

    dsc:
      initialize: false
    externalSecret:
      create: false
    
    # Define the new InferenceService
    inferenceService:
      name: hf-inference-service # <-- New service name
      minReplicas: 1
      maxReplicas: 1
      resources:
        requests:
          cpu: "8"
          memory: 32Gi
        limits:
          cpu: "12"
          memory: 32Gi
    
    # Define the new runtime (HuggingFace TGI)
    servingRuntime:
      name: hf-runtime
      port: 8080
      image: docker.io/kserve/huggingfaceserver:latest
      modelFormat: huggingface
      args:
        - --model_dir
        - /models
        - --model_name
        - /models/Mistral-7B-Instruct-v0.3
        - --http_port
        - "8080"
    
    # Define the new model to download
    model:
      repository: mistralai/Mistral-7B-Instruct-v0.3
      files:
        - generation_config.json
        - config.json
        - model.safetensors.index.json
        - model-00001-of-00003.safetensors
        - model-00002-of-00003.safetensors
        - model-00003-of-00003.safetensors
        - tokenizer.model
        - tokenizer.json
        - tokenizer_config.json
    

    IMPORTANT: A known issue in the model-downloading container requires that you explicitly list all files that you want to download from the HuggingFace repository. Ensure that you list every file required for the model to run.

Step 2: Adding the new LLM to the demonstration UI

Configure the frontend to recognize the new LLM.

  1. Edit the frontend overrides: Open the overrides/rag-llm-frontend-values.yaml file.

  2. Update LLM_URLS: Add the URL of the new service to the LLM_URLS environment variable. The URL uses the http://<service-name>-predictor/v1 format or http://<service-name>-predictor/openai/v1 for the HuggingFace runtime. In the overrides/rag-llm-frontend-values.yaml file:

    env:
      # ...
      - name: LLM_URLS
        value: '["http://cpu-inference-service-predictor/v1","http://hf-inference-service-predictor/openai/v1"]'
    

Task: Customizing RAG data sources

By default, the pattern ingests data from the Validated Patterns documentation. You can change this to point to public Git repositories or web pages.

  1. Edit the vector database overrides: Open the overrides/vector-db-values.yaml file.

  2. Update sources: Modify the repoSources and webSources keys. You can add any publicly available Git repository or public web URL. The job also processes PDF files from webSources. In the overrides/vector-db-values.yaml file:

    providers:
      qdrant:
        enabled: true
      mssql:
        enabled: true
    
    vectorEmbedJob:
      repoSources:
        - repo: https://github.com/your-org/your-docs.git # <-- Your repo
          globs:
            - "**/*.md"
      webSources:
        - https://your-company.com/product-manual.pdf # <-- Your PDF
      chunking:
        size: 4096
    

Task: Adding a new RAG database provider

By default, the pattern enables qdrant and mssql. You can also enable redis, pgvector, or elastic. This process requires three steps: adding secrets, enabling the database, and configuring the UI.

Step 1: Updating the secrets file

  1. If the new database requires credentials, add them to the main secrets file:

    vim ~/values-secret-rag-llm-cpu.yaml
    
  2. Add the necessary credentials. For example:

    secrets:
      # ...
      - name: pgvector
        fields:
          - name: user
            value: user # <-- Update the user
          - name: password
            value: password # <-- Update the password
          - name: db
            value: db # <-- Update the db
    

NOTE: For information about the expected values, see the values-secret.yaml.template file.

Step 2: Enabling the provider in the vector database chart

Edit the overrides/vector-db-values.yaml file and set enabled: true for the providers that you want to add.

In the overrides/vector-db-values.yaml file:

providers:
  qdrant:
    enabled: true
  mssql:
    enabled: true
  pgvector: # <-- ADD THIS
    enabled: true
  elastic: # <-- OR THIS
    enabled: true

Step 3: Adding the provider to the demonstration UI

Edit the overrides/rag-llm-frontend-values.yaml file to configure the UI:

  1. Add the secrets for the new provider to the dbProvidersSecret.vault list.
  2. Add the connection details for the new provider to the dbProvidersSecret.providers list.

The following example shows the configuration for non-default RAG database providers:

In the overrides/rag-llm-frontend-values.yaml file:

dbProvidersSecret:
  vault:
    - key: mssql
      field: sapassword
    - key: pgvector # <-- Add this block
      field: user
    - key: pgvector
      field: password
    - key: pgvector
      field: db
    - key: elastic # <-- Add this block
      field: user
    - key: elastic
      field: password
  providers:
    - type: qdrant # <-- Example for Qdrant
      collection: docs
      url: http://qdrant-service:6333
      embedding_model: sentence-transformers/all-mpnet-base-v2
    - type: mssql # <-- Example for MSSQL
      table: docs
      connection_string: >-
        Driver={ODBC Driver 18 for SQL Server};
        Server=mssql-service,1433;
        Database=embeddings;
        UID=sa;
        PWD={{ .mssql_sapassword }};
        TrustServerCertificate=yes;
        Encrypt=no;        
      embedding_model: sentence-transformers/all-mpnet-base-v2
    - type: redis # <-- Example for Redis
      index: docs
      url: redis://redis-service:6379
      embedding_model: sentence-transformers/all-mpnet-base-v2
    - type: elastic # <-- Example for Elastic
      index: docs
      url: http://elastic-service:9200
      user: "{{ .elastic_user }}"
      password: "{{ .elastic_password }}"
      embedding_model: sentence-transformers/all-mpnet-base-v2
    - type: pgvector # <-- Example for PGVector
      collection: docs
      url: >-
        postgresql+psycopg://{{ .pgvector_user }}:{{ .pgvector_password }}@pgvector-service:5432/{{ .pgvector_db }}        
      embedding_model: sentence-transformers/all-mpnet-base-v2