AI Generation with LLM and RAG

Validation status:

Tested

CI status:

Links:

Document generation demo with LLM and RAG

Introduction

This deployment is based on the validated pattern framework, using GitOps for seamless provisioning of all operators and applications. It deploys a Chatbot application that harnesses the power of Large Language Models (LLMs) combined with the Retrieval-Augmented Generation (RAG) framework.

The pattern uses the Red Hat OpenShift AI to deploy and serve LLM models at scale.

The application uses either the EDB Postgres for Kubernetes operator (default), or Redis, to store embeddings of Red Hat product documentation, running on Red Hat OpenShift Container Platform to generate project proposals for specific Red Hat products.

Demo Description & Architecture

The goal of this demo is to showcase a Chatbot LLM application augmented with data from Red Hat product documentation running on Red Hat OpenShift AI. It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. The application generates a project proposal for a Red Hat product.

Key Features

Leveraging Red Hat OpenShift AI to deploy and serve LLM models powered by NVIDIA GPU accelerator.
LLM Application augmented with content from Red Hat product documentation.
Multiple LLM providers (OpenAI, Hugging Face, NVIDIA).
Vector Database, such as EDB Postgres for Kubernetes, or Redis, to store embeddings of Red Hat product documentation.
Monitoring dashboard to provide key metrics such as ratings.
GitOps setup to deploy e2e demo (frontend / vector database / served models).

RAG Demo Workflow

Overview of workflow

Figure 3. Schematic diagram for workflow of RAG demo with Red Hat OpenShift.

RAG Data Ingestion

ingestion

Figure 4. Schematic diagram for Ingestion of data for RAG.

RAG Augmented Query

query

Figure 5. Schematic diagram for RAG demo augmented query.

In Figure 5, we can see RAG augmented query. The Mistral-7B model is used for language processing. LangChain is used to integrate different tools of the LLM-based application together and to process the PDF files and web pages. A vector database provider such as EDB Postgres for Kubernetes (or Redis), is used to store vectors. HuggingFace TGI is used to serve the Mistral-7B model. Gradio is used for user interface and object storage to store language model and other datasets. Solution components are deployed as microservices in the Red Hat OpenShift Container Platform cluster.

Download diagrams

View and download all of the diagrams above in our open source tooling site.

Open Diagrams

Diagram

Figure 6. Proposed demo architecture with OpenShift AI

Components deployed

Hugging Face Text Generation Inference Server: The pattern deploys a Hugging Face TGIS server. The server deploys mistral-community/Mistral-7B-v0.2 model. The server will require a GPU node.
EDB Postgres for Kubernetes / Redis Server: A Vector Database server is deployed to store vector embeddings created from Red Hat product documentation.
Populate VectorDb Job: The job creates the embeddings and populates the vector database.
LLM Application: This is a Chatbot application that can generate a project proposal by augmenting the LLM with the Red Hat product documentation stored in vector db.
Prometheus: Deploys a prometheus instance to store the various metrics from the LLM application and TGIS server.
Grafana: Deploys Grafana application to visualize the metrics.

Overview

Figure 1. Overview of the validated pattern for RAG Demo with Red Hat OpenShift

Logical

Figure 2. Logical diagram of the RAG Demo with Red Hat OpenShift.