RAG-LLM pattern on Microsoft Azure

Validation status:

Tested

Links:

About the RAG-LLM pattern on Microsoft Azure on Microsoft Azure

The RAG-LLM GitOps Pattern offers a robust and scalable solution for deploying LLM-based applications with integrated retrieval capabilities on Microsoft Azure. By embracing GitOps principles, this pattern ensures automated, consistent, and auditable deployments. It streamlines the setup of complex LLM architectures, allowing users to focus on application development rather than intricate infrastructure provisioning.

Solution elements and technologies

The RAG-LLM pattern on Microsoft Azure leverages the following key technologies and components:

Red Hat OpenShift Container Platform on Microsoft Azure: The foundation for container orchestration and application deployment.
Microsoft SQL Server : The default relational database backend for storing vector embeddings.
Hugging Face Models: Used for both embedding generation and large language model inference.
Red Hat OpenShift GitOps: The primary driver for automated deployment and continuous synchronization of the pattern’s components.
Red Hat OpenShift AI: An optimized inference engine for large language models, deployed on GPU-enabled nodes.
Node Feature Discovery (NFD) Operator: A Kubernetes add-on for detecting hardware features and system configuration.
NVIDIA GPU Operator: The GPU Operator uses the Operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU.