In this folder, you'll find an example implementation that utilizes Langchain, OpenAI embeddings, and OpenAI LLM. Specifically, we have implemented a basic Retriever, Answerer, and Generator (RAG) model using Langchain. This RAG model leverages OpenAI embeddings and the OpenAI LLM for its functionality. Additionally, we utilize a web scraper to gather data and use it as a corpus for our RAG model. ChromaDB serves as the vector database in this setup.

To get started, navigate to:

{root folder of git codebase}/Step0/RAG_OpenAI.ipynb

This Jupyter notebook provides a hands-on guide to setting up and understanding the RAG model within the described environment. Follow along with the instructions provided to ensure a smooth setup process.

NVIDIA AI Foundation end points

As a first step, we will use NVIDIA AI foundation endpoints hosted at this location: Foundation models. The API reference to use these models with Langchain wrappers can be found here: API Reference. The notebook

{root folder of git codebase}/Step1/RAG_NVEndpoints.ipynb

contains the necessary code changes from the baseline to use NVIDIA Foundational models.

Deploying NVIDIA Retriever Embedding microservice locally

Now that we've familiarized ourselves with the basic setup and functionalities, it's time to take the next step and deploy the NREM (NVIDIA Retriever Embedding Microservice) locally.

The purpose of this deployment is to replace the NVIDIA AI Foundation endpoints used in the previous step, which relied on a hosted endpoint. By deploying the embedding model locally, we gain more control and flexibility over its usage and integration within our environment.

{root folder of git codebase}/Step2/NREM.ipynb

Deploying NVIDIA Retriever microservice locally

Building upon our previous deployment of the Embedding Microservice, we are now ready to deploy the retriever. Unlike the previous deployment, the retriever orchestrates multiple containers, including the embedding model, re-ranker, and an accelerated vector database. To streamline this process, we will utilize Docker Compose, allowing us to bring up this complex service with simple commands.

{root folder of git codebase}/Step3/Retriever.ipynb

Deploy a Huggingface model with NIM

TODO

Hugging Face has become a central hub for LLMs, offering a vast array of pre-trained models as well as fine-tuned for various tasks. In this notebook, we'll explore how developers can integrate Hugging Face models into the NIM.

Train a LoRA and deploy with NIM

TODO

This notebook shows how one can use NeMo Framework to train LoRA adapaters and subsequently deploy it for inference using NIMs.

Performance analysis

TODO

Compute the performance metrics and load testing.

Time to First Token Calculation:

Measure the time from when a request is received by the service until the first token is processed.

Inter-token latency

Measure the time between processing consecutive tokens in a sequence.

Load testing

Use a load testing tool to simulate multiple simultaneous requests to the service.
Execute the load tests and monitor various performance metrics such as response time, throughput, error rate, and system resource utilization.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Step0		Step0
Step1		Step1
Step2		Step2
Step3		Step3
imgs		imgs
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Introduction

Setting Up the Environment

NVIDIA AI Foundation end points

Deploying NVIDIA Retriever Embedding microservice locally

Deploying NVIDIA Retriever microservice locally

Deploy a Huggingface model with NIM

Train a LoRA and deploy with NIM

Performance analysis

Time to First Token Calculation:

Inter-token latency

Load testing

Deployment at scale

Deployment of NIMs on kubernetes cluster

Monitoring the RAG application

Autoscaling (Horizontal Pod Autoscaling)

About

Releases

Packages

Languages

nvpranak/NIMPlaybooks

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Introduction

Setting Up the Environment

NVIDIA AI Foundation end points

Deploying NVIDIA Retriever Embedding microservice locally

Deploying NVIDIA Retriever microservice locally

Deploy a Huggingface model with NIM

Train a LoRA and deploy with NIM

Performance analysis

Time to First Token Calculation:

Inter-token latency

Load testing

Deployment at scale

Deployment of NIMs on kubernetes cluster

Monitoring the RAG application

Autoscaling (Horizontal Pod Autoscaling)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages