A production RAG system is split into 3 main components:
- ingestion: clean, chunk, embed, and load your data to a vector DB
- retrieval: query your vector DB for context
- generation: attach the retrieved context to your prompt and pass it to an LLM
The ingestion component sits in the feature pipeline, while the retrieval and generation components are implemented inside the inference pipeline.
You can also use the retrieval and generation components in your training pipeline to fine-tune your LLM further on domain-specific prompts.
You can apply advanced techniques to optimize your RAG system for ingestion, retrieval and generation.
That being said, there are 3 main types of advanced RAG techniques:
- Pre-retrieval optimization [ingestion]: tweak how you create the chunks
- Retrieval optimization [retrieval]: improve the queries to your vector DB
- Post-retrieval optimization [retrieval]: process the retrieved chunks to filter out the noise
You can learn more about RAG from Decoding ML LLM Twin Course:
- Lesson 4: SOTA Python Streaming Pipelines for Fine-tuning LLMs and RAG — in Real-Time!
- Lesson 5: The 4 Advanced RAG Algorithms You Must Know to Implement
The finetuning dataset preparation module automates the generation of datasets specifically formatted for training and fine-tuning Large Language Models (LLMs). It interfaces with Qdrant, sends structured prompts to LLMs, and manages data with Comet ML for experiment tracking and artifact logging.
- Model Customization: Tailors the LLM's responses to specific domains or tasks.
- Improved Accuracy: Enhances the model's understanding of nuanced language used in specialized fields.
- Efficiency: Reduces the need for extensive post-processing by producing more relevant outputs directly.
- Adaptability: Allows models to continuously learn from new data, staying relevant as language and contexts evolve.
To prepare your environment for these components, follow these steps:
poetry init
poetry install
To ensure that your Docker containers can communicate with each other you need to update your /etc/hosts
file.
Add the following entries to map the hostnames to your local machine:
# Docker MongoDB Hosts Configuration
127.0.0.1 mongo1
127.0.0.1 mongo2
127.0.0.1 mongo3
For Windows users check this article: https://medium.com/workleap/the-only-local-mongodb-replica-set-with-docker-compose-guide-youll-ever-need-2f0b74dd8384
CometML is a cloud-based platform that provides tools for tracking, comparing, explaining, and optimizing experiments and models in machine learning. CometML helps data scientists and teams to better manage and collaborate on machine learning experiments.
- Experiment Tracking: CometML automatically tracks your code, experiments, and results, allowing you to compare between different runs and configurations visually.
- Model Optimization: It offers tools to compare different models side by side, analyze hyperparameters, and track model performance across various metrics.
- Collaboration and Sharing: Share findings and models with colleagues or the ML community, enhancing team collaboration and knowledge transfer.
- Reproducibility: By logging every detail of the experiment setup, CometML ensures experiments are reproducible, making it easier to debug and iterate.
When integrating CometML into your projects, you'll need to set up several environment variables to manage the authentication and configuration:
COMET_API_KEY
: Your unique API key that authenticates your interactions with the CometML API.COMET_PROJECT
: The project name under which your experiments will be logged.COMET_WORKSPACE
: The workspace name that organizes various projects and experiments.
To access and set up the necessary CometML variables for your project, follow these steps:
-
Create an Account or Log In:
- Visit CometML's website and log in if you already have an account, or sign up if you're a new user.
-
Create a New Project:
- Once logged in, navigate to your dashboard. Here, you can create a new project by clicking on "New Project" and entering the relevant details for your project.
-
Access API Key:
- After creating your project, you will need to obtain your API key. Navigate to your account settings by clicking on your profile at the top right corner. Select 'API Keys' from the menu, and you'll see an option to generate or copy your existing API key.
-
Set Environment Variables:
- Add the obtained
COMET_API_KEY
to your environment variables, along with theCOMET_PROJECT
andCOMET_WORKSPACE
names you have set up.
- Add the obtained
Qdrant is an open-source vector database designed for storing and searching large volumes of high-dimensional vector data. It is particularly suited for tasks that require searching through large datasets to find items similar to a query item, such as in recommendation systems or image retrieval applications.
Qdrant is a robust vector database optimized for handling high-dimensional vector data, making it ideal for machine learning and AI applications. Here are the key reasons for using Qdrant:
- Efficient Searching: Utilizes advanced indexing mechanisms to deliver fast and accurate search capabilities across high-dimensional datasets.
- Scalability: Built to accommodate large-scale data sets, which is critical for enterprise-level deployments.
- Flexibility: Supports a variety of distance metrics and filtering options that allow for precise customization of search results.
- Integration with ML Pipelines: Seamlessly integrates into machine learning pipelines, enabling essential functions like nearest neighbor searches.
Qdrant can be utilized via its Docker implementation or through its managed cloud service(https://cloud.qdrant.io/login). This flexibility allows you to choose an environment that best suits your project's needs.
To configure your environment for Qdrant, set the following variables:
QDRANT_HOST
: The hostname or IP address where your Qdrant server is running.QDRANT_PORT
: The port on which Qdrant listens, typically6333
for Docker setups.
QDRANT_CLOUD_URL
: The URL for accessing Qdrant Cloud services.QDRANT_APIKEY
: The API key for authenticating with Qdrant Cloud.
Please check this article to learn how to obtain these variables (https://qdrant.tech/documentation/cloud/quickstart-cloud/?utm_source=decodingml&utm_medium=referral&utm_campaign=llm-course)
Additionally, you can control the connection mode (Cloud or Docker) using a setting in your configuration file. More details can be found in db/qdrant.py :
USE_QDRANT_CLOUD: True # Set to False to use Docker setup
Before running any commands, ensure your environment variables are correctly set up in your .env
file to guarantee that everything works properly.
Ensure your .env
file includes the following configurations:
MONGO_DATABASE_HOST="mongodb://localhost:30001,localhost:30002,localhost:30003/?replicaSet=my-replica-set"
MONGO_DATABASE_NAME="scrabble"
# QdrantDB config
QDRANT_DATABASE_HOST="localhost"
QDRANT_DATABASE_PORT=6333
QDRANT_APIKEY= your-key
QDRANT_CLOUD_URL=your-url
USE_QDRANT_CLOUD=False
# MQ config
RABBITMQ_DEFAULT_USERNAME=guest
RABBITMQ_DEFAULT_PASSWORD=guest
RABBITMQ_HOST=localhost
RABBITMQ_PORT= 5673
COMET_API_KEY="your-key"
COMET_WORKSPACE="alexandruvesa"
COMET_PROJECT='decodingml'
OPENAI_API_KEY = "your-key"
You need a real dataset to run and test the modules. This section covers additional tools and scripts included in the project that assist with specific tasks, such as data insertion.
The insert_data_mongo.py
script is designed to manage the automated downloading and insertion of various types of documents (articles, posts, repositories) into a MongoDB database. It facilitates the initial population of the database with structured data for further processing or analysis.
- Dataset Downloading: Automatically downloads JSON formatted data files from Google Drive based on predefined file IDs.
- Dynamic Data Insertion: Inserts different types of documents (articles, posts, repositories) into the MongoDB database, associating each entry with its respective author.
- Download Data: The script first checks if the specified
output_dir
directory exists and contains any files. If not, it creates the directory and downloads the data files from Google Drive. - Insert Data: Based on the type specified in the downloaded files, it inserts posts, articles, or repositories into the MongoDB database.
- Logging: After each insertion, the script logs the number of items inserted and their associated author ID to help monitor the process.
query_expansion.py
: Handles the expansion of a given query into multiple variations using language model-based templates. It integrates theChatOpenAI
class fromlangchain_openai
and a customQueryExpansionTemplate
to generate expanded queries suitable for further processing.
reranking.py
: Manages the reranking of retrieved documents based on relevance to the original query. It uses aRerankingTemplate
and theChatOpenAI
model to reorder the documents in order of relevance, making use of language model outputs.
retriever.py
: Performs vector-based retrieval of documents from a vector database using query expansion and reranking strategies. It utilizes theQueryExpansion
andReranker
classes, as well asQdrantClient
for database interactions andSentenceTransformer
for generating query vectors.
self_query.py
: Generates metadata attributes related to a query, such as author ID, using a self-query mechanism. It employs aSelfQueryTemplate
and theChatOpenAI
model to extract required metadata from the query context.
After you have everything setup, environemnt variables, CometML account and Qdrant account, it's time to insert data in your environment.
The workflow is straightforward:
- Start all the services: MongoDB, Qdrant, RabbitMQ
- Start the CDC system (for more details you can check https://medium.com/decodingml/the-3nd-out-of-11-lessons-of-the-llm-twin-free-course-ba82752dad5a)
- Insert data to mongodb by running
make insert-data-mongo
- Insert data into Qdrant VectorDB by running the Bytewax pipelines
- Go to retriver.py (outside of rag folder) and write your own query
The project includes a Makefile
for easy management of common tasks. Here are the main commands you can use:
make help
: Displays help for each make command.make local-start
: Build and start mongodb, mq and qdrant.make local-test-github
: Insert data to mongodbmake local-test-retriever:
: Test RAG retrieval