- Problem Statement
- Dataset
- Tech Stack
- Must-have
- Good-to-have
- Methodology
- Environment setup
- Running the application using docker compose
- Evaluation Criteria
- Future works
- Credits
- Contact info for peers
credits: xkcd.com
Studying for exams can often be overwhelming and time-consuming, especially when there's a large volume of material to review. It's common to need quick access to information on specific topics but struggle to find the relevant content. To tackle this, a RAG (Retrieval-Augmented-Generation) application, built with course materials as its knowledge base, can streamline the study process by optimizing time and effort.As I prepare for the Azure Fundamentals AZ-900 exam, I developed the AZ-900 Study Buddy, a RAG application designed to address these challenges. This project not only helps with exam preparation but also serves as a practical example of how RAG technology can be applied to educational use cases in general.
- All data for this project are available in data folder.
- Raw data: My AZ900 course notes, see
module_*.md
files. - readme_notes_with_ids.json: Raw data that are loaded and chunked into JSON format.
- az900_notes_with_vectors.pkl: The RAG knowledge base source, a structured data with vector embeddings.
- ground-truth-data.pkl: Ground truth data, contains questions to be used in retrieval evaluation and RAG evaluation.
- RAG_evaluation.pkl: A set of LLM generated responses, cosine similarity, latency, cost and tokens usage data from RAG evaluation.
- results_backup.json: A backup of JSON format questions generated for ground truth data (to skip question regeneration if any error occurs during parsing).
- Python 3.12.
- Docker.
- Elasticsearch for knowledge base retrieval.
- OpenAI or Ollama for LLM.
- Docker, since the knowledge base is indexed into elasticsearch docker container. You could download it at Download Docker Desktop.
- An OpenAI account with credits, or an Ollama docker container with Phi3 model pulled.
- Basic knowledge of these Docker commands for troubleshooting:
docker ps
,docker ps -a
,docker start
,docker compose up
,docker compose down
.
There are 4 main steps, the first 2 steps:
- Ingestion: load, chunk and embed raw data into structured data with embeddings.
- Evaluation: evaluate the best retrieval and RAG methods for this use case.
Once the best retrieval method and RAG is determined, the app AZ900 Study Buddy is then built in these 2 steps:
- Interface: streamlit application and RAG backend.
- Containerization: dockerize application, knowledge base and LLM using docker compose.
One-time setup to reproduce any parts of this repo on your workstation. You can skip to Running the application using docker compose if you only want to run the application.
conda create -n llm-zoomcamp-env python
conda activate llm-zoomcamp-env
conda install pip
pip install pipenv
pipenv install tqdm notebook==7.1.2 openai elasticsearch pandas jupyter sentence_transformers==2.7.0 python-dotenv seaborn streamlit
pipenv shell
: This allows you to run commands such aspython xxx.py
,streamlit run xxx.py
in the virtual environment.- Make sure docker service is up and running!
git clone
this repo to your local workstation.- Prepare the .env file.
- Start elasticsearch.
- Refer ChatGPT's suggestions to view
.env*
files. Else you can't proceed with next step. - Rename .env_template to
.env
. - (Not applicable for running application, only for code reproduction) Copy-paste your OpenAI API key to env. variable "OPENAI_API_KEY".
Not applicable for running application as it's handled by docker compose. For some code reproduction, you will need elasticsearch container to be running..
- To check if elasticsearch container is running, go to http://localhost:9200/.
- If not, either start an existing elasticsearch container using
docker start elasticsearch
or a new elasticsearch container with the following command:
docker run -it \
--name elasticsearch \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.4.3
This is for reproduction of the AZ900 Study Buddy application only (without reproducing other steps e.g. ingestion, evaluation).
- Make sure Docker service is up and running!
- There are 2 LLM options available in the application: GPT-4o-mini and Ollama Phi3. You need to decide on which one to use and to have it ready before starting the application. See diagram for setup flow:
- GPT-4o-mini: You will be asked to select GPT-4o-mini and input your OpenAI API key at the application screen. You can now proceed to docker compose step directly.
- Ollama Phi3: You need to have a running ollama container with Phi3 model inside.To check if you have an existing ollama container, execute
docker ps -a
.- If one exists but stopped, start container with
docker start ollama
. - If there's no existing Ollama container, execute the following:
- If one exists but stopped, start container with
docker run -it \
--rm \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
- Once ollama container is running (you can see it in
docker ps
), check if the ollama container has a Phi3 model by executing:
- Download this zip file az900_study_buddy.zip to your Desktop.
- Unzip the downloaded
az900_study_buddy.zip
. You should see a new folderaz900_study_buddy
on your Desktop. - Execute the following in command prompt:
- Optional health checks:
- Cross check containers
az900_study_buddy_app
andelasticsearch
are up withdocker ps
. If you plan to use Ollama, you need to seeollama
container as well, refer this guide Pick your LLM. - To ensure that data are indexed in elasticsearch, go to http://localhost:9200/_cat/indices?v. You should see
az900_course_notes
under "index".
- Cross check containers
- To access the application, open this link http://localhost:8501/ in a browser. If it is successful, you should now see the following screen:
- Note: If you executed
docker compose up -d
(detached mode), it would only tell you the containers are started but it doesn't tell you when the Streamlit app is ready. So you might encounter 404 not found when loading the application. Please try refreshing the page again after a minute or so.
- You must choose an LLM model first before you could start asking questions!
- If you're using GPT-4o-mini, select "OpenAI GPT-4o-mini" in drop down list and input your API key, press submit.
- If you're using Ollama, select "Ollama Microsoft Phi3" and press submit.
- You can now start to ask questions. Here's a screenshot of the interaction using Ollama:
- Execute the following in command prompt:
cd Desktop/az900_study_buddy
.docker compose down
- You could also stop your ollama container if you're done with it.
For peer review: A full list of LLM zoomcamp project evaluation criteria is available here.
- Problem description
- See Problem Statement.
- RAG flow
- Both a knowledge base and an LLM are used in the RAG flow. See Methodology.
- Retrieval evaluation
- Metrics used: Hit rate and Mean Reciprocal Rank (MRR)
- Multiple retrieval approaches are evaluated: text search, vector search, hybrid search, together with fine-tuning on the boost parameter. Refer notebook retrieval_evaluation.ipynb for details.
- Hybrid search with document reranking is also evaluated. Refer notebook hybrid_search_reranking.ipynb for details.
- Conclusion: Hybrid search (without document reranking) has the best retrieval performance.
- RAG evaluation
- Metrics used: Cosine similarity, latency, cost and tokens usage.
- Models evaluated: GPT-4o-mini, GPT-3.5-turbo.
- Multiple RAG approaches are evaluated based on metrics above. Refer notebook RAG_evaluation.ipynb.
- Conclusion: The best one, GPT-4o-mini, is used in the application.
- Interface
- A Streamlit UI is used. See app.py
- Ingestion pipeline
- Automated ingestion within Dockerfile by auto-calling a shell script entrypoint.sh that runs the Python script index_data.py to index data into elasticsearch container.
- Monitoring
- Currently unavailable.
- Containerization
- Everything is in docker-compose with an optional setup (if want to use Ollama Phi3 instead of GPT-4o-mini). See Running the application (using docker compose).
- Reproducibility
- Best practices
- Hybrid search: refer notebook retrieval_evaluation.ipynb.
- Document re-ranking: refer notebook hybrid_search_reranking.ipynb.
- Bonus points (not covered in the course)
- Currently unavailable in the cloud.
- Monitoring using PostgreSQL and Grafana.
- Migrate data to MongoDB.
- Deploy app onto cloud: HuggingFace Spaces, Streamlit Cloud, AWS EC2.
- A project report to talk more about methodology, findings.
A big thanks to Alexey Grigorev and the DataTalks.Club for the LLM Zoomcamp course, which made this project possible! 😃
- For any questions regarding peer review, please reach out to me on DataTalks.Club user "Vivien S.".
- Alternatively, you can create a new issue under Issues.