Skip to content

Build a local RAG (Retrieval Augmented Generation) to generate exam questions for the Google Cloud Platform professional Data Engineer certification.

Notifications You must be signed in to change notification settings

anquetos/gcp-professional-data-engineer-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local RAG for GCP Professional Data Engineer certification

This project aims at building a local RAG which will help in training for the Google Cloud Cloud Professional Data Engineer certification by generating exam questions.

Various topics will be covered on the journey to build this RAG like :

  • extracting content from a PDF file ;
  • embeddings text ;
  • generating output based on retrieved context ;
  • creating custom prompt templates ;
  • building a user interface.

Note
🙏 This project won't have been possible without the great video tutorial (Local Retrieval Augmented Generation (RAG) from Scratch) from Daniel Bourke.

Project structure

.
├── .gitignore
├── README.md
├── notebooks
│   └── rag-building-discovery.ipynb
├── notes.md
├── pdf
│   └── source.pdf
├── requirements.txt
├── src
│   ├── __init__.py
│   ├── generation
│   │   ├── __init__.py
│   │   ├── augment_prompt.py
│   │   ├── generation_pipeline.py
│   │   ├── load_model.py
│   │   └── text_retriever.py
│   ├── helpers
│   │   └── timing_functions.py
│   └── preprocessing
│       ├── pdf_extractor.py
│       ├── preprocessing_pipeline.py
│       └── text_embedder.py
└── templates
    ├── generate_exam_question.yaml
    └── question_answer.yaml

About

Build a local RAG (Retrieval Augmented Generation) to generate exam questions for the Google Cloud Platform professional Data Engineer certification.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published