This project implements a Retrieval-Augmented Generation (RAG) system using a large language model (LLM) for handling queries on unstructured data (e.g., PDFs). The system provides context-aware responses with privacy-preserving features, such as PII redaction.
The architecture integrates several key components:
- Database (e.g., PDFs)
- PII Redaction using Faker
- Context Construction using RAG
- Large Language Model (LLM) Llama3.1
Each section below provides more details on these components and how they interact.
The system relies on a database containing unstructured documents, such as PDF files. These documents contain the information that the user queries against.
- Data type: PDFs or other unstructured data formats.
- Role: The source of information that feeds into the system for generating responses.
Before processing any data, the system employs a PII (Personally Identifiable Information) redaction step to ensure that sensitive information is not exposed in the final responses.
- Tool used: Faker library for redacting PII.
- Function: Ensures that the context generated from the database is sanitized by replacing sensitive information with fake or anonymized data.
- Input: Context extracted from the database relevant to the query.
- Output: PII-redacted context passed on for further processing.
The heart of the system is the RAG (Retrieval-Augmented Generation) model, which constructs a context-aware query by retrieving relevant information from the database.
- Process:
- The query is sent to the RAG system, which looks up the relevant context from the PII-redacted database.
- The context relevant to the query is passed on to the LLM for generating a final response.
- Input: Query from the user.
- Output: A context-aware query sent to the LLM.
The final step in the system involves sending the context-aware query to the LLM (Llama 3.1 from Ollama). The LLM generates a response based on the provided context.
- Model used: Llama 3.1 (Ollama).
- Function: Processes the context-aware query and generates a natural language response.
- Prompt template: Prompt template to interpolate context into the prompt and add any instruction to make the LLM's response more refined.
- Input: Context-aware query from the RAG.
- Output: Final response delivered to the user.
- User Query: A query is made by the user, which initiates the process.
- Context Retrieval: The system retrieves relevant data from the database.
- PII Redaction: Any sensitive information is redacted from the retrieved data.
- RAG Model: The RAG model constructs a context-aware query from the redacted data.
- LLM Response: The LLM processes the query and provides the final response to the user.
- User asks: "What are the questions involved in general health screening in preemployment health assessment?"
- The system retrieves the relevant section from the patient's PDF file.
- PII redaction replaces the patient's name and other sensitive details.
- The RAG model constructs a query with the anonymised context.
- The LLM generates a response.
-
Download ollama. Follow instruction to run llama3.1
-
Install the required packages:
pip install -r requirements.txt
- Run the Flask API:
python app.py
- Open /ui/index.html in your web browser to interact with the chatbot.
The pdf data is autogenerated by chatGPT4o, with sample empty file based on NSW Preemployment questionnaire.