Skip to content

Contextual Doc Retrieval is a Python-based system leveraging OpenAI GPT-4o and Cohere for re-ranking and query expansion, combined with BM25 for accurate document retrieval. It parses PDFs, chunks content contextually, and enhances search precision with AI-powered contextual understanding and re-ranking.

License

Notifications You must be signed in to change notification settings

lesteroliver911/contextual-doc-retrieval-opneai-reranker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contextual Document Retrieval with LlamaParse, OpenAI GPT-4o, Contextual chunking, BM25, Query Expansion and Cohere Re-ranking

This repository provides a Python-based system for accurate document retrieval using advanced AI techniques. It combines OpenAI's GPT-4o model with Cohere for re-ranking and query expansion to enhance search precision. The system utilizes BM25 for keyword-based search and contextual chunking for improved understanding of documents, ensuring high-quality, context-aware results.

Features

  • PDF Parsing: Extracts content from PDFs using LlamaParse.
  • Contextual Chunking: Splits documents into manageable chunks and provides contextual summaries using OpenAI's GPT-4.
  • BM25 Search: Implements a BM25 search index for efficient keyword-based retrieval.
  • Cohere Re-ranking: Enhances search results by re-ranking them using Cohere's reranking model.
  • Query Expansion: Expands search queries using AI to improve retrieval performance.
  • Error Handling: Robust exception handling ensures reliable document processing.

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/contextual-doc-retrieval-opneai-reranker.git
    cd contextual-doc-retrieval-opneai-reranker
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up environment variables:

    • Create a .env file with the following API keys:
      OPENAI_API_KEY=your_openai_api_key
      COHERE_API_KEY=your_cohere_api_key
      LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key

Usage

  1. Run the Script:

    python main.py
  2. Load PDF Document: The script will prompt you to enter the path to your PDF file.

  3. Perform Document Search: You can input your search queries, and the system will return the relevant results from the document using BM25 and re-ranking with Cohere.

Example Query

  • Original query: "Summarize the full document and explain the Fixture limits in detail."
  • The system will provide both an original and expanded version of the query for better retrieval accuracy.

Dependencies

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any changes or enhancements.

About

Contextual Doc Retrieval is a Python-based system leveraging OpenAI GPT-4o and Cohere for re-ranking and query expansion, combined with BM25 for accurate document retrieval. It parses PDFs, chunks content contextually, and enhances search precision with AI-powered contextual understanding and re-ranking.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages