GitHub

# Chat with PDF - Streamlit Application

## Overview

The **Chat with PDF** application is a powerful tool that allows users to upload PDFs, extract their content, and interact with them through an AI-based question-answering system. Powered by **Claude by Anthropic** and **HuggingFace Embeddings**, this app enables efficient extraction and interaction with data from PDF files.

### Key Features:
- **PDF Content Extraction**: Extracts text from PDF files using `pdfplumber`.
- **AI-Powered Interaction**: Ask questions related to the content of the PDF, and get answers using the **Claude AI model**.
- **Text Embeddings**: Converts extracted PDF text into embeddings stored in a **FAISS vector store** for fast and efficient querying.
- **Error Handling**: Built-in error handling to address API issues, PDF extraction failures, and embedding creation problems.
- **Retry Mechanism**: Uses the **Tenacity** library to automatically retry embedding creation in case of failure, with exponential backoff.

## Installation

To get started with the application, follow these steps:

### 1. Clone the Repository
```bash
git clone https://github.com/atharvad38/chat_with_pdf.git
cd chat_with_pdf

2. Install Required Packages

Make sure you have Python 3.x installed, and then install the dependencies using pip:

pip install -r requirements.txt

3. Setup Anthropic API Key

You will need a valid Anthropic API key for the application to work. To get started:

Sign up at Anthropic and generate your API key.
Replace the api_key variable in the app.py file with your API key.

api_key = 'your-anthropic-api-key-here'

4. Run the Application

Once everything is set up, you can start the Streamlit app by running:

streamlit run app.py

This will open the application in your default web browser.

How to Use

Upload a PDF: Select and upload a PDF file using the file uploader in the app.
PDF Processing: The app will extract the text content from the PDF and create embeddings from the text.
Ask Questions: Once the embeddings are created, you can ask questions based on the document. The AI model will generate answers using the extracted content from the PDF.

Code Explanation

Key Components:

PDFProcessor Class:
- extract_pdf_content: This function extracts text from a PDF file using pdfplumber.
- create_embeddings: This function converts the extracted text into embeddings and stores them in a FAISS vector store for efficient similarity search.
- validate_api_key: Verifies that the provided Anthropic API key is valid.
Main Logic:
- Uses Streamlit to create a user interface.
- PDF Upload: Users upload a PDF, and the app processes and creates embeddings for it.
- Question-Answering: Once the PDF is processed, users can interact with the document by asking questions, and the AI provides answers based on the PDF content.

Error Handling:

Uses the Tenacity package for retries, particularly for the embedding creation process. If the creation fails, it retries up to 3 times with an exponential backoff.

Requirements

Python 3.x
Streamlit
pdfplumber (For extracting text from PDFs)
Pandas
Langchain (Anthropic)
HuggingFace Embeddings
FAISS (For efficient vector storage and search)
Tenacity (For retry logic)
Logging (For error and info logging)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2. Install Required Packages

3. Setup Anthropic API Key

4. Run the Application

How to Use

Code Explanation

Key Components:

Error Handling:

Requirements

About

Releases

Packages

Languages

atharvad38/chat_with_pdf

Folders and files

Latest commit

History

Repository files navigation

2. Install Required Packages

3. Setup Anthropic API Key

4. Run the Application

How to Use

Code Explanation

Key Components:

Error Handling:

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages