Skip to content

The PyQt6 application using ColPali and OpenAI to show Efficient Document Retrieval with Vision Language Models

License

Notifications You must be signed in to change notification settings

hyun-yang/MyColPali

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MyColPali

This application utilizes the ColPali vision language model and OpenAI capabilities to implement various document processing features.

Introduction to ColPali

ColPali Features

  • Efficient document indexing using Vision Language Models.
  • Capable of handling various document types, including text, tables, and images.
  • No need for processes such as OCR, Layout Parsing, Chunking, Captioning, or text embedding models.
  • Relatively fast response processing compared to other RAG systems.

colpali_archtecture

Updates

  • Added ColQwen2 model
  • Need to update the dependency library by running following command
pip install -r requirements.txt

Prerequisites

Before you begin, ensure you have met the following requirements:

  1. Python:

    Make sure you have Python 3.10 or later installed. You can download it from the official Python website.

  python --version
  1. pip:

    Ensure you have pip installed, which is the package installer for Python.

  2. Git:

    Ensure you have Git installed for version control. You can download it from the official Git website.

  3. Virtual Environment:

    It is recommended to use a virtual environment to manage your project dependencies.

    You can create a virtual environment using venv:

  python -m venv venv
  source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  1. IDE/Code Editor:

    Use an IDE or code editor of your choice. Popular options include PyCharm, VSCode, and Eclipse.

  2. PlantUML:

    PlantUML is used for generating UML diagrams.

    Download PlantUML from the official PlantUML website or PyCharm plugin, Xcode extension.

Quick Install

  1. Clone repository
git clone https://github.com/hyun-yang/MyColPali
  1. With pip:
pip install -r requirements.txt

Or virtual environment(venv), use this command

python -m pip install -r requirements.txt
  1. Run main.py
python main.py
  1. Configure API Key

    • Open 'Setting' menu and set API key.
  2. Re-run main.py

python main.py

ColPali/ColQwen2 Model Download

Make sure to download the ColPali/ColQwen2 model prior to using the application.

The total file size to be downloaded over 5GB(ColPali), 8GB(ColQwen2). Depending on your current network speed, this may take some time.

Choose one of the two methods below to download:

  1. Use the download tool from Hugging Face vidore/colpali-v1.2 to download.
  2. Use the download tool from Hugging Face vidore/colqwen2-v0.1 to download.
  3. Open the Jupyter notebook file download_model/download_colpali_model.ipynb and run it.

PyTorch Installation

To utilize the GPU, you need to install a version of PyTorch that is compatible with your operating system and the CUDA version supported by your GPU.

If the PyTorch version is not installed correctly or if you do not have a GPU, it will operate in CPU mode, which is slower.

Please refer to the Utility.get_torch_device method in the util folder for more information.

Screenshots

  • First Run

mycolpali_first_run_2

  • Setting mycolpali_first_run_1

UML Diagram

  • Main Class Diagram

MyColPali-UML-3

  • Vision Presenter / ColPaliVLMModel Diagram

MyColPali-UML-2

ColPali Question/Answer Test

This is the system information used for this test.

  • OS : Windows 11
  • CPU : Ryzen 7 7800X3D
  • RAM : 64GB DDR5 Corsair 6000MT/s
  • GPU : Nvidia GeForce RTX 4070 Ti Super - 16GB VRAM
  • CUDA : 12.1

1) ColPali Efficient Document Retrieval with Vision Language Models Question/Answer

The document referenced in the question/answer below is the ColPali: Efficient Document Retrieval with Vision Language Models.

This document is 20 pages long and includes text, graphs, and images.

  • File indexing time : 17 seconds
  • Total pages : 20 pages
  • File size : 8.9 mb

colpai_pdf_time

English Questions

  1. Summarize this document.
  2. What is the purpose of the ViDoRe benchmark?
  3. Why is the ColPali model superior to existing document retrieval systems?
  4. What is the importance of visual cues in document retrieval systems?
  5. How is the training dataset for the ColPali model composed?
  6. How does the late interaction mechanism of the ColPali model work?
  7. What evaluation metrics does the ViDoRe benchmark use?
  8. What comparative models were used to evaluate the performance of the ColPali model?
  9. How has the indexing speed of the ColPali model been improved?
  10. What methods are used to reduce the memory usage of the ColPali model?

When answering the question "What evaluation metrics does the ViDoRe benchmark use?", please note that the quality of response differs when answering to using 5 images versus 10 images.

Korean Questions

  1. 이 문서를 요약해주세요.
  2. ViDoRe 벤치마크의 목적은 무엇인가요?
  3. ColPali 모델이 기존 문서 검색 시스템보다 우수한 이유는 무엇인가요?
  4. 문서 검색 시스템에서 시각적 단서의 중요성은 무엇인가요?
  5. ColPali 모델의 학습 데이터셋은 어떻게 구성되었나요?
  6. ColPali 모델의 늦은 상호작용 메커니즘은 어떻게 작동하나요?
  7. ViDoRe 벤치마크는 어떤 평가 메트릭을 사용하나요?
  8. ColPali 모델의 성능을 평가하기 위해 어떤 비교 모델이 사용되었나요?
  9. ColPali 모델의 인덱싱 속도는 어떻게 개선되었나요?
  10. ColPali 모델의 메모리 사용량을 줄이기 위한 방법은 무엇인가요?

Q/A Result

  1. Summarize this document.

colpai_summarize this document

  1. What is the purpose of the ViDoRe benchmark?

colpai_What is the purpose of the ViDoRe benchmark2

  1. Why is the ColPali model superior to existing document retrieval systems?

colpai_Why is the ColPali model superior to existing document retrieval systems

  1. What is the importance of visual cues in document retrieval systems?

colpai_What is the importance of visual cues in document retrieval systems

  1. What evaluation metrics does the ViDoRe benchmark use?
  • Using 5 images colpai_What evaluation metrics does the ViDoRe benchmark use

  • Using 10 images
    colpai_What evaluation metrics does the ViDoRe benchmark use-10images

  1. What methods are used to reduce the memory usage of the ColPali model?

colpai_What methods are used to reduce the memory usage of the ColPali model

  1. How has the indexing speed of the ColPali model been improved?

colpai_How has the indexing speed of the ColPali model been improved

  1. What comparative models were used to evaluate the performance of the ColPali model?

colpai_What comparative models were used to evaluate the performance of the ColPali model

  1. ColPali 모델이 기존 문서 검색 시스템보다 우수한 이유는 무엇인가요?

colpai_Why is the ColPali model superior to existing document retrieval systems-kor

  1. 문서 검색 시스템에서 시각적 단서의 중요성은 무엇인가요?

colpai_What is the importance of visual cues in document retrieval systems-kor

  1. ColPali 모델의 학습 데이터셋은 어떻게 구성되었나요?

colpai_How is the training dataset for the ColPali model composed2-kor

2) Data and AI Trends Report 2024 Question/Answer

The document referenced in the question/answer below is the Data and AI Trends Report 2024.

This report is 44 pages long and includes text, graphs, and images.

  • File indexing time : 242 seconds
  • Total pages : 44 pages
  • File size : 23.7 mb

ai_trend_report_44pages

English Questions

  1. Explain the Top 5 trends.
  2. What is RAG, and how can it be utilized?
  3. Explain why we should learn AI.

Korean Questions

  1. 우리가 AI를 배워야 하는 이유를 설명해줘.
  2. AI를 사용해서 데이터 통합을 하려고 하는 기업의 비율은 얼마나 될까?
  3. RAG와 같은 AI 모델을 활용한 기술을 사용하여 데이터베이스 관리에 사용하고 싶은 기업의 비율은 얼마나 될까?
  4. RAG가 어떤 기술이고 어떻게 활용할 수 있어?

Q/A Result

  1. Explain Top 5 trends.

mycolpali_eng_qa_1

  1. What is RAG, and how can it be utilized?

mycolpali_eng_qa_2

  1. Explain why we should learn AI.

mycolpali_eng_qa_3

  1. 우리가 AI를 배워야 하는 이유를 설명해줘.

mycolpali_han_qa_1

  1. AI를 사용해서 데이터 통합을 하려고 하는 기업의 비율은 얼마나 될까?
  2. RAG와 같은 AI 모델을 활용한 기술을 사용하여 데이터베이스 관리에 사용하고 싶은 기업의 비율은 얼마나 될까?

mycolpali_han_qa_2

  1. RAG가 어떤 기술이고 어떻게 활용할 수 있어?

mycolpali_han_qa_3

  • Question/Answer List

mycolpali_list

Important Notes

  • When selecting a size in the Image Size settings, the app adjusts the size of the returned images from ColPali, according to the selected image size (the longer side of width/height).
  • If the Image Size checkbox is selected, the app uses returned images from ColPali, without resizing it.
  • The larger the image size, the more tokens will be used.

License

Distributed under the MIT License.

About

The PyQt6 application using ColPali and OpenAI to show Efficient Document Retrieval with Vision Language Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published