This application utilizes the ColPali vision language model and OpenAI capabilities to implement various document processing features.
- ColPali is a groundbreaking document retrieval model that utilizes Vision Language Models (VLM).
- arxiv link ColPali: Efficient Document Retrieval with Vision Language Models
- Efficient document indexing using Vision Language Models.
- Capable of handling various document types, including text, tables, and images.
- No need for processes such as OCR, Layout Parsing, Chunking, Captioning, or text embedding models.
- Relatively fast response processing compared to other RAG systems.
- Added ColQwen2 model
- Need to update the dependency library by running following command
pip install -r requirements.txt
Before you begin, ensure you have met the following requirements:
-
Python:
Make sure you have Python 3.10 or later installed. You can download it from the official Python website.
python --version
-
pip:
Ensure you have pip installed, which is the package installer for Python.
-
Git:
Ensure you have Git installed for version control. You can download it from the official Git website.
-
Virtual Environment:
It is recommended to use a virtual environment to manage your project dependencies.
You can create a virtual environment using venv:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
IDE/Code Editor:
Use an IDE or code editor of your choice. Popular options include PyCharm, VSCode, and Eclipse.
-
PlantUML:
PlantUML is used for generating UML diagrams.
Download PlantUML from the official PlantUML website or PyCharm plugin, Xcode extension.
- Clone repository
git clone https://github.com/hyun-yang/MyColPali
- With pip:
pip install -r requirements.txt
Or virtual environment(venv), use this command
python -m pip install -r requirements.txt
- Run main.py
python main.py
-
Configure API Key
- Open 'Setting' menu and set API key.
-
Re-run main.py
python main.py
Make sure to download the ColPali/ColQwen2 model prior to using the application.
The total file size to be downloaded over 5GB(ColPali), 8GB(ColQwen2). Depending on your current network speed, this may take some time.
Choose one of the two methods below to download:
- Use the download tool from Hugging Face vidore/colpali-v1.2 to download.
- Use the download tool from Hugging Face vidore/colqwen2-v0.1 to download.
- Open the Jupyter notebook file download_model/download_colpali_model.ipynb and run it.
To utilize the GPU, you need to install a version of PyTorch that is compatible with your operating system and the CUDA version supported by your GPU.
If the PyTorch version is not installed correctly or if you do not have a GPU, it will operate in CPU mode, which is slower.
Please refer to the Utility.get_torch_device method in the util folder for more information.
- First Run
- Main Class Diagram
- Vision Presenter / ColPaliVLMModel Diagram
This is the system information used for this test.
- OS : Windows 11
- CPU : Ryzen 7 7800X3D
- RAM : 64GB DDR5 Corsair 6000MT/s
- GPU : Nvidia GeForce RTX 4070 Ti Super - 16GB VRAM
- CUDA : 12.1
The document referenced in the question/answer below is the ColPali: Efficient Document Retrieval with Vision Language Models.
This document is 20 pages long and includes text, graphs, and images.
- File indexing time : 17 seconds
- Total pages : 20 pages
- File size : 8.9 mb
- Summarize this document.
- What is the purpose of the ViDoRe benchmark?
- Why is the ColPali model superior to existing document retrieval systems?
- What is the importance of visual cues in document retrieval systems?
- How is the training dataset for the ColPali model composed?
- How does the late interaction mechanism of the ColPali model work?
- What evaluation metrics does the ViDoRe benchmark use?
- What comparative models were used to evaluate the performance of the ColPali model?
- How has the indexing speed of the ColPali model been improved?
- What methods are used to reduce the memory usage of the ColPali model?
When answering the question "What evaluation metrics does the ViDoRe benchmark use?", please note that the quality of response differs when answering to using 5 images versus 10 images.
- 이 문서를 요약해주세요.
- ViDoRe 벤치마크의 목적은 무엇인가요?
- ColPali 모델이 기존 문서 검색 시스템보다 우수한 이유는 무엇인가요?
- 문서 검색 시스템에서 시각적 단서의 중요성은 무엇인가요?
- ColPali 모델의 학습 데이터셋은 어떻게 구성되었나요?
- ColPali 모델의 늦은 상호작용 메커니즘은 어떻게 작동하나요?
- ViDoRe 벤치마크는 어떤 평가 메트릭을 사용하나요?
- ColPali 모델의 성능을 평가하기 위해 어떤 비교 모델이 사용되었나요?
- ColPali 모델의 인덱싱 속도는 어떻게 개선되었나요?
- ColPali 모델의 메모리 사용량을 줄이기 위한 방법은 무엇인가요?
- Summarize this document.
- What is the purpose of the ViDoRe benchmark?
- Why is the ColPali model superior to existing document retrieval systems?
- What is the importance of visual cues in document retrieval systems?
- What evaluation metrics does the ViDoRe benchmark use?
- What methods are used to reduce the memory usage of the ColPali model?
- How has the indexing speed of the ColPali model been improved?
- What comparative models were used to evaluate the performance of the ColPali model?
- ColPali 모델이 기존 문서 검색 시스템보다 우수한 이유는 무엇인가요?
- 문서 검색 시스템에서 시각적 단서의 중요성은 무엇인가요?
- ColPali 모델의 학습 데이터셋은 어떻게 구성되었나요?
The document referenced in the question/answer below is the Data and AI Trends Report 2024.
This report is 44 pages long and includes text, graphs, and images.
- File indexing time : 242 seconds
- Total pages : 44 pages
- File size : 23.7 mb
- Explain the Top 5 trends.
- What is RAG, and how can it be utilized?
- Explain why we should learn AI.
- 우리가 AI를 배워야 하는 이유를 설명해줘.
- AI를 사용해서 데이터 통합을 하려고 하는 기업의 비율은 얼마나 될까?
- RAG와 같은 AI 모델을 활용한 기술을 사용하여 데이터베이스 관리에 사용하고 싶은 기업의 비율은 얼마나 될까?
- RAG가 어떤 기술이고 어떻게 활용할 수 있어?
- Explain Top 5 trends.
- What is RAG, and how can it be utilized?
- Explain why we should learn AI.
- 우리가 AI를 배워야 하는 이유를 설명해줘.
- AI를 사용해서 데이터 통합을 하려고 하는 기업의 비율은 얼마나 될까?
- RAG와 같은 AI 모델을 활용한 기술을 사용하여 데이터베이스 관리에 사용하고 싶은 기업의 비율은 얼마나 될까?
- RAG가 어떤 기술이고 어떻게 활용할 수 있어?
- Question/Answer List
- When selecting a size in the Image Size settings, the app adjusts the size of the returned images from ColPali, according to the selected image size (the longer side of width/height).
- If the Image Size checkbox is selected, the app uses returned images from ColPali, without resizing it.
- The larger the image size, the more tokens will be used.
Distributed under the MIT License.