python src/analyze.py path/to/directory "Describe the image"
Welcome to this CogVLM2 Autocaptioning Tools repository! This project sets up tools for autocaptioning using the state-of-the-art CogVLM2.
✅ Chat Mode ✅ Caption Mode ✅ FastAPI Application
CogVLM2 is an Open Source VLM that rivals near GPT4V performance. This repository aims to set up the necessary environment and some tools to leverage the power of the CogVLM2 model. The model was created and released by The Knowledge Engineering Group (KEG) & Data Mining (THUDM) at Tsinghua University: https://huggingface.co/THUDM.
(TESTED ON UBUNTU 22.04 | CUDA 12.1 | Torch 2.3.0+cu121 w/ Xformers)
For windows, lmk. I'll make a pull request to actually test. but should work fine.
Follow the steps below to set up the project:
- Download and Run the Shell Script:
wget https://raw.githubusercontent.com/C0nsumption/Consume-CogVLM2/main/setup/setup.sh chmod +x setup.sh ./setup.sh
- Download and Run the Batch Script:
curl -o setup.bat https://raw.githubusercontent.com/C0nsumption/Consume-CogVLM2/main/setup/setup.bat setup.bat
Manual Installation
-
Clone this Repo and Navigate to the Project Directory:
git clone https://github.com/C0nsumption/Consume-CogVLM2.git cd Consume-CogVLM2
-
Set Up a Virtual Environment:
python -m venv venv source venv/bin/activate # For Linux/Mac venv\Scripts\activate # For Windows
-
Initialize with Git LFS (make sure to have installed. Ask ChatGPT.):
git lfs install
-
Clone the Model Repository:
git clone https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B-int4
-
Install Dependencies:
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121 pip install -r requirements.txt
-
Run Tests:
python test/test.py
After setting up the environment, you can start using the CogVLM2 autocaptioning tools. Detailed usage instructions and examples can be found in the Usage Guide.
I welcome contributions from the community! If you'd like to contribute, please fork the repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Feel free to reach out if you have any questions or need further assistance! But give me time, very busy:
accelerating 🫡