Releases: illuin-tech/colpali
v0.3.3
[0.3.3] - 2024-10-29
Added
- Add BiQwen2 model
Changed
- Modified ColQwen and BiQwen to prevent the useless forward pass in the last layer of the original model (classification head)
- Bumped "breaking" dependencies on MTEB and Transformers version and made the corresponding changes in the code
- Casted Image dtype in ColPali due to breaking 4.46 transformers update
- Added a "num_image_tokens" kwarg to the
ColQwen2Processor
to allow for different image resolutions
Fixed
- Fix wrong variable name for
ColPaliProcessor
's prefixes
Full Changelog: v0.3.2...v0.3.3
v0.3.2: The interpretability update
Description
✨ This release brings the interpretability module to colpali-engine
and adds support for generating similarity maps with the ColQwen2 model.
🛠️ We’ve also made several code improvements and added tests for ColQwen2 to ensure better performance and reliability.
Features
Added
- Restore, refactor, and improve
interpretability
module for generating similarity maps
Changed
- Remove dummy image from
ColPaliProcessor.process_queries
Fixed
- Fix the
compute_hardnegs.py
script
Tests
- Add missing
model.eval()
in tests - Add tests for ColQwen2
Full Changelog: v0.3.1...v0.3.2
v0.3.1: ColQwen2
[0.3.1] - 2024-09-27
Added
- Add module-level imports for collators
- Add sanity check in the run inference example script
- Add E2E test for ColPali
- Add Qwen2-VL support
Changed
- Improve code clarity the run inference example script
- Subset the example dataset in the run inference example script
- Rename scorer test to
test_processing_utils
- Greatly simplify routing logic in Trainer selection and when feeding arguments to the model forward pass (refacto)
- Removed class
ContrastiveNegativeTrainer
which is now just integrated in ContrastiveTrainer. This should not affect the user-facing API. - Bumped transformers version to 4.45.0 to get Qwen2-VL support
Fixed
- Import HardNegCollator at module-level if and only if datasets is available
- Remove the need for
typer
in the run inference example script - Fix edge case when empty suffix
""
given to processor - Fix bug in HardNegCollator since 0.3.0
Full Changelog: v0.3.0...v0.3.1
v0.3.0: Extensive package refacto
Description
✨ This release is an extensive package refacto, making ColPali more modular and easier to use.
🚨 It is NOT backward-compatible with previous versions.
Features
Added
- Restructure the
utils
module - Restructure the model training code
- Add custom
Processor
classes to easily process images and/or queries - Enable module-level imports
- Add scoring to processor
- Add
CustomRetrievalEvaluator
- Add missing typing
- Add tests for model, processor, scorer, and collator
- Lint
Changelog
- Add missing docstrings
- Add "Ruff" and "Test" CI pipelines
Changed
- Restructure all modules to closely follow the
transformers
architecture - Hugely simplify the collator implementation to make it model-agnostic
ColPaliProcessor
'sprocess_queries
doesn't need a mock image input anymore- Clean
pyproject.toml
- Loosen the required dependencies
- Replace
black
with theruff
linter
Removed
- Remove
interpretability
andeval_manager
modules - Remove unused utils
- Remove
TextRetrieverCollator
- Remove
HardNegDocmatixCollator
Fixed
- Fix wrong PIL import
- Fix dependency issues
Full Changelog: v0.2.2...v0.3.0
v0.2.2
v0.2.1
[0.2.1] - 2024-09-02
Patch query preprocessing helper function disalignement with training scheme.
Fixed
- Add 10 extra pad token by default to the query to act as reasoning buffers. This was added in the collator but not the external helper function for inference purposes.
v0.2.0
[0.2.0]
Large refactoring to adress several issues and add features. This release is not backward compatible with previous versions.
The models trained under this version will exhibit degraded performance if used with the previous version of the code and vice versa.
Added
- Added multiple training options for training with hard negatives. This leads to better model performance !
- Added options for restarting training from a checkpoint.
Changed
- Optionally load ColPali models from pre-initialized backbones of the same shape to remove any stochastic initialization when loading adapters. This fixes 11 and 17.
Fixed
- Set padding side to right in the tokenizer to fix misalignement issue between different query lengths in the same batch. Fixes 12
- Add 10 extra pad token by default to the query to act as reasoning buffers. This enables the above fix to be made without degrading performance and cleans up the old technique of using tokens.
v0.1.1: Reference release for the ColPali paper
Initial release
This release contains the code of reference for the ColPali arXiv paper [url]. In particular, it contains the model architecture, the loss function, and the trainer used for training ColPali.
To use this version of colpali-engine
, install the package with:
pip install colpali-engine==0.1.1