Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure package (cont.) #36

Merged
merged 83 commits into from
Sep 10, 2024
Merged

Restructure package (cont.) #36

merged 83 commits into from
Sep 10, 2024

Conversation

tonywu71
Copy link
Collaborator

@tonywu71 tonywu71 commented Sep 5, 2024

Description

Continue package restructuration from #28.

Added

  • Add custom Processor classes to easily process images and/or queries
  • Enable module-level imports
  • Add scoring to processor
  • Add CustomRetrievalEvaluator
  • Add missing typing
  • Add tests for model, processor, scorer, and collator
  • Add missing docstrings
  • Add "Ruff" and "Test" CI pipelines

Changed

  • Restructure all modules to closely follow the transformers architecture
  • Hugely simplify the collator implementation to make it model-agnostic
  • Clean pyproject.toml
  • Loosen the required dependencies
  • Replace black with the ruff linter

Removed

  • Remove TextRetrieverCollator
  • Remove HardNegDocmatixCollator

Fixed

  • Fix wrong PIL import
  • Fix dependency issues

@tonywu71 tonywu71 requested a review from ManuelFay September 5, 2024 10:18
Copy link
Collaborator

@ManuelFay ManuelFay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments - essentially what we were saying, might as well have the processing funcs in each model's processor

colpali_engine/collators/text_retriever_collator.py Outdated Show resolved Hide resolved
colpali_engine/models/__init__.py Outdated Show resolved Hide resolved
colpali_engine/trainer/colmodel_training.py Outdated Show resolved Hide resolved
scripts/compute_hardnegs.py Outdated Show resolved Hide resolved
scripts/compute_hardnegs.py Outdated Show resolved Hide resolved
scripts/compute_hardnegs.py Outdated Show resolved Hide resolved
scripts/infer/run_inference_with_python.py Outdated Show resolved Hide resolved
scripts/infer/run_inference_with_python.py Outdated Show resolved Hide resolved
@tonywu71 tonywu71 changed the title Fix typing issues ing the restructured package Restructure package (cont.) Sep 6, 2024
@tonywu71 tonywu71 self-assigned this Sep 6, 2024
@ManuelFay
Copy link
Collaborator

ManuelFay commented Sep 6, 2024

If code is tested and runs, LGTM.
Otherwise, let's check first if everything is smooth sailing

@tonywu71 tonywu71 force-pushed the fix-restructured-package branch 4 times, most recently from 54cc1fd to 30ea174 Compare September 6, 2024 15:31
@tonywu71
Copy link
Collaborator Author

tonywu71 commented Sep 7, 2024

@ManuelFay Made some extra code cleanup, and mostly importantly I've added some tests! 🎉

As discussed, I'll create a dev pre-release to test our package before shipping it to vidore-benchmark and byaldi.
On your side, can you do one more review and help me to test 1-2 train SLURM jobs please?

@ManuelFay
Copy link
Collaborator

Main comment here is that since it's mainly aesthetic, we can merge into main but no need to push a new tag until we have real breaking functionality (like configs). until then, I don't think it's worth the hassle for people to bump their version since everything is basically iso

.vscode/settings.json Outdated Show resolved Hide resolved
colpali_engine/utils/torch_utils.py Show resolved Hide resolved
colpali_engine/utils/torch_utils.py Show resolved Hide resolved
@tonywu71 tonywu71 requested a review from ManuelFay September 10, 2024 08:23
pyproject.toml Show resolved Hide resolved
scripts/compute_hardnegs.py Outdated Show resolved Hide resolved
@tonywu71 tonywu71 force-pushed the fix-restructured-package branch from 06377ee to 67135cc Compare September 10, 2024 09:45
@tonywu71 tonywu71 force-pushed the fix-restructured-package branch from 67135cc to ec32192 Compare September 10, 2024 10:03
@tonywu71 tonywu71 requested a review from ManuelFay September 10, 2024 11:54
* add: scorer in processor

* fix: lint

* fix: tests

* fix: bugs

* fix: tests pass

* fix: lint

* fix: tony's coms
Copy link
Collaborator

@ManuelFay ManuelFay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - you decide what to do with the device = "auto"
but we can merge this bug PR and get on with our lives haha

@tonywu71
Copy link
Collaborator Author

LGTM - you decide what to do with the device = "auto" but we can merge this bug PR and get on with our lives haha

In the end I went with your way of doing it, I just made it a bit more clear in the code 👍🏼

@tonywu71
Copy link
Collaborator Author

tonywu71 commented Sep 10, 2024

I've added some minor fixes. Once the test have passed, I'll squash, merge the PR and finally release v0.3.0! 😉

@tonywu71 tonywu71 merged commit 2c75550 into main Sep 10, 2024
5 checks passed
@tonywu71 tonywu71 deleted the fix-restructured-package branch October 13, 2024 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants