Skip to content

Latest commit

 

History

History
59 lines (49 loc) · 2.53 KB

README.md

File metadata and controls

59 lines (49 loc) · 2.53 KB

Aliproduct-BLIP-cvpr2023

This is the solution for Aliproduct Largs-Scale Competition on CVPR2023 workshop.

We finish the AliProduct competition, upload our solution on GitHub and the report will come soon.

Result

Our solution achieves an average recall of 0.76 on the val dataset without pre-trained models and doesn't require additional dirty data pre-processing and multi-stage training and can achieve a speed of 0.16s per image per gpu. We train the model on 8*A100 40G with the aliproduct dataset, which include 4 million image-context pairs.

How to use?

1.install the environment

conda create -n air python=3.9
conda activate air
pip install -r requirements.txt

2.train the model

You can change some hyperparameters in train_retrieval.py before run.

bash run.sh

3.val and predict

After finish the train steps, you can use the itm_predict.py or itc_predict.py to predict the result. If you want to test the preformance, do this:

bash test.sh

The test.sh will compute the itm_socre or itc_score for top_k image-context pairs. The start and end is for image index to accelerate by multi-gpus. Each 10 image-context pairs need 3.6s on an A100 80G gpu.

Visualization

  1. context: "M & D Simple Modern Light Luxury Comfort Good Quality Living Room with a Double Motor Lounge Chair Sofa TE04"
  1. context: "er tong hua xing che fang ce fan niu niu che 1-3 sui bao bao wan ju che yin le ke zuo ke qi si lun lium che"
  1. context: "feiyangg/LP Paragraph Style Electric Guitar Tiger Veneer Factory Direct Color Can Be Customized"