We finish the AliProduct competition, upload our solution on GitHub and the report will come soon.
Our solution achieves an average recall of 0.76 on the val dataset without pre-trained models and doesn't require additional dirty data pre-processing and multi-stage training and can achieve a speed of 0.16s per image per gpu. We train the model on 8*A100 40G with the aliproduct dataset, which include 4 million image-context pairs.
conda create -n air python=3.9
conda activate air
pip install -r requirements.txt
You can change some hyperparameters in train_retrieval.py before run.
bash run.sh
After finish the train steps, you can use the itm_predict.py or itc_predict.py to predict the result. If you want to test the preformance, do this:
bash test.sh
The test.sh will compute the itm_socre or itc_score for top_k image-context pairs. The start and end is for image index to accelerate by multi-gpus. Each 10 image-context pairs need 3.6s on an A100 80G gpu.
- context: "M & D Simple Modern Light Luxury Comfort Good Quality Living Room with a Double Motor Lounge Chair Sofa TE04"
- context: "er tong hua xing che fang ce fan niu niu che 1-3 sui bao bao wan ju che yin le ke zuo ke qi si lun lium che"
- context: "feiyangg/LP Paragraph Style Electric Guitar Tiger Veneer Factory Direct Color Can Be Customized"