If you have any questions, please feel free to contact me by e-mail: liyunxin987@163.com, Twitter: @LyxTg.
The main contributions are:
-
We present a product description generation paradigm that is based only on the image and several marketing keywords. For this new setting, we propose a straightforward and effective multimodal in-context tuning approach, named ModICT, integrating the power from the frozen language model and visual encoder.
-
Our work is the first one to investigate utilizing the in-context learning and text generation capabilities of various frozen language models for multimodal E-commerce product description generation. ModICT can be plugged into various types of language models and the training process is parameter-efficient.
-
We conduct extensive experiments on our newly built three-category product datasets. The experimental results indicate that the proposed method achieves state-of-the-art performance on a wide range of evaluation metrics. Using the proposed multimodal in-context tuning technical, small models also achieve competitive performance compared to LLMs.
The overall workflow of ModICT. The left part depicts the process of in-context reference construction. The right parts show the efficient multimodal in-context tuning ways for the sequence-to- sequence language model (1) and autoregressive language model (2). Blocks with red lines are learnable.
MD2T is a new setting for multimodal E-commerce Description generation based on structured keywords and images.
MD2T | Cases&Bags | Clothing | Home Appliances |
---|---|---|---|
#Train | 18,711 | 200,000 | 86,858 |
#Dev | 983 | 6,120 | 1,794 |
#Test | 1,000 | 8,700 | 2,200 |
Avg_N #MP | 5.41 | 6.57 | 5.48 |
Avg_L #MP | 13.50 | 20.34 | 18.30 |
Avg_L #Desp | 80.05 | 79.03 | 80.13 |
Table: The detailed statistics of MD2T. Avg_N and Avg_L represent the average number and length respectively. MP and Desp indicate the marketing keywords and description.
Our preprocessed data (Text + Images) can be downloaded from https://huggingface.co/datasets/YunxinLi/MD2T.
If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝.
@article{li2024multimodal,
title={A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation},
author={Li, Yunxin and Hu, Baotian and Luo, Wenhan and Ma, Lin and Ding, Yuxin and Zhang, Min},
journal={LREC-COLING},
year={2024}
}