Merge branch 'main' of https://github.com/YingqingHe/Awesome-LLMs-mee…

…t-Multimodal-Generation
YingqingHe · Aug 29, 2024 · 652bb17 · 652bb17
2 parents 7190ed0 + 018b4bf
commit 652bb17
Showing 1 changed file with 22 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 
 <div align="center">
 <h2> LLMs Meet Multimodal Generation and Editing: A Survey </h2> 
-<a href='https://arxiv.org/abs/2405.19334'><img src='https://img.shields.io/badge/ArXiv-2211.14758-red'></a>
+<a href='https://arxiv.org/abs/2405.19334'><img src='https://img.shields.io/badge/ArXiv-2405.19334-red'></a>
 </div>
 
 # 🤗 Introduction
@@ -1958,6 +1958,27 @@ Dongchao Yang, Jinchuan Tian, Xu Tan\
 
 # 📍 Multimodal Understanding with LLMs
 ## Image Understanding
+
++ **Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions** (11 June 2024)<details><summary>Renjie Pi, Jianshu Zhang, Jipeng Zhang et al.</summary> Renjie Pi, Jianshu Zhang, Jipeng Zhang, Rui Pan, Zhekai Chen, Tong Zhang</details></details>
+[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07502)
+[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/91b4f447bb06d081a7947b42df57491a04fa46f9)
+
+
+
++ **T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text** (11 June 2024)<details><summary>[ACL 2024] Aoxiong Yin, Haoyuan Li, Kai Shen et al.</summary> Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting Zhuang</details></details>
+[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07119)
+[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/186910d697bf7eb605aa055aee78fd91ce3ce9fe)
+
+
++ **Open-World Human-Object Interaction Detection via Multi-modal Prompts** (11 June 2024)<details><summary>Jie Yang, Bingliang Li, Ailing Zeng et al.</summary>Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang, Ruimao Zhang</details></details>
+[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07119)
+[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/186910d697bf7eb605aa055aee78fd91ce3ce9fe)
+
+
++ **Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?** (11 June 2024)<details><summary>Xingyu Fu, Muyu He, Yujie Lu et al.</summary>Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth</details></details>
+[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07221v1)
+[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/f0acf2a2293d963c3786e83bb198c75612adc446)
+
 + **InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks** (21 Dec 2023)<details><summary>Zhe Chen, Jiannan Wu, Wenhai Wang, et al.</summary>Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai</details>
 [![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.14238)
 [![citation](https://img.shields.io/badge/citation-10-blue.svg?paper=6a33e58ef961a3a0a5657518b2be86395eb7c8d0)](https://www.semanticscholar.org/paper/6a33e58ef961a3a0a5657518b2be86395eb7c8d0)