From b98d459714526b96ad9d3b188037ae366af24796 Mon Sep 17 00:00:00 2001 From: xiaoweichi <94704855+litwellchi@users.noreply.github.com> Date: Sat, 29 Jun 2024 16:47:21 +0800 Subject: [PATCH 1/2] Update README.md --- README.md | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index 2179b20..0acff70 100644 --- a/README.md +++ b/README.md @@ -83,27 +83,6 @@ ### 🔅 LLM-based -+ **Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions** (11 June 2024)
Renjie Pi, Jianshu Zhang, Jipeng Zhang et al. Renjie Pi, Jianshu Zhang, Jipeng Zhang, Rui Pan, Zhekai Chen, Tong Zhang
-[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07502) -[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/91b4f447bb06d081a7947b42df57491a04fa46f9) - - - -+ **T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text** (11 June 2024)
[ACL 2024] Aoxiong Yin, Haoyuan Li, Kai Shen et al. Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting Zhuang
-[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07119) -[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/186910d697bf7eb605aa055aee78fd91ce3ce9fe) - - -+ **Open-World Human-Object Interaction Detection via Multi-modal Prompts** (11 June 2024)
Jie Yang, Bingliang Li, Ailing Zeng et al.Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang, Ruimao Zhang
-[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07119) -[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/186910d697bf7eb605aa055aee78fd91ce3ce9fe) - - -+ **Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?** (11 June 2024)
Xingyu Fu, Muyu He, Yujie Lu et al.Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth
-[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07221v1) -[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/f0acf2a2293d963c3786e83bb198c75612adc446) - - + **An Image is Worth 32 Tokens for Reconstruction and Generation** (11 June 2024)
Qihang Yu, Mark Weber, Xueqing Deng et al. Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen
[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07550) @@ -1955,6 +1934,27 @@ Dongchao Yang, Jinchuan Tian, Xu Tan\ # 📍 Multimodal Understanding with LLMs ## Image Understanding + ++ **Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions** (11 June 2024)
Renjie Pi, Jianshu Zhang, Jipeng Zhang et al. Renjie Pi, Jianshu Zhang, Jipeng Zhang, Rui Pan, Zhekai Chen, Tong Zhang
+[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07502) +[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/91b4f447bb06d081a7947b42df57491a04fa46f9) + + + ++ **T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text** (11 June 2024)
[ACL 2024] Aoxiong Yin, Haoyuan Li, Kai Shen et al. Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting Zhuang
+[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07119) +[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/186910d697bf7eb605aa055aee78fd91ce3ce9fe) + + ++ **Open-World Human-Object Interaction Detection via Multi-modal Prompts** (11 June 2024)
Jie Yang, Bingliang Li, Ailing Zeng et al.Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang, Ruimao Zhang
+[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07119) +[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/186910d697bf7eb605aa055aee78fd91ce3ce9fe) + + ++ **Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?** (11 June 2024)
Xingyu Fu, Muyu He, Yujie Lu et al.Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth
+[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.07221v1) +[![citation](https://img.shields.io/badge/citation-0-blue.svg?paper=da0d382c7fa981ba185ca633868442b75cb76de6)](https://www.semanticscholar.org/paper/f0acf2a2293d963c3786e83bb198c75612adc446) + + **InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks** (21 Dec 2023)
Zhe Chen, Jiannan Wu, Wenhai Wang, et al.Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai
[![Paper](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.14238) [![citation](https://img.shields.io/badge/citation-10-blue.svg?paper=6a33e58ef961a3a0a5657518b2be86395eb7c8d0)](https://www.semanticscholar.org/paper/6a33e58ef961a3a0a5657518b2be86395eb7c8d0) From 018b4bf9c6cba80dcdb8dafdcaeaa93c26d990c0 Mon Sep 17 00:00:00 2001 From: Yazhou Xing Date: Mon, 8 Jul 2024 20:32:04 +0800 Subject: [PATCH 2/2] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0acff70..0c84826 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@

LLMs Meet Multimodal Generation and Editing: A Survey

- +
# 🤗 Introduction