Awesome-LLM-paper

This repository contains papers related to all kinds of LLMs.

We strongly encourage researchers in the hope of advancing their excellent work.

Resources

Workshops and Tutorials

Theme	Source	Link	Other
……	……	……	……
Descriptions	……

Papers

Survey

Paper	Source	Link	Other
A Survey on Multimodal Large Language Models for Autonomous Driving	arXiv:2311.12320	bilibili	……
Descriptions	……
Retrieval-Augmented Generation for Large Language Models: A Survey	Arxiv2023'Tongji University	……	……
Descriptions	This paper provides a comprehensive overview of the integration of retrieval mechanisms with generative processes within large language models to enhance their performance and knowledge capabilities.
A Survey on Multimodal Large Language Models for Autonomous Driving	WACV2023'Purdue University	Bilibili: MLM for Autonomous Driving Survey	Github: MLM for Autonomous Driving Survey
Descriptions	This paper provides a comprehensive overview of the integration of retrieval mechanisms with generative processes within large language models to enhance their performance and knowledge capabilities.

Benchmark and Evaluation

Paper	Source	Link	Other
……	……	……	……
Descriptions	……

RAG

Paper	Source	Link	Other
Improving Text Embeddings with Large Language Models	Arxiv2024'Microsoft	……	Hugging Face: e5-mistral-7b-instruct
Descriptions	Mistral's primary work is achieved through a two-stage prompt process: The first stage generates a pool of candidate tasks through brainstorming prompts. The second stage synthesizes data from the pool of candidate tasks. The author categorizes tasks into two major types - asymmetric tasks where the retrieved pair consists of a query and a document, differentiated by length into short-long matches, long-short matches, long-long matches, and short-short matches, with a classic example being the search engine scenario. The paper indicates that large language models (LLMs) can significantly improve the quality of text embeddings, partly due to the synthetic data and partly due to the autoregressive capabilities of the LLMs. Moreover, it can streamline multi-stage embedding tasks into a single-stage fine-tuning (SFT) task, simplifying the training process.
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems	NAACL 2024	bilibili	Code: stanford-futuredata/ARES
Descriptions	ARES, an Automated RAG Evaluation System, efficiently evaluates retrieval-augmented generation systems across multiple tasks using synthetic data and minimal human annotations, maintaining accuracy even with domain shifts.

Embedding

Paper	Source	Link	Other
C-Pack: Packaged Resources To Advance General Chinese Embedding	Arxiv2023'BAAI	Bilibili: C-Pack	Github: C-Pack
Descriptions	BAAI and Huggingface introduce C-Pack which is an advanced model for Chinese embeddings, significantly outperforming existing models and includes comprehensive benchmarks, a massive dataset, and a range of models. BAAI 联合Huggingface 推出的 C-Pack，主打中文嵌入，性能明显优于现有模型，包括全面的基准测试、大规模数据集和多种模型。

LLM

Paper	Source	Link	Other
Llama 2: Open Foundation and Fine-Tuned Chat Models	Arxiv2023'Meta	bilibili	Github: Llama
Descriptions	The technical report of Llama 2 from Meta Which is one of the top leaders of the LLMs open-sourced community. The greatest contribution of Llama 2 is the development of a range of pretrained and fine-tuned large language models (LLMs) that not only outperform existing open-source chat models on various benchmarks but are also optimized for dialogue scenarios. Additionally, these models have shown excellent performance in human evaluations of helpfulness and safety, potentially serving as effective substitutes for closed-source models. The Llama 2 project also provides a detailed description of the fine-tuning process and safety enhancements, aimed at fostering further development by the community and contributing to the responsible development of large language models. 本论文是Llama 2 模型发布的技术报告，来自全球最主要的大模型开源领袖之一 Meta。Llama 2的最大贡献是开发了一系列预训练和微调的大型语言模型（LLM），这些模型不仅在多个基准测试中优于现有的开源聊天模型，而且还经过优化，特别适用于对话场景。此外，这些模型在人类评估的帮助性和安全性方面表现出色，可能成为闭源模型的有效替代品。Llama 2项目还提供了对微调过程和安全性提升的详细描述，旨在促进社区基于此工作进一步发展，贡献于负责任的大型语言模型的开发。
Higher Layers Need More LoRA Experts	Arxiv2024'Northwestern University	……	……
Descriptions	In deep learning models, higher layers require more LoRA (Low-Rank Adaptation) experts to enhance the model’s expressive power and adaptability.
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression	Arxiv2023'Microsoft	……	……
Descriptions	To accelerate and enhance the performance of large language models (LLMs) in handling long texts, compressing prompts can be an effective method.
Can AI Assistants Know What They Don't Know?	Arxiv2024'Fudan University	……	Code: Say-I-Dont-Know
Descriptions	The paper explores if AI assistants can identify when they don't know something, creating a "I don't know" dataset to teach this, resulting in fewer false answers and increased accuracy.
Code Llama: Open Foundation Models for Code	Arxiv2023'Meta AI	bilibili	codellama
Descriptions	The article introduces Code Llama, a family of large programming language models developed by Meta AI, based on Llama 2, designed to offer state-of-the-art performance among open models, support large input contexts, and possess zero-shot instruction following capabilities for programming tasks.
Are Emergent Abilities of Large Language Models a Mirage?	NIPS2023'Stanford University	bilibili	……
Descriptions	The article challenges the notion that large language models (LLMs) exhibit "emergent abilities," suggesting that these abilities may be an artifact of the metrics chosen by researchers rather than inherent properties of the models themselves. Through mathematical modeling, empirical testing, and meta-analysis, the authors demonstrate that alternative metrics or improved statistical methods can eliminate the perception of emergent abilities, casting doubt on their existence as a fundamental aspect of scaling AI models.
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models	Arxiv2023'MEGVII Technology	……	VaryBase
Descriptions	The article introduces Vary, a method for expanding the visual vocabulary of Large Vision-Language Models (LVLMs) to enhance dense and fine-grained visual perception capabilities for specific visual tasks, such as document-level OCR or chart understanding.
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks	Arxiv2019'UKP Lab	bilibili	sentence-transformers
Descriptions	The paper introduces Sentence-BERT (SBERT), a modification of the BERT network that employs siamese and triplet network structures to produce semantically meaningful sentence embeddings that can be compared using cosine similarity, thereby significantly enhancing the efficiency of sentence similarity search and clustering tasks.
Mixtral of Experts	Arxiv2019'UKP Lab	……	Mixtral of Experts
Descriptions	Mixtral is a model based on the Transformer architecture with two key differences: Mixtral supports a full dense context length of up to 32,000 tokens; It utilizes a Mixture of Experts (MoE) layer instead of the traditional feedforward network blocks. The model is similar to the Mistral 7B architecture, but each layer includes eight feedforward units ("experts"). During processing, a routing network at each layer selects two "experts" to handle and merge the output for each token. Although only two experts’ data are processed per token, different experts may be chosen at each timestep. As a result, while each token has access to 47B parameters, only 13B active parameters are used during inference. Mixtral was trained with a context range of 32k tokens and has outperformed or matched the Llama 2 70B and GPT-3.5 in benchmarks, particularly excelling in mathematics, code generation, and multilingual tasks. Additionally, a specially tuned model—Mixtral 8x7B – Instruct—has surpassed human benchmark models including GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B chat models.
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model			https://github.com/Chinese-Tiny-LLM/Chinese-Tiny-LLM
Descriptions	本研究介绍了CT-LLM，一个优先考虑中文的大型语言模型（LLM），使用12000亿标记的大型语料库，其中8000亿为中文标记。CT-LLM通过从头开始并主要使用中文数据，展现了在理解和处理中文方面的卓越能力，同时通过对齐技术进一步提升。该模型在CHC-Bench上表现出色，在中文任务中表现优异，并展示了其在英语任务中的熟练程度。本研究挑战了主要使用英语语料库训练LLM的现有方法，开辟了新的训练方法视野。通过开源完整的训练过程和相关资源（如MAP-CC和CHC-Bench），我们希望促进学术界和工业界的进一步探索和创新，推动更包容和多功能的语言模型的发展。
Confident Adaptive Language Modeling	Source		Other
Descriptions	近年来，基于Transformer的大型语言模型（LLM）在许多任务上取得了显著性能提升，但这些进步伴随着模型规模和推理时间成本的增加。实际上，LLM生成的序列包含不同难度级别的任务，一些预测需要模型的全部计算能力，而其他预测则可以用较少的计算资源完成。在本研究中，我们提出了自信自适应语言建模（CALM）框架，该框架根据输入和生成时间步骤动态分配计算资源。我们解决了早期退出解码的挑战，包括置信度衡量标准、将序列级约束连接到每个token的退出决策以及处理由于早期退出导致的隐藏表示缺失。通过理论分析和实验证明，该框架在保持高性能的同时，可将计算量减少至3倍。

Fine-tuning

Towards a Unified View of Parameter-Efficient Transfer Learning	ICLR2022'Carnegie Mellon University	……	unify-parameter-efficient-tuning
Descriptions	This paper presents a unified framework for understanding and improving various parameter-efficient transfer learning methods by modifying specific hidden states in pre-trained models, defining a set of design dimensions to differentiate between methods, and experimentally demonstrating the framework's ability to identify important design choices in previous methods and instantiate new parameter-efficient tuning methods that are more effective with fewer parameters.
QLoRA: Efficient Finetuning of Quantized LLMs	NeurIPS2023'University of Washington	bilibili	Github: QLoRA
Descriptions	This paper introduces QLoRA, a method for fine-tuning LLMs that significantly reduces memory usage. QLoRA achieves this by: Using a new data type called 4-bit NormalFloat (NF4) for weights, which is efficient for storing normally distributed weight values. Applying "double quantization" to compress the size of quantization constants. Employing "paged optimizers" to manage memory spikes during training. 这篇论文提出了一种名为 QLoRA 的大型语言模型的方法，可以显著降低内存使用量。QLoRA 通过以下方式实现这一点：使用一种名为 4-bit NormalFloat (NF4) 的新数据类型来存储权重，该数据类型对存储服从正态分布的权重值非常有效。应用“双量化”来压缩量化常数的尺寸。采用“分页优化器”来管理训练过程中的内存峰值。这些创新使 QLoRA 能够在内存有限的单个 GPU (48GB) 上微调大型模型 (例如，65B 参数)。训练出的模型在聊天机器人基准测试上实现了最先进的性能，甚至在某些情况下超过了 ChatGPT 等先前模型的性能。
Prefix-Tuning: Optimizing Continuous Prompts for Generation	ArXive2021'Stanford University	bilibili	...
Descriptions	This paper introduces prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks. Unlike fine-tuning, which modifies all language model parameters, prefix-tuning keeps them frozen and optimizes a small continuous task-specific vector (called the prefix). This allows prefix-tuning to be more efficient than fine-tuning, especially in low-data settings. 这篇论文提出了一种名为“前缀微调”的轻量级替代微调的方法，用于自然语言生成任务。与微调修改所有语言模型参数不同，前缀微调保持参数冻结，并优化一个小的连续任务特定向量 (称为前缀)。这使得前缀微调比微调更有效，尤其是在数据量较小的背景下。
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks	ACL2022'Tsinghua University	……	Github:
Descriptions	The author introduces P-tuning v2, which utilizes deep prompt optimization techniques, such as Prefix Tuning, to improve upon Prompt Tuning and P-Tuning as a universal solution across scales and NLU tasks. Compared to P-tuning, this method incorporates prompt tokens at every layer rather than just at the input layer, bringing two main benefits: More learnable parameters (increasing from 0.01% in P-tuning and Prompt Tuning to 0.1%-3%), while still being efficient. Embedding prompts into deeper structural layers has a more direct impact on model predictions.

Prompt/Context

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning	EMNLP2023'Peking University	bilibili	Github: ICL
Descriptions	This paper sheds light on the inner workings of in-context learning (ICL) in LLMs. While ICL has shown promise in enabling LLMs to perform various tasks through demonstrations, the mechanism behind this learning has been unclear. The authors investigate this mechanism through the lens of information flow and discover that labels in the demonstrations act as anchors. These labels serve two key functions: 1) During initial processing, semantic information accumulates within the representations of these label words. 2) This consolidated information acts as a reference point for the LLMs' final predictions. Based on these findings, the paper introduces three novel contributions: 1) An anchor re-weighting method to enhance ICL performance, 2) A demonstration compression technique to improve efficiency, and 3) An analysis framework to diagnose ICL errors in GPT2-XL. The effectiveness of these contributions validates the proposed mechanism and paves the way for future research in ICL. 这篇论文通过信息流视角揭示了大型语言模型 (LLM) 中的上下文学习 (ICL) 的内部工作原理。虽然 ICL 在通过演示让大型语言模型执行各种任务方面表现出潜力，但其背后的学习机制一直不清楚。作者通过信息流的视角研究了这种机制，并发现演示中的标签充当锚点作用。这些标签具有两个关键功能：1) 在初始处理过程中，语义信息会累积在这些标签词的表征中。2) 这种整合的信息作为大型语言模型最终预测的参考点。基于这些发现，论文提出了三项原创贡献：1) 提高 ICL 性能的锚点重新加权方法，2) 提高推理效率的演示压缩技术，3) 用于诊断 GPT2-XL 中 ICL 错误的分析框架。这些贡献的有效性验证了所提出的机制，并为 ICL 的未来研究铺平了道路。。

Ad click prediction

Deep & Cross Network for Ad Click Predictions Source Other …… …… …… …… Descriptions 特征工程对于预测模型的成功至关重要，但常常需要手动操作或详尽搜索。尽管DNN能够自动学习特征交互，但在所有类型的特征交互中效率不高。本文提出了深度与交叉网络（DCN），该网络在保留DNN优势的同时，引入了高效学习特定特征交互的新型交叉网络。DCN在每层显式进行特征交叉，无需手动特征工程，并且增加的复杂性极小。实验结果显示，DCN在CTR预测和密集分类数据集上的模型准确性和内存使用均优于现有最先进算法。

Agent

ChatDev: Communicative Agents for Software Development	Source	https://github.com/OpenBMB/ChatDev	Other
……	……	……	……
Descriptions	软件开发是一个需要多种技能协同的复杂任务。传统深度学习方法在瀑布模型的不同阶段（如设计、编码和测试）中存在技术不一致性，导致开发过程低效。本文提出了ChatDev，一个由大型语言模型驱动的聊天式软件开发框架，利用自然语言和编程语言的统一沟通方式，促进多代理系统在设计、编码和测试阶段的协作，提高了开发效率。ChatDev通过多轮对话生成解决方案，展示了语言作为多代理协作的桥梁的潜力

Tool Learning

Paper	Source	Link	Other
……	……	……	……
Descriptions	……

MMLM

Paper	Source	Link	Other
……	……	……	……
Descriptions	……

Reinforcement Learning

Paper	Source	Link	Other
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning	ICLR2023	……	……
Descriptions	Diffusion strategies, as a highly expressive class of policies, are used in offline reinforcement learning scenarios to improve learning efficiency and decision-making performance.

Paper	Source	Link	Other
Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling	ICLR2023	……	……
Descriptions	强化学习（RL）代理通常没有先验知识，从零开始学习。我们提出使用少量样本的大型语言模型（LLMs）来假设并验证一个抽象世界模型（AWM），以提高RL代理的样本效率。DECKARD代理在Minecraft中进行物品制作，通过两个阶段实现：Dream阶段，代理利用LLM将任务分解为子目标形成AWM；Wake阶段，代理为每个子目标学习策略并验证AWM。这种方法不仅显著提高了样本效率，还能纠正LLM中的错误，成功结合LLMs的噪声信息与环境动态中的知识。

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.idea		.idea
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-LLM-paper

Contents

Resources

Workshops and Tutorials

Papers

Survey

Benchmark and Evaluation

RAG

Embedding

LLM

Fine-tuning

Prompt/Context

Ad click prediction

Agent

Tool Learning

MMLM

Reinforcement Learning

🌟 Contributors

Star History

About

Releases

Packages

Contributors 6

License

aJupyter/Awesome-LLM-paper

Folders and files

Latest commit

History

Repository files navigation

Awesome-LLM-paper

Contents

Resources

Workshops and Tutorials

Papers

Survey

Benchmark and Evaluation

RAG

Embedding

LLM

Fine-tuning

Prompt/Context

Ad click prediction

Agent

Tool Learning

MMLM

Reinforcement Learning

🌟 Contributors

Star History

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Packages