LLM Agent

LLM Agent
- Survey
- Agents
- LLM OS
- Auto GPT
- Tool GPT
- UI Agent
- Multi Modal
- Evaluation
- Projects
- Products
- Misc

Survey

🌟 GUI Agents: A Survey, arXiv, 2412.13501, arxiv, pdf, cication: -1

Dang Nguyen, Jian Chen, Yu Wang, ..., Ryan A. Rossi, Franck Dernoncourt
Large Language Model-Brained GUI Agents: A Survey, arXiv, 2411.18279, arxiv, pdf, cication: -1

Chaoyun Zhang, Shilin He, Jiaxu Qian, ..., Dongmei Zhang, Qi Zhang
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey, arXiv, 2411.02006, arxiv, pdf, cication: -1

Biao Wu, Yanda Li, Meng Fang, ..., Yunchao Wei, Ling Chen · (awesome-mobile-agents - aialt)
GUI Agents with Foundation Models: A Comprehensive Survey, arXiv, 2411.04890, arxiv, pdf, cication: -1

Shuai Wang, Weiwen Liu, Jingxuan Chen, ..., Yasheng Wang, Ruiming Tang

Agents

Agent Laboratory: Using LLM Agents as Research Assistants, arXiv, 2501.04227, arxiv, pdf, cication: -1

Samuel Schmidgall, Yusheng Su, Ze Wang, ..., Zicheng Liu, Emad Barsoum · (agentlaboratory.github)
Agents

· (𝕏)
Agents Are Not Enough, arXiv, 2412.16241, arxiv, pdf, cication: -1

Chirag Shah, Ryen W. White · (𝕏)
Can Large Language Models Adapt to Other Agents In-Context?, arXiv, 2412.19726, arxiv, pdf, cication: -1

Matthew Riemer, Zahra Ashktorab, Djallel Bouneffouf, ..., Justin D. Weisz, Murray Campbell
Introducing smolagents, a simple library to build agents 🤗

· (smolagents - huggingface)
Building effective agents
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows, arXiv, 2403.11322, arxiv, pdf, cication: -1

Yiran Wu, Tianwei Yue, Shaokun Zhang, ..., Chi Wang, Qingyun Wu · (stateflow - yiranwu0)
AI PERSONA: Towards Life-long Personalization of LLMs, arXiv, 2412.13103, arxiv, pdf, cication: -1

Tiannan Wang, Meiling Tao, Ruoyu Fang, ..., Yuchen Eleanor Jiang, Wangchunshu Zhou
voyage-3 & voyage-3-lite: A new generation of small yet mighty general-purpose embedding models
DynaSaur: Large Language Agents Beyond Predefined Actions, arXiv, 2411.01747, arxiv, pdf, cication: -1

Dang Nguyen, Viet Dac Lai, Seunghyun Yoon, ..., Franck Dernoncourt, Tianyi Zhou · (dynasaur - adobe-research)
🌟 Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level, arXiv, 2411.03562, arxiv, pdf, cication: -1

Antoine Grosnit, Alexandre Maraval, James Doran, ..., Haitham Bou-Ammar, Jun Wang
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant, arXiv, 2410.18603, arxiv, pdf, cication: -1

Chengyou Jia, Minnan Luo, Zhuohang Dang, ..., Tianbao Xie, Zhiyong Wu · (chengyou-jia.github)
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions, arXiv, 2410.20424, arxiv, pdf, cication: -1

Ziming Li, Qianbo Zang, David Ma, ..., Wenhao Huang, Ge Zhang · (AutoKaggle%5D - multimodal-art-projection)
Agentic Information Retrieval, arXiv, 2410.09713, arxiv, pdf, cication: 1

Weinan Zhang, Junwei Liao, Ning Li, ..., Kounianhua Du

LLM OS

/dev/agents came out of stealth with $56M to build an OS for AI agents 𝕏

Auto GPT

Tool GPT

Introducing the Model Context Protocol (MCP) 𝕏
🌟 xgrammar - mlc-ai

· (blog - mlc-ai)
outlines - dottxt-ai
cfahlgren1 / qwen-2.5-code-interpreter 🤗

· (x)
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models, arXiv, 2410.11710, arxiv, pdf, cication: -1

Pei Wang, Yanan Wu, Zekun Wang, ..., Wenbo Su, Bo Zheng
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models, arXiv, 2410.11805, arxiv, pdf, cication: -1

Han Han, Tong Zhu, Xiang Zhang, ..., Hao Xiong, Wenliang Chen

UI Agent

browserless - browserless
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection, arXiv, 2501.04575, arxiv, pdf, cication: -1

Yuhang Liu, Pengxiang Li, Zishu Wei, ..., Hongxia Yang, Fei Wu · (InfiGUIAgent - Reallm-Labs)
web-ui - browser-use
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents, arXiv, 2410.05243, arxiv, pdf, cication: -1

Boyu Gou, Ruohan Wang, Boyuan Zheng, ..., Huan Sun, Yu Su · (osu-nlp-group.github) · (UGround - OSU-NLP-Group) · (arxiv) · (huggingface)
A3: Android Agent Arena for Mobile GUI Agents, arXiv, 2501.01149, arxiv, pdf, cication: -1

Yuxiang Chai, Hanhao Li, Jiayu Zhang, ..., Siyuan Huang, Hongsheng Li · (yuxiangchai.github)
browser-use-webui - warmshao
🌟 OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis, arXiv, 2412.19723, arxiv, pdf, cication: -1

Qiushi Sun, Kanzhi Cheng, Zichen Ding, ..., Yu Qiao, Zhiyong Wu · (qiushisun.github)
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments, arXiv, 2404.07972, arxiv, pdf, cication: -1

Tianbao Xie, Danyang Zhang, Jixuan Chen, ..., Victor Zhong, Tao Yu · (os-world.github) · (OSWorld - xlang-ai)
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World, arXiv, 2412.17589, arxiv, pdf, cication: -1

Yanheng He, Jiahe Jin, Shijie Xia, ..., Xiangkun Hu, Pengfei Liu · (gair-nlp.github) · (PC-Agent - GAIR-NLP)
Aria-UI: Visual Grounding for GUI Instructions, arXiv, 2412.16256, arxiv, pdf, cication: -1

Yuhao Yang, Yue Wang, Dongxu Li, ..., Chao Huang, Junnan Li · (ariaui.github) · (Aria-UI - AriaUI)
🌟 CogAgent - THUDM

· (cogagent.aminer)
AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation, arXiv, 2412.18116, arxiv, pdf, cication: -1

Hao Wen, Shizuo Tian, Borislav Pavlov, ..., Ya-Qin Zhang, Yuanchun Li
SeleniumBase - seleniumbase

Web Crawling / Testing / Scraping / Stealth
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents, arXiv, 2412.13194, arxiv, pdf, cication: -1

Yifei Zhou, Qianlan Yang, Kaixiang Lin, ..., Sergey Levine, Erran Li · (yanqval.github)
Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models without Fine-Tuning, arXiv, 2412.10840, arxiv, pdf, cication: -1

Hai-Ming Xu, Qi Chen, Lei Wang, ..., Lingqiao Liu
Large Action Models: From Inception to Implementation, arXiv, 2412.10047, arxiv, pdf, cication: -1

Lu Wang, Fangkai Yang, Chaoyun Zhang, ..., Dongmei Zhang, Qi Zhang · (microsoft.github)
helium - mherrmann
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials, arXiv, 2412.09605, arxiv, pdf, cication: -1

Yiheng Xu, Dunjie Lu, Zhennan Shen, ..., Caiming Xiong, Tao Yu · (easyref-gen.github)
🌟 Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction, arXiv, 2412.04454, arxiv, pdf, cication: -1

Yiheng Xu, Zekun Wang, Junli Wang, ..., Tao Yu, Caiming Xiong · (aguvis-project.github)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent, arXiv, 2411.17465, arxiv, pdf, cication: -1

Kevin Qinghong Lin, Linjie Li, Difei Gao, ..., Lijuan Wang, Mike Zheng Shou
PTA-1: Controlling Computers with Small Models 🤗

· (huggingface)
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use, arXiv, 2411.10323, arxiv, pdf, cication: -1

Siyuan Hu, Mingyu Ouyang, Difei Gao, ..., Mike Zheng Shou · (computer_use_ootb - showlab)
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents, arXiv, 2411.06559, arxiv, pdf, cication: -1

Yu Gu, Boyuan Zheng, Boyu Gou, ..., Huan Sun, Yu Su · (WebDreamer - OSU-NLP-Group)
Sharingan: Extract User Action Sequence from Desktop Recordings, arXiv, 2411.08768, arxiv, pdf, cication: -1

Yanting Chen, Yi Ren, Xiaoting Qin, ..., Saravan Rajmohan, Qi Zhang
🌟 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning, arXiv, 2411.02337, arxiv, pdf, cication: -1

Zehan Qi, Xiao Liu, Iat Long Iong, ..., Jie Tang, Yuxiao Dong
🌟 AutoGLM: Autonomous Foundation Agents for GUIs, arXiv, 2411.00820, arxiv, pdf, cication: -1

Xiao Liu, Bo Qin, Dongzhu Liang, ..., Yuxiao Dong, Jie Tang
🌟 OS-ATLAS: A Foundation Action Model for Generalist GUI Agents, arXiv, 2410.23218, arxiv, pdf, cication: -1

Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, ..., Paul Pu Liang, Yu Qiao · (osatlas.github) · (OS-Atlas - OS-Copilot)
jadechoghari / OmniParser 🤗
skyvern - Skyvern-AI

· (skyvern)
Agent S: An Open Agentic Framework that Uses Computers Like a Human, arXiv, 2410.08164, arxiv, pdf, cication: -1

Saaket Agashe, Jiuzhou Han, Shuyu Gan, ..., Ang Li, Xin Eric Wang · (Agent-S - simular-ai)
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting, arXiv, 2410.17856, arxiv, pdf, cication: -1

Shaofei Cai, Zihao Wang, Kewei Lian, ..., Anji Liu, Yitao Liang · (craftjarvis.github)
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization, arXiv, 2410.19609, arxiv, pdf, cication: -1

Hongliang He, Wenlin Yao, Kaixin Ma, ..., Zhenzhong Lan, Dong Yu
Ferret-UI is the first UI-centric multimodal large language model (MLLM) designed for referring 🤗
agent.exe - corbt

the easiest way to let Claude's new computer use capabilities take over your computer!
computer_use_ootb - showlab
OmniParser - microsoft

Screen Parsing tool for Pure Vision Based GUI Agent · (arxiv)
Developing a computer use model
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation, arXiv, 2410.13232, arxiv, pdf, cication: -1

Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, ..., Dongha Lee, Jinyoung Yeo
MobA: A Two-Level Agent System for Efficient Mobile Task Automation, arXiv, 2410.13757, arxiv, pdf, cication: -1

Zichen Zhu, Hao Tang, Yansi Li, ..., Lu Chen, Kai Yu · (MobA - OpenDFM)

Multi Modal

OmAgent - om-ai-lab
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines, arXiv, 2410.21220, arxiv, pdf, cication: -1

Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding, ..., Xiangyu Yue · (arxiv) · (cnzzx.github) · (VSA - cnzzx)

Evaluation

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks, arXiv, 2412.14161, arxiv, pdf, cication: -1

Frank F. Xu, Yufan Song, Boxuan Li, ..., Shuyan Zhou, Graham Neubig
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents, arXiv, 2410.24024, arxiv, pdf, cication: -1

Yifan Xu, Xiao Liu, Xueqiao Sun, ..., Jie Tang, Yuxiao Dong · (Android-Lab - THUDM)
Agent-as-a-Judge: Evaluate Agents with Agents, arXiv, 2410.10934, arxiv, pdf, cication: -1

Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, ..., Vikas Chandra, Jürgen Schmidhuber

Projects

eliza - elizaOS
crewAI - crewAIInc

Production-grade framework for orchestrating sophisticated AI agent systems.
Agentarium - Thytu
PraisonAI - MervinPraison
smolagents - huggingface
python-sdk - modelcontextprotocol
dynasaur - adobe-research
steel-browser - steel-dev
Qwen-Agent - QwenLM
postbot3000 - ahmad2b
🌟 LLM agent memory frameworks are compared across multiple GitHub projects

· (reddit)
letta - letta-ai
Athene-V2-Agent is an open-source Agent LLM that surpasses the state-of-the-art in function calling and agentic capabilities. 🤗

· (nexusflow)
composio - ComposioHQ
taskgen - simbianai

· (taskgen - tanchongmin)
bee-agent-framework - i-am-bee

Products

Common Sense Agents, a new backbone for agentic creative computing 𝕏

Misc

Agents
Anthropic总结智能体年度经验：最成功的≠最复杂的

Multi Agent

Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains, arXiv, 2501.05707, arxiv, pdf, cication: -1

Vighnesh Subramaniam, Yilun Du, Joshua B. Tenenbaum, ..., Shuang Li, Igor Mordatch · (llm-multiagent-ft.github)
swarms - kyegomez
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration, arXiv, 2412.04440, arxiv, pdf, cication: -1

Kaiyi Huang, Yukun Huang, Xuefei Ning, ..., Yu Wang, Xihui Liu · (karine-h.github) · (arxiv)
MALT: Improving Reasoning with Multi-Agent LLM Training, arXiv, 2412.01928, arxiv, pdf, cication: -1

Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, ..., Ronald Clark, Christian Schroeder de Witt
Generative Agent Simulations of 1,000 People, arXiv, 2411.10109, arxiv, pdf, cication: -1

Joon Sung Park, Carolyn Q. Zou, Aaron Shaw, ..., Percy Liang, Michael S. Bernstein · (𝕏) · (genagents - joonspk-research) · (mp.weixin.qq)
multi-agent-orchestrator - awslabs
The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation

· (𝕏)
TinyTroupe - microsoft
ag2 - ag2ai
autogen - microsoft

· (microsoft)
Project Sid: Many-agent simulations toward AI civilization, arXiv, 2411.00114, arxiv, pdf, cication: -1

Altera. AL, Andrew Ahn, Nic Becker, ..., Feitong Yang, Guangyu Robert Yang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm_agent.md

llm_agent.md

LLM Agent

Survey

Agents

LLM OS

Auto GPT

Tool GPT

UI Agent

Multi Modal

Evaluation

Projects

Products

Misc

Multi Agent

Files

llm_agent.md

Latest commit

History

llm_agent.md

File metadata and controls

LLM Agent

Survey

Agents

LLM OS

Auto GPT

Tool GPT

UI Agent

Multi Modal

Evaluation

Projects

Products

Misc

Multi Agent