-
🌟 GUI Agents: A Survey,
arXiv, 2412.13501
, arxiv, pdf, cication: -1Dang Nguyen, Jian Chen, Yu Wang, ..., Ryan A. Rossi, Franck Dernoncourt
-
Large Language Model-Brained GUI Agents: A Survey,
arXiv, 2411.18279
, arxiv, pdf, cication: -1Chaoyun Zhang, Shilin He, Jiaxu Qian, ..., Dongmei Zhang, Qi Zhang
-
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey,
arXiv, 2411.02006
, arxiv, pdf, cication: -1Biao Wu, Yanda Li, Meng Fang, ..., Yunchao Wei, Ling Chen · (awesome-mobile-agents - aialt)
-
GUI Agents with Foundation Models: A Comprehensive Survey,
arXiv, 2411.04890
, arxiv, pdf, cication: -1Shuai Wang, Weiwen Liu, Jingxuan Chen, ..., Yasheng Wang, Ruiming Tang
-
Agent Laboratory: Using LLM Agents as Research Assistants,
arXiv, 2501.04227
, arxiv, pdf, cication: -1Samuel Schmidgall, Yusheng Su, Ze Wang, ..., Zicheng Liu, Emad Barsoum · (agentlaboratory.github)
-
· (𝕏)
-
Agents Are Not Enough,
arXiv, 2412.16241
, arxiv, pdf, cication: -1Chirag Shah, Ryen W. White · (𝕏)
-
Can Large Language Models Adapt to Other Agents In-Context?,
arXiv, 2412.19726
, arxiv, pdf, cication: -1Matthew Riemer, Zahra Ashktorab, Djallel Bouneffouf, ..., Justin D. Weisz, Murray Campbell
-
Introducing smolagents, a simple library to build agents 🤗
· (smolagents - huggingface)
-
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows,
arXiv, 2403.11322
, arxiv, pdf, cication: -1Yiran Wu, Tianwei Yue, Shaokun Zhang, ..., Chi Wang, Qingyun Wu · (stateflow - yiranwu0)
-
AI PERSONA: Towards Life-long Personalization of LLMs,
arXiv, 2412.13103
, arxiv, pdf, cication: -1Tiannan Wang, Meiling Tao, Ruoyu Fang, ..., Yuchen Eleanor Jiang, Wangchunshu Zhou
-
voyage-3 & voyage-3-lite: A new generation of small yet mighty general-purpose embedding models
-
DynaSaur: Large Language Agents Beyond Predefined Actions,
arXiv, 2411.01747
, arxiv, pdf, cication: -1Dang Nguyen, Viet Dac Lai, Seunghyun Yoon, ..., Franck Dernoncourt, Tianyi Zhou · (dynasaur - adobe-research)
-
🌟 Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level,
arXiv, 2411.03562
, arxiv, pdf, cication: -1Antoine Grosnit, Alexandre Maraval, James Doran, ..., Haitham Bou-Ammar, Jun Wang
-
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant,
arXiv, 2410.18603
, arxiv, pdf, cication: -1Chengyou Jia, Minnan Luo, Zhuohang Dang, ..., Tianbao Xie, Zhiyong Wu · (chengyou-jia.github)
-
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions,
arXiv, 2410.20424
, arxiv, pdf, cication: -1Ziming Li, Qianbo Zang, David Ma, ..., Wenhao Huang, Ge Zhang · (AutoKaggle%5D - multimodal-art-projection)
-
Agentic Information Retrieval,
arXiv, 2410.09713
, arxiv, pdf, cication: 1Weinan Zhang, Junwei Liao, Ning Li, ..., Kounianhua Du
-
🌟 xgrammar - mlc-ai
· (blog - mlc-ai)
-
outlines - dottxt-ai
-
cfahlgren1 / qwen-2.5-code-interpreter 🤗
· (x)
-
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models,
arXiv, 2410.11710
, arxiv, pdf, cication: -1Pei Wang, Yanan Wu, Zekun Wang, ..., Wenbo Su, Bo Zheng
-
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models,
arXiv, 2410.11805
, arxiv, pdf, cication: -1Han Han, Tong Zhu, Xiang Zhang, ..., Hao Xiong, Wenliang Chen
-
browserless - browserless
-
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection,
arXiv, 2501.04575
, arxiv, pdf, cication: -1Yuhang Liu, Pengxiang Li, Zishu Wei, ..., Hongxia Yang, Fei Wu · (InfiGUIAgent - Reallm-Labs)
-
web-ui - browser-use
-
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents,
arXiv, 2410.05243
, arxiv, pdf, cication: -1Boyu Gou, Ruohan Wang, Boyuan Zheng, ..., Huan Sun, Yu Su · (osu-nlp-group.github) · (UGround - OSU-NLP-Group) · (arxiv) · (huggingface)
-
A3: Android Agent Arena for Mobile GUI Agents,
arXiv, 2501.01149
, arxiv, pdf, cication: -1Yuxiang Chai, Hanhao Li, Jiayu Zhang, ..., Siyuan Huang, Hongsheng Li · (yuxiangchai.github)
-
browser-use-webui - warmshao
-
🌟 OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis,
arXiv, 2412.19723
, arxiv, pdf, cication: -1Qiushi Sun, Kanzhi Cheng, Zichen Ding, ..., Yu Qiao, Zhiyong Wu · (qiushisun.github)
-
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments,
arXiv, 2404.07972
, arxiv, pdf, cication: -1Tianbao Xie, Danyang Zhang, Jixuan Chen, ..., Victor Zhong, Tao Yu · (os-world.github) · (OSWorld - xlang-ai)
-
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World,
arXiv, 2412.17589
, arxiv, pdf, cication: -1Yanheng He, Jiahe Jin, Shijie Xia, ..., Xiangkun Hu, Pengfei Liu · (gair-nlp.github) · (PC-Agent - GAIR-NLP)
-
Aria-UI: Visual Grounding for GUI Instructions,
arXiv, 2412.16256
, arxiv, pdf, cication: -1Yuhao Yang, Yue Wang, Dongxu Li, ..., Chao Huang, Junnan Li · (ariaui.github) · (Aria-UI - AriaUI)
-
🌟 CogAgent - THUDM
· (cogagent.aminer)
-
AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation,
arXiv, 2412.18116
, arxiv, pdf, cication: -1Hao Wen, Shizuo Tian, Borislav Pavlov, ..., Ya-Qin Zhang, Yuanchun Li
-
SeleniumBase - seleniumbase
Web Crawling / Testing / Scraping / Stealth
-
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents,
arXiv, 2412.13194
, arxiv, pdf, cication: -1Yifei Zhou, Qianlan Yang, Kaixiang Lin, ..., Sergey Levine, Erran Li · (yanqval.github)
-
Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models without Fine-Tuning,
arXiv, 2412.10840
, arxiv, pdf, cication: -1Hai-Ming Xu, Qi Chen, Lei Wang, ..., Lingqiao Liu
-
Large Action Models: From Inception to Implementation,
arXiv, 2412.10047
, arxiv, pdf, cication: -1Lu Wang, Fangkai Yang, Chaoyun Zhang, ..., Dongmei Zhang, Qi Zhang · (microsoft.github)
-
helium - mherrmann
-
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials,
arXiv, 2412.09605
, arxiv, pdf, cication: -1Yiheng Xu, Dunjie Lu, Zhennan Shen, ..., Caiming Xiong, Tao Yu · (easyref-gen.github)
-
🌟 Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction,
arXiv, 2412.04454
, arxiv, pdf, cication: -1Yiheng Xu, Zekun Wang, Junli Wang, ..., Tao Yu, Caiming Xiong · (aguvis-project.github)
-
ShowUI: One Vision-Language-Action Model for GUI Visual Agent,
arXiv, 2411.17465
, arxiv, pdf, cication: -1Kevin Qinghong Lin, Linjie Li, Difei Gao, ..., Lijuan Wang, Mike Zheng Shou
-
PTA-1: Controlling Computers with Small Models 🤗
· (huggingface)
-
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use,
arXiv, 2411.10323
, arxiv, pdf, cication: -1Siyuan Hu, Mingyu Ouyang, Difei Gao, ..., Mike Zheng Shou · (computer_use_ootb - showlab)
-
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents,
arXiv, 2411.06559
, arxiv, pdf, cication: -1Yu Gu, Boyuan Zheng, Boyu Gou, ..., Huan Sun, Yu Su · (WebDreamer - OSU-NLP-Group)
-
Sharingan: Extract User Action Sequence from Desktop Recordings,
arXiv, 2411.08768
, arxiv, pdf, cication: -1Yanting Chen, Yi Ren, Xiaoting Qin, ..., Saravan Rajmohan, Qi Zhang
-
🌟 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning,
arXiv, 2411.02337
, arxiv, pdf, cication: -1Zehan Qi, Xiao Liu, Iat Long Iong, ..., Jie Tang, Yuxiao Dong
-
🌟 AutoGLM: Autonomous Foundation Agents for GUIs,
arXiv, 2411.00820
, arxiv, pdf, cication: -1Xiao Liu, Bo Qin, Dongzhu Liang, ..., Yuxiao Dong, Jie Tang
-
🌟 OS-ATLAS: A Foundation Action Model for Generalist GUI Agents,
arXiv, 2410.23218
, arxiv, pdf, cication: -1Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, ..., Paul Pu Liang, Yu Qiao · (osatlas.github) · (OS-Atlas - OS-Copilot)
-
skyvern - Skyvern-AI
· (skyvern)
-
Agent S: An Open Agentic Framework that Uses Computers Like a Human,
arXiv, 2410.08164
, arxiv, pdf, cication: -1Saaket Agashe, Jiuzhou Han, Shuyu Gan, ..., Ang Li, Xin Eric Wang · (Agent-S - simular-ai)
-
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting,
arXiv, 2410.17856
, arxiv, pdf, cication: -1Shaofei Cai, Zihao Wang, Kewei Lian, ..., Anji Liu, Yitao Liang · (craftjarvis.github)
-
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization,
arXiv, 2410.19609
, arxiv, pdf, cication: -1Hongliang He, Wenlin Yao, Kaixin Ma, ..., Zhenzhong Lan, Dong Yu
-
Ferret-UI is the first UI-centric multimodal large language model (MLLM) designed for referring 🤗
-
agent.exe - corbt
the easiest way to let Claude's new computer use capabilities take over your computer!
-
computer_use_ootb - showlab
-
OmniParser - microsoft
Screen Parsing tool for Pure Vision Based GUI Agent · (arxiv)
-
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation,
arXiv, 2410.13232
, arxiv, pdf, cication: -1Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, ..., Dongha Lee, Jinyoung Yeo
-
MobA: A Two-Level Agent System for Efficient Mobile Task Automation,
arXiv, 2410.13757
, arxiv, pdf, cication: -1Zichen Zhu, Hao Tang, Yansi Li, ..., Lu Chen, Kai Yu · (MobA - OpenDFM)
-
OmAgent - om-ai-lab
-
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines,
arXiv, 2410.21220
, arxiv, pdf, cication: -1Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding, ..., Xiangyu Yue · (arxiv) · (cnzzx.github) · (VSA - cnzzx)
-
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks,
arXiv, 2412.14161
, arxiv, pdf, cication: -1Frank F. Xu, Yufan Song, Boxuan Li, ..., Shuyan Zhou, Graham Neubig
-
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents,
arXiv, 2410.24024
, arxiv, pdf, cication: -1Yifan Xu, Xiao Liu, Xueqiao Sun, ..., Jie Tang, Yuxiao Dong · (Android-Lab - THUDM)
-
Agent-as-a-Judge: Evaluate Agents with Agents,
arXiv, 2410.10934
, arxiv, pdf, cication: -1Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, ..., Vikas Chandra, Jürgen Schmidhuber
-
eliza - elizaOS
-
crewAI - crewAIInc
Production-grade framework for orchestrating sophisticated AI agent systems.
-
Agentarium - Thytu
-
PraisonAI - MervinPraison
-
smolagents - huggingface
-
python-sdk - modelcontextprotocol
-
dynasaur - adobe-research
-
steel-browser - steel-dev
-
Qwen-Agent - QwenLM
-
postbot3000 - ahmad2b
-
🌟 LLM agent memory frameworks are compared across multiple GitHub projects
· (reddit)
-
letta - letta-ai
-
· (nexusflow)
-
composio - ComposioHQ
-
taskgen - simbianai
· (taskgen - tanchongmin)
-
bee-agent-framework - i-am-bee
-
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains,
arXiv, 2501.05707
, arxiv, pdf, cication: -1Vighnesh Subramaniam, Yilun Du, Joshua B. Tenenbaum, ..., Shuang Li, Igor Mordatch · (llm-multiagent-ft.github)
-
swarms - kyegomez
-
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration,
arXiv, 2412.04440
, arxiv, pdf, cication: -1Kaiyi Huang, Yukun Huang, Xuefei Ning, ..., Yu Wang, Xihui Liu · (karine-h.github) · (arxiv)
-
MALT: Improving Reasoning with Multi-Agent LLM Training,
arXiv, 2412.01928
, arxiv, pdf, cication: -1Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, ..., Ronald Clark, Christian Schroeder de Witt
-
Generative Agent Simulations of 1,000 People,
arXiv, 2411.10109
, arxiv, pdf, cication: -1Joon Sung Park, Carolyn Q. Zou, Aaron Shaw, ..., Percy Liang, Michael S. Bernstein · (𝕏) · (genagents - joonspk-research) · (mp.weixin.qq)
-
multi-agent-orchestrator - awslabs
-
The Virtual Lab: AI Agents Design New SARS-CoV-2 Nanobodies with Experimental Validation
· (𝕏)
-
TinyTroupe - microsoft
-
ag2 - ag2ai
-
autogen - microsoft
· (microsoft)
-
Project Sid: Many-agent simulations toward AI civilization,
arXiv, 2411.00114
, arxiv, pdf, cication: -1Altera. AL, Andrew Ahn, Nic Becker, ..., Feitong Yang, Guangyu Robert Yang