Skip to content

Latest commit

 

History

History
305 lines (213 loc) · 27.8 KB

llm_agent.md

File metadata and controls

305 lines (213 loc) · 27.8 KB

LLM Agent

Survey

  • 🌟 GUI Agents: A Survey, arXiv, 2412.13501, arxiv, pdf, cication: -1

    Dang Nguyen, Jian Chen, Yu Wang, ..., Ryan A. Rossi, Franck Dernoncourt

  • Large Language Model-Brained GUI Agents: A Survey, arXiv, 2411.18279, arxiv, pdf, cication: -1

    Chaoyun Zhang, Shilin He, Jiaxu Qian, ..., Dongmei Zhang, Qi Zhang

  • Foundations and Recent Trends in Multimodal Mobile Agents: A Survey, arXiv, 2411.02006, arxiv, pdf, cication: -1

    Biao Wu, Yanda Li, Meng Fang, ..., Yunchao Wei, Ling Chen · (awesome-mobile-agents - aialt) Star

  • GUI Agents with Foundation Models: A Comprehensive Survey, arXiv, 2411.04890, arxiv, pdf, cication: -1

    Shuai Wang, Weiwen Liu, Jingxuan Chen, ..., Yasheng Wang, Ruiming Tang

Agents

  • Agent Laboratory: Using LLM Agents as Research Assistants, arXiv, 2501.04227, arxiv, pdf, cication: -1

    Samuel Schmidgall, Yusheng Su, Ze Wang, ..., Zicheng Liu, Emad Barsoum · (agentlaboratory.github)

  • Agents

    · (𝕏)

  • Agents Are Not Enough, arXiv, 2412.16241, arxiv, pdf, cication: -1

    Chirag Shah, Ryen W. White · (𝕏)

  • Can Large Language Models Adapt to Other Agents In-Context?, arXiv, 2412.19726, arxiv, pdf, cication: -1

    Matthew Riemer, Zahra Ashktorab, Djallel Bouneffouf, ..., Justin D. Weisz, Murray Campbell

  • Introducing smolagents, a simple library to build agents 🤗

    · (smolagents - huggingface) Star

  • Building effective agents

  • StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows, arXiv, 2403.11322, arxiv, pdf, cication: -1

    Yiran Wu, Tianwei Yue, Shaokun Zhang, ..., Chi Wang, Qingyun Wu · (stateflow - yiranwu0) Star

  • AI PERSONA: Towards Life-long Personalization of LLMs, arXiv, 2412.13103, arxiv, pdf, cication: -1

    Tiannan Wang, Meiling Tao, Ruoyu Fang, ..., Yuchen Eleanor Jiang, Wangchunshu Zhou

  • voyage-3 & voyage-3-lite: A new generation of small yet mighty general-purpose embedding models

  • DynaSaur: Large Language Agents Beyond Predefined Actions, arXiv, 2411.01747, arxiv, pdf, cication: -1

    Dang Nguyen, Viet Dac Lai, Seunghyun Yoon, ..., Franck Dernoncourt, Tianyi Zhou · (dynasaur - adobe-research) Star

  • 🌟 Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level, arXiv, 2411.03562, arxiv, pdf, cication: -1

    Antoine Grosnit, Alexandre Maraval, James Doran, ..., Haitham Bou-Ammar, Jun Wang

  • AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant, arXiv, 2410.18603, arxiv, pdf, cication: -1

    Chengyou Jia, Minnan Luo, Zhuohang Dang, ..., Tianbao Xie, Zhiyong Wu · (chengyou-jia.github)

  • AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions, arXiv, 2410.20424, arxiv, pdf, cication: -1

    Ziming Li, Qianbo Zang, David Ma, ..., Wenhao Huang, Ge Zhang · (AutoKaggle%5D - multimodal-art-projection) Star

  • Agentic Information Retrieval, arXiv, 2410.09713, arxiv, pdf, cication: 1

    Weinan Zhang, Junwei Liao, Ning Li, ..., Kounianhua Du

LLM OS

Auto GPT

Tool GPT

UI Agent

  • browserless - browserless Star

  • InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection, arXiv, 2501.04575, arxiv, pdf, cication: -1

    Yuhang Liu, Pengxiang Li, Zishu Wei, ..., Hongxia Yang, Fei Wu · (InfiGUIAgent - Reallm-Labs) Star

  • web-ui - browser-use Star

  • Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents, arXiv, 2410.05243, arxiv, pdf, cication: -1

    Boyu Gou, Ruohan Wang, Boyuan Zheng, ..., Huan Sun, Yu Su · (osu-nlp-group.github) · (UGround - OSU-NLP-Group) Star · (arxiv) · (huggingface)

  • A3: Android Agent Arena for Mobile GUI Agents, arXiv, 2501.01149, arxiv, pdf, cication: -1

    Yuxiang Chai, Hanhao Li, Jiayu Zhang, ..., Siyuan Huang, Hongsheng Li · (yuxiangchai.github)

  • browser-use-webui - warmshao Star

  • 🌟 OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis, arXiv, 2412.19723, arxiv, pdf, cication: -1

    Qiushi Sun, Kanzhi Cheng, Zichen Ding, ..., Yu Qiao, Zhiyong Wu · (qiushisun.github)

  • OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments, arXiv, 2404.07972, arxiv, pdf, cication: -1

    Tianbao Xie, Danyang Zhang, Jixuan Chen, ..., Victor Zhong, Tao Yu · (os-world.github) · (OSWorld - xlang-ai) Star

  • PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World, arXiv, 2412.17589, arxiv, pdf, cication: -1

    Yanheng He, Jiahe Jin, Shijie Xia, ..., Xiangkun Hu, Pengfei Liu · (gair-nlp.github) · (PC-Agent - GAIR-NLP) Star

  • Aria-UI: Visual Grounding for GUI Instructions, arXiv, 2412.16256, arxiv, pdf, cication: -1

    Yuhao Yang, Yue Wang, Dongxu Li, ..., Chao Huang, Junnan Li · (ariaui.github) · (Aria-UI - AriaUI) Star

  • 🌟 CogAgent - THUDM Star

    · (cogagent.aminer)

  • AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation, arXiv, 2412.18116, arxiv, pdf, cication: -1

    Hao Wen, Shizuo Tian, Borislav Pavlov, ..., Ya-Qin Zhang, Yuanchun Li

  • SeleniumBase - seleniumbase Star

    Web Crawling / Testing / Scraping / Stealth

  • Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents, arXiv, 2412.13194, arxiv, pdf, cication: -1

    Yifei Zhou, Qianlan Yang, Kaixiang Lin, ..., Sergey Levine, Erran Li · (yanqval.github)

  • Attention-driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models without Fine-Tuning, arXiv, 2412.10840, arxiv, pdf, cication: -1

    Hai-Ming Xu, Qi Chen, Lei Wang, ..., Lingqiao Liu

  • Large Action Models: From Inception to Implementation, arXiv, 2412.10047, arxiv, pdf, cication: -1

    Lu Wang, Fangkai Yang, Chaoyun Zhang, ..., Dongmei Zhang, Qi Zhang · (microsoft.github)

  • helium - mherrmann Star

  • AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials, arXiv, 2412.09605, arxiv, pdf, cication: -1

    Yiheng Xu, Dunjie Lu, Zhennan Shen, ..., Caiming Xiong, Tao Yu · (easyref-gen.github)

  • 🌟 Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction, arXiv, 2412.04454, arxiv, pdf, cication: -1

    Yiheng Xu, Zekun Wang, Junli Wang, ..., Tao Yu, Caiming Xiong · (aguvis-project.github)

  • ShowUI: One Vision-Language-Action Model for GUI Visual Agent, arXiv, 2411.17465, arxiv, pdf, cication: -1

    Kevin Qinghong Lin, Linjie Li, Difei Gao, ..., Lijuan Wang, Mike Zheng Shou

  • PTA-1: Controlling Computers with Small Models 🤗

    · (huggingface)

  • The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use, arXiv, 2411.10323, arxiv, pdf, cication: -1

    Siyuan Hu, Mingyu Ouyang, Difei Gao, ..., Mike Zheng Shou · (computer_use_ootb - showlab) Star

  • Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents, arXiv, 2411.06559, arxiv, pdf, cication: -1

    Yu Gu, Boyuan Zheng, Boyu Gou, ..., Huan Sun, Yu Su · (WebDreamer - OSU-NLP-Group) Star

  • Sharingan: Extract User Action Sequence from Desktop Recordings, arXiv, 2411.08768, arxiv, pdf, cication: -1

    Yanting Chen, Yi Ren, Xiaoting Qin, ..., Saravan Rajmohan, Qi Zhang

  • 🌟 WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning, arXiv, 2411.02337, arxiv, pdf, cication: -1

    Zehan Qi, Xiao Liu, Iat Long Iong, ..., Jie Tang, Yuxiao Dong

  • 🌟 AutoGLM: Autonomous Foundation Agents for GUIs, arXiv, 2411.00820, arxiv, pdf, cication: -1

    Xiao Liu, Bo Qin, Dongzhu Liang, ..., Yuxiao Dong, Jie Tang

  • 🌟 OS-ATLAS: A Foundation Action Model for Generalist GUI Agents, arXiv, 2410.23218, arxiv, pdf, cication: -1

    Zhiyong Wu, Zhenyu Wu, Fangzhi Xu, ..., Paul Pu Liang, Yu Qiao · (osatlas.github) · (OS-Atlas - OS-Copilot) Star

  • jadechoghari / OmniParser 🤗

  • skyvern - Skyvern-AI Star

    · (skyvern)

  • Agent S: An Open Agentic Framework that Uses Computers Like a Human, arXiv, 2410.08164, arxiv, pdf, cication: -1

    Saaket Agashe, Jiuzhou Han, Shuyu Gan, ..., Ang Li, Xin Eric Wang · (Agent-S - simular-ai) Star

  • ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting, arXiv, 2410.17856, arxiv, pdf, cication: -1

    Shaofei Cai, Zihao Wang, Kewei Lian, ..., Anji Liu, Yitao Liang · (craftjarvis.github)

  • OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization, arXiv, 2410.19609, arxiv, pdf, cication: -1

    Hongliang He, Wenlin Yao, Kaixin Ma, ..., Zhenzhong Lan, Dong Yu

  • Ferret-UI is the first UI-centric multimodal large language model (MLLM) designed for referring 🤗

  • agent.exe - corbt Star

    the easiest way to let Claude's new computer use capabilities take over your computer!

  • computer_use_ootb - showlab Star

  • OmniParser - microsoft Star

    Screen Parsing tool for Pure Vision Based GUI Agent · (arxiv)

  • Developing a computer use model

  • Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation, arXiv, 2410.13232, arxiv, pdf, cication: -1

    Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, ..., Dongha Lee, Jinyoung Yeo

  • MobA: A Two-Level Agent System for Efficient Mobile Task Automation, arXiv, 2410.13757, arxiv, pdf, cication: -1

    Zichen Zhu, Hao Tang, Yansi Li, ..., Lu Chen, Kai Yu · (MobA - OpenDFM) Star

Multi Modal

  • OmAgent - om-ai-lab Star

  • Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines, arXiv, 2410.21220, arxiv, pdf, cication: -1

    Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding, ..., Xiangyu Yue · (arxiv) · (cnzzx.github) · (VSA - cnzzx) Star

Evaluation

  • TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks, arXiv, 2412.14161, arxiv, pdf, cication: -1

    Frank F. Xu, Yufan Song, Boxuan Li, ..., Shuyan Zhou, Graham Neubig

  • AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents, arXiv, 2410.24024, arxiv, pdf, cication: -1

    Yifan Xu, Xiao Liu, Xueqiao Sun, ..., Jie Tang, Yuxiao Dong · (Android-Lab - THUDM) Star

  • Agent-as-a-Judge: Evaluate Agents with Agents, arXiv, 2410.10934, arxiv, pdf, cication: -1

    Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, ..., Vikas Chandra, Jürgen Schmidhuber

Projects

Products

Misc

Multi Agent