Skip to content

A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.

License

Notifications You must be signed in to change notification settings

PengBoUESTC/UI-TARS-desktop

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Important

[2025-03-18] We released a technical preview version of a new desktop app - Agent TARS, a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.

UI-TARS

UI-TARS Desktop

UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.

   πŸ“‘ Paper    | πŸ€— Hugging Face Models   |   πŸ«¨ Discord   |   πŸ€– ModelScope  
πŸ–₯️ Desktop Application    |    πŸ‘“ Midscene (use in browser)

Showcases

Instruction Video
Get the current weather in SF using the web browser
new_mac_action_weather.mp4
Send a twitter with the content "hello world"
new_send_twitter_windows.mp4

News

  • [2025-02-20] - πŸ“¦ Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
  • [2025-01-23] - πŸš€ We updated the Cloud Deployment section in the δΈ­ζ–‡η‰ˆ: GUIζ¨‘εž‹ιƒ¨η½²ζ•™η¨‹ with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

Features

  • πŸ€– Natural language control powered by Vision-Language Model
  • πŸ–₯️ Screenshot and visual recognition support
  • 🎯 Precise mouse and keyboard control
  • πŸ’» Cross-platform support (Windows/MacOS)
  • πŸ”„ Real-time feedback and status display
  • πŸ” Private and secure - fully local processing

Quick Start

See Quick Start.

Deployment

See Deployment.

Contributing

See CONTRIBUTING.md.

SDK (Experimental)

See @ui-tars/sdk

License

UI-TARS Desktop is licensed under the Apache License 2.0.

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation πŸ“

@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}

About

A GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 93.9%
  • JavaScript 3.2%
  • Less 0.9%
  • SCSS 0.9%
  • HTML 0.6%
  • CSS 0.3%
  • Other 0.2%