v.2.0.0

Latest

Latest

yadong-lu released this 13 Feb 01:06

· 81 commits to master since this release

dcca70c

What's new in V2.0.0?

Larger and cleaner set of icon caption + grounding dataset
60% improvement in latency compared to V1 model checkpoints
Strong performance: 39.6 average accuracy on ScreenSpot Pro
Your agent only need one tool: OmniTool. Control a Windows 11 VM with OmniParser + your vision model of choice. OmniTool supports out of the box the following large language models - OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use.

Assets 2