Releases: microsoft/OmniParser
Releases · microsoft/OmniParser
v1.5.0
From this version, we adopt semantic versioning for faster and more friendly development experience.
What's Changed
- Adding Microsoft SECURITY.MD by @microsoft-github-policy-service in #2
- nit: fix spelling for requirements.txt by @nmstoker in #17
- Update requirement.txt by @krishna2 in #16
- Updated einops typo in requirements.txt by @redron in #28
- Add torch.inference mode by @aliencaocao in #29
- Add PaddleOCR option by @aliencaocao in #53
- Add icon detect image size option by @aliencaocao in #72
- version 1.5 by @yadong-lu in #94
New Contributors
- @microsoft-github-policy-service made their first contribution in #2
- @nmstoker made their first contribution in #17
- @krishna2 made their first contribution in #16
- @redron made their first contribution in #28
- @aliencaocao made their first contribution in #29
- @yadong-lu made their first contribution in #94
Full Changelog: https://github.com/microsoft/OmniParser/commits/v1.5.0
v.2.0.0
What's new in V2.0.0?
- Larger and cleaner set of icon caption + grounding dataset
- 60% improvement in latency compared to V1 model checkpoints
- Strong performance: 39.6 average accuracy on ScreenSpot Pro
- Your agent only need one tool: OmniTool. Control a Windows 11 VM with OmniParser + your vision model of choice. OmniTool supports out of the box the following large language models - OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use.