Releases · microsoft/OmniParser

From this version, we adopt semantic versioning for faster and more friendly development experience.

What's Changed

Adding Microsoft SECURITY.MD by @microsoft-github-policy-service in #2
nit: fix spelling for requirements.txt by @nmstoker in #17
Update requirement.txt by @krishna2 in #16
Updated einops typo in requirements.txt by @redron in #28
Add torch.inference mode by @aliencaocao in #29
Add PaddleOCR option by @aliencaocao in #53
Add icon detect image size option by @aliencaocao in #72
version 1.5 by @yadong-lu in #94

New Contributors

@microsoft-github-policy-service made their first contribution in #2
@nmstoker made their first contribution in #17
@krishna2 made their first contribution in #16
@redron made their first contribution in #28
@aliencaocao made their first contribution in #29
@yadong-lu made their first contribution in #94

Full Changelog: https://github.com/microsoft/OmniParser/commits/v1.5.0

What's new in V2.0.0?

Larger and cleaner set of icon caption + grounding dataset

60% improvement in latency compared to V1 model checkpoints

Strong performance: 39.6 average accuracy on ScreenSpot Pro

Your agent only need one tool: OmniTool. Control a Windows 11 VM with OmniParser + your vision model of choice. OmniTool supports out of the box the following large language models - OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's new in V2.0.0?

Releases: microsoft/OmniParser

v1.5.0

What's Changed

New Contributors

Contributors

v.2.0.0

What's new in V2.0.0?