You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/visual_actions_comparison.md
+9-9
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# Visual Actions Comparison: GroundingDINO+EasyOCR vs YOLO-based Detection
1
+
# Visual Actions Comparison: GroundingDINO+EasyOCR vs OmniParser Detection
2
2
3
3
This document provides a detailed comparison between the two visual action approaches used in the Crab framework for GUI element detection and interaction.
4
4
@@ -11,7 +11,7 @@ This document provides a detailed comparison between the two visual action appro
11
11
-[EasyOCR](https://github.com/JaidedAI/EasyOCR) for text recognition
12
12
-**Primary Use Case**: General-purpose object detection with text recognition
13
13
14
-
### New Approach (YOLO-based Detection)
14
+
### New Approach (OmniParser Detection)
15
15
-**Implementation**: Located in `crab/actions/omniparser_visual_actions.py`
16
16
-**Core Technologies**:
17
17
- Custom YOLO model optimized for GUI element detection
@@ -22,7 +22,7 @@ This document provides a detailed comparison between the two visual action appro
from crab.actions.omniparser_visual_actions import detect_and_annotate_gui_elements
217
217
@@ -247,12 +247,12 @@ The comparison tests evaluate:
247
247
248
248
## Conclusion
249
249
250
-
The YOLO-based detection now offers a complete alternative to the legacy approach:
250
+
The OmniParser Detection now offers a complete alternative to the legacy approach:
251
251
- Faster processing times (2-3x speedup)
252
252
- Smaller core model size (30x smaller)
253
253
- Choice of OCR engines
254
254
- Better GUI element detection accuracy
255
255
- Enhanced box filtering with OCR awareness
256
256
- Confidence-based classification
257
257
258
-
While the legacy approach still has some unique capabilities (multi-image processing, general object detection), the YOLO-based approach provides a more efficient and specialized solution for GUI automation tasks. Future improvements will focus on adding multi-image support and enhancing semantic understanding using OmniParser's capabilities.
258
+
While the legacy approach still has some unique capabilities (multi-image processing, general object detection), the OmniParser Detection provides a more efficient and specialized solution for GUI automation tasks. Future improvements will focus on adding multi-image support and enhancing semantic understanding using OmniParser's capabilities.
0 commit comments