LION is an add-on that performs automatic Optical Character Recognition (OCR) on specific screen areas at predefined intervals.
Why is it called "smart"? Not because I developed it, nor because the "i" makes a clever acronym.
Since it performs repeated OCR scans on the same screen area, it would normally read identical text multiple times - which isn't ideal. To solve this, I implemented a mechanism that prevents speech output when newly recognized text closely matches previous results.
My primary purpose for developing this add-on was subtitle reading. Its working principle enables it to read various types of screen-based subtitles, including those on YouTube, Netflix, Bilibili, embedded subtitles in AVI files, and even live TV captions!
When using, always set videos to full-screen mode as it mimics human visual perception. Larger text yields better recognition results, though accuracy isn't perfect. For optimal performance:
- Enlarge subtitle fonts when possible
- Use high-resolution displays
The OCR engine isn't flawless and may struggle with certain graphics.
Beyond subtitles, it can monitor screen text that isn't directly accessible, like video game menus. However, it cannot recognize highlighted text selections.
To start with default settings: Press NVDA+ALT+N. LION will perform full-screen OCR every 1 second and only speak when text changes.
For customization:
- Navigate to NVDA menu > Preferences > LION Settings
- Example use case: Video files might display logos in the top-left corner that get read alongside subtitles. The next section explains solutions.
Available settings:
- OCR Interval: Frequency of OCR operations (0.1-10 seconds)
- OCR Target: Screen area to scan (Options: Current Control/Current Window/Navigation Object/Full Screen)
- Crop Pixels (Top/Bottom/Right/Left): Trims unwanted areas in Full Screen/Current Window modes. Useful for ignoring persistent logos - e.g., cropping 10% from top removes top-left logos. For efficiency, you might crop 70% from top as subtitles typically occupy the lower third.
- Complete OCR engine overhaul using PaddleOCR-json for improved accuracy
- Implemented add-on template for easier compilation
- Adapted for NVDA 2022.1 compatibility
- Completed interface translations
- Added shortcut customization in Input Gestures dialog
- Added warnings when initiating OCR during screensaver/black screen
- Changed default shortcut to NVDA+ALT+N
- Ensured compatibility with NVDA 2021.1
- Fixed various bugs
- Updated activation/deactivation sounds
Fixed a major YouTube full-screen mode bug
- Crop settings now affect Current Window mode
- Added similarity threshold (0-1) for gaming scenarios:
- Compares current text with previous output
- 0: All texts considered identical (unusable)
- 1: Always speaks (even duplicates)
- Default: 0.5
Initial release