Munji language text detection.
Detector created for book "Грюнберг А.Л. — Мунджанский язык Тексты" with printed text.
The detector is based on Google cloud vision text detection with additional heuristics that recognize the characters of Munji language. A variety of heuristics are used, such as the correlation of special characters or signs and the replacement of some letters obtained by Google text detection. (See detector/mapping.py
)
Install requirements
pip install -r requirements.txt
For use Google cloud vision you need to get GOOGLE_APPLICATION_CREDENTIALS and set corresponding environment variable.
Get Google cloud vision text detection response for image:
python detector\google_ocr.py --path tests/page148/img.png
If command succeed, response saved as .pickle
file.
Get Munji text from image and google response:
python -m detector tests/page148/img.png tests/page148/img.pickle
or simply
python -m detector tests/page148/img.png
if response located in same dirictory with same filename as image.
Result
Resulted detected text located in .txt
-file near .pickle
-file.
version | |
---|---|
Windows | 11 |
Python | 3.11 |
pip | 23.0 |
numpy | 1.24.1 |
opencv-python | 4.7.0.68 |
3.0.0 | |
google-cloud-vision | 3.3.1 |