Skip to content
This repository has been archived by the owner on Jul 14, 2023. It is now read-only.
/ OCR-Munji Public archive

Text detection of printed text in Munji language.

Notifications You must be signed in to change notification settings

annndruha/OCR-Munji

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR-Munji

Munji language text detection.

Detector created for book "Грюнберг А.Л. — Мунджанский язык Тексты" with printed text.

readme.png

Alghoritm

The detector is based on Google cloud vision text detection with additional heuristics that recognize the characters of Munji language. A variety of heuristics are used, such as the correlation of special characters or signs and the replacement of some letters obtained by Google text detection. (See detector/mapping.py)

Using

Step 0

Install requirements

pip install -r requirements.txt

Step 1

For use Google cloud vision you need to get GOOGLE_APPLICATION_CREDENTIALS and set corresponding environment variable.

Step 2

Get Google cloud vision text detection response for image:

python detector\google_ocr.py --path tests/page148/img.png

If command succeed, response saved as .pickle file.

Step 3

Get Munji text from image and google response:

python -m detector tests/page148/img.png tests/page148/img.pickle

or simply

python -m detector tests/page148/img.png

if response located in same dirictory with same filename as image.

Result

Resulted detected text located in .txt-file near .pickle-file.

Tested on

version
Windows 11
Python 3.11
pip 23.0
numpy 1.24.1
opencv-python 4.7.0.68
google 3.0.0
google-cloud-vision 3.3.1

About

Text detection of printed text in Munji language.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages