GitHub - annndruha/OCR-Munji: Text detection of printed text in Munji language.

OCR-Munji

Munji language text detection.

Detector created for book "Грюнберг А.Л. — Мунджанский язык Тексты" with printed text.

Alghoritm

The detector is based on Google cloud vision text detection with additional heuristics that recognize the characters of Munji language. A variety of heuristics are used, such as the correlation of special characters or signs and the replacement of some letters obtained by Google text detection. (See detector/mapping.py)

Using

Step 0

Install requirements

pip install -r requirements.txt

Step 1

For use Google cloud vision you need to get GOOGLE_APPLICATION_CREDENTIALS and set corresponding environment variable.

Step 2

Get Google cloud vision text detection response for image:

python detector\google_ocr.py --path tests/page148/img.png

If command succeed, response saved as .pickle file.

Step 3

Get Munji text from image and google response:

python -m detector tests/page148/img.png tests/page148/img.pickle

or simply

python -m detector tests/page148/img.png

if response located in same dirictory with same filename as image.

Result

Resulted detected text located in .txt-file near .pickle-file.

Tested on

	version
Windows	11
Python	3.11
pip	23.0
numpy	1.24.1
opencv-python	4.7.0.68
google	3.0.0
google-cloud-vision	3.3.1

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
detector		detector
templates		templates
tests		tests
.gitignore		.gitignore
README.md		README.md
readme.png		readme.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR-Munji

Alghoritm

Using

Step 0

Step 1

Step 2

Step 3

Tested on

About

Releases

Languages

annndruha/OCR-Munji

Folders and files

Latest commit

History

Repository files navigation

OCR-Munji

Alghoritm

Using

Step 0

Step 1

Step 2

Step 3

Tested on

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages