A wrapper on top of python-OCR tools such as pytesseract and easyocr, to recognize and extract text embedded in images. Also, convert scanned-PDFs to text searchable PDFs.
-
Updated
Jul 6, 2024 - Python
A wrapper on top of python-OCR tools such as pytesseract and easyocr, to recognize and extract text embedded in images. Also, convert scanned-PDFs to text searchable PDFs.
This project uses the Tesseract OCR library to extract text from images. The text is then parsed using regular expressions to extract the numbers. The numbers are then written to a text file in the output directory. To use this project, simply place the input images in the input_images directory and run the Python script.
This project Extracts Text from Images (OCR Recognition) by using the Azure Cognitive Services - Computer Vision API.
Add a description, image, and links to the extract-text-from-image topic page so that developers can more easily learn about it.
To associate your repository with the extract-text-from-image topic, visit your repo's landing page and select "manage topics."