Skip to content

Simple code to convert pdf/s to image files and use Tesseract OCR on these image files to extract text from them. This code focuses on extracting Batch No. from pharmacy bills using RegEx. None of the actual pdfs and files could be added as all data used was real life/sensitive data.

Notifications You must be signed in to change notification settings

avinxxsh/realDataOCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

e3954d9 · Jul 5, 2022

History

12 Commits
Jul 4, 2022
Jul 5, 2022
Jul 5, 2022
Jul 5, 2022
Jul 5, 2022

About

Simple code to convert pdf/s to image files and use Tesseract OCR on these image files to extract text from them. This code focuses on extracting Batch No. from pharmacy bills using RegEx. None of the actual pdfs and files could be added as all data used was real life/sensitive data.

Topics

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages