-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image Pre-processing in ImageTrans #127
Comments
There are several preprocessing: |
There is not mechanism to use python for preprocessing. I need to add a new plugin type for this. |
Hi i think dpi matters tired before a image with height 1800px and DPI 366 the OCR was really good then made the image 2X with waifu2x DPI was then 72 and OCR made many more errors. |
And why are you saving the ocr image in jpeg where png not better? I train right now a new bubble model let you know how it is when its done. |
Okay. I will try out different DPIs and file formats. Related issues: tesseract-ocr/tesseract#1702 |
I've found a way to set the DPI of images to 300, but the result is not good for tesseract, so I will not add this fow now. https://www.b4x.com/android/forum/threads/save-images-in-300-dpi.137269/ |
Have tired some stuff to and found out that when i use a image that has 1280x1820 and a dpi over 300 is the ocr better. I like to have this feature: Save the image like now but when you add to imageTrans Settings the path to your ImageMagick covert exe you get a 300dpi ocr image. You can do it with this cli command:
First density is for the input image and second density for the output this will not change the resolution just the dpi. Hope you can add that why a pre-processing script to. Something like this Think this will help you a exec for b4j https://www.b4x.com/android/forum/threads/jshell-library.34661/#content
|
Which OCR software do you use? If using tesseract, it has an option to specify the DPI. DPI is just a meta data. If the resolution is the same, the image pixels should be the same. By specifying the DPI 300, it may provide the same result. |
Will use tesseract i think WinRT and Abbyy are good to. No when resolution is same but dpi is 300 in meta is it bigger is just shown at that resolution. |
Is like 96dpi 1pixel and 300dpi have like 4pxels in 1 pixel so ocr gets better. OCR tools say it self that there like to use 300dpi for best results. |
DPI is a conpect for printing and scanning. The tools say it is better to use at least 300 DPI to scan documents but I think it won't affect too much if we adjust the DPI of already scanned images. |
You are right dpi is for the scan 72dpi will work to when resolution is ok. Remove noise, rescale and so on can then be done.
|
It is now possible to use the pure-text images manager for customized image preprocessing: |
Hi is ImageTrans Pre-processing ocr images before ocr?
Like making ocr image to 300dpi and so on.
Like to know is there a why to run a python script before ImageTrans ocr the image.
Will be great to have a why for Pre-processing the image.
What i see is that when i press ocr on a plugin, is the saved image just 96dpi my original image was over 300dpi.
Best when your tool saved the dpi from the original image to max 300dpi.
Dont know how you handle dpi with tesseract.
The text was updated successfully, but these errors were encountered: