Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image Pre-processing in ImageTrans #127

Closed
hentaitaku opened this issue Dec 31, 2021 · 14 comments
Closed

Image Pre-processing in ImageTrans #127

hentaitaku opened this issue Dec 31, 2021 · 14 comments

Comments

@hentaitaku
Copy link

hentaitaku commented Dec 31, 2021

Hi is ImageTrans Pre-processing ocr images before ocr?

Like making ocr image to 300dpi and so on.

Like to know is there a why to run a python script before ImageTrans ocr the image.
Will be great to have a why for Pre-processing the image.

What i see is that when i press ocr on a plugin, is the saved image just 96dpi my original image was over 300dpi.
Best when your tool saved the dpi from the original image to max 300dpi.

Dont know how you handle dpi with tesseract.

@xulihang
Copy link
Owner

xulihang commented Dec 31, 2021

There are several preprocessing:
It crops the text area.
It removes Japanese furigana.
It can remove background.
It can convert vertical Japanese to horizontal
It can extract lines for tesseract to read
As for DPI, I think the image resolution matters. DPI is just a meta info. If the width or height is smaller than 50, ImageTrans will scale it up.

@xulihang
Copy link
Owner

There is not mechanism to use python for preprocessing. I need to add a new plugin type for this.

@hentaitaku
Copy link
Author

Hi i think dpi matters tired before a image with height 1800px and DPI 366 the OCR was really good then made the image 2X with waifu2x DPI was then 72 and OCR made many more errors.

@hentaitaku
Copy link
Author

And why are you saving the ocr image in jpeg where png not better?

I train right now a new bubble model let you know how it is when its done.

@xulihang
Copy link
Owner

xulihang commented Dec 31, 2021

Okay. I will try out different DPIs and file formats.

Related issues: tesseract-ocr/tesseract#1702

@xulihang
Copy link
Owner

xulihang commented Jan 2, 2022

After some experiments, I think DPI and file formats do not affect the result very much, although a better benchmark should be made for comparison. Maybe after the import OCRed text from exported text areas is done, you can do your own preprocessing in this way

Some test images:.

test

2

test300dpi

@xulihang
Copy link
Owner

xulihang commented Jan 2, 2022

I've found a way to set the DPI of images to 300, but the result is not good for tesseract, so I will not add this fow now.

https://www.b4x.com/android/forum/threads/save-images-in-300-dpi.137269/

@hentaitaku
Copy link
Author

hentaitaku commented Jan 2, 2022

Have tired some stuff to and found out that when i use a image that has 1280x1820 and a dpi over 300 is the ocr better.
But when i save the same image in 1280x1820 with 96dpi is the ocr not so good.

I like to have this feature:

Save the image like now but when you add to imageTrans Settings the path to your ImageMagick covert exe you get a 300dpi ocr image.
You need to get the 300dpi Image as source and not a 72dpi some tools do that wrong, there get 72dpi and save it to 300dpi.
Think is not hard to add just a if exist and a else and a exec for the cli.

You can do it with this cli command:

convert -density 300 -units PixelsPerInch IMAGE_IN -density 300 -colorspace sRGB -sampling-factor 4:2:2 -quality 100 -strip -interlace none -format jpg IMAGE_OUT

First density is for the input image and second density for the output this will not change the resolution just the dpi.

Hope you can add that why a pre-processing script to.
Add in imageTrans Settings a box where the user can add a command for a cli that will run after the image is saved.
When the box is empty will it skip the exec.

Something like this "/path/python.exe /path/your-script.py"

Think this will help you a exec for b4j https://www.b4x.com/android/forum/threads/jshell-library.34661/#content

And one more think i train with darknet a bubble speech model right now and found out that darknetV4 works but the new Scaled YOLO v4 not works. There are many new for csp mish swish p5 https://github.com/AlexeyAB/darknet and the mAP is better so when you have time hope you can add it. What i know is the cfg use now activation=logistic before was it activation=linear when i use linear in the model.cfg is it working but buggy get wrong sizes think because i trained with logistic.

@xulihang
Copy link
Owner

xulihang commented Jan 2, 2022

Which OCR software do you use?

If using tesseract, it has an option to specify the DPI. DPI is just a meta data. If the resolution is the same, the image pixels should be the same. By specifying the DPI 300, it may provide the same result.

@hentaitaku
Copy link
Author

Will use tesseract i think WinRT and Abbyy are good to.

No when resolution is same but dpi is 300 in meta is it bigger is just shown at that resolution.

@hentaitaku
Copy link
Author

hentaitaku commented Jan 2, 2022

Is like 96dpi 1pixel and 300dpi have like 4pxels in 1 pixel so ocr gets better.

OCR tools say it self that there like to use 300dpi for best results.
Is smaller resolution but with more details.
When you make the resolution bigger will the OCR tool just see a 72dpi image and dont know what todo with it.

@xulihang
Copy link
Owner

xulihang commented Jan 2, 2022

DPI is a conpect for printing and scanning. The tools say it is better to use at least 300 DPI to scan documents but I think it won't affect too much if we adjust the DPI of already scanned images.

@hentaitaku
Copy link
Author

You are right dpi is for the scan 72dpi will work to when resolution is ok.
But a pre-processing like i said where user can make his own script is a good think.

Remove noise, rescale and so on can then be done.
Tesseract hates noise so better remove it.

Add in imageTrans Settings a box where the user can add a command for a cli that will run after the image is saved.
When the box is empty will it skip the exec.

Something like this "/path/python.exe /path/your-script.py"

Think this will help you a exec for b4j https://www.b4x.com/android/forum/threads/jshell-library.34661/#content

@xulihang
Copy link
Owner

It is now possible to use the pure-text images manager for customized image preprocessing:

#199

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants