Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid resolution 0 dpi warning in stderr #6

Closed
timvisee opened this issue Oct 11, 2019 · 8 comments · Fixed by #8
Closed

Invalid resolution 0 dpi warning in stderr #6

timvisee opened this issue Oct 11, 2019 · 8 comments · Fixed by #8
Assignees
Labels
enhancement New feature or request

Comments

@timvisee
Copy link
Collaborator

timvisee commented Oct 11, 2019

When using this crate, I occasionally receive a warning in stderr when opening/reading an image. I assume this is produced by the leptess/tesseract library.

This is what it looks like:

Warning: Invalid resolution 0 dpi. Using 70 instead.

It does not look like it is possible to disable this behavior through the current API. Are there any plans to implement a toggle for this?

@houqp
Copy link
Owner

houqp commented Oct 12, 2019

Based on discussion from tesseract-ocr/tesseract#1702, I added a new set_source_resolution, could you give that a try?

@timvisee
Copy link
Collaborator Author

Thank you, didn't notice this option could be a solution.

I'm wondering, would this overwrite the resolution in case the image resolution is known? Because I can imagine this could then cause undesirable behavior if setting the resolution to 70 for all images, even though the resolution might be known for some.

@houqp
Copy link
Owner

houqp commented Oct 13, 2019

Yeah, good point. Added get_source_y_resolution and set_fallback_source_resolution methods, could you give that a try?

@timvisee
Copy link
Collaborator Author

Using set_fallback_source_resolution did the trick. No warnings show up anymore.

Thanks for the rapid addition!

@timvisee
Copy link
Collaborator Author

timvisee commented Oct 13, 2019

I just noticed an interesting edge case. It appears that some images have a DPI of 1 defined (and yes, that's incorrect). tesseract produces a warning for this as well:

Warning: Invalid resolution 1 dpi. Using 70 instead.

It's interesting, because this isn't covered by the set_fallback_source_resolution function.
Don't worry, it's not much a problem. Just posting this for other to see, that this isn't currently solved, if they're experiencing the same. I might open an issue for this on tesseract in the future.

In case you're wondering. I'm scanning all images, stickers, videos and such from Telegram groups (for smart spam prevention). As you can probably imagine, I'm receiving a wide spectrum of images, image types, sizes and formats. That's why I'm seeing these weird edge cases.

@houqp
Copy link
Owner

houqp commented Oct 13, 2019

That's interesting, i wonder what's the range of dpi that tesseract would consider invalid. If it can't work with 1 dpi images, then it makes sense to add it to the fallback method.

@timvisee
Copy link
Collaborator Author

That's interesting, i wonder what's the range of dpi that tesseract would consider invalid. If it can't work with 1 dpi images, then it makes sense to add it to the fallback method.

I didn't notice the fallback method is only part of this library, and thought it was provided by tesseract. I'll try and search for the range and update the function.

@timvisee
Copy link
Collaborator Author

The original warning appears to be coming from the following section, and changes the DPI if the detected DPI is outside a specified range:
https://github.com/tesseract-ocr/tesseract/blob/247cd0edc44e0a4b6cf46f1faccdb5d1557ed1f0/src/api/baseapi.cpp#L2017-L2031

The allowed DPI range is defined here:
https://github.com/tesseract-ocr/tesseract/blob/247cd0edc44e0a4b6cf46f1faccdb5d1557ed1f0/src/ccstruct/publictypes.h#L33-L39

Note that it only automatically changes the used DPI to the lowest in the allowed range if the user didn't specify a DPI himself. And it does not change the DPI if the user did explicitly set it to something outside the allowed range.

I'll look into improving this crate for these findings now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants