-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple monospace text not correctly interpretted #2820
Comments
It looks like it's detecting the text as |
I am also trying to use Tesseract to OCR random strings of letters and numbers mixed together. And I have the same general problem eas you describe, with Tesseract mixing up 'S' and '5' and also '1' and 'I'. Tesseract is primarily designed to recognize words and determine what characters are present by what should be there for the word to be valid. So it doesn't naturally deal well with non-word strings. The only suggestion I have is the following list of config file parameters that I am using to try to prevent Tesseract from using the word-matching method and instead just use a character by character recognition approach: tessedit_flip_0O 0 To be honest, I don't even know if this makes any difference, or whether the LSTM engine (which I am using) pays attention to these settings. |
Hi Tesseract-ocr Team, I am facing similar challenges as @woodjohndavid. I am try to recognise and extract a random 50+ character string (UID) from images that are uploaded in my workflow. They need to be 100% correct in order to find the correct UID. In my current OCR results, I get random spaces and incorrect characters being recognised as per @woodjohndavid's explanation. See below my example baseline image in order to get the best results: My quick and dirty bash test (FYI: building a web app with Python that will be the finished product):
My bash script for testing:
Any thoughts or suggestions you might have pertaining to this issue? Is it possible for Tesseract-ocr to recognise these long UID's? Looking forward to your response, thank you. |
|
about the whitelist issue. |
The better reference would be here – the reason for the current behaviour of white/blacklisting – which is indeed of little practical use – is the narrowness of the default beam in the LSTM decoder. The |
Environment
Current Behavior:
Config file:
Image:
Command-line usage:
Expected Behavior:
i.e.
To elaborate; I would like to detect only a single hex number rendered in monospace in a PNG.
The text was updated successfully, but these errors were encountered: