Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] -oem option has no effect with tesseract v4 #1264

Closed
hamelg opened this issue Apr 29, 2020 · 1 comment
Closed

[BUG] -oem option has no effect with tesseract v4 #1264

hamelg opened this issue Apr 29, 2020 · 1 comment

Comments

@hamelg
Copy link
Contributor

hamelg commented Apr 29, 2020

CCExtractor version: 0.88
tesseract version: 4.1.1
leptonica version: 1.79.0

  • Is this a regression (i.e. did it work before)? NO
  • What platform did you use? Linux

Video links

https://app.box.com/s/mhu17q37hc4ofprneydfailktp70pi4l

Additional information

I found out the -oem option doesn't work : if tesseract v4 is installed, ccextract force silently oem parameter to 1.

https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ocr.c#L182

With oem=1, I get bad results.

I fixed that to set tesseract oem with the option -oem passed on cli (ccx_options.ocr_oem) and now with oem=0 the result is near perfect. On my subs, oem=0 gives the best results, oem=1 or 2 are useless.

$ tesseract --help-oem
OCR Engine modes:
  0    Legacy engine only.
  1    Neural nets LSTM engine only.
  2    Legacy + LSTM engines.
  3    Default, based on what is available.
@cfsmp3
Copy link
Contributor

cfsmp3 commented May 7, 2020

@hamelg this seems trivial do fix, send a PR? :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants