[BUG] -oem option has no effect with tesseract v4 #1264

hamelg · 2020-04-29T19:38:35Z

CCExtractor version: 0.88
tesseract version: 4.1.1
leptonica version: 1.79.0

Is this a regression (i.e. did it work before)? NO
What platform did you use? Linux

Video links

https://app.box.com/s/mhu17q37hc4ofprneydfailktp70pi4l

Additional information

I found out the -oem option doesn't work : if tesseract v4 is installed, ccextract force silently oem parameter to 1.

https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ocr.c#L182

With oem=1, I get bad results.

I fixed that to set tesseract oem with the option -oem passed on cli (ccx_options.ocr_oem) and now with oem=0 the result is near perfect. On my subs, oem=0 gives the best results, oem=1 or 2 are useless.

$ tesseract --help-oem
OCR Engine modes:
  0    Legacy engine only.
  1    Neural nets LSTM engine only.
  2    Legacy + LSTM engines.
  3    Default, based on what is available.

The text was updated successfully, but these errors were encountered:

cfsmp3 · 2020-05-07T11:41:17Z

@hamelg this seems trivial do fix, send a PR? :-)

cfsmp3 added the difficulty: easy label May 7, 2020

hamelg mentioned this issue May 8, 2020

[FIX] Allow all oem modes with tesseract v4 #1267

Merged

10 tasks

cfsmp3 closed this as completed May 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] -oem option has no effect with tesseract v4 #1264

[BUG] -oem option has no effect with tesseract v4 #1264

hamelg commented Apr 29, 2020

cfsmp3 commented May 7, 2020

[BUG] -oem option has no effect with tesseract v4 #1264

[BUG] -oem option has no effect with tesseract v4 #1264

Comments

hamelg commented Apr 29, 2020

Video links

Additional information

cfsmp3 commented May 7, 2020