You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I fixed that to set tesseract oem with the option -oem passed on cli (ccx_options.ocr_oem) and now with oem=0 the result is near perfect. On my subs, oem=0 gives the best results, oem=1 or 2 are useless.
$ tesseract --help-oem
OCR Engine modes:
0 Legacy engine only.
1 Neural nets LSTM engine only.
2 Legacy + LSTM engines.
3 Default, based on what is available.
The text was updated successfully, but these errors were encountered:
CCExtractor version: 0.88
tesseract version: 4.1.1
leptonica version: 1.79.0
Video links
https://app.box.com/s/mhu17q37hc4ofprneydfailktp70pi4l
Additional information
I found out the -oem option doesn't work : if tesseract v4 is installed, ccextract force silently oem parameter to 1.
https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ocr.c#L182
With oem=1, I get bad results.
I fixed that to set tesseract oem with the option -oem passed on cli (ccx_options.ocr_oem) and now with oem=0 the result is near perfect. On my subs, oem=0 gives the best results, oem=1 or 2 are useless.
The text was updated successfully, but these errors were encountered: