-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turning on legacy OCR engine mode #39
Comments
The main problem is that v3 uses a different format for training data than v4. So we would need to manage different sets of training data within a single r package. |
I see. This is pretty unfortunate. Maybe you could tag latest stable version before engine update and invite people to optionally install it with |
The easiest way to install an old version of the R package is using MRAN snapshots: install.packages('tesseract', repos = 'https://cran.microsoft.com/snapshot/2018-09-01/') |
The whitelist / blacklist options are now supported in tesseract 4.1 (on cran). |
Whitelist and blacklist are not implemented in version 4.0 (issue) and user patterns do not work (issue). In this issue people recommend turning on legacy OEM using
--oem 0
option flag. This option is not part of configs, but rather belong to the engine itself, like language.Could we please enable more ocr options as arguments in
tesseract::tesseract()
, includingoem
to be able to temporarily switch to older version of the engine.The text was updated successfully, but these errors were encountered: