Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turning on legacy OCR engine mode #39

Closed
dmypstl opened this issue Feb 9, 2019 · 4 comments
Closed

Turning on legacy OCR engine mode #39

dmypstl opened this issue Feb 9, 2019 · 4 comments

Comments

@dmypstl
Copy link

dmypstl commented Feb 9, 2019

Whitelist and blacklist are not implemented in version 4.0 (issue) and user patterns do not work (issue). In this issue people recommend turning on legacy OEM using --oem 0 option flag. This option is not part of configs, but rather belong to the engine itself, like language.

Could we please enable more ocr options as arguments in tesseract::tesseract(), including oem to be able to temporarily switch to older version of the engine.

@jeroen
Copy link
Member

jeroen commented Feb 9, 2019

The main problem is that v3 uses a different format for training data than v4. So we would need to manage different sets of training data within a single r package.

@dmypstl
Copy link
Author

dmypstl commented Feb 9, 2019

I see. This is pretty unfortunate. Maybe you could tag latest stable version before engine update and invite people to optionally install it with remotes::install_github("ropensci/tesseract@v3.0.4") or whatever the last version was before it was updated to 4.0. I think found the relevant commit now, but it is pretty awkward to refer to it.

@jeroen
Copy link
Member

jeroen commented Feb 10, 2019

The easiest way to install an old version of the R package is using MRAN snapshots:

install.packages('tesseract', repos = 'https://cran.microsoft.com/snapshot/2018-09-01/')

@jeroen
Copy link
Member

jeroen commented Jul 25, 2019

The whitelist / blacklist options are now supported in tesseract 4.1 (on cran).

@jeroen jeroen closed this as completed Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants