Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options #18

Closed
under-score opened this issue Oct 17, 2017 · 7 comments
Closed

Options #18

under-score opened this issue Oct 17, 2017 · 7 comments

Comments

@under-score
Copy link

It seems that options are not recognized, when trying
options = list(classify_bln_numeric_mode = "0")

@eipi10
Copy link

eipi10 commented Dec 4, 2017

I'm having this issue as well. For example, when I try x = ocr(img, options=list(textord_tabfind_find_tables="1")) (in order to try and OCR a table from and image), I get the following error:

Error in ocr(img, options = list(textord_tabfind_find_tables = "1")) : 
  unused argument (options = list(textord_tabfind_find_tables = "1"))

Would it be possible to add an example or two to the help for ?ocr to show how to use the options argument? The help page has a link to the tesseract engine options page, but no information about how to correctly specify these options in the ocr function.

@jeroen
Copy link
Member

jeroen commented Feb 23, 2018

I have overhauled the way parameters are used. Can you try the dev version of tesseract and see if this works better now?

devtools::install_github("ropensci/tesseract")

@jeroen
Copy link
Member

jeroen commented Mar 4, 2018

This should be fixed in tesseract 2.0 on CRAN. Please open an new issue if you still find problems.

@jeroen jeroen closed this as completed Mar 4, 2018
@qarmitage
Copy link

I'm having what I believe is the same issue with the newest release.

`engine<-tesseract(options=list(tessedit_char_whitelist="0123456789"))
vs.
engine<-tesseract(options=list(tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZ"))

cell_text <- ocr(file, engine=engine, HOCR=FALSE)`

Appear to be generating the same results.
Perhaps I'm confused.

@jeroen
Copy link
Member

jeroen commented Jan 9, 2019

You're not confused, unfortunately this option is currently unsupported in Tesseract 4: See tesseract-ocr/tesseract#751 or tesseract-ocr/tesseract#2066

Also this wiki page: https://github.com/tesseract-ocr/tesseract/wiki/Planning#features-from-30x-which-are-missing-for-lstm

Hopefully the tesseract people will fix this soon.

@sowla
Copy link

sowla commented Mar 13, 2019

Thank you for a really awesome package! Bartłomiej Uliasz says here:

To use whitelist in a config file or using the -c tessedit_char_whitelist=... command-line switch, in the newest 4.0 version you will have to set OCR Engine mode to the "Original Tesseract only".

I don't know if it applies to the R version or how difficult to it is to implement, but I stumbled across the post and thought I'd share it here in case someone who can judge whether this information is useful might see it :)

@jeroen
Copy link
Member

jeroen commented Jul 25, 2019

The whitelist option is now supported again in Tesseract 4.1 (on CRAN now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants