OCR_ENGINE=None Doesn't work #256

svmrw · 2024-08-16T19:22:24Z

Hello. The Readme says the following:

By default, marker will use surya for OCR. Surya is slower on CPU, but more accurate than tesseract. If you want faster OCR, set OCR_ENGINE to ocrmypdf. This also requires external dependencies (see above).
If you don't want OCR at all, set OCR_ENGINE to None.

export OCR_ENGINE=None
marker_single ./file.pdf ./marker

Running the command gives the following:

pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
OCR_ENGINE
  Input should be 'surya' or 'ocrmypdf' [type=literal_error, input_value='None', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error

I really want to convert pdf to markdown, but not use OCR.
Almost all pdf files have text that can be selected and copied, and embedded images need to be kept original. It seems to me that the whole document does not need to be recognized as an image if the text is easy to copy.

Please tell me, is this somehow possible or impossible?
Maybe it was supported before, but now it is not?
Or maybe I am doing something wrong?
Thanks.

The text was updated successfully, but these errors were encountered:

svmrw · 2024-08-18T13:05:52Z

#257
I tried to make changes manually based on your commit.
The error is no longer displayed, but...
OCR Surya still loads and recognizes the whole file.
Ie: OCR_ENGINE=None and OCR_ENGINE=Surya work the same. No changes are visible.
I most likely assume that I am doing something wrong, so I ask you to check it yourself.

kyr0 · 2024-09-17T22:54:16Z

Running into the same and as OCR runs my machine into max memory, I need to use a different software now.. dead end

Zxilly mentioned this issue Aug 18, 2024

fix: None env parse for OCR_ENGINE #257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR_ENGINE=None Doesn't work #256

OCR_ENGINE=None Doesn't work #256

svmrw commented Aug 16, 2024

svmrw commented Aug 18, 2024 •

edited

Loading

kyr0 commented Sep 17, 2024

OCR_ENGINE=None Doesn't work #256

OCR_ENGINE=None Doesn't work #256

Comments

svmrw commented Aug 16, 2024

svmrw commented Aug 18, 2024 • edited Loading

kyr0 commented Sep 17, 2024

svmrw commented Aug 18, 2024 •

edited

Loading