You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default, marker will use surya for OCR. Surya is slower on CPU, but more accurate than tesseract. If you want faster OCR, set OCR_ENGINE to ocrmypdf. This also requires external dependencies (see above). If you don't want OCR at all, set OCR_ENGINE to None.
pydantic_core._pydantic_core.ValidationError: 1 validation error for Settings
OCR_ENGINE
Input should be 'surya' or 'ocrmypdf' [type=literal_error, input_value='None', input_type=str]
For further information visit https://errors.pydantic.dev/2.8/v/literal_error
I really want to convert pdf to markdown, but not use OCR.
Almost all pdf files have text that can be selected and copied, and embedded images need to be kept original. It seems to me that the whole document does not need to be recognized as an image if the text is easy to copy.
Please tell me, is this somehow possible or impossible?
Maybe it was supported before, but now it is not?
Or maybe I am doing something wrong?
Thanks.
The text was updated successfully, but these errors were encountered:
#257
I tried to make changes manually based on your commit.
The error is no longer displayed, but...
OCR Surya still loads and recognizes the whole file.
Ie: OCR_ENGINE=None and OCR_ENGINE=Surya work the same. No changes are visible.
I most likely assume that I am doing something wrong, so I ask you to check it yourself.
Hello. The Readme says the following:
export OCR_ENGINE=None marker_single ./file.pdf ./marker
Running the command gives the following:
I really want to convert pdf to markdown, but not use OCR.
Almost all pdf files have text that can be selected and copied, and embedded images need to be kept original. It seems to me that the whole document does not need to be recognized as an image if the text is easy to copy.
Please tell me, is this somehow possible or impossible?
Maybe it was supported before, but now it is not?
Or maybe I am doing something wrong?
Thanks.
The text was updated successfully, but these errors were encountered: