Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract can't use traineddata #2927

Closed
royudev opened this issue Mar 19, 2020 · 6 comments
Closed

Tesseract can't use traineddata #2927

royudev opened this issue Mar 19, 2020 · 6 comments

Comments

@royudev
Copy link

royudev commented Mar 19, 2020

I have train tesseract using tesstrain and I got a traineddata file from it, I then copied the said file to /usr/local/share/tessdata/

But when i tried to extract the text from the image i used to for training i got this error

Error: Tesseract (legacy) engine requested, but components are not present in /usr/local/share/tessdata/bar.traineddata!!
Failed loading language 'bar'
Tesseract couldn't load any languages!
Could not initialize tesseract.

here's the command i use

tesseract data/bar-ground-truth/alexis_ruhe01_1852_0018_022.tif stdout -l bar

tesseract version is

tesseract 5.0.0-alpha-635-g90405
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found SSE
 Found OpenMP 201307

here's /usr/local/share/tessdata directory

drwxr-xr-x 4 root root     4096 Mar 19 17:11 ./
drwxr-xr-x 8 root root     4096 Mar  5 16:48 ../
-rwxr-xr-x 1 root root     2364 Mar 19 17:11 bar.traineddata*
drwxr-xr-x 2 root root     4096 Mar  5 16:48 configs/
-rwxr-xr-x 1 root root 15400601 Mar  6 15:27 eng.traineddata*
-rw-r--r-- 1 root root      572 Mar  5 16:48 pdf.ttf
drwxr-xr-x 2 root root     4096 Mar  5 16:48 tessconfigs/

as you can see, bar.traineddata is in the said directory but it can't seems to use it
what could possibly wrong with it?

@royudev royudev changed the title Tesseract can't used traineddata Tesseract can't use traineddata Mar 19, 2020
@royudev
Copy link
Author

royudev commented Mar 19, 2020

tesseract --list-langs gives

List of available languages (3):
bar
eng
foo

I've also tried adding OCR options --oem 0 to 3, but none of them works

@royudev
Copy link
Author

royudev commented Mar 19, 2020

figured it out

I copied the traineddata in the data/bar/bar.traineddata not the one generated in the /data/bar.traineddata

@royudev royudev closed this as completed Mar 19, 2020
@TheSYNcoder
Copy link

@royudev
I first tried copying the /data/lang.traineddata to /usr/local/share/tessdata , however
it said it Could not found lstm dictionaries.
Then i tried copying the /data/data/lang.traineddata , but it gave the same error as yours ,
so please can you elaborate how exactly you solved it ?

@Shreeshrii
Copy link
Collaborator

, however
it said it Could not found lstm dictionaries.

If you did not give a wordlist during training then the lstm dictionary will not be there. It is not a required item and you should be able to use the traineddata for recognition without it.

@TheSYNcoder
Copy link

@Shreeshrii I havn't given a wordlist during training , however i am still not able to follow why it is showing me the errors , can you please look at this error

@ConfuzedCoder
Copy link

Found the solution in forum. Check out this discussion https://groups.google.com/g/tesseract-ocr/c/KKTiag0VCFE/m/A4jpS6i6CwAJ

Let me know whether it solves your issue or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants