Tesseract can't use traineddata #2927

royudev · 2020-03-19T17:24:58Z

I have train tesseract using tesstrain and I got a traineddata file from it, I then copied the said file to /usr/local/share/tessdata/

But when i tried to extract the text from the image i used to for training i got this error

Error: Tesseract (legacy) engine requested, but components are not present in /usr/local/share/tessdata/bar.traineddata!!
Failed loading language 'bar'
Tesseract couldn't load any languages!
Could not initialize tesseract.

here's the command i use

tesseract data/bar-ground-truth/alexis_ruhe01_1852_0018_022.tif stdout -l bar

tesseract version is

tesseract 5.0.0-alpha-635-g90405
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found SSE
 Found OpenMP 201307

here's /usr/local/share/tessdata directory

drwxr-xr-x 4 root root     4096 Mar 19 17:11 ./
drwxr-xr-x 8 root root     4096 Mar  5 16:48 ../
-rwxr-xr-x 1 root root     2364 Mar 19 17:11 bar.traineddata*
drwxr-xr-x 2 root root     4096 Mar  5 16:48 configs/
-rwxr-xr-x 1 root root 15400601 Mar  6 15:27 eng.traineddata*
-rw-r--r-- 1 root root      572 Mar  5 16:48 pdf.ttf
drwxr-xr-x 2 root root     4096 Mar  5 16:48 tessconfigs/

as you can see, bar.traineddata is in the said directory but it can't seems to use it
what could possibly wrong with it?

The text was updated successfully, but these errors were encountered:

royudev · 2020-03-19T17:41:07Z

tesseract --list-langs gives

List of available languages (3):
bar
eng
foo

I've also tried adding OCR options --oem 0 to 3, but none of them works

royudev · 2020-03-19T17:47:56Z

figured it out

I copied the traineddata in the data/bar/bar.traineddata not the one generated in the /data/bar.traineddata

TheSYNcoder · 2020-04-06T14:38:14Z

@royudev
I first tried copying the /data/lang.traineddata to /usr/local/share/tessdata , however
it said it Could not found lstm dictionaries.
Then i tried copying the /data/data/lang.traineddata , but it gave the same error as yours ,
so please can you elaborate how exactly you solved it ?

Shreeshrii · 2020-04-06T14:45:19Z

, however
it said it Could not found lstm dictionaries.

If you did not give a wordlist during training then the lstm dictionary will not be there. It is not a required item and you should be able to use the traineddata for recognition without it.

TheSYNcoder · 2020-04-06T15:20:07Z

@Shreeshrii I havn't given a wordlist during training , however i am still not able to follow why it is showing me the errors , can you please look at this error

ConfuzedCoder · 2021-05-06T12:52:15Z

Found the solution in forum. Check out this discussion https://groups.google.com/g/tesseract-ocr/c/KKTiag0VCFE/m/A4jpS6i6CwAJ

Let me know whether it solves your issue or not.

royudev changed the title ~~Tesseract can't used traineddata~~ Tesseract can't use traineddata Mar 19, 2020

royudev closed this as completed Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tesseract can't use traineddata #2927

Tesseract can't use traineddata #2927

royudev commented Mar 19, 2020

royudev commented Mar 19, 2020

royudev commented Mar 19, 2020

TheSYNcoder commented Apr 6, 2020

Shreeshrii commented Apr 6, 2020

TheSYNcoder commented Apr 6, 2020

ConfuzedCoder commented May 6, 2021

Tesseract can't use traineddata #2927

Tesseract can't use traineddata #2927

Comments

royudev commented Mar 19, 2020

royudev commented Mar 19, 2020

royudev commented Mar 19, 2020

TheSYNcoder commented Apr 6, 2020

Shreeshrii commented Apr 6, 2020

TheSYNcoder commented Apr 6, 2020

ConfuzedCoder commented May 6, 2021