Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed loading language Tesseract couldn't load any languages! #34

Closed
eshvan opened this issue Apr 25, 2016 · 35 comments
Closed

Failed loading language Tesseract couldn't load any languages! #34

eshvan opened this issue Apr 25, 2016 · 35 comments

Comments

@eshvan
Copy link

eshvan commented Apr 25, 2016

Tess4J version - 3.1.0
Tesseract version - Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Os - OSX 10.11.3 El Capitan

//TESS4J_FOLDER_PATH = "/usr/local/Cellar/tesseract/3.04.01_1/share/' - location lang .traineddata
instance = new Tesseract();
instance.setDatapath(TESS4J_FOLDER_PATH);
instance.setLanguage("chi_tra");

String result = "";
File imageFile = new File(filePath);
try {
     result = instance.doOCR(imageFile);
} catch (TesseractException e) {
    System.err.println(e.getMessage());
}
return result;

i got error

Failed loading language 'chi_tra'
Tesseract couldn't load any languages!
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000012408d311, pid=68062, tid=0x0000000000001703
#
# JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build 1.8.0_92-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libtesseract.dylib+0x13311]  _ZN9tesseract9Tesseract15recog_all_wordsEP8PAGE_RESP10ETEXT_DESCPK4TBOXPKci+0xb9
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
@nguyenq
Copy link
Owner

nguyenq commented Apr 25, 2016

What is the path to your tessdata folder, which should already have the .traineddata files?

@eshvan
Copy link
Author

eshvan commented Apr 25, 2016

same as commented line.
"/usr/local/Cellar/tesseract/3.04.01_1/share/'
screen shot 2016-04-25 at 20 16 00

@nguyenq
Copy link
Owner

nguyenq commented Apr 26, 2016

Can you load any other language, for instance, eng?

@eshvan
Copy link
Author

eshvan commented Apr 26, 2016

Yes any other lang works fine.Seems problem with two chi_tra and chi_sim

@nguyenq
Copy link
Owner

nguyenq commented Apr 26, 2016

So the problem is with specific .traineddata files. They might have been corrupted during download. I do not have any issue with the copy I currently have.

image

@eshvan
Copy link
Author

eshvan commented Apr 26, 2016

  1. Re-download traineddata from https://github.com/tesseract-ocr/tessdata didn't help
  2. Checked again it fails with all large traineddata files, more ~20mb
  3. It works from console if i simply run
    tesseract imagefile output -l chi_tra
  4. And it works on windows machine
    Any suggestions ?

@nguyenq
Copy link
Owner

nguyenq commented Apr 26, 2016

I just tested using the latest chi_*.traineddata versions with VietOCR and had no problems. The program ran with -Xms128m -Xmx1024m option. Since those data files are large, you may want to give more memory to your program.

@4F2E4A2E
Copy link
Collaborator

@eshvan is the problem solved now?

@eshvan
Copy link
Author

eshvan commented Apr 28, 2016

nope still no luck.
i tried to assign -Xms128m -Xmx1024m for Java Runtime Environment didn't help. I use intellij idea to run code it has same settings.

@nguyenq
Copy link
Owner

nguyenq commented Apr 29, 2016

We still can't reproduce your issue. Tess4J unit tests would run on Windows 10 with no problem. We don't have a Mac to try on. Can you try executing the unit tests from the console?

@dong77
Copy link

dong77 commented May 2, 2016

I ran into the very same problem on my mac - cannot load 'chi_sim' at all.

@dong77
Copy link

dong77 commented May 2, 2016

mvn test fails on osx El Capitan.

21:18:20.159 [main] INFO n.sourceforge.tess4j.Tesseract1Test - createDocuments for an image 21:18:20.308 [main] ERROR net.sourceforge.tess4j.Tesseract1 - Invalid calling convention 63 java.lang.IllegalArgumentException: Invalid calling convention 63 at com.sun.jna.Native.createNativeCallback(Native Method) at com.sun.jna.CallbackReference.<init>(CallbackReference.java:239) at com.sun.jna.CallbackReference.getFunctionPointer(CallbackReference.java:413) at com.sun.jna.CallbackReference.getFunctionPointer(CallbackReference.java:395) at com.sun.jna.Function.convertArgument(Function.java:541) at com.sun.jna.Function.invoke(Function.java:305) at com.sun.jna.Library$Handler.invoke(Library.java:236) at com.sun.proxy.$Proxy10.gsapi_set_stdio(Unknown Source) at org.ghost4j.Ghostscript.initialize(Ghostscript.java:323) at net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Png(PdfUtilities.java:103) at net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Tiff(PdfUtilities.java:48) at net.sourceforge.tess4j.Tesseract1.createDocuments(Tesseract1.java:500) at net.sourceforge.tess4j.Tesseract1Test.testCreateDocuments(Tesseract1Test.java:211) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) Warning in pixReadMemPng: work-around: writing to a temp file Can not open file "/pdf.ttf"! 21:18:20.320 [main] INFO n.sourceforge.tess4j.Tesseract1Test - doOCR on a PDF document 21:18:20.320 [main] INFO n.sourceforge.tess4j.Tesseract1Test - doOCR on a PNG image with UNLV zone file .uzn 21:18:20.571 [main] INFO n.sourceforge.tess4j.Tesseract1Test - & duck/goose, as 12.5% of E-mail

@eshvan
Copy link
Author

eshvan commented May 2, 2016

attached full log
log.txt

@dracupid
Copy link

This issue seems just like #26. I came across the same problem under OSX 10.11(tesseract 3.04). All language data larger than about 20-25 MB cannot be loaded. However, it works well under Ubuntu 16.04, tesseract 3.03.

  • I have tried run tesseract(3.04, OSX 10.11) as a C++ lib directly and it works well.Is it possible that this issue is caused by JNA? I have tried -Xms128m -Xmx1024m but it has no effect.
  • Has anyone tried an old version of tesseract (<=3.03) under OSX?

@tonydeng
Copy link

@dracupid
Seemingly completely tess4j problem, the direct implementation of the Tesseract command is no problem

@nguyenq
Copy link
Owner

nguyenq commented Jul 28, 2016

@dracupid, I suspect something with JNA as well because that, beside Tesseract binary, is the piece that has platform-specific components. Tess4J works fine with any language data on Windows and Linux. We do not have a OS X system to perform testing on, so that would depend on the users to carry out.

I suggest that you download JNA source and step through it for debugging the issue. Hope that would be the source of the problem, or the investigation may even require stepping into native codes of Tesseract.

How about trying out this before looking into JNA? Play with "eng" first. Once you got the program to work, rename the .traineddata file that you have problem with as eng.traineddata and run the program again. Do not try on a PDF -- stay with a simple image.

@smartlan
Copy link

smartlan commented Dec 5, 2016

seems it can not load chi_sim data in OS X system !!!!

@dongqiangqiang
Copy link

I also have the problem,and I downoad the chi_sim.traineddata, but the problem is also show

@vitasoft
Copy link

vitasoft commented Mar 16, 2017

I'm using tess4j 3.3.0 on Mac with Java8

"kor.traineddata" have the same problem!

Failed loading language 'kor'

Eng,Spa and other western language work well.

But in case of east asian language don't work.

"kor", "chi_sim", "chi_tra" don't work

This issue is only Mac user's problem?

Is there anybody find a solution?

for reference, Tesseract command works well in every languages

@nuclearg
Copy link

Now I meet the same problem... On OSX, there is only a simple message:

Failed loading language 'chi_sim'

and tesseract command works very well.

Any one can help?

@Mao-x-w
Copy link

Mao-x-w commented Jan 10, 2018

I had met the same problem in mac if the problem was solved, please tell me

@justinchuntingho
Copy link

I have the same problem too.

@NoobDoesMC
Copy link

NoobDoesMC commented Jan 14, 2018

I get this error

@nguyenq
Copy link
Owner

nguyenq commented Jan 14, 2018

It may be related to Tesseract issue #1250.

@delonzhou
Copy link

i have the same issue too.

@nguyenq
Copy link
Owner

nguyenq commented Aug 5, 2018

Did anyone try out with changing locale, as suggested in Tesseract issue?

export LC_ALL=C

@hyunseo0404
Copy link

hyunseo0404 commented Aug 24, 2018

Just tried setting LC_CTYPE env variable to C (LC_CTYPE=C) and it worked!
I've tried all of the above mentioned language data (kor, chi_sim, chi_tra) and they now all seem to work fine after changing the locale.

I suggest everyone else having this problem to try this as well.

@parmarmanoj007
Copy link

get location of ur tessdata folder by typing in command prompt:
$ brew list tesseract
in may case:
/usr/local/Cellar/tesseract/3.05.01/bin/tesseract
/usr/local/Cellar/tesseract/3.05.01/include/tesseract/ (27 files)
/usr/local/Cellar/tesseract/3.05.01/lib/libtesseract.3.dylib
/usr/local/Cellar/tesseract/3.05.01/lib/pkgconfig/tesseract.pc
/usr/local/Cellar/tesseract/3.05.01/lib/ (2 other files)
/usr/local/Cellar/tesseract/3.05.01/share/man/ (11 files)
/usr/local/Cellar/tesseract/3.05.01/share/tessdata/ (28 files)

now
tessdata_dir_config = r'--tessdata-dir "/usr/local/Cellar/tesseract/3.05.01/share/tessdata"'

txt= image_to_string(img,lang='eng',config=tessdata_dir_config)

@zcaudate
Copy link

Can someone explain why export LC_CTYPE=C works? In the issue, someone also mentioned that there may be potentially messy outcomes in java and python. Does anyone know what that might be?

@4F2E4A2E
Copy link
Collaborator

4F2E4A2E commented Apr 9, 2019

This is the best explanation I've found so far: https://www.gnu.org/software/libc/manual/html_node/Locale-Categories.html

@iseegr8tfuldeadppl
Copy link

iseegr8tfuldeadppl commented May 6, 2019

Make sure the environment variable TESSDATA_PREFIX is set to your tessdata directory!
(for ex. C:\msys64\mingw32\share\tessdata).

@nguyenq
Copy link
Owner

nguyenq commented Jul 26, 2019

According to tesseract-ocr/tesseract#1250, the problem has been resolved in Tesseract 4.1.0 for macOS. Can someone verify with Tess4J 4.4.0?

@nguyenq nguyenq closed this as completed Jul 31, 2019
@lokesh-stack
Copy link

can some one help me with following error:

lokesh@Biu:~$ tesseract /home/lokesh/Desktop/tess_imgs/1.tif ~/Desktop/out -l foo
Failed loading language 'foo'
Tesseract couldn't load any languages!
Could not initialize tesseract.

@johnpili
Copy link

@lokesh-stack I had similar problem too. I resolved it by downloading the raw tessdata.

https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata?raw=true

can some one help me with following error:

lokesh@Biu:~$ tesseract /home/lokesh/Desktop/tess_imgs/1.tif ~/Desktop/out -l foo
Failed loading language 'foo'
Tesseract couldn't load any languages!
Could not initialize tesseract.

@EliasPereirah
Copy link

EliasPereirah commented Dec 10, 2023

This happened to me, in my case I was downloading the data trained with curl, the problem is that I was downloading:
https://github.com/tesseract-ocr/tessdata_best/blob/main/example.traineddata
instead of:
https://github.com/tesseract-ocr/tessdata_best/raw/main/example.traineddata

just change /blob/ to /raw/ in the URL, can be the case for someone else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests