-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
could not initialize class com.optimaize.langdetect.profiles.BuiltInLanguages when running on spark #72
Comments
A bit of a late help for anybody that might have the same problem and find this (as I did), based on everything I've been able to found and which I hope is helpful enough to anybody with this problem, since I have found quite a few people with this problem, but not so many complete solutions. As it's my first time with Spark and I was also suffering with the not-serializables tasks, trying to instantiate things in the workers themselves and a lot more, it's wasn't easy to pinpoint that this was the real problem, since it could have been a lot of things. Be aware that I'm not expert at Spark by no means though.
static {
List<LdLocale> names = new ArrayList<>();
names.add(LdLocale.fromString("af"));
names.add(LdLocale.fromString("an"));
[...]
} and LdLocale.fromString(String) does something like [...]
List<String> strings = Splitter.on('-').splitToList(string);
[...]
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>19.0</version>
</dependency>
</dependencies>
</dependencyManagement>
The code they used: //List<String> strings = Splitter.on('-').splitToList(string);
List<String> strings = new ArrayList<String>();
String[] stringParts = string.split("-");
for (String stringpart: stringParts){
strings.add(stringpart);
} The repository with the code: The Jira issue where I found the repository: |
adding |
I tried the "userClassPathFirst" strategy (Spark 2.3.1), but unfortunately adding that config seemed to bork something else unrelated in Spark. Possibly Spark depends on the behavior of the older version of Guava and going up to Guava 19 makes it blow up? Hard to tell. |
The GitHub fork in the comment by @DanielGSM has a jar file that can be dropped into projects to solve this. Probably the easiest solution. |
I get error java.lang.NoClassDefFoundError:could not initialize class com.optimaize.langdetect.profiles.BuiltInLanguages when using language-detector in spark. I'm using suggested method to load the profile.
List<LanguageProfile> languageProfiles = new LanguageProfileReader().readAllBuiltIn(); LanguageDetector languageDetector = LanguageDetectorBuilder.create(NgramExtractors.standard()).withProfiles(languageProfiles).build();
here is some stacktrace :
16/11/30 17:36:05 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerTaskEnd(0,0,ResultTask,ExceptionFailure(java.lang.NoClass$ efFoundError,Could not initialize class com.optimaize.langdetect.profiles.BuiltInLanguages,[Ljava.lang.StackTraceElement;@4a5ae036,java.lang.NoClassDefFoundError: C$ uld not initialize class com.optimaize.langdetect.profiles.BuiltInLanguages at com.optimaize.langdetect.profiles.LanguageProfileReader.readAllBuiltIn(LanguageProfileReader.java:118) at com.ebdesk.ph.nlp_sentence.TwitterWord2Vec$1.call(TwitterWord2Vec.java:86) at com.ebdesk.ph.nlp_sentence.TwitterWord2Vec$1.call(TwitterWord2Vec.java:1) at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1028) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:214) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:919) at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:910) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:910) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:668) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330) at org.apache.spark.rdd.RDD.iterator(RDD.scala:281) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
The text was updated successfully, but these errors were encountered: