Spark Issue + Could not initialize class org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter #2261

mokumar1202 · 2020-05-12T14:37:09Z

Hi All,

I would need guidance on this please. I upgraded spark to 2.3 from 2.1 and by doing that I am not able to write data to hadoop (I get the below error). I have found that version of the Parquet jar files is 1.8.2, i have also tried with v1.10.0 but no luck

Caused by: java.lang.NoSuchMethodError: org.apache.parquet.column.ParquetProperties.(ILorg/apache/parquet/column/ParquetProperties$WriterVersion;

I also get the below error.
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$

heuermh · 2020-05-12T14:48:36Z

Hello @mokumar1202, thank you for submitting this issue!

Yes, we have had a lot of trouble with conflicting Parquet (& Avro) dependencies in Spark over the years, I see you already found issue #1742

Which Spark version are you using (2.3.4 is the most recent 2.3.x)?

Which ADAM version are you using? Git head builds against Parquet 1.10.1; we're blocked from upgrading to Parquet 1.11.x (see #2245)

mokumar1202 · 2020-05-12T14:53:52Z

Yes, i was on version 2.1 and it was all well. I am an administrator and having to do this upgrade so the user community in my organisation can benefit from the new methods/functions in spark 2.3.0

Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
// .__/_,// //_\ version 2.3.0
//

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)
Type in expressions to have them evaluated.
Type :help for more information.

mokumar1202 · 2020-05-12T15:13:21Z

Also i should have said it before, I am not using ADAM shell. I am testing the spark query from RStudio so just a bash shell. Have i raised this issue in the wrong place?

heuermh · 2020-05-12T15:42:33Z

Thank you for the clarification! Does the Spark query in RStudio use ADAM as a dependency?

If not, then you might want to try the Apache Spark user mailing list. There are a lot of helpful folks on there. See https://spark.apache.org/community.html

mokumar1202 · 2020-05-12T16:45:09Z

No it does not use ADAM. I will reach out to the Apache Spark community.

heuermh · 2020-05-12T18:59:08Z

Ah, ok, thank you. Hope that helps!

mokumar1202 · 2020-05-12T21:23:59Z

I have found a solution, Parquet jar files of version 1.8.3 fixed it and i am able to write data to hadoop. Thought i will post here, if it helps someone else.

heuermh added this to the 0.32.0 milestone May 12, 2020

heuermh closed this as completed May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark Issue + Could not initialize class org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter #2261

Spark Issue + Could not initialize class org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter #2261

mokumar1202 commented May 12, 2020

heuermh commented May 12, 2020

mokumar1202 commented May 12, 2020

mokumar1202 commented May 12, 2020

heuermh commented May 12, 2020

mokumar1202 commented May 12, 2020

heuermh commented May 12, 2020

mokumar1202 commented May 12, 2020

Spark Issue + Could not initialize class org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter #2261

Spark Issue + Could not initialize class org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter #2261

Comments

mokumar1202 commented May 12, 2020

heuermh commented May 12, 2020

mokumar1202 commented May 12, 2020

mokumar1202 commented May 12, 2020

heuermh commented May 12, 2020

mokumar1202 commented May 12, 2020

heuermh commented May 12, 2020

mokumar1202 commented May 12, 2020