Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark Issue + Could not initialize class org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter #2261

Closed
mokumar1202 opened this issue May 12, 2020 · 7 comments
Milestone

Comments

@mokumar1202
Copy link

Hi All,

I would need guidance on this please. I upgraded spark to 2.3 from 2.1 and by doing that I am not able to write data to hadoop (I get the below error). I have found that version of the Parquet jar files is 1.8.2, i have also tried with v1.10.0 but no luck

Caused by: java.lang.NoSuchMethodError: org.apache.parquet.column.ParquetProperties.(ILorg/apache/parquet/column/ParquetProperties$WriterVersion;

I also get the below error.
java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$

@heuermh
Copy link
Member

heuermh commented May 12, 2020

Hello @mokumar1202, thank you for submitting this issue!

Yes, we have had a lot of trouble with conflicting Parquet (& Avro) dependencies in Spark over the years, I see you already found issue #1742

Which Spark version are you using (2.3.4 is the most recent 2.3.x)?

Which ADAM version are you using? Git head builds against Parquet 1.10.1; we're blocked from upgrading to Parquet 1.11.x (see #2245)

@mokumar1202
Copy link
Author

Yes, i was on version 2.1 and it was all well. I am an administrator and having to do this upgrade so the user community in my organisation can benefit from the new methods/functions in spark 2.3.0

Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/
/ .__/_,// //_\ version 2.3.0
/
/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)
Type in expressions to have them evaluated.
Type :help for more information.

@mokumar1202
Copy link
Author

Also i should have said it before, I am not using ADAM shell. I am testing the spark query from RStudio so just a bash shell. Have i raised this issue in the wrong place?

@heuermh
Copy link
Member

heuermh commented May 12, 2020

Thank you for the clarification! Does the Spark query in RStudio use ADAM as a dependency?

If not, then you might want to try the Apache Spark user mailing list. There are a lot of helpful folks on there. See https://spark.apache.org/community.html

@mokumar1202
Copy link
Author

No it does not use ADAM. I will reach out to the Apache Spark community.

@heuermh heuermh added this to the 0.32.0 milestone May 12, 2020
@heuermh
Copy link
Member

heuermh commented May 12, 2020

Ah, ok, thank you. Hope that helps!

@heuermh heuermh closed this as completed May 12, 2020
@mokumar1202
Copy link
Author

I have found a solution, Parquet jar files of version 1.8.3 fixed it and i am able to write data to hadoop. Thought i will post here, if it helps someone else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants