Issue while reading xls file in Pyspark #592
Unanswered
sarveshsood
asked this question in
Q&A
Replies: 1 comment
-
Can you try |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi All,
While running below code, I'm getting an error and not able to figure how to resolve.
from pyspark.sql import SparkSession
jars = [
r"C:\Users\Jars\Spark_jar_files\spark-excel_2.12-3.2.1_0.17.1.jar",
r"C:\Users\Jars\Spark_jar_files\poi-ooxml-schemas-4.1.2.jar",
r"C:\Users\Jars\Spark_jar_files\commons-collections4-4.4.jar",
r"C:\Users\Jars\Spark_jar_files\xmlbeans-3.1.0.jar",
r"C:\Users\Jars\Spark_jar_files\mysql-connector-java-8.0.29.jar"
]
spark = SparkSession
.builder
.appName("XLS Read")
.config("spark.jars", ",".join(jars))
.getOrCreate()
df = spark.read.format("excel")
.option("dataAddress", "'Data'!A2")
.option("header", "true")
.option("inferSchema", "false")
.load(
r'C:\Users\Jars\data_file.xlsx')
df.show(10)
error:
py4j.protocol.Py4JJavaError: An error occurred while calling o35.load.
: java.lang.NoClassDefFoundError: org/apache/logging/log4j/LogManager
at shadeio.poi.util.IOUtils.(IOUtils.java:43)
at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:222)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:185)
I want to read 'Data' sheet from multiple excel files. please help me sort this issue out or let me know if there is any working code available to read data from xls file.
Beta Was this translation helpful? Give feedback.
All reactions