Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APache-sedona Failure #1688

Closed
tony189 opened this issue Nov 21, 2024 · 6 comments · Fixed by #1692
Closed

APache-sedona Failure #1688

tony189 opened this issue Nov 21, 2024 · 6 comments · Fixed by #1692

Comments

@tony189
Copy link

tony189 commented Nov 21, 2024

Installing JAR libraries from initScript and then apache-sedona 1.6.0 or 1.6.1 makes imposible to execute any notebook. throwing the error

Failure starting repl. Try detaching and re-attaching the notebook.

at com.databricks.spark.chauffeur.ExecContextState.processInternalMessage(ExecContextState.scala:347)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)

Doesn't matther if reattach, create new notebook ,or restart... always fails. Without apache-sedona everything works

Settings

Sedona version = sedona-spark-shaded-3.4_2.12-1.6.1.jar geotools-wrapper-1.6.1-28.2.jar

Apache Spark version = 3.5.0

Environment = Azure, Databricks

Copy link

Thank you for your interest in Apache Sedona! We appreciate you opening your first issue. Contributions like yours help make Apache Sedona better.

@Kontinuation
Copy link
Member

Do you have additional python libraries installed (including the apache-sedona python library)? I've seen similar issues before and resolved it by adding 2 more dependencies for pinning the versions of numpy and pandas:

  • numpy<1.24
  • pandas==1.5.3

According to the linked issue, Installing rasterio<1.4.0 before installing sedona also resolves this problem. If this does not resolve this problem, you can head to the "Driver logs" of your Databricks cluster to gather more information about this problem.

@jiayuasu
Copy link
Member

In addition, if you use Spark 3.5.0, the sedona jar version should be sedona-spark-shaded-3.5_2.12-1.6.1.jar not 3.4

@tony189
Copy link
Author

tony189 commented Nov 22, 2024

I didn't realize what you point @jiayuasu, I changed it but still didn't wok.

Installing numpy and pandas as @Kontinuation said, fix the problem... even though both libraries are included in every databricks cluster as a standard config with those exact conditions....

However in this way the cluster takes ages to start... I think I'm gona stuck with the sql and forget about python.

@jiayuasu
Copy link
Member

jiayuasu commented Nov 22, 2024

@Kontinuation It think it might be the best if we remove rasterio from the mandatory dependency of Sedona 1.7.0?

@Kontinuation
Copy link
Member

@Kontinuation It think it might be the best if we remove rasterio from the mandatory dependency of Sedona 1.7.0?

Good point. There are many other rasterio related issues reported by users because it is hard to get rasterio and GDAL installed on some particular environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants