-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-406] Raster deserializer for PySpark #1281
[SEDONA-406] Raster deserializer for PySpark #1281
Conversation
The R test failure is unrelated to this PR. The recent updates of sparklyr or dbplyr caused this problem. See sparklyr/sparklyr#3429. |
f903332
to
5aea3b1
Compare
2. Compile the Sedona Scala and Java code with `-Dgeotools` and then copy the ==sedona-spark-shaded-{{ sedona.current_version }}.jar== to ==SPARK_HOME/jars/== folder. | ||
2. Put JAI jars to ==SPARK_HOME/jars/== folder. | ||
``` | ||
export JAI_CORE_VERSION="1.1.3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put these jars in geotools-wrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These jars are already in geotools-wrapper, so we can instead put the geotools-wrapper jar to the $SPARK_HOME/jars/
folder, and build spark-shaded jar without -Dgeotools
. However, this won't be able to test dependency changes such as adding jiffle as a new dependency.
I can update the document to use geotools-wrapper instead of directly using JAI jars, since it is much easier (no need to rebuild with -Dgeotools
for testing sedona python), and covers most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine then. We don't need to update this.
@@ -583,6 +583,44 @@ SELECT RS_AsPNG(raster) | |||
|
|||
Please refer to [Raster writer docs](../../api/sql/Raster-writer) for more details. | |||
|
|||
## Collecting raster Dataframes and working with them locally in Python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add one more section to explain how to write a regular Python User Defined Function (not Pandas UDF) to work on the raster type? I understand that the UDF cannot return a raster type directly since we only have a Python deserializer, but with the RS_MakeRaster()
+ NumPy array, we can still construct the raster type. It is important to show this workflow. Maybe we can show this in a separate Doc PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New section added.
5aea3b1
to
b7c4881
Compare
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[SEDONA-XXX] my subject
.What changes were proposed in this PR?
API changes
This PR adds a new class
SedonaRaster
to sedona python package. Raster objects in sedona will be converted toSedonaRaster
objects in python when collecting raster objects in PySpark:Users can define PandasUDFs taking raster object as parameter. Please use the
deserialize
function insedona.raster.raster_serde
module to deserialize the bytes toSedonaRaster
object before processing it. Please note that this only works with Spark >= 3.4.0.Internal changes
How was this patch tested?
Added new tests
Did this PR include necessary documentation updates?