-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-227] Implemented Python geometry serializer as a native extension #767
[SEDONA-227] Implemented Python geometry serializer as a native extension #767
Conversation
The Github Action for building wheels is unable to run. Seems that Github does not allow running If using |
cb4bb57
to
634e196
Compare
634e196
to
a86c268
Compare
Just create https://issues.apache.org/jira/browse/INFRA-24203 Let's see if the ASF infra team will allow this GitHub action. |
@douglasdennis @umartin Doug and Martin, since you were following the new serializer improvement, any comment on this PR? |
I haven't worked with C in 20 years, but from my understanding, everything looks good. I don't have any objections to merging. |
python/src/geos_c_dyn_funcs.h
Outdated
|
||
/* The following are function pointers to GEOS C APIs provided by | ||
* libgeos_c. These functions must be called after a successful invocation of | ||
* `load_geos_c_functions` */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find this function. I probably missed it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function to initialize geos_c_dyn should be load_geos_c_library
or load_geos_c_from_handle
. I've fixed the comment.
python/src/geos_c_dyn_funcs.h
Outdated
* libgeos_c. These functions must be called after a successful invocation of | ||
* `load_geos_c_functions` */ | ||
|
||
#include "geos_c_dyn.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These headers include each other. Is that due to some nuance of C?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the recursive inclusion since geos_c_dyn_funcs.h is not meant to be a self-contained header file.
@Kontinuation Will this work out of the box with a pip install? Like, once this gets pushed to pypi, will I be able to just do |
In most cases, yes. We will release precompiled wheels for commonly used platforms. You'll still encounter problems on platforms not having matching wheel releases, such as Windows running on ARM64. If you do not have toolchains installed on your system, |
That's awesome! I'm excited to be able to use it. Thanks for putting so much work into this, this will all help me out quite a bit. |
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[SEDONA-XXX] my subject
.What changes were proposed in this PR?
This patch provides a Python geometry serializer implemented as a native extension. The original pure python serializer was still kept around as a fallback implementation when the native extension failed to load.
Please be acknowledged that the existence of native extensions will complicate the release process of apache-sedona python packages, since we have to build wheels for various CPython versions and platforms. There is a newly added Github Action for building wheels as a reference approach.
For platforms not covered by prebuilt wheels, users can easily install the package using the source distribution since the newly added extension does not require any third-party libraries to build.
How was this patch tested?
Multi-platform Compatibility
We've added a Github Action for testing the native extension on various platforms. We've also run tests on Apple M1, so it is also expected to work on various ARM64 platforms.
Performance
This is the result of running benchmarking code in #745 on an ECS instance with Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz. The python environment running the benchmark has shapely 2.0.0 installed.
Here is the benchmark result in a more comprehensive format (obtained by running rich-bench):
We've also run the benchmark with shapely 1.8.5. The performance improvement of deserialization was not that significant:
Did this PR include necessary documentation updates?