-
Notifications
You must be signed in to change notification settings - Fork 697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SEDONA-429][SEDONA-430] Support specifying GeoParquet spec version number and CRS #1162
Conversation
1717a76
to
4f856f5
Compare
@@ -14,12 +14,7 @@ | |||
package org.apache.spark.sql.execution.datasources.parquet | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a new data source to only read the metadata of a parquet file? This is crucial for entry-level users to explore an unknown parquet file including geoparquet. In our geoparquet case, this will help user know the projjson value since we are not able to properly parse it to a known epsg code.
I understand that a Spark DataFrame only allows the schema to be the metadata which cannot be used to hold such information.
So I suggest that we add a new data source namely geoparquet.metadata
, which loads these metadata using ParquetFileReader
. One good example is from DuckDB: https://duckdb.org/docs/data/parquet/metadata.html
This can be addressed in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a JIRA ticket for this: https://issues.apache.org/jira/browse/SEDONA-455
Let's address this in a separate PR.
...c/main/scala/org/apache/spark/sql/execution/datasources/parquet/GeoParquetWriteSupport.scala
Outdated
Show resolved
Hide resolved
… versions of Hadoop
8d108a1
to
921d79b
Compare
…umber and CRS (apache#1162) * Support geoparquet.version and geoparquet.crs option for Spark 3.0~3.3 * Add tests for geoparquet.version and geoparquet.crs options for Spark 3.0~3.3 * Add documentation for geoparquet.version and geoparquet.crs options * Apply this patch on Spark 3.4 and Spark 3.5 * Remove `Configuration.getPropsWithPrefix` to be compatible with lower versions of Hadoop * Add notes about crs metadata in GeoParquet files * Allow omitting CRS by setting geoparquet.crs to "" (empty string) * Set default crs metadata to null * Apply to Spark 3.4 and Spark 3.5 * Explain the behavior of geoparquet.crs option
Note: This PR depends on #1161
Did you read the Contributor Guide?
Is this PR related to a JIRA ticket?
[SEDONA-XXX] my subject
.What changes were proposed in this PR?
1.0.0-beta.1
to1.0.0
geoparquet.version
optiongeoparquet.crs
optionHow was this patch tested?
Add new tests for GeoParquet metadata.
Did this PR include necessary documentation updates?