-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rewrite_position_delete_files leads to error #8045
Comments
Looks like this was caused by partition transformation/hidden partition where column If you can, can you share how your original rewrite_position_delete_files command so we can try to repro? |
I think I know the issue. It is part of the code to do 'removeDanglingDeletes'. For each partition of delete files, I am trying to find 'live' data files so I can do the clean up. In this method, https://github.com/apache/iceberg/blob/master/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackPositionDeletesRewriter.java#L122 , I use the DeleteFile's partition data directly , to query data_files table. I thought it would work as the data_files table is using the transformed partition values, just as the DeleteFile partition data should have. But the partition data of DeleteFile is not the same type as exposed in the Spark metadata table... In particular, there's a difference of logical and real Avro types as defined in spec: https://iceberg.apache.org/spec/#avro Summary: the issue does not affect specifically partition transforms. It affects partitions that has an Avro type != logical Avro type . ie date, time, etc. Im investigating a fix involving adding a conversion from Avro type to logical type. |
No argument for |
I have a fix here: #8059 |
@szehon-ho Thanks for the fix. I am facing the same issue on iceberg 1.3.0 while trying to remove delete files using proc So my questions to you is how can we remove delete files if we are still using 1.3.0 ? Is it somehow possible to manually remove reference of delete files without corrupting the metadata ? Thanks for your help.
|
And we are still using spark 3.3.1 so is there any way to get around this issue without upgrading to spark 3.4 and iceberg 1.3.1. |
Apache Iceberg version
1.3.0 (latest release)
Query engine
Spark
Please describe the bug 🐞
While testing https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_position_delete_files I see the following error:
Partition specification:
Source fields:
The text was updated successfully, but these errors were encountered: