-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add iceberg table properties for maintenance of previous metadata files #23260
Add iceberg table properties for maintenance of previous metadata files #23260
Conversation
6126969
to
e7c204e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc! Suggested rephrasing, please adapt my suggestions to fit the column formatting, and let me know if my suggestions change the meaning in a way you're not comfortable with.
e7c204e
to
0d8c08a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Thanks for the update! Pulled updated branch, new local doc build, looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor nits to make the core logic more understandable for future readers, otherwise LGTM
// TableMetadata#addPreviousFile builds up the metadata log and uses | ||
// TableProperties.METADATA_PREVIOUS_VERSIONS_MAX to determine how many files should stay in | ||
// the log, thus we don't include metadata.previousFiles() for deletion - everything else can | ||
// be removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was confusing to me at first because TableMetadata#addPreviousFile
isn't referenced in this logic, but rather indirectly when generating previousFiles
. I think we can rewrite it to be a bit more clear
// TableMetadata#addPreviousFile builds up the metadata log and uses | |
// TableProperties.METADATA_PREVIOUS_VERSIONS_MAX to determine how many files should stay in | |
// the log, thus we don't include metadata.previousFiles() for deletion - everything else can | |
// be removed | |
// metadata.previousFiles() is populated using TableProperties.METADATA_PREVIOUS_VERSIONS_MAX | |
// thus we don't include metadata.previousFiles() for deletion - everything else can be removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, it makes the meaning of the comment more clear. Fixed, please take a look when available!
presto-iceberg/src/main/java/com/facebook/presto/iceberg/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
3e2b4b3
0d8c08a
to
3e2b4b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the addition. Have suggested small doc level changes, please have a look.
Otherwise LGTM
presto-iceberg/src/main/java/com/facebook/presto/iceberg/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Config("iceberg.metadata-previous-versions-max") | ||
@ConfigDescription("The max number of previous metadata files exist in metadata log") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : Would it make more clear if we has a description The max number of old metadata files to keep
since this is something now user can control with this config property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it seems better, I have changed here as well as the corresponding description in IcebergTableProperties
and docs
. Please take a look when convenient. @agrawalreetika @steveburnett @ZacBlanco
ac93001
3e2b4b3
to
ac93001
Compare
@@ -355,6 +361,12 @@ Property Name Description | |||
``delete_mode`` Optionally specifies the write delete mode of the Iceberg ``merge-on-read`` | |||
specification to use for new tables, either ``copy-on-write`` | |||
or ``merge-on-read``. | |||
|
|||
``metadata_previous_versions_max`` Optionally specifies the max number of old metadata files to ``100`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would happen if metadata_delete_after_commit = false but metadata_previous_versions_max = 100? Would the files be deleted after 100? If not, shall we specify that metadata_previous_versions_max only works if metadata_delete_after_commit is set to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metadata_delete_after_commit = false
means that Iceberg wouldn't automatically delete the old and untracked metadata files after each commit. It wouldn't affect the functionality of metadata_previous_versions_max
, which specify how many previous metadata files would be tracked in current metadata.
For example, if we set metadata_previous_versions_max = 1
and metadata_delete_after_commit = false
on the creation of an Iceberg table, and execute several insert operations which will lead in several metadata files.
We will get only 1 previous metadata entry in current metadata's log ( through tableMetadata.previousFiles()
). But find that none of the metadata files were actually deleted.
On the other way, if we set metadata_previous_versions_max = 1
and metadata_delete_after_commit = true
and do the same things as above, we will find that only 2 metadata files remain.
ac93001
to
b7b287f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local docs build. Looks good, thanks!
Description
This PR enables to set Iceberg table properties
metadata_previous_versions_max
andmetadata_delete_after_commit
to specify the number of previous metadata files that need to be tracked, and whether to delete the metadata files that no longer be tracked after each commit.Besides, this PR also enable our own version of
HiveTableOperations
to support deleting old metadata files after commit based on the table properties. This feature has been supported by Iceberg's native version ofHiveTableOperations
.Motivation and Context
The metadata files of an iceberg table will grow to a very large amount, as almost each operation will lead in a new one.
So we need to provide a method to delete unnecessary metadata files.
Impact
When executing
show create table ...
, two additional table properties would be shown. Except that, there is no impact on existing use cases, as the default values are the same as before.Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.