Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: initial implementation of table features. #1755

Closed
wants to merge 0 commits into from

Conversation

hntd187
Copy link
Collaborator

@hntd187 hntd187 commented Oct 22, 2023

Description

Added initial support for parsing table features out and updating table state with the information for them. Also be aware the table features are only informational right now, they are not used or enforced anywhere currently.

Related Issue(s)

None I am aware of.

Documentation

https://delta.io/blog/2023-07-27-delta-lake-table-features/
https://github.com/delta-io/delta/blob/master/PROTOCOL.md#table-features

@github-actions github-actions bot added binding/rust Issues for the Rust crate rust labels Oct 22, 2023
@github-actions
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@michaelbrichko
Copy link

FYI, there is already an open PR on this: #1754

@MrPowers
Copy link
Collaborator

Thanks for opening this! It seems like this PR puts us on the right path. From what I can tell, it captures all the complexities of table features (e.g. columnMapping is a reader & writer feature whereas appendOnly is only a writer feature). It looks like the aliases line up with the aliases used in delta-io/delta.

It looks like we'll be able to use these to give a better user experience. Here's the current error when you try to read a Delta table with deletion vectors enabled: "DeltaProtocolError: The table's minimum reader version is 3but deltalake only supports up to version 1."

We'd much rather give an error message like this: "You are trying to read a Delta table with deletion vectors enabled, but read support for deletion vectors isn't supported yet".

It looks like there was a similar PR opened around the same time, which is also good, but this PR captures the nitty-gritty details that are important for us.

Whenever the transaction log protocol is updated, we'll need to manually update this file as well. That's pretty annoying, but I don't think there is any way around that.

@hntd187
Copy link
Collaborator Author

hntd187 commented Oct 26, 2023

Generally the Into impls are meant to define the minimum protocol (either reader or writer) necessary to support everything in a table. It lets you do things like

get_writer_features().map(Into::into).min() to find the minimum writer very needed for a table.

@rtyler rtyler changed the title Initial implementation of table features. feat: initial implementation of table features. Oct 26, 2023
Copy link

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just glanced through the PR. One comment about liquid table feature.

ICEBERG_COMPAT_V1,
/// liquid clustering support
#[serde(alias = "liquid")]
LIQUID,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a liquid table feature? I don't see it in the protocol.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only added it because I saw a test that read a liquid table and that table had liquid in the table features.

Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this looks reasonable. My one request is we make it so it's not an immediate parsing errors (and definitely not a panic) if there are features we don't recognize. We should expect there will be new features added, and we should give an appropriate error (ProtocolNotSupported or something like that) when this happens.

/// Features table readers can support as well as let users know
/// what is supported
#[derive(Serialize, Deserialize, Debug, Copy, Clone, Eq, PartialEq, Hash)]
pub enum ReaderFeatures {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having these features, but it does mean we error when we see a feature we don't recognize when parsing, right? There is a serde::other tag that I think we could use here. (docs).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like that lemme update for that

"deletionVectors" => ReaderFeatures::DELETION_VECTORS,
"timestampNtz" => ReaderFeatures::TIMESTAMP_WITHOUT_TIMEZONE,
"v2Checkpoint" => ReaderFeatures::V2_CHECKPOINT,
f => panic!("Unknown reader feature encountered: {}", f),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should warrant a panic. Just unsupported table error. I'd convert this into some Other variant and maybe let the error be handled elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the not having these functions panic.

Comment on lines 157 to 158
assert!(_table.get_state().reader_features().is_some());
assert!(_table.get_state().writer_features().is_some());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to assert the specific features here, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have more/better tests I'm going to check in prolly tomorrow.

@hntd187
Copy link
Collaborator Author

hntd187 commented Nov 2, 2023

I was doing a merge from main and did this, apologies re-opening another PR shortly.

wjones127 pushed a commit that referenced this pull request Nov 3, 2023
# Description
Hi again, this is table features again, but albeit with less work done
on the main branch like the smart guy I am.

# Related Issue(s)
#1755 prior review and information

# Documentation
https://delta.io/blog/2023-07-27-delta-lake-table-features/
https://github.com/delta-io/delta/blob/master/PROTOCOL.md#table-features

---------

Co-authored-by: Stephen Carman <stephen.carman@databricks.com>
natinimni pushed a commit to natinimni/delta-rs that referenced this pull request Jan 31, 2024
# Description
Hi again, this is table features again, but albeit with less work done
on the main branch like the smart guy I am.

# Related Issue(s)
delta-io#1755 prior review and information

# Documentation
https://delta.io/blog/2023-07-27-delta-lake-table-features/
https://github.com/delta-io/delta/blob/master/PROTOCOL.md#table-features

---------

Co-authored-by: Stephen Carman <stephen.carman@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants