Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Python connector kernel integration for CDF reads prototype #609

Conversation

PatrickJin-db
Copy link
Collaborator

@PatrickJin-db PatrickJin-db commented Dec 6, 2024

Known issues:
Timestamps may be incorrect, this is an issue with the cdf_prototype branch of delta-kernel-rs.

Tested via the following:

  1. cargo update && maturin develop in python/delta-kernel-rust-sharing-wrapper
  2. pip install . in python
  3. Ran some load_table_changes_as_pandas calls, using both versions and timestamps. For more details, see the script on slack.

Added some unit and integration tests too

@PatrickJin-db PatrickJin-db changed the title [DO NOT MERGE] Python connector kernel integration for CDF reads [DO NOT MERGE] Python connector kernel integration for CDF reads prototype Dec 6, 2024
@PatrickJin-db PatrickJin-db force-pushed the PatrickJin-db/python-delta-kernel-cdf_prototype branch from 737d42e to 86319f3 Compare December 6, 2024 02:31
@PatrickJin-db PatrickJin-db force-pushed the PatrickJin-db/python-delta-kernel-cdf_prototype branch from 86319f3 to c401e39 Compare December 6, 2024 02:33
Copy link
Collaborator

@linzhou-db linzhou-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good!
The next steps:

  1. what's the plan for unit test? is there a way to unit test without support delta format sharing in the oss server cdf?
  2. what's a list of tables we want to test?
  3. what's the largest table we want to test on?
  4. what's the release plan?

# Construct map from version to actions that took place in that version
for line in lines:
line_json = loads(line)
file = line_json["file"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there could be metadata in lines, that needs to be handled.
https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#metadata-in-delta-format

@PatrickJin-db PatrickJin-db force-pushed the PatrickJin-db/python-delta-kernel-cdf_prototype branch from 8749e73 to ae1bdef Compare December 10, 2024 10:57
@PatrickJin-db PatrickJin-db force-pushed the PatrickJin-db/python-delta-kernel-cdf_prototype branch from ae1bdef to 2b96cfe Compare December 11, 2024 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants