Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: upgrade to Arrow 37 and Datafusion 23 #1314

Merged
merged 10 commits into from
Apr 29, 2023
Merged

feat: upgrade to Arrow 37 and Datafusion 23 #1314

merged 10 commits into from
Apr 29, 2023

Conversation

rtyler
Copy link
Member

@rtyler rtyler commented Apr 28, 2023

Lots of fun API changes, Fields is the biggest impact in terms of lines of code however.

BREAKING CHANGE: new major versions for arrow and datafusion

This has lots of API changes and will need followup work
This feature makes it much easier to read/write JSON to parquet. This is
immediately useful in kafka-delta-ingest but I believe will be much more
generally useful for all consumers of the package
I figure we can use this feature for other JSONy things too
@github-actions github-actions bot added binding/rust Issues for the Rust crate rust labels Apr 28, 2023
@github-actions
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits
specification
for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

Most of what appears to have changed can be solved by a Vec<ArrowField>.into()
to convert to the Fields abstraction
@rtyler rtyler changed the title Upgrade to Arrow 37 and Datafusion 23 feat: Upgrade to Arrow 37 and Datafusion 23 Apr 28, 2023
@rtyler rtyler changed the title feat: Upgrade to Arrow 37 and Datafusion 23 feat: upgrade to Arrow 37 and Datafusion 23 Apr 28, 2023
@github-actions github-actions bot added the binding/python Issues for the Python package label Apr 28, 2023
This tracks changes in arrow 37 which switched to basically relying on
Arc<Field>
@wjones127 wjones127 self-requested a review April 28, 2023 23:47
Comment on lines -324 to -334
pub fn arrow_schema_json(&self) -> PyResult<String> {
let schema = self
._table
.get_schema()
.map_err(PyDeltaTableError::from_raw)?;
serde_json::to_string(
&<ArrowSchema as TryFrom<&deltalake::Schema>>::try_from(schema)
.map_err(PyDeltaTableError::from_arrow)?,
)
.map_err(|_| PyDeltaTableError::new_err("Got invalid table schema"))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this was used in our public API, and I know the arrow-rs folks wanted to remove the unofficial JSON schema serialization from their public API as well. I think it should be fine to drop this.

@rtyler rtyler marked this pull request as ready for review April 29, 2023 04:43
@rtyler rtyler enabled auto-merge (rebase) April 29, 2023 04:43
@rtyler rtyler merged commit 01b5994 into main Apr 29, 2023
@rtyler rtyler deleted the arrow-37 branch April 29, 2023 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate rust
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants