Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Spark 3.5 and Delta 3.0 [Spark] [Delta] #220

Closed
osopardo1 opened this issue Oct 23, 2023 · 4 comments
Closed

Update to Spark 3.5 and Delta 3.0 [Spark] [Delta] #220

osopardo1 opened this issue Oct 23, 2023 · 4 comments
Assignees

Comments

@osopardo1
Copy link
Member

Following the news on Delta and Spark releases we should update the qbeast-spark libraries to keep up to date with the latest features.

This is the summary of the new developments included in both releases.

Spark 3.5

  • Scala and Go client support in Spark Connect.
  • Structured Streaming support for Spark Connect in Python and Scala
  • Introduce Arrow Python UDFs
  • PyTorch-based distributed ML Support

Read the full notes here.

Delta 3.0

  • Delta Universal Format: allow you to read Delta tables with Hudi and Iceberg clients.
  • Delta Kernel: decoupling reading and writing for building Delta Connectors.
  • Delta Spark: change on package name (before was delta-core)
  • Support for Spark 3.5
  • Better performance for Deletion Vectors

Read the full notes here.

@osopardo1
Copy link
Member Author

For the record: this update would be for main-1.0.0

@osopardo1 osopardo1 self-assigned this Dec 4, 2023
@osopardo1
Copy link
Member Author

osopardo1 commented Dec 15, 2023

Some things I am experiencing with the versions upgrade:

  • withNewTransaction method from Delta is deprecated. Now they enforce to pass a series of new arguments as an Option to commit to a table:
    • CatalogTable (if exists)
    • Snapshot (if exists)
    • Still need to read the code and understand it's role in the Metadata Writer.
  • Properties/options are not passed as usual to the saveAsTable / Catalog code. A.k.a:columnsToIndex is not propagated properly and throws an AnalysisException when writing to Qbeast.
  • Delta has a new trait called TableChanges that can be confused for our own TableChanges interface in package core. Be careful when importing.

@osopardo1
Copy link
Member Author

This feature is merged in 1.0.0-main!

@osopardo1
Copy link
Member Author

Merged on #284

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant