Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2023 H1 Roadmap #1128

Closed
3 of 21 tasks
wjones127 opened this issue Feb 5, 2023 · 7 comments
Closed
3 of 21 tasks

2023 H1 Roadmap #1128

wjones127 opened this issue Feb 5, 2023 · 7 comments
Labels
help wanted Extra attention is needed

Comments

@wjones127
Copy link
Collaborator

wjones127 commented Feb 5, 2023

Work committed to

These are projects current contributors are working on.

  • (P0) Data Acceptance Tests running in CI (@wjones127)
  • (P0) Fully protocol compliant optimistic commit protocol (conflict resolution). - (feat: optimistic transaction protocol #632) (@roeap)
  • (P0) ADBC driver: create / read / append / overwrite (@wjones127)
    • Lay foundation for DuckDB plugin, more language bindings (R), and cross-language Polars support (R and Javascript, in addition to Python)
  • (P1) Python bindings integrated with ADBC driver (@wjones127)
    • ADBC to supersede PyArrow-based reader / writer.
  • (P0) Remove experimental marker from Python writer (@wjones127)
  • (P0) Writer version 2 support in operation module (@wjones127)
  • (TBD) Provide async features in the Python binding (@fvaleye)
  • (TBD) Airbyte <> Delta Lake integration (@fvaleye)
  • More Rust documentation
    • Figure out where to host
    • Figure out SEO
    • Probably migrate off of github.io
  • Blog posts (@MrPowers)
    • PyO3 blog post good for Rust audience
    • Content for Azure. Developer advocacy arm of Azure is very impressive. They spread this message.
    • Usage of the Python module is more compelling
    • Kafka-delta-ingest reduced writer cost 25 times. Christian & Tyler co-authors.
  • Purge Ruby bindings. They’re not usable.

Projects seeking contributors

In addition to smaller issues labelled good-first-issue, these are some larger projects that we could use some help on. Most of them will be implemented as part of the operations module in the Rust source and can later be exposed to Python and other bindings.

@wjones127 wjones127 pinned this issue Feb 5, 2023
@houqp houqp added the help wanted Extra attention is needed label Feb 5, 2023
@MrPowers
Copy link
Contributor

MrPowers commented Feb 6, 2023

This looks great! Really excited!

Some blog post ideas:

  • deltalake 0.7.0 post explaining the new features
  • Delta Lake + AWS Lambda (from the aws-sdk-pandas work being done by @nkarpov)
  • Why delta-rs is switching to ADBC (I think the Rust data community would be interested in this one)

Let me know if I should make issues for the blog posts. I'm fine tracking them elsewhere too. I'll want delta-rs community reviews, but we can just do those in the Slack chat. Thanks for putting this together.

@saivarunk
Copy link

@MrPowers I'm interested in taking up Delta Lake + AWS Lambda blog post. Can you help me out with the process?

@ion-elgreco
Copy link
Collaborator

@wjones127 maybe a silly question but why would you still need the Operations API that only uses data fusion (in rust) after introducing the ADBC API?

From the design document I can see any query engine can potentially be used with ADBC.

@FlavioDiasPs
Copy link

Why implement optimize and zorder when databricks is going to the opposite side with Liquid Clustering. By the moment delta-rs implement this, databricks will have made Liquid Clustering the default.

@ion-elgreco
Copy link
Collaborator

Why implement optimize and zorder when databricks is going to the opposite side with Liquid Clustering. By the moment delta-rs implement this, databricks will have made Liquid Clustering the default.

But they are already implemented in delta-rs.

@andreale28
Copy link

andreale28 commented Aug 2, 2023

Why implement optimize and zorder when databricks is going to the opposite side with Liquid Clustering. By the moment delta-rs implement this, databricks will have made Liquid Clustering the default.

Delta-rs team actually implemented these two features before the announcement of delta 3.0 and liquid clustering. To be honest, delta 3.0 and liquid clustering came out kinds of unexpectedly

@rtyler rtyler closed this as completed Oct 25, 2023
@rtyler rtyler unpinned this issue Oct 26, 2023
@sim-san
Copy link

sim-san commented Aug 22, 2024

@rtyler
Do you plan to support Generated Columns (Writer Version 4) in delta-rs ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

9 participants