Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate object_store into arrow-rs repository #2030

Closed
14 tasks done
alamb opened this issue Jul 8, 2022 · 8 comments
Closed
14 tasks done

Incorporate object_store into arrow-rs repository #2030

alamb opened this issue Jul 8, 2022 · 8 comments
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Jul 8, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
As described in influxdata/object_store_rs#41 and on https://lists.apache.org/thread/l2103pl85xkyq10c96z73d5t68f6tthd there appears to be consensus for donating the object_store implementation used in datafusion (and eventually maybe in the parquet reader) to Apache

This ticket tracks the actual work required to do so the current status is transparent and can be followed by those who are interested

Here is the copy/pasted Rationale from influxdata/object_store_rs#41 for clarity

Rationale

  1. A common, high quality object store abstraction for communicating with various remote object stores is useful for a range of projects and usecases.
  2. A library with a common API to access remote object stores is directly aligned with the Arrow mission of providing building blocks for modern high performance analytics systems
  3. The clear governance of Apache Arrow offers the best chance to build a unified and strong community around this crate, hopefully both increasing its adoption and attracting community contributions for its long term evolution and maintenance

Background

Object stores are increasing important for analytic systems as more data is located in such systems; @yjshen donated an object store abstraction to Arrow Datafusion to allow Datafusion to read from local files, S3, hdfs, and others. In apache/datafusion#2489 the DataFusion community is proposing migrating from this original object store abstraction, part of the DataFusion project (part of apache arrow) to the code in this crate.

Provenance

The code in this crate was originally developed by InfluxData, largely by @carols10cents, for InfluxDB IOx. @tustvold has since extracted the code and released it as its own crate. Upon consideration, as described above, for the long term health of both this code and the arrow-rs and arrow-datafusion projects, moving it to be an official part of Arrow would be beneficial and we would like to donate it to the community

There is additional background here apache/datafusion#2677 (comment)

Plan

@alamb
Copy link
Contributor Author

alamb commented Jul 22, 2022

@tustvold, can you please add the same owners of the arrow crate https://crates.io/crates/arrow to be owners of the https://crates.io/crates/object_store on crates.io as well?

@alamb
Copy link
Contributor Author

alamb commented Jul 22, 2022

I plan to complete the other tasks on this ticket this weekend or early next week

@alamb
Copy link
Contributor Author

alamb commented Jul 23, 2022

@alamb
Copy link
Contributor Author

alamb commented Jul 23, 2022

I have ported the integration tests in #2148

@alamb
Copy link
Contributor Author

alamb commented Jul 26, 2022

Ported all tickets and added the object_store label to them: https://github.com/apache/arrow-rs/issues?q=is%3Aissue+is%3Aopen+label%3Aobject-store

@alamb
Copy link
Contributor Author

alamb commented Aug 13, 2022

All that is left here is a blog post -- @tustvold and I have written one about this donation for the InfluxData blog which I will propose to repost on arrow.apache.org/blog once it is published

@alamb
Copy link
Contributor Author

alamb commented Sep 8, 2022

We wrote a blog post here: https://www.influxdata.com/blog/rust-object-store-donation/

I have been quite conflicted about repeating the content on the arrow blog -- what I think may be better is a brief blog about new improvements to object_store in later releases (like reduced dependencies) and we can link to the original blog post

So with that I am claiming this task is done. 😅

@alamb alamb closed this as completed Sep 8, 2022
@alamb
Copy link
Contributor Author

alamb commented Sep 8, 2022

We are in the process of creating the second object_sore release under ASF process: #2620

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

1 participant