-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganize the project folders #2081
Conversation
data-access/README.md
Outdated
This module contains an `async` API for the [DataFusion][df] to access data, either remotely or locally. | ||
|
||
[df]: https://crates.io/crates/datafusion | ||
This module contains an `async` API for accessing data based on object store interfaces, either remotely or locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i understand the intent of this correctly, i think the data access layer could expose other data access methods than just object store (although they may not exist yet). For example a streaming provider, or database provider. If this is correct i think we would just want to make the description more generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Although currently we only support object store, we can make description not based on the object store.
I think its very clean / organized and i do prefer this proposed structure. But im curious why an intermediate rust directory is needed? do you have an idea or plans for code that would fall outside of that? |
Do we need to align with this #1750? |
Cargo.toml
Outdated
"datafusion/rust/jit", | ||
"datafusion/rust/physical-expr", | ||
"datafusion/rust/cli", | ||
"datafusion/rust/proto", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rust directory here seems a bit redundant to me, do you expect us to have other language implementations here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
data-access/Cargo.toml
Outdated
@@ -16,8 +16,8 @@ | |||
# under the License. | |||
|
|||
[package] | |||
name = "datafusion-storage" | |||
description = "Storage for DataFusion query engine" | |||
name = "data-access" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data-access
feel a bit too generic to me. To be more specific, how about we name this crate and the folder object-store
? the object-store crate hasn't been claimed in crates.io yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i had the idea that within this other data access providers / crates could exist - for example maybe stream provider or database provider and thats why the name was generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @matthewmturner mentioned, the object store may be too specific and maybe will limit the extension of future data access abilities.
data-access/Cargo.toml
Outdated
name = "datafusion-storage" | ||
description = "Storage for DataFusion query engine" | ||
name = "data-access" | ||
description = "General data access layer based on object store" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 This is certainly a more accurate description :D
Thanks @matthewmturner, @houqp , I'm good with removing the rust folder. Previously I was thinking maybe there's something datafusion specified, like Ballista did. |
I think we need to be careful if we make a crate with a name like |
Agree. Will change "data-access" to be "datafusion-data-access". |
Rebase and add refinements according to the PR review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the next datafusion release is going to be fun since we will have a lot of versions to update :D
Hi @alamb, any chance to merge this PR this week😂 |
@yahoNanJing sounds good to me! I haven't been following it closely but given @yjshen and @houqp have given it ✅ I'll plan to merge it in when the tests pass Thanks for sticking with it |
Interestingly the failure in ballista also fails for me locally: https://github.com/apache/arrow-datafusion/runs/5726864404?check_suite_focus=true Maybe something about moving the code into different directories has caused the test to start running fixed in 463de7f |
Thanks @alamb for fixing the issue. |
Thanks again @yahoNanJing and everyone else helped! |
Which issue does this PR close?
Closes #2080.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?