Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize the project folders #2081

Merged
merged 3 commits into from
Mar 29, 2022
Merged

Reorganize the project folders #2081

merged 3 commits into from
Mar 29, 2022

Conversation

yahoNanJing
Copy link
Contributor

@yahoNanJing yahoNanJing commented Mar 25, 2022

Which issue does this PR close?

Closes #2080.

Rationale for this change

What changes are included in this PR?

Screen Shot 2022-03-28 at 9 03 11 AM

Are there any user-facing changes?

This module contains an `async` API for the [DataFusion][df] to access data, either remotely or locally.

[df]: https://crates.io/crates/datafusion
This module contains an `async` API for accessing data based on object store interfaces, either remotely or locally.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i understand the intent of this correctly, i think the data access layer could expose other data access methods than just object store (although they may not exist yet). For example a streaming provider, or database provider. If this is correct i think we would just want to make the description more generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Although currently we only support object store, we can make description not based on the object store.

@matthewmturner
Copy link
Contributor

I think its very clean / organized and i do prefer this proposed structure. But im curious why an intermediate rust directory is needed? do you have an idea or plans for code that would fall outside of that?

@liukun4515
Copy link
Contributor

Do we need to align with this #1750?

Cargo.toml Outdated
"datafusion/rust/jit",
"datafusion/rust/physical-expr",
"datafusion/rust/cli",
"datafusion/rust/proto",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rust directory here seems a bit redundant to me, do you expect us to have other language implementations here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@@ -16,8 +16,8 @@
# under the License.

[package]
name = "datafusion-storage"
description = "Storage for DataFusion query engine"
name = "data-access"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data-access feel a bit too generic to me. To be more specific, how about we name this crate and the folder object-store? the object-store crate hasn't been claimed in crates.io yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had the idea that within this other data access providers / crates could exist - for example maybe stream provider or database provider and thats why the name was generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @matthewmturner mentioned, the object store may be too specific and maybe will limit the extension of future data access abilities.

name = "datafusion-storage"
description = "Storage for DataFusion query engine"
name = "data-access"
description = "General data access layer based on object store"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This is certainly a more accurate description :D

@yahoNanJing
Copy link
Contributor Author

I think its very clean / organized and i do prefer this proposed structure. But im curious why an intermediate rust directory is needed? do you have an idea or plans for code that would fall outside of that?

Thanks @matthewmturner, @houqp , I'm good with removing the rust folder. Previously I was thinking maybe there's something datafusion specified, like Ballista did.

@yahoNanJing
Copy link
Contributor Author

Do we need to align with this #1750?

Yes, it's aligned with #1750.

@alamb
Copy link
Contributor

alamb commented Mar 25, 2022

I think we need to be careful if we make a crate with a name like data-access as to publish a new version of datafusion to crates.io we will also need to publish a version of data-access. Perhaps we can call it datafusion-data-access if we decide to go down this route

@yahoNanJing
Copy link
Contributor Author

I think we need to be careful if we make a crate with a name like data-access as to publish a new version of datafusion to crates.io we will also need to publish a version of data-access. Perhaps we can call it datafusion-data-access if we decide to go down this route

Agree. Will change "data-access" to be "datafusion-data-access".

@yahoNanJing
Copy link
Contributor Author

yahoNanJing commented Mar 26, 2022

Rebase and add refinements according to the PR review comments.

@houqp houqp added the api change Changes the API exposed to users of the crate label Mar 28, 2022
Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the next datafusion release is going to be fun since we will have a lot of versions to update :D

@yahoNanJing
Copy link
Contributor Author

Hi @alamb, any chance to merge this PR this week😂

@alamb
Copy link
Contributor

alamb commented Mar 28, 2022

@yahoNanJing sounds good to me! I haven't been following it closely but given @yjshen and @houqp have given it ✅ I'll plan to merge it in when the tests pass

Thanks for sticking with it

@alamb
Copy link
Contributor

alamb commented Mar 28, 2022

Interestingly the failure in ballista also fails for me locally: https://github.com/apache/arrow-datafusion/runs/5726864404?check_suite_focus=true

Maybe something about moving the code into different directories has caused the test to start running

fixed in 463de7f

@yahoNanJing
Copy link
Contributor Author

Thanks @alamb for fixing the issue.

@yjshen yjshen merged commit 41b4e49 into apache:master Mar 29, 2022
@yjshen
Copy link
Member

yjshen commented Mar 29, 2022

Thanks again @yahoNanJing and everyone else helped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reorganize the project folders
7 participants