Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-32335: [C++][Docs] Add design document for Acero #35320

Merged
merged 11 commits into from
May 19, 2023

Conversation

westonpace
Copy link
Member

@westonpace westonpace commented Apr 25, 2023

Rationale for this change

The documentation for Acero was incomplete. This PR refactors the existing documentation and adds several entirely new sections to form a complete design document for Acero.

What changes are included in this PR?

Some existing documentation is cleaned up. Acero documentation is moved into its own folder and broken into several pages.

Are these changes tested?

The documentation is built as part of the CI but I wouldn't say it is fully tested.

Are there any user-facing changes?

There are not code changes (other than the removal of two legacy methods) but there are many user-facing documentation changes.

@westonpace westonpace changed the title GH-32335: [C++] Add design document for Acero GH-32335: [C++][Docs] Add design document for Acero Apr 25, 2023
@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue #32335 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Apr 25, 2023
@westonpace
Copy link
Member Author

@github-actions crossbow submit preview-docs

@github-actions
Copy link

Revision: 04e8cc0

Submitted crossbow builds: ursacomputing/crossbow @ actions-22cf5a633d

Task Status
preview-docs Github Actions

@westonpace
Copy link
Member Author

@github-actions crossbow submit preview-docs

@github-actions
Copy link

Revision: 62c725d

Submitted crossbow builds: ursacomputing/crossbow @ actions-838f0ce05e

Task Status
preview-docs Github Actions

@westonpace
Copy link
Member Author

Copy link
Contributor

@edponce edponce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a few minutes to read over this great documentation and stumbled upon a few typos.

docs/source/cpp/acero/overview.rst Outdated Show resolved Hide resolved
docs/source/cpp/acero/overview.rst Outdated Show resolved Hide resolved
docs/source/cpp/acero/overview.rst Outdated Show resolved Hide resolved
docs/source/cpp/acero/overview.rst Outdated Show resolved Hide resolved
docs/source/cpp/acero/user_guide.rst Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Apr 27, 2023
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great! Very nice and comprehensive docs, this is going to be a super useful resource.
I added a few minor comments (and note for myself: I read up to the Developer's guide)

docs/source/cpp/acero/overview.rst Outdated Show resolved Hide resolved
docs/source/cpp/acero/overview.rst Outdated Show resolved Hide resolved
docs/source/cpp/acero/overview.rst Outdated Show resolved Hide resolved
docs/source/cpp/acero/expression_ast.svg Outdated Show resolved Hide resolved
docs/source/cpp/acero/layers.svg Outdated Show resolved Hide resolved
docs/source/cpp/acero/user_guide.rst Show resolved Hide resolved
docs/source/cpp/acero/user_guide.rst Show resolved Hide resolved
docs/source/cpp/acero/substrait.rst Outdated Show resolved Hide resolved
Comment on lines 81 to 84
* The Substrait spec requires that a ``filter`` be completely satisfied by a read
relation. However, Acero only uses a read filter for pushdown projection and
it may not be fully satisfied. Users should generally attach an additional
filter relation with the same filter expression after the read relation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side question, but wondering while reading this: if that's what the spec of substrait requires, shouldn't the translation of a substrait plan to Acero ExecPlan inject such an additional filter node to ensure we actually fulfill that requirement? (instead of requiring the user to do that explicitly in their Substrait plan)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a fair point. Let's address in a follow-up.

* Acero does not support the SATURATE option for overflow
* Acero does not support kernels that take more than two arguments
for the functions ``and``, ``or``, ``xor``
* Acero does not support temporal arithmetic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if it is clear enough that it is the Acero Substrait->Acero::ExecPlan conversion that doesn't support this, because Acero itself does support temporal arithmetic, as far as I know? (since we have scalar compute functions for temporal types, you should be able to use those in Acero)

Copy link
Member Author

@westonpace westonpace May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a fair point. Let's address in a follow-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore that above comment. Replied to the wrong thing. What I meant to say was...

I think we actually might support temporal arithmetic now. Also, some of the things in this list were definitely wrong. This section has always been a bit volatile and I went ahead and removed it for now (there is already a blanket statement that new functions are added regularly and mappings may not be available instantly).

@westonpace
Copy link
Member Author

@github-actions crossbow submit preview-docs

@github-actions
Copy link

Revision: e54a546

Submitted crossbow builds: ursacomputing/crossbow @ actions-cb2af88888

Task Status
preview-docs Github Actions

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to read the user_guide.rst and part of developer_guide.rst. As a newcomer to arrow compute and acero, I find this documentation easy to understand. Thanks!

docs/source/cpp/acero/developer_guide.rst Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels May 12, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 12, 2023
@westonpace
Copy link
Member Author

@github-actions crossbow submit preview-docs

@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels May 12, 2023
@github-actions
Copy link

Revision: 8707d92

Submitted crossbow builds: ursacomputing/crossbow @ actions-62d908a2ed

Task Status
preview-docs Github Actions

westonpace and others added 11 commits May 19, 2023 06:47
…d a developer's guide. Fix up the API reference docs for Acero
Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 19, 2023
@westonpace
Copy link
Member Author

This has been out for a while and I think there is some other ongoing docs work I don't want to conflict with. I'm going to merge this when CI passes.

@westonpace westonpace merged commit 8a856c9 into apache:main May 19, 2023
@ursabot
Copy link

ursabot commented May 21, 2023

Benchmark runs are scheduled for baseline = 3f736ff and contender = 8a856c9. 8a856c9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.36% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.33% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.51% ⬆️0.12%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 8a856c90 ec2-t3-xlarge-us-east-2
[Finished] 8a856c90 test-mac-arm
[Finished] 8a856c90 ursa-i9-9960x
[Finished] 8a856c90 ursa-thinkcentre-m75q
[Finished] 3f736ff1 ec2-t3-xlarge-us-east-2
[Finished] 3f736ff1 test-mac-arm
[Finished] 3f736ff1 ursa-i9-9960x
[Finished] 3f736ff1 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Add initial Acero design documents
6 participants