Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory limited nested-loop join #5564

Merged
merged 3 commits into from
Mar 14, 2023
Merged

Memory limited nested-loop join #5564

merged 3 commits into from
Mar 14, 2023

Conversation

korowa
Copy link
Contributor

@korowa korowa commented Mar 12, 2023

Which issue does this PR close?

Part of #5220.

Rationale for this change

Control over memory allocations in NestedLoopJoinExec

What changes are included in this PR?

First commit modifies join operator:

  • MemoryReservations added to NestedLoopJoinExec (for build-side data) and to NestedLoopJoinStream (for visited_left_side)
  • NestedLoopJoinExec now uses BuildProbeJoinMetrics and is able to expose metrics via explain analyze

Second commit is an attempt to avoid nested mutexes (discussion) -- it's solved via structs for SharedMemoryReservation & SharedOptionalMemoryReservation (instead of OperatorMemoryReservation), and TryGrow trait, used in build-side data collection in HashJoinExec (both types of reservations potentially could be passed into collect_left closure).

Are these changes tested?

There are test cases for

  • NL-join overallocation for different join types
  • SQL-level test for overallocation while joining by expression
  • Unit tests for new MemoryReservation wrappers

Are there any user-facing changes?

Nested-loop joins should respect memory pool limits

@github-actions github-actions bot added the core Core DataFusion crate label Mar 12, 2023
@korowa korowa changed the title Memory limite nested-loop join Memory limited nested-loop join Mar 12, 2023
@alamb
Copy link
Contributor

alamb commented Mar 13, 2023

Thank you @korowa -- I plan to review this tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @korowa -- I went through this PR commit by commit and I think it looks great. Breaking it into two commits made it much easier to review.

cc @liukun4515 who contributed this operator originally in #4562

/// Operator-level memory reservation for left data
reservation: OperatorMemoryReservation,
/// Execution metrics
metrics: ExecutionPlanMetricsSet,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for unifying the metrics at the same time

);
let filter = prepare_join_filter();

let join_types = vec![
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb alamb merged commit 612eb1d into apache:main Mar 14, 2023
@ursabot
Copy link

ursabot commented Mar 14, 2023

Benchmark runs are scheduled for baseline = 26eb406 and contender = 612eb1d. 612eb1d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants