Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: ResourceExhausted for memory limit in GroupedHashAggregateStream #4371

Merged
merged 3 commits into from
Nov 28, 2022

Conversation

crepererum
Copy link
Contributor

@crepererum crepererum commented Nov 25, 2022

Which issue does this PR close?

For #3940.

Update: This does NOT close the issue. I forgot the no-group version (AggregateStream). Will do that in a follow-up PR. It's a rather simple change though.

Rationale for this change

Ensure that users don't run out of memory while performing group-by operations. This is esp. important for servers or multi-tenant systems.

What changes are included in this PR?

This is similar to #4202. It includes an additional type StreamType so we can double-check our test setup (namely: is the stream that we request actually the stream version we want).

Are these changes tested?

Extended test_oom. Also here are the perf results:

cargo bench -p datafusion --bench aggregate_query_sql -- --baseline issue3940d-pre                                                                                                                                                                   [32/5030]
   Compiling datafusion v14.0.0 (/home/mneumann/src/arrow-datafusion/datafusion/core)
   Compiling parquet-test-utils v0.1.0 (/home/mneumann/src/arrow-datafusion/parquet-test-utils)
    Finished bench [optimized] target(s) in 5m 11s
     Running benches/aggregate_query_sql.rs (target/release/deps/aggregate_query_sql-0659981fac849434)
aggregate_query_no_group_by 15 12
                        time:   [686.71 µs 688.17 µs 689.79 µs]
                        change: [+0.9423% +1.5099% +2.1284%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

aggregate_query_no_group_by_min_max_f64
                        time:   [637.75 µs 640.91 µs 644.19 µs]
                        change: [+0.9089% +1.5396% +2.1740%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

aggregate_query_no_group_by_count_distinct_wide
                        time:   [2.5239 ms 2.5437 ms 2.5641 ms]
                        change: [+1.5365% +2.6581% +3.8220%] (p = 0.00 < 0.05)
                        Performance has regressed.

Benchmarking aggregate_query_no_group_by_count_distinct_narrow: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.7s, enable flat sampling, or reduce sample count to 50.
aggregate_query_no_group_by_count_distinct_narrow
                        time:   [1.7286 ms 1.7392 ms 1.7498 ms]
                        change: [+1.2376% +2.3389% +3.5532%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

aggregate_query_group_by
                        time:   [2.2890 ms 2.3063 ms 2.3241 ms]
                        change: [+1.7896% +2.8160% +3.7350%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking aggregate_query_group_by_with_filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
aggregate_query_group_by_with_filter
                        time:   [1.1419 ms 1.1444 ms 1.1472 ms]
                        change: [+1.1563% +1.6416% +2.1195%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

aggregate_query_group_by_u64 15 12
                        time:   [2.3083 ms 2.3237 ms 2.3394 ms]
                        change: [+1.7416% +2.7403% +3.7301%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking aggregate_query_group_by_with_filter_u64 15 12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
aggregate_query_group_by_with_filter_u64 15 12
                        time:   [1.1492 ms 1.1530 ms 1.1572 ms]
                        change: [+0.4978% +1.1567% +1.7562%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

aggregate_query_group_by_u64_multiple_keys
                        time:   [15.070 ms 15.384 ms 15.706 ms]
                        change: [-2.0263% +0.9477% +4.0997%] (p = 0.55 > 0.05)
                        No change in performance detected.

aggregate_query_approx_percentile_cont_on_u64
                        time:   [3.8600 ms 3.8963 ms 3.9341 ms]
                        change: [-0.1229% +1.2647% +2.8308%] (p = 0.09 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

aggregate_query_approx_percentile_cont_on_f32
                        time:   [3.2601 ms 3.2871 ms 3.3147 ms]
                        change: [-0.5989% +0.7136% +2.0221%] (p = 0.27 > 0.05)
                        No change in performance detected.

I think the regressions (<3%) are within the safety margin of such a crucial feature (and also within what a laptop could reliable reproduce).

Are there any user-facing changes?

The V1 group-by op an now emit a ResourceExhausted error if it runs out of memory. Note that the error is kinda nested/wrapped due to #4172.

@github-actions github-actions bot added the core Core DataFusion crate label Nov 25, 2022
@@ -2295,7 +2295,7 @@ impl ScalarValue {
/// Estimate size if bytes including `Self`. For values with internal containers such as `String`
/// includes the allocated size (`capacity`) rather than the current length (`len`)
pub fn size(&self) -> usize {
std::mem::size_of_val(&self)
std::mem::size_of_val(self)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests panicked due to an integer underflow. Apart that the vector/hashmap calculations were wrong, this here was also kinda tricky: the size of &ScalarValue is 8 bytes, not 48 🤦.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me -- thank you @crepererum

I have some minor style suggestions, but nothing that would prevent this PR from being merged.

For other reviewers, I found the changes easier to see with whitespace blind diff: https://github.com/apache/arrow-datafusion/pull/4371/files?w=1

I'll plan to merge this PR tomorrow unless anyone else wants more time to review

cc @milenkovicm @yjshen @richox @tustvold

})
};

let stream = futures::stream::unfold(inner, |mut this| async move {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -257,12 +313,32 @@ fn group_aggregate_batch(
accumulator_set,
indices: vec![row as u32], // 1.3
};
// NOTE: do NOT include the `GroupState` struct size in here because this is captured by
// `group_states` (see allocation down below)
allocated += group_state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figuring out how to encapsulate some of this accounting (so it wasn't inlined into the code) would make it easier to maintain I think. But I don't think that is required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as long as the Allocator API in Rust isn't stable, this will be a bit of a mess. Once it is stable, we could have very elegant memory accounting.

@@ -326,10 +402,20 @@ fn group_aggregate_batch(
)
})
.try_for_each(|(accumulator, values)| match mode {
AggregateMode::Partial => accumulator.update_batch(&values),
AggregateMode::Partial => {
let size_pre = accumulator.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also consider pulling the size accounting to before/after the match to avoid the duplication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

baseline_metrics,
)?))
}
self.execute_typed(partition, context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit be1d376 into apache:master Nov 28, 2022
@alamb
Copy link
Contributor

alamb commented Nov 28, 2022

Thanks again @crepererum

@ursabot
Copy link

ursabot commented Nov 28, 2022

Benchmark runs are scheduled for baseline = fe8aee6 and contender = be1d376. be1d376 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

bors added a commit to rust-lang/rust-clippy that referenced this pull request Dec 24, 2022
Add size_of_ref lint

This addresses #9995, which is likely raising a valid point about `std::mem::size_of_val()`: It's [very easy to use double-references as the argument](apache/datafusion#4371 (comment)), which the function will happily accept and give back the size of _the reference_, not the size of the value _behind_ the reference. In the worst case, if the value matches the programmer's expectation, this seems to work, while in fact, everything will go horribly wrong e.g. on a different platform.

The size of a `&T` is independent of what `T` is, and people might want to use `std::mem::size_of_val()` to actually get the size of _any_ reference (e.g. via `&&()`). I would rather suggest that this is always bad behavior, though ([instead](https://doc.rust-lang.org/reference/type-layout.html#pointers-and-references-layout), [and](https://doc.rust-lang.org/stable/std/primitive.usize.html#associatedconstant.BITS)). I, therefore, put this lint into `correctness`.

Since the problem is usually easily fixed by removing extra `&`, I went light on suggesting code.

---

changelog: New lint: [`size_of_ref`]
[#10098](#10098)
<!-- changelog_checked -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants