Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Sorting / Merge performance #2427

Closed
2 of 6 tasks
Tracked by #1861
alamb opened this issue May 3, 2022 · 6 comments
Closed
2 of 6 tasks
Tracked by #1861

Improve Sorting / Merge performance #2427

alamb opened this issue May 3, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented May 3, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I plan to make sorting / merging faster. My reasons;

  1. I find it personally interesting
  2. It is a key piece of technology to bring DataFusion's performance to be on par with things like DuckDB
  3. It is important for my project IOx in the medium term

Describe the solution you'd like
Basically the plan is to follow the advice given by Goetz Graefe in Implementing sorting in database systems
and successfully implemented in systems like DuckDB (see blog post)`

It will likely involve some combination of a specialization of the row format and JIT comparisons

Here is my rough plan and a sketch of the kinds of things I want to work on

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@alamb
Copy link
Contributor Author

alamb commented May 29, 2022

In case anyone is interested, here are some flame graphs I gathered for the merge benchmarks:
flamegraphs.zip, and an example is below.

They were all gathered from alamb/merge_bench which is based on master at 7b7edf9

alamb@MacBook-Pro-6 arrow-datafusion % git merge-base alamb/alamb/merge_bench apache/master
7b7edf9c43383c1d3310286b69d2d037db72c967

flamegraph-merge-i64

Conclusion, unshockingly, is that SortKeyCursor takes most of the time. Conveniently, I have ideas of how to improve this

@alamb
Copy link
Contributor Author

alamb commented Jun 24, 2022

Here is a trace from IOx showing a lot of time being spent sorting batches...

22.02 s   22.7%	0 s	 	                                                           iox_query::provider::deduplicate::deduplicate::_$u7b$$u7b$closure$u7d$$u7d$::hf3d0d2543c390834
19.25 s   19.9%	0 s	 	                                                            _$LT$futures_util..stream..stream..next..Next$LT$St$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h6e92074c50cc7f69
19.25 s   19.9%	0 s	 	                                                             futures_util::stream::stream::StreamExt::poll_next_unpin::h0a7d4570974377a3
19.25 s   19.9%	0 s	 	                                                              _$LT$core..pin..Pin$LT$P$GT$$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::hee2bdbc79b3020cf
19.25 s   19.9%	0 s	 	                                                               _$LT$datafusion..physical_plan..union..ObservedStream$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::hf4bd36dcc3490171
19.25 s   19.9%	0 s	 	                                                                futures_util::stream::stream::StreamExt::poll_next_unpin::h0a7d4570974377a3
19.25 s   19.9%	0 s	 	                                                                 _$LT$core..pin..Pin$LT$P$GT$$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::hee2bdbc79b3020cf
19.25 s   19.9%	0 s	 	                                                                  _$LT$datafusion..physical_plan..stream..RecordBatchStreamAdapter$LT$S$GT$$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::h2cb791419b210a88
19.25 s   19.9%	0 s	 	                                                                   _$LT$futures_util..stream..try_stream..try_flatten..TryFlatten$LT$St$GT$$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::hecbc974d417753bb
19.25 s   19.9%	0 s	 	                                                                    _$LT$S$u20$as$u20$futures_core..stream..TryStream$GT$::try_poll_next::ha20467981653b215
19.25 s   19.9%	0 s	 	                                                                     _$LT$futures_util..stream..once..Once$LT$Fut$GT$$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::h95fb74a12fffeea3
19.25 s   19.9%	0 s	 	                                                                      _$LT$futures_util..future..try_future..MapErr$LT$Fut$C$F$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h3230513ec689705a
19.25 s   19.9%	0 s	 	                                                                       _$LT$futures_util..future..future..Map$LT$Fut$C$F$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h464940ca2f951951
19.25 s   19.9%	0 s	 	                                                                        _$LT$futures_util..future..future..map..Map$LT$Fut$C$F$GT$$u20$as$u20$core..future..future..Future$GT$::poll::heb35220cf8b84857
19.25 s   19.9%	0 s	 	                                                                         _$LT$futures_util..future..try_future..into_future..IntoFuture$LT$Fut$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h55a6d3f5af84763f
19.25 s   19.9%	0 s	 	                                                                          _$LT$F$u20$as$u20$futures_core..future..TryFuture$GT$::try_poll::hf219144ea170e2c8
19.25 s   19.9%	0 s	 	                                                                           _$LT$core..future..from_generator..GenFuture$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h79164ac5d2444444
19.25 s   19.9%	0 s	 	                                                                            datafusion::physical_plan::sorts::sort::do_sort::_$u7b$$u7b$closure$u7d$$u7d$::h40cf3c40db84fa17
19.24 s   19.9%	0 s	 	                                                                             _$LT$core..future..from_generator..GenFuture$LT$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::hf5dad25a12c01571
19.24 s   19.9%	0 s	 	                                                                              datafusion::physical_plan::sorts::sort::ExternalSorter::insert_batch::_$u7b$$u7b$closure$u7d$$u7d$::h09bf6567368d0832
19.24 s   19.9%	0 s	 	                                                                               datafusion::physical_plan::sorts::sort::sort_batch::h7b6b5407071c4f46
18.64 s   19.2%	0 s	 	                                                                                arrow::compute::kernels::sort::lexsort_to_indices::h4e2ca9dbf2774da6
18.61 s   19.2%	0 s	 	                                                                                 arrow::compute::kernels::sort::sort_unstable_by::h492ca4aa0dad4bef
18.61 s   19.2%	0 s	 	                                                                                  core::slice::_$LT$impl$u20$$u5b$T$u5d$$GT$::sort_unstable_by::he9ab2244924b2cf3
18.61 s   19.2%	0 s	 	                                                                                   core::slice::sort::quicksort::h2837f00ef0d13950

@alamb alamb changed the title Improve Sorting / Merge performance by Apply the row format, and taking advantage of new JIT framework Improve Sorting / Merge performance Aug 11, 2022
@alamb
Copy link
Contributor Author

alamb commented Aug 11, 2022

I believe @tustvold is working on some aspects of this (specifically creating faster comparators based on JIT). Is that tracked somewhere @tustvold ?

@tustvold
Copy link
Contributor

I'm currently prototyping some stuff, hope to push a draft in the coming days for feedback.

@alamb
Copy link
Contributor Author

alamb commented Apr 6, 2023

I believe @tustvold is actively working on this (e.g #5894, #5895, and #5886)

@Dandandan
Copy link
Contributor

I think we can close this ticket #5894, #5895, and #5886 and others are merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants