perf(swordfish): Parallel expression evaluation #3593

colin-ho · 2024-12-17T23:51:44Z

Addresses: #3389. More generally, this PR optimizes for projections with many expressions, particularly memory intensive expressions like UDFs.

Problem:

Currently, swordfish parallelizes projections across morsels, with 1 CPU per morsel. However, if each projection has many memory intensive expressions, we could experience a massive inflation in memory because we will have many materialized morsels living in memory at once.

Proposed solution:

Instead, we can parallelize the expressions within the projection (but only for expressions that require compute). This way, we still have good CPU utilization, but we keep a lower number of materialized morsels in memory.

In the linked issue above, we see that a 128cpu machine will parallelize morsels across the cores, each doing multiple udfs, resulting in "317GB allocations and duration 351 secs".

This PR reduces that to 7.8GB peak memory and runtime of 66 seconds.

Notes:

Found a bug with the loole channels where an async send to a sync receive was not respecting capacity constraints, and was allowing sends even though the receive did not happen. Moved over to https://github.com/fereidani/kanal, which worked much better.

Todos for next time:

We should also be able to parallelize expression evaluation within a single expression, since it is a tree. We can calculate max width of the tree and set that as max parallel tasks.

codspeed-hq · 2024-12-18T00:04:52Z

CodSpeed Performance Report

Merging #3593 will degrade performances by 15.97%

_{Comparing colin/par-eval-expr (a8a0fb1) with main (ca4d3f7)}

Summary

❌ 1 regressions
✅ 26 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`colin/par-eval-expr`	Change
❌	`test_show[100 Small Files]`	15.5 ms	18.4 ms	-15.97%

codecov · 2024-12-18T05:07:18Z

Codecov Report

Attention: Patch coverage is 91.93548% with 15 lines in your changes missing coverage. Please review.

Project coverage is 77.83%. Comparing base (ca4d3f7) to head (a8a0fb1).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/daft-dsl/src/expr/mod.rs	64.00%	9 Missing ⚠️
...ft-local-execution/src/intermediate_ops/project.rs	92.00%	4 Missing ⚠️
src/daft-local-execution/src/run.rs	91.30%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3593      +/-   ##
==========================================
+ Coverage   77.80%   77.83%   +0.02%     
==========================================
  Files         718      718              
  Lines       88176    88295     +119     
==========================================
+ Hits        68607    68726     +119     
  Misses      19569    19569

Files with missing lines	Coverage Δ
src/daft-connect/src/op/execute/root.rs	`96.00% <100.00%> (ø)`
src/daft-connect/src/op/execute/write.rs	`78.31% <100.00%> (+0.26%)`	⬆️
src/daft-local-execution/src/channel.rs	`98.14% <100.00%> (-0.10%)`	⬇️
src/daft-local-execution/src/dispatcher.rs	`93.52% <100.00%> (ø)`
...-execution/src/intermediate_ops/intermediate_op.rs	`82.39% <100.00%> (+0.70%)`	⬆️
src/daft-local-execution/src/pipeline.rs	`88.86% <100.00%> (+0.08%)`	⬆️
src/daft-local-execution/src/runtime_stats.rs	`51.68% <100.00%> (-1.58%)`	⬇️
...rc/daft-local-execution/src/sinks/blocking_sink.rs	`82.72% <100.00%> (ø)`
...c/daft-local-execution/src/sinks/streaming_sink.rs	`80.62% <100.00%> (ø)`
src/daft-local-execution/src/sources/source.rs	`54.28% <100.00%> (ø)`
... and 5 more

... and 2 files with indirect coverage changes

colin-ho · 2024-12-19T00:37:18Z

src/daft-connect/src/op/execute/root.rs

@@ -46,7 +46,7 @@ impl Session {
                let cfg = Arc::new(DaftExecutionConfig::default());
                let native_executor = NativeExecutor::from_logical_plan_builder(&optimized_plan)?;

-                let mut result_stream = native_executor.run(&pset, cfg, None)?.into_stream();
+                let mut result_stream = pin!(native_executor.run(&pset, cfg, None)?.into_stream());


The kanal receiver doesn't provide a way to get an owned stream, so I manually made one using unfold, with a recv.await inside. Because of this await point I need to pin the stream.

colin-ho · 2024-12-19T00:37:52Z

src/daft-local-execution/src/channel.rs

@@ -36,7 +32,7 @@ pub(crate) fn create_ordering_aware_receiver_channel<T: Clone>(
 ) -> (Vec<Sender<T>>, OrderingAwareReceiver<T>) {
    match ordered {
        true => {
-            let (senders, receiver) = (0..buffer_size).map(|_| create_channel::<T>(1)).unzip();
+            let (senders, receiver) = (0..buffer_size).map(|_| create_channel::<T>(0)).unzip();


Setting channel sizes to 0 can elide heap allocation.

par eval expr

19672d1

github-actions bot added the perf label Dec 17, 2024

remove unused dep

2aa778d

Colin Ho and others added 2 commits December 17, 2024 17:43

Merge branch 'main' into colin/par-eval-expr

8ae863b

fix connect

ba31a88

Colin Ho added 2 commits December 18, 2024 16:23

cleanup

6e6a6ed

Merge branch main into colin/par-eval-expr

a8a0fb1

colin-ho commented Dec 19, 2024

View reviewed changes

colin-ho marked this pull request as ready for review December 19, 2024 00:38

colin-ho requested a review from samster25 December 19, 2024 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(swordfish): Parallel expression evaluation #3593

perf(swordfish): Parallel expression evaluation #3593

colin-ho commented Dec 17, 2024 •

edited

Loading

codspeed-hq bot commented Dec 18, 2024 •

edited

Loading

codecov bot commented Dec 18, 2024 •

edited

Loading

colin-ho Dec 19, 2024

colin-ho Dec 19, 2024

perf(swordfish): Parallel expression evaluation #3593

Are you sure you want to change the base?

perf(swordfish): Parallel expression evaluation #3593

Conversation

colin-ho commented Dec 17, 2024 • edited Loading

codspeed-hq bot commented Dec 18, 2024 • edited Loading

CodSpeed Performance Report

Merging #3593 will degrade performances by 15.97%

Summary

Benchmarks breakdown

codecov bot commented Dec 18, 2024 • edited Loading

Codecov Report

colin-ho Dec 19, 2024

Choose a reason for hiding this comment

colin-ho Dec 19, 2024

Choose a reason for hiding this comment

colin-ho commented Dec 17, 2024 •

edited

Loading

codspeed-hq bot commented Dec 18, 2024 •

edited

Loading

codecov bot commented Dec 18, 2024 •

edited

Loading