Performance comparison to vaex #310

maartenbreddels · 2017-04-13T12:26:47Z

As requested by @jbednar I'd like to share my comparison of datashader to vaex (I'm the author of vaex).

Notebook: https://gist.github.com/maartenbreddels/cd1c8bcc2563813c026a0e5d599f6bf1

This is benchmarked on a Macbook Air (13" Early 2015, with 8GB ram). The performance difference here is largest. Counting values is about 4x slower, calculating means 22x (although I remember the mean being just 14x slower before).

When I benchmarked it on a 12 core machine (24 with hyperthreading, and 256GB ram) I see a 2x performance different (If I remember correctly). This hints at datashader using more CPU cycle, and maybe a compiler/numba issue not using the full memory bus, 256bit vs 128bit for instance.

Please take a careful look if my benchmark is fair. There seems to be a different definition on the limits/ranges that we use btw. In vaex, the limits define the left border of the first bin, and the right border of the last bin, and all intervals are half open (open on the right).

It's actually nice to see we (apart from how we define the bins) agree on the outcomes.

vaex version: '1.0.0-beta.6'
datashader version: '0.4.0'

I hope this is useful.

philippjfr · 2017-04-13T14:56:26Z

Thanks a lot for doing this, great to see that the results generally match. To get the best results using datashader the dataframe should be loaded as a dask dataframe with multiple partitions like this:

import dask.dataframe as dd
ddf = dd.from_pandas(df, npartitions=4)

Using 4 partitions I narrowed the results a fair bit but still gives the edge to vaex, with a 2x speedup on counts and an 8x speedup on means.

maartenbreddels · 2017-04-13T18:25:33Z

Thanks for the quick reply. Ok, I thought that happened by default, I now get 2.6x and 6.3x, but it's an underpowered machine. I think the best comparison would be both singlethreaded. But I think that the 2x speedup would be best explained by using half the bandwidth of the memory bus, at least that's my best guess.
About the bins, can you point me out where in the code I can find where it goes from data values to pixel/bin values, I cannot find it easily.

jbednar · 2017-04-13T18:31:24Z

Thanks!

Using half-open bins is already a to-do item: #259, which someone is going to look at soon. Hopefully that issue shows the info that you would need to change that behavior, which would be fine to do for testing purposes.

maartenbreddels · 2017-04-13T18:41:42Z

For what it's worth, in the code used in HoloViews we decided many years ago that it was simpler and more consistent (both self-consistent and consistent with how arrays work in Python/Numpy) to make all upper boundaries exclusive.

I vote for this, I think it is highly confusing to have the last bin inclusive, and I think that by noticing the last value is not in, people will realise this themselves.

Would be great if you work on this, to see if we both agree, whatever you decide to do.

jbednar · 2017-04-13T19:41:42Z

Right. I agree; everything I've done myself has always been upper-exclusive; and I'm fairly confident that once we search through the whole code for implications that's what we'll decide here as well. But we do need to do due diligence and understand the implications first.

gbrener · 2017-04-21T16:09:09Z

Benchmarking datashader exposed a numba issue affecting performance under multithreaded workloads. Once fixed, there should be a significant performance increase in many cases: numba/numba#2345

maartenbreddels · 2017-04-21T17:18:50Z

Thanks for sharing this here

jbednar · 2017-04-21T18:43:30Z

And thank you for spurring us to fix this issue, by providing a clear benchmark to compare against! I'm very interested to see the updated benchmarks, since Numba can do some optimizations at a higher level than a C/C++ compiler can; should be interesting!

jbednar · 2017-05-09T19:10:30Z

It's actually nice to see we (apart from how we define the bins) agree on the outcomes.

The latest datashader master should now be using upper-exclusive bounds, and so it should now be feasible to make a more detailed comparison of the results.

philippjfr · 2017-05-09T20:03:53Z

Also wanted to provide an update on the performance comparison, using the current dev release of numba and datashader installed using:

conda install -c numba numba
conda install -c datashader/label/dev datashader

I ran your benchmarking notebook and replaced the dataframe with a dask dataframe like this:

import dask.dataframe as dd
ddf = dd.from_pandas(df, npartitions=7).persist()

Based on these results @gbrener and the numba team and were able to achieve huge performance improvements. I now get the following results compared to vaex:

datashader takes 0.7x as much time for count
datashader takes 0.75x as much time for sum
datashader takes 0.9x as much time for mean

Thanks again for filing this issue since it gave us something concrete to compare against.

jbednar · 2017-05-10T00:23:50Z

Excellent! It's always great to know what to shoot for, and this was a very effective way to test Numba. The Numba team found several ways in which the performance was being limited, and was able to achieve a very nice speedup. There's still more we could do at the datashader level, but I'll close this for now since we've achieved parity. Thanks, @maartenbreddels, @gbrener, @philippjfr, and Siu!

maartenbreddels · 2017-05-10T13:49:48Z

Thanks for the update, I'll see if I have time to benchmark it as well. But great to see progress is being made because of this. Can you tell the important changes that caused this, or maybe refer to PR/commits?

philippjfr · 2017-05-10T13:57:53Z

@maartenbreddels I believe the important PRs in numba were:

numba/numba#2346
numba/numba#2349
numba/numba#2352

They all deal with optimizing/disabling reference counting I believe.

This was referenced Apr 19, 2017

General performance optimizations #313

Closed

Investigate dask locking behavior #314

Closed

jbednar closed this as completed May 10, 2017

maartenbreddels mentioned this issue Jan 5, 2018

Tutorial run through issues on Windows 10 vaexio/vaex#59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance comparison to vaex #310

Performance comparison to vaex #310

maartenbreddels commented Apr 13, 2017 •

edited by jbednar

Loading

philippjfr commented Apr 13, 2017 •

edited

Loading

maartenbreddels commented Apr 13, 2017

jbednar commented Apr 13, 2017

maartenbreddels commented Apr 13, 2017

jbednar commented Apr 13, 2017

gbrener commented Apr 21, 2017

maartenbreddels commented Apr 21, 2017

jbednar commented Apr 21, 2017

jbednar commented May 9, 2017

philippjfr commented May 9, 2017 •

edited

Loading

jbednar commented May 10, 2017

maartenbreddels commented May 10, 2017

philippjfr commented May 10, 2017 •

edited

Loading

Performance comparison to vaex #310

Performance comparison to vaex #310

Comments

maartenbreddels commented Apr 13, 2017 • edited by jbednar Loading

philippjfr commented Apr 13, 2017 • edited Loading

maartenbreddels commented Apr 13, 2017

jbednar commented Apr 13, 2017

maartenbreddels commented Apr 13, 2017

jbednar commented Apr 13, 2017

gbrener commented Apr 21, 2017

maartenbreddels commented Apr 21, 2017

jbednar commented Apr 21, 2017

jbednar commented May 9, 2017

philippjfr commented May 9, 2017 • edited Loading

jbednar commented May 10, 2017

maartenbreddels commented May 10, 2017

philippjfr commented May 10, 2017 • edited Loading

maartenbreddels commented Apr 13, 2017 •

edited by jbednar

Loading

philippjfr commented Apr 13, 2017 •

edited

Loading

philippjfr commented May 9, 2017 •

edited

Loading

philippjfr commented May 10, 2017 •

edited

Loading