Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance comparison to vaex #310

Closed
maartenbreddels opened this issue Apr 13, 2017 · 13 comments
Closed

Performance comparison to vaex #310

maartenbreddels opened this issue Apr 13, 2017 · 13 comments

Comments

@maartenbreddels
Copy link

maartenbreddels commented Apr 13, 2017

As requested by @jbednar I'd like to share my comparison of datashader to vaex (I'm the author of vaex).

Notebook: https://gist.github.com/maartenbreddels/cd1c8bcc2563813c026a0e5d599f6bf1

This is benchmarked on a Macbook Air (13" Early 2015, with 8GB ram). The performance difference here is largest. Counting values is about 4x slower, calculating means 22x (although I remember the mean being just 14x slower before).

When I benchmarked it on a 12 core machine (24 with hyperthreading, and 256GB ram) I see a 2x performance different (If I remember correctly). This hints at datashader using more CPU cycle, and maybe a compiler/numba issue not using the full memory bus, 256bit vs 128bit for instance.

Please take a careful look if my benchmark is fair. There seems to be a different definition on the limits/ranges that we use btw. In vaex, the limits define the left border of the first bin, and the right border of the last bin, and all intervals are half open (open on the right).

It's actually nice to see we (apart from how we define the bins) agree on the outcomes.

vaex version: '1.0.0-beta.6'
datashader version: '0.4.0'

I hope this is useful.

@philippjfr
Copy link
Member

philippjfr commented Apr 13, 2017

Thanks a lot for doing this, great to see that the results generally match. To get the best results using datashader the dataframe should be loaded as a dask dataframe with multiple partitions like this:

import dask.dataframe as dd
ddf = dd.from_pandas(df, npartitions=4)

Using 4 partitions I narrowed the results a fair bit but still gives the edge to vaex, with a 2x speedup on counts and an 8x speedup on means.

@maartenbreddels
Copy link
Author

Thanks for the quick reply. Ok, I thought that happened by default, I now get 2.6x and 6.3x, but it's an underpowered machine. I think the best comparison would be both singlethreaded. But I think that the 2x speedup would be best explained by using half the bandwidth of the memory bus, at least that's my best guess.
About the bins, can you point me out where in the code I can find where it goes from data values to pixel/bin values, I cannot find it easily.

@jbednar
Copy link
Member

jbednar commented Apr 13, 2017

Thanks!

Using half-open bins is already a to-do item: #259, which someone is going to look at soon. Hopefully that issue shows the info that you would need to change that behavior, which would be fine to do for testing purposes.

@maartenbreddels
Copy link
Author

For what it's worth, in the code used in HoloViews we decided many years ago that it was simpler and more consistent (both self-consistent and consistent with how arrays work in Python/Numpy) to make all upper boundaries exclusive.

I vote for this, I think it is highly confusing to have the last bin inclusive, and I think that by noticing the last value is not in, people will realise this themselves.

Would be great if you work on this, to see if we both agree, whatever you decide to do.

@jbednar
Copy link
Member

jbednar commented Apr 13, 2017

Right. I agree; everything I've done myself has always been upper-exclusive; and I'm fairly confident that once we search through the whole code for implications that's what we'll decide here as well. But we do need to do due diligence and understand the implications first.

@gbrener
Copy link
Contributor

gbrener commented Apr 21, 2017

Benchmarking datashader exposed a numba issue affecting performance under multithreaded workloads. Once fixed, there should be a significant performance increase in many cases: numba/numba#2345

@maartenbreddels
Copy link
Author

Thanks for sharing this here

@jbednar
Copy link
Member

jbednar commented Apr 21, 2017

And thank you for spurring us to fix this issue, by providing a clear benchmark to compare against! I'm very interested to see the updated benchmarks, since Numba can do some optimizations at a higher level than a C/C++ compiler can; should be interesting!

@jbednar
Copy link
Member

jbednar commented May 9, 2017

It's actually nice to see we (apart from how we define the bins) agree on the outcomes.

The latest datashader master should now be using upper-exclusive bounds, and so it should now be feasible to make a more detailed comparison of the results.

@philippjfr
Copy link
Member

philippjfr commented May 9, 2017

Also wanted to provide an update on the performance comparison, using the current dev release of numba and datashader installed using:

conda install -c numba numba
conda install -c datashader/label/dev datashader

I ran your benchmarking notebook and replaced the dataframe with a dask dataframe like this:

import dask.dataframe as dd
ddf = dd.from_pandas(df, npartitions=7).persist()

Based on these results @gbrener and the numba team and were able to achieve huge performance improvements. I now get the following results compared to vaex:

datashader takes 0.7x as much time for count
datashader takes 0.75x as much time for sum
datashader takes 0.9x as much time for mean

Thanks again for filing this issue since it gave us something concrete to compare against.

@jbednar
Copy link
Member

jbednar commented May 10, 2017

Excellent! It's always great to know what to shoot for, and this was a very effective way to test Numba. The Numba team found several ways in which the performance was being limited, and was able to achieve a very nice speedup. There's still more we could do at the datashader level, but I'll close this for now since we've achieved parity. Thanks, @maartenbreddels, @gbrener, @philippjfr, and Siu!

@jbednar jbednar closed this as completed May 10, 2017
@maartenbreddels
Copy link
Author

Thanks for the update, I'll see if I have time to benchmark it as well. But great to see progress is being made because of this. Can you tell the important changes that caused this, or maybe refer to PR/commits?

@philippjfr
Copy link
Member

philippjfr commented May 10, 2017

@maartenbreddels I believe the important PRs in numba were:

numba/numba#2346
numba/numba#2349
numba/numba#2352

They all deal with optimizing/disabling reference counting I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants