-
-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance comparison to vaex #310
Comments
Thanks a lot for doing this, great to see that the results generally match. To get the best results using datashader the dataframe should be loaded as a dask dataframe with multiple partitions like this: import dask.dataframe as dd
ddf = dd.from_pandas(df, npartitions=4) Using 4 partitions I narrowed the results a fair bit but still gives the edge to vaex, with a 2x speedup on counts and an 8x speedup on means. |
Thanks for the quick reply. Ok, I thought that happened by default, I now get 2.6x and 6.3x, but it's an underpowered machine. I think the best comparison would be both singlethreaded. But I think that the 2x speedup would be best explained by using half the bandwidth of the memory bus, at least that's my best guess. |
Thanks! Using half-open bins is already a to-do item: #259, which someone is going to look at soon. Hopefully that issue shows the info that you would need to change that behavior, which would be fine to do for testing purposes. |
I vote for this, I think it is highly confusing to have the last bin inclusive, and I think that by noticing the last value is not in, people will realise this themselves. Would be great if you work on this, to see if we both agree, whatever you decide to do. |
Right. I agree; everything I've done myself has always been upper-exclusive; and I'm fairly confident that once we search through the whole code for implications that's what we'll decide here as well. But we do need to do due diligence and understand the implications first. |
Benchmarking datashader exposed a numba issue affecting performance under multithreaded workloads. Once fixed, there should be a significant performance increase in many cases: numba/numba#2345 |
Thanks for sharing this here |
And thank you for spurring us to fix this issue, by providing a clear benchmark to compare against! I'm very interested to see the updated benchmarks, since Numba can do some optimizations at a higher level than a C/C++ compiler can; should be interesting! |
The latest datashader master should now be using upper-exclusive bounds, and so it should now be feasible to make a more detailed comparison of the results. |
Also wanted to provide an update on the performance comparison, using the current dev release of numba and datashader installed using:
I ran your benchmarking notebook and replaced the dataframe with a dask dataframe like this:
Based on these results @gbrener and the numba team and were able to achieve huge performance improvements. I now get the following results compared to vaex:
Thanks again for filing this issue since it gave us something concrete to compare against. |
Excellent! It's always great to know what to shoot for, and this was a very effective way to test Numba. The Numba team found several ways in which the performance was being limited, and was able to achieve a very nice speedup. There's still more we could do at the datashader level, but I'll close this for now since we've achieved parity. Thanks, @maartenbreddels, @gbrener, @philippjfr, and Siu! |
Thanks for the update, I'll see if I have time to benchmark it as well. But great to see progress is being made because of this. Can you tell the important changes that caused this, or maybe refer to PR/commits? |
@maartenbreddels I believe the important PRs in numba were: numba/numba#2346 They all deal with optimizing/disabling reference counting I believe. |
As requested by @jbednar I'd like to share my comparison of datashader to vaex (I'm the author of vaex).
Notebook: https://gist.github.com/maartenbreddels/cd1c8bcc2563813c026a0e5d599f6bf1
This is benchmarked on a Macbook Air (13" Early 2015, with 8GB ram). The performance difference here is largest. Counting values is about 4x slower, calculating means 22x (although I remember the mean being just 14x slower before).
When I benchmarked it on a 12 core machine (24 with hyperthreading, and 256GB ram) I see a 2x performance different (If I remember correctly). This hints at datashader using more CPU cycle, and maybe a compiler/numba issue not using the full memory bus, 256bit vs 128bit for instance.
Please take a careful look if my benchmark is fair. There seems to be a different definition on the limits/ranges that we use btw. In vaex, the limits define the left border of the first bin, and the right border of the last bin, and all intervals are half open (open on the right).
It's actually nice to see we (apart from how we define the bins) agree on the outcomes.
vaex version: '1.0.0-beta.6'
datashader version: '0.4.0'
I hope this is useful.
The text was updated successfully, but these errors were encountered: