-
-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate dask locking behavior #314
Comments
Does the time spent acquiring locks scale with the dataset size, or would it become negligible for a problem 10x or 100x larger? |
None of these are hit in this code (except for the one in the threadpool, which is only for running multiple schedulers at once, and should have no effect here). I suspect the locking your seeing is at the numba level, which has locks around compilation due to threadsafe issues in compilation. |
Interesting. Not sure why compilation would be involved here, on a repeat run like this, but maybe it's the code that checks to see if compilation is needed? |
@jbednar based on additional runs, the time spent waiting for locks increases with dataset size. |
Hmm; definitely needs to be investigated then. Thanks! |
Upon further investigation, this is only measuring the time dask spends waiting on results in the scheduler. The queue for managing results in the threaded scheduler uses a lock for blocking reads, which is desired. While the main (scheduler) thread is blocked, the worker threads are free to do work, so this has no effect on the actual performance. If you look at the timings you'll see that the lock time is almost the same as the run time, which matches expectations here. |
Closing this issue, since I'm unable to find any direct evidence of thread contention beyond the results queue lock. Thanks for your help @jcrist. |
In my general performance optimization tasks with datashader, I ran cProfiler on both datashader and vaex to gain insight into potential bottlenecks. The code under profile was similar to the first half of the notebook in Issue #310 except the pandas dataframe was persisted into a dask dataframe before being passed on to datashader, and the profiled functions/methods were run for 5 iterations instead of 3. Data preparation steps were not included in the benchmarked code section:
Datashader:
Vaex:
The results seem to indicate that most of the time is being spent waiting on locks (thread contention overhead) rather than data processing. The dask codebase shows that we're using locks in several places, notably with a SerializableLock class; blocking operations surrounding queue modification operations; and making thread-safe modifications to a thread pool.
The text was updated successfully, but these errors were encountered: