Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate dask locking behavior #314

Closed
gbrener opened this issue Apr 19, 2017 · 7 comments
Closed

Investigate dask locking behavior #314

gbrener opened this issue Apr 19, 2017 · 7 comments

Comments

@gbrener
Copy link
Contributor

gbrener commented Apr 19, 2017

In my general performance optimization tasks with datashader, I ran cProfiler on both datashader and vaex to gain insight into potential bottlenecks. The code under profile was similar to the first half of the notebook in Issue #310 except the pandas dataframe was persisted into a dask dataframe before being passed on to datashader, and the profiled functions/methods were run for 5 iterations instead of 3. Data preparation steps were not included in the benchmarked code section:

Datashader:

         75463 function calls (72629 primitive calls) in 3.057 seconds

   Ordered by: internal time
   List reduced from 1411 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      540    2.978    0.006    2.978    0.006 {method 'acquire' of '_thread.lock' objects}
        2    0.007    0.004    0.007    0.004 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/llvmlite/binding/executionengine.py:100(finalize_object)
        2    0.006    0.003    0.006    0.003 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/llvmlite/binding/passmanagers.py:94(run)
      940    0.002    0.000    0.003    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/dask/core.py:159(get_dependencies)
2385/1459    0.002    0.000    0.005    0.000 {method 'format' of 'str' objects}
     6652    0.001    0.000    0.005    0.000 {built-in method builtins.isinstance}
   355/60    0.001    0.000    0.004    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/abc.py:194(__subclasscheck__)
       11    0.001    0.000    0.001    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/llvmlite/binding/module.py:11(parse_assembly)
       37    0.001    0.000    0.001    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/llvmlite/binding/passmanagers.py:123(run)
       20    0.001    0.000    2.991    0.150 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/dask/async.py:386(get_async)

Vaex:

         17574 function calls (17319 primitive calls) in 0.427 seconds

   Ordered by: internal time
   List reduced from 169 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      502    0.361    0.001    0.361    0.001 {method 'acquire' of '_thread.lock' objects}
       30    0.022    0.001    0.022    0.001 {built-in method numpy.core.multiarray.array}
      680    0.006    0.000    0.006    0.000 {method 'update' of 'dict' objects}
      350    0.005    0.000    0.016    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/vaex/events.py:25(emit)
       10    0.005    0.000    0.005    0.000 {method 'reduce' of 'numpy.ufunc' objects}
       10    0.004    0.000    0.026    0.003 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/numpy/lib/nanfunctions.py:36(_replace_nan)
      330    0.004    0.000    0.004    0.000 {method '__enter__' of '_thread.lock' objects}
      180    0.003    0.000    0.003    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/vaex/dataset.py:327(__repr__)
       10    0.001    0.000    0.001    0.000 {built-in method numpy.core.multiarray.zeros}
       10    0.001    0.000    0.385    0.039 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/vaex/multithreading.py:69(map)

The results seem to indicate that most of the time is being spent waiting on locks (thread contention overhead) rather than data processing. The dask codebase shows that we're using locks in several places, notably with a SerializableLock class; blocking operations surrounding queue modification operations; and making thread-safe modifications to a thread pool.

@jbednar
Copy link
Member

jbednar commented Apr 19, 2017

Does the time spent acquiring locks scale with the dataset size, or would it become negligible for a problem 10x or 100x larger?

@jcrist
Copy link
Collaborator

jcrist commented Apr 19, 2017

The dask codebase shows that we're using locks in several places, notably with a SerializableLock class; blocking operations surrounding queue modification operations; and making thread-safe modifications to a thread pool.

None of these are hit in this code (except for the one in the threadpool, which is only for running multiple schedulers at once, and should have no effect here). I suspect the locking your seeing is at the numba level, which has locks around compilation due to threadsafe issues in compilation.

@jbednar
Copy link
Member

jbednar commented Apr 19, 2017

Interesting. Not sure why compilation would be involved here, on a repeat run like this, but maybe it's the code that checks to see if compilation is needed?

@gbrener
Copy link
Contributor Author

gbrener commented Apr 19, 2017

@jbednar based on additional runs, the time spent waiting for locks increases with dataset size.

@jbednar
Copy link
Member

jbednar commented Apr 19, 2017

Hmm; definitely needs to be investigated then. Thanks!

@jcrist
Copy link
Collaborator

jcrist commented Apr 19, 2017

Upon further investigation, this is only measuring the time dask spends waiting on results in the scheduler. The queue for managing results in the threaded scheduler uses a lock for blocking reads, which is desired. While the main (scheduler) thread is blocked, the worker threads are free to do work, so this has no effect on the actual performance. If you look at the timings you'll see that the lock time is almost the same as the run time, which matches expectations here.

@gbrener
Copy link
Contributor Author

gbrener commented Apr 19, 2017

Closing this issue, since I'm unable to find any direct evidence of thread contention beyond the results queue lock. Thanks for your help @jcrist.

@gbrener gbrener closed this as completed Apr 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants