Investigate dask locking behavior #314

gbrener · 2017-04-19T19:23:09Z

In my general performance optimization tasks with datashader, I ran cProfiler on both datashader and vaex to gain insight into potential bottlenecks. The code under profile was similar to the first half of the notebook in Issue #310 except the pandas dataframe was persisted into a dask dataframe before being passed on to datashader, and the profiled functions/methods were run for 5 iterations instead of 3. Data preparation steps were not included in the benchmarked code section:

Datashader:

         75463 function calls (72629 primitive calls) in 3.057 seconds

   Ordered by: internal time
   List reduced from 1411 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      540    2.978    0.006    2.978    0.006 {method 'acquire' of '_thread.lock' objects}
        2    0.007    0.004    0.007    0.004 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/llvmlite/binding/executionengine.py:100(finalize_object)
        2    0.006    0.003    0.006    0.003 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/llvmlite/binding/passmanagers.py:94(run)
      940    0.002    0.000    0.003    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/dask/core.py:159(get_dependencies)
2385/1459    0.002    0.000    0.005    0.000 {method 'format' of 'str' objects}
     6652    0.001    0.000    0.005    0.000 {built-in method builtins.isinstance}
   355/60    0.001    0.000    0.004    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/abc.py:194(__subclasscheck__)
       11    0.001    0.000    0.001    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/llvmlite/binding/module.py:11(parse_assembly)
       37    0.001    0.000    0.001    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/llvmlite/binding/passmanagers.py:123(run)
       20    0.001    0.000    2.991    0.150 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/dask/async.py:386(get_async)

Vaex:

         17574 function calls (17319 primitive calls) in 0.427 seconds

   Ordered by: internal time
   List reduced from 169 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      502    0.361    0.001    0.361    0.001 {method 'acquire' of '_thread.lock' objects}
       30    0.022    0.001    0.022    0.001 {built-in method numpy.core.multiarray.array}
      680    0.006    0.000    0.006    0.000 {method 'update' of 'dict' objects}
      350    0.005    0.000    0.016    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/vaex/events.py:25(emit)
       10    0.005    0.000    0.005    0.000 {method 'reduce' of 'numpy.ufunc' objects}
       10    0.004    0.000    0.026    0.003 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/numpy/lib/nanfunctions.py:36(_replace_nan)
      330    0.004    0.000    0.004    0.000 {method '__enter__' of '_thread.lock' objects}
      180    0.003    0.000    0.003    0.000 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/vaex/dataset.py:327(__repr__)
       10    0.001    0.000    0.001    0.000 {built-in method numpy.core.multiarray.zeros}
       10    0.001    0.000    0.385    0.039 /Users/gbrener/miniconda/envs/vaex/lib/python3.6/site-packages/vaex/multithreading.py:69(map)

The results seem to indicate that most of the time is being spent waiting on locks (thread contention overhead) rather than data processing. The dask codebase shows that we're using locks in several places, notably with a SerializableLock class; blocking operations surrounding queue modification operations; and making thread-safe modifications to a thread pool.

The text was updated successfully, but these errors were encountered:

jbednar · 2017-04-19T19:29:24Z

Does the time spent acquiring locks scale with the dataset size, or would it become negligible for a problem 10x or 100x larger?

jcrist · 2017-04-19T19:36:15Z

The dask codebase shows that we're using locks in several places, notably with a SerializableLock class; blocking operations surrounding queue modification operations; and making thread-safe modifications to a thread pool.

None of these are hit in this code (except for the one in the threadpool, which is only for running multiple schedulers at once, and should have no effect here). I suspect the locking your seeing is at the numba level, which has locks around compilation due to threadsafe issues in compilation.

jbednar · 2017-04-19T19:38:04Z

Interesting. Not sure why compilation would be involved here, on a repeat run like this, but maybe it's the code that checks to see if compilation is needed?

gbrener · 2017-04-19T19:43:45Z

@jbednar based on additional runs, the time spent waiting for locks increases with dataset size.

jbednar · 2017-04-19T19:46:25Z

Hmm; definitely needs to be investigated then. Thanks!

jcrist · 2017-04-19T19:58:52Z

Upon further investigation, this is only measuring the time dask spends waiting on results in the scheduler. The queue for managing results in the threaded scheduler uses a lock for blocking reads, which is desired. While the main (scheduler) thread is blocked, the worker threads are free to do work, so this has no effect on the actual performance. If you look at the timings you'll see that the lock time is almost the same as the run time, which matches expectations here.

gbrener · 2017-04-19T20:11:31Z

Closing this issue, since I'm unable to find any direct evidence of thread contention beyond the results queue lock. Thanks for your help @jcrist.

gbrener mentioned this issue Apr 19, 2017

General performance optimizations #313

Closed

9 tasks

gbrener closed this as completed Apr 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate dask locking behavior #314

Investigate dask locking behavior #314

gbrener commented Apr 19, 2017

jbednar commented Apr 19, 2017 •

edited

Loading

jcrist commented Apr 19, 2017

jbednar commented Apr 19, 2017

gbrener commented Apr 19, 2017 •

edited

Loading

jbednar commented Apr 19, 2017

jcrist commented Apr 19, 2017

gbrener commented Apr 19, 2017

Investigate dask locking behavior #314

Investigate dask locking behavior #314

Comments

gbrener commented Apr 19, 2017

jbednar commented Apr 19, 2017 • edited Loading

jcrist commented Apr 19, 2017

jbednar commented Apr 19, 2017

gbrener commented Apr 19, 2017 • edited Loading

jbednar commented Apr 19, 2017

jcrist commented Apr 19, 2017

gbrener commented Apr 19, 2017

jbednar commented Apr 19, 2017 •

edited

Loading

gbrener commented Apr 19, 2017 •

edited

Loading