Break out inner loop of optimize_layout so joblib can parallelize it (0.4dev version) #292

tomwhite · 2019-09-11T11:11:03Z

This is the 0.4dev version of #255. Probably best to review this one.

If no random seed is set then optimize_layout will use all cores on the machine, otherwise it will use a single core for reproducibility.

… compatibility with 0.3.

…lelize it

pep8speaks · 2019-09-11T11:11:07Z

Hello @tomwhite! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file umap/layouts.py:

Line 262:80: E501 line too long (86 > 79 characters)
Line 299:80: E501 line too long (116 > 79 characters)

In the file umap/umap_.py:

Line 1892:80: E501 line too long (99 > 79 characters)

sleighsoft · 2019-09-11T11:37:54Z

umap/layouts.py

+    parallel(parallel_calls(inner_fn, n_tasks))
+
+
+@numba.njit(fastmath=True, nogil=True)


I am not super deep into what you are doing here, so bare with me if my question makes no sense.
Why can't you use @numba.njit(parallel=True) @ line 129 instead of using nogil=True and joblib?

Your question makes a lot of sense. The answer is to do with random number seeding and reproducibility: basically Numba parallel doesn't allow controlled seeding across threads so it's not possible to get a reproducible result. With this code (using joblib), if you don't set a seed it will use all cores, otherwise it will use a single core and the result will be deterministic. See discussion in #231 for why that is not achievable with Numba parallel.

You could just toggle the decorator.

import numba import numpy as np def searchsorted(a, v): indices = np.empty(v.shape, dtype=np.int64) for i in numba.prange(v.shape[0]): indices[i] = np.searchsorted(a[i], v[i]) return indices # No numba, no parallel searchsorted(np.array([[4,3,2,5]]), np.array([[1]])) # Now uses the numba implementation parallel_searchsorted = numba.njit(searchsorted) parallel_searchsorted(np.array([[4,3,2,5]]), np.array([[1]]))

This way you can also programmatically just toggle parallel on or off by passing it to njit.
See https://github.com/numba/numba/blob/master/numba/decorators.py#L41

@sleighsoft you are right, this would be a better approach, since it's shorter and doesn't need a new dependency (joblib). I actually investigated this approach a while ago, but it got lost in the discussion in #231. I think the key point (which I missed before) was that we can use numba parallel if no seed is set (to get speed), but disable parallel if a seed is set (to get reproducibility).

I've opened #294 with the suggested approach.

tomwhite · 2019-09-12T17:01:14Z

Closing in favour of #294

tomwhite added 2 commits September 11, 2019 11:02

Make new arguments to simplicial_set_embedding optional for backwards…

ffd2e70

… compatibility with 0.3.

Break out inner loop of optimize_layout_euclidean so joblib can paral…

2894a54

…lelize it

tomwhite requested a review from lmcinnes September 11, 2019 11:11

sleighsoft reviewed Sep 11, 2019

View reviewed changes

tomwhite mentioned this pull request Sep 12, 2019

Break out inner loop of optimize_layout_euclidean so numba can parallelize it 0.4dev #294

Merged

tomwhite closed this Sep 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Break out inner loop of optimize_layout so joblib can parallelize it (0.4dev version) #292

Break out inner loop of optimize_layout so joblib can parallelize it (0.4dev version) #292

tomwhite commented Sep 11, 2019

pep8speaks commented Sep 11, 2019

sleighsoft Sep 11, 2019

tomwhite Sep 11, 2019

sleighsoft Sep 11, 2019

sleighsoft Sep 11, 2019

tomwhite Sep 12, 2019

tomwhite commented Sep 12, 2019

		parallel(parallel_calls(inner_fn, n_tasks))


		@numba.njit(fastmath=True, nogil=True)

Break out inner loop of optimize_layout so joblib can parallelize it (0.4dev version) #292

Break out inner loop of optimize_layout so joblib can parallelize it (0.4dev version) #292

Conversation

tomwhite commented Sep 11, 2019

pep8speaks commented Sep 11, 2019

sleighsoft Sep 11, 2019

Choose a reason for hiding this comment

tomwhite Sep 11, 2019

Choose a reason for hiding this comment

sleighsoft Sep 11, 2019

Choose a reason for hiding this comment

sleighsoft Sep 11, 2019

Choose a reason for hiding this comment

tomwhite Sep 12, 2019

Choose a reason for hiding this comment

tomwhite commented Sep 12, 2019