-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Break out inner loop of optimize_layout so joblib can parallelize it (0.4dev version) #292
Break out inner loop of optimize_layout so joblib can parallelize it (0.4dev version) #292
Conversation
… compatibility with 0.3.
Hello @tomwhite! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:
|
parallel(parallel_calls(inner_fn, n_tasks)) | ||
|
||
|
||
@numba.njit(fastmath=True, nogil=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not super deep into what you are doing here, so bare with me if my question makes no sense.
Why can't you use @numba.njit(parallel=True)
@ line 129 instead of using nogil=True
and joblib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your question makes a lot of sense. The answer is to do with random number seeding and reproducibility: basically Numba parallel doesn't allow controlled seeding across threads so it's not possible to get a reproducible result. With this code (using joblib), if you don't set a seed it will use all cores, otherwise it will use a single core and the result will be deterministic. See discussion in #231 for why that is not achievable with Numba parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just toggle the decorator.
import numba
import numpy as np
def searchsorted(a, v):
indices = np.empty(v.shape, dtype=np.int64)
for i in numba.prange(v.shape[0]):
indices[i] = np.searchsorted(a[i], v[i])
return indices
# No numba, no parallel
searchsorted(np.array([[4,3,2,5]]), np.array([[1]]))
# Now uses the numba implementation
parallel_searchsorted = numba.njit(searchsorted)
parallel_searchsorted(np.array([[4,3,2,5]]), np.array([[1]]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way you can also programmatically just toggle parallel on or off by passing it to njit.
See https://github.com/numba/numba/blob/master/numba/decorators.py#L41
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sleighsoft you are right, this would be a better approach, since it's shorter and doesn't need a new dependency (joblib). I actually investigated this approach a while ago, but it got lost in the discussion in #231. I think the key point (which I missed before) was that we can use numba parallel if no seed is set (to get speed), but disable parallel if a seed is set (to get reproducibility).
I've opened #294 with the suggested approach.
Closing in favour of #294 |
This is the 0.4dev version of #255. Probably best to review this one.
If no random seed is set then optimize_layout will use all cores on the machine, otherwise it will use a single core for reproducibility.