Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parallel module using joblib for Spark #5924
Add parallel module using joblib for Spark #5924
Changes from 12 commits
fab287a
74d56c9
6a4c775
a28c7f1
3f61fd5
bdaa6ea
4b657cb
7defb54
cfd39eb
2d9d752
b2159b4
172cfbc
03f2755
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this can still be in map_nested, so that the signature of parallel_map could be
and
map_nested
would callThis way it will be easier to start using
parallel_map
in other places in the code no ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so
_map_with_joblib
would also takesplit_kwds
as input, which is arbitrarily split according tonum_proc
rather than decided by joblib.Is there any other places that you are thinking of using
parallel_map
for? I thought it's just a replacement of the multiprocessing part ofmap_nested
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_jobs
is specified to joblib anyway no ? not a big deal anywayThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might leave it like this so that
n_jobs=-1
could be used when the user wants to letjoblib
decide the number of workers / processes etc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good idea !