-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace multi processing with joblib #477
replace multi processing with joblib #477
Conversation
5100a5a
to
ef7fe8a
Compare
1b7080f
to
4a62e02
Compare
6dcbf51
to
4ffb05a
Compare
qlib/data/updateparallel.py
Outdated
require=None, | ||
maxtasksperchild=None, | ||
**kwargs) | ||
self._backend_args["maxtasksperchild"] = ["maxtasksperchild"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._backend_args["maxtasksperchild"] = maxtasksperchild
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(self._backend, MultiprocessingBackend):
self._backend_args["maxtasksperchild"] = maxtasksperchild
qlib/data/updateparallel.py
Outdated
from joblib import Parallel | ||
|
||
|
||
class UpdateParallel(Parallel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UpdateParallel
moves to qlib/utils/__init__.py
UpdateParllel
renamed to ParallelExt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/microsoft/qlib/blob/main/qlib/utils/paral.py
will be a better place
qlib/data/updateparallel.py
Outdated
maxtasksperchild=None, | ||
**kwargs | ||
): | ||
super(UpdateParallel, self).__init__(n_jobs=n_jobs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super(UpdateParallel, self).__init__(
n_jobs=n_jobs,
backend=backend,
verbose=verbose,
timeout=timeout,
pre_dispatch=pre_dispatch,
batch_size=batch_size,
temp_folder=temp_folder,
max_nbytes=max_nbytes,
mmap_mode=mmap_mode,
prefer=prefer,
require=require,
)
qlib/data/updateparallel.py
Outdated
backend=None, | ||
verbose=0, | ||
timeout=None, | ||
pre_dispatch="2 * n_jobs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using *args, **kwargs
instead of explicitly list all the arguments?
qlib/data/updateparallel.py
Outdated
from joblib import Parallel | ||
|
||
|
||
class UpdateParallel(Parallel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/microsoft/qlib/blob/main/qlib/utils/paral.py
will be a better place
qlib/config.py
Outdated
@@ -92,6 +92,7 @@ def set_conf_from_C(self, config_c): | |||
"kernels": NUM_USABLE_CPU, | |||
# How many tasks belong to one process. Recommend 1 for high-frequency data and None for daily data. | |||
"maxtasksperchild": None, | |||
"joblib_backend" : None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set the default backend to multiprocessing if loky is very likely to OOM?
qlib/data/updateparallel.py
Outdated
@@ -0,0 +1,41 @@ | |||
from joblib import Parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
tests/misc/test_get_multi_proc.py
Outdated
""" | ||
For testing if it will raise error | ||
""" | ||
qlib.init(provider_uri=TestAutoData.provider_uri, expression_cache=None, dataset_cache=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to use loky to pass the test
* replace multi processing with joblib * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * update class Parallel and data.py * Fix Parallel support for maxtasksperchild Co-authored-by: wangw <1666490690@qq.com> Co-authored-by: zhupr <zhu.pengrong@foxmail.com>
Description
Multiprocessing has following weakness
Joblib has no above problems.
So we try to replace multi processing with joblib
How Has This Been Tested?
pytest qlib/tests/test_all_pipeline.py
under upper directory ofqlib
.Types of changes