You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Dear Coverage Team,
I am currently working on adding testing + coverage for Distributed Data Parallel for Pytorch.
Using pytest, for each DDP test, it is going to launch a master subprocess which will create children subprocess for running multi-gpu training.
One test is going to create master process using call_training_script.
def call_training_script(module_file, cli_args, method, tmpdir, timeout=60):
file = Path(module_file.__file__).absolute()
cli_args = cli_args.split(' ') if cli_args else []
cli_args += ['--tmpdir', str(tmpdir)]
cli_args += ['--trainer_method', method]
command = [sys.executable, str(file)] + cli_args
# need to set the PYTHONPATH in case pytorch_lightning was not installed into the environment
env = os.environ.copy()
env['PYTHONPATH'] = f'{pytorch_lightning.__file__}:' + env.get('PYTHONPATH', '')
# for running in ddp mode, we need to lauch it's own process or pytest will get stuck
p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env)
try:
std, err = p.communicate(timeout=timeout)
err = str(err.decode("utf-8"))
if 'Exception' in err:
raise Exception(err)
except TimeoutExpired:
p.kill()
std, err = p.communicate()
return std, err
And then the master process will call its children process.
for local_rank in range(1, self.trainer.num_processes):
env_copy = os.environ.copy()
env_copy['LOCAL_RANK'] = f'{local_rank}'
# remove env var if global seed not set
if os.environ.get('PL_GLOBAL_SEED') is None and 'PL_GLOBAL_SEED' in env_copy:
del env_copy['PL_GLOBAL_SEED']
# start process
# if hydra is available and initialized, make sure to set the cwd correctly
cwd: Optional[str] = None
if HYDRA_AVAILABLE:
if HydraConfig.initialized():
cwd = get_original_cwd()
proc = subprocess.Popen(command, env=env_copy, cwd=cwd)
self.interactive_ddp_procs.append(proc)
I tried following your documentation, but couldn't make it work.
Would it be possible to have some guidance and I also personally think an example repository with nested example would be great.
Best regards,
Thomas Chaton.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context about the feature request here.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Dear Coverage Team,
I am currently working on adding testing + coverage for Distributed Data Parallel for Pytorch.
Using pytest, for each DDP test, it is going to launch a master subprocess which will create children subprocess for running multi-gpu training.
One test is going to create master process using
call_training_script
.And then the master process will call its children process.
I tried following your documentation, but couldn't make it work.
Would it be possible to have some guidance and I also personally think an example repository with nested example would be great.
Best regards,
Thomas Chaton.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context about the feature request here.
The text was updated successfully, but these errors were encountered: