Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve training type plugin when passed with Accelerator #10775

Closed
kaushikb11 opened this issue Nov 26, 2021 · 4 comments
Closed

Resolve training type plugin when passed with Accelerator #10775

kaushikb11 opened this issue Nov 26, 2021 · 4 comments
Assignees
Labels
accelerator bug Something isn't working priority: 0 High priority task

Comments

@kaushikb11
Copy link
Contributor

kaushikb11 commented Nov 26, 2021

🐛 Bug

To Reproduce

trainer = Trainer(accelerator=GPUAccelerator(precision_plugin=PrecisionPlugin(), training_type_plugin=DDPPlugin()), gpus=4)

At the moment, when training type plugin is passed with Accelerators, attributes such as parallel_devices, cluster_environment and sync_batchnorm are not set to the training plugin and leads to errors.

Expected behavior

trainer = Trainer(accelerator=GPUAccelerator(precision_plugin=PrecisionPlugin(), training_type_plugin=DDPPlugin()), gpus=4)

# should be equivalent to

training_type_plugin.parallel_devices == [torch.device("cuda", i) for i in self.parallel_device_ids]
training_type_plugin.cluster_environment == LightningEnvironment()
training_type_plugin.sync_batchnorm == False
@kaushikb11 kaushikb11 added bug Something isn't working priority: 0 High priority task accelerator labels Nov 26, 2021
@kaushikb11 kaushikb11 self-assigned this Nov 26, 2021
@awaelchli
Copy link
Contributor

awaelchli commented Nov 26, 2021

Thanks @kaushikb11
This fix would only go into 1.5.x and partially apply to master, because #10416 would change the requirements slightly here as things get inverted.

@kaushikb11
Copy link
Contributor Author

kaushikb11 commented Nov 26, 2021

Yup @awaelchli, I did check #10648 out before going ahead with the PR. Thanks!

@ananthsub
Copy link
Contributor

@four4fish @kaushikb11 is this solved by the accelerator connector rewrite?

@four4fish
Copy link
Contributor

@ananthsub yes it's solved by the rewrite and the follow ups
This is the last one #12105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment