Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterEnvironment interface and improvements #6303

Closed
7 of 8 tasks
awaelchli opened this issue Mar 2, 2021 · 8 comments
Closed
7 of 8 tasks

ClusterEnvironment interface and improvements #6303

awaelchli opened this issue Mar 2, 2021 · 8 comments
Labels
environment good first issue Good for newcomers priority: 1 Medium priority task refactor
Milestone

Comments

@awaelchli
Copy link
Contributor

awaelchli commented Mar 2, 2021

🚀 Feature

In #5915 we started discussing what a good interface for the cluster environment could be and where it is headed in the future.
We should iterate on these ideas.

Some ideas / issues / refactors to discuss:

# returns true if we can detect the cluster environment.
def detect():
    if ...
        return True
@awaelchli awaelchli added feature Is an improvement or enhancement help wanted Open to be worked on docs Documentation related discussion In a discussion stage refactor labels Mar 2, 2021
@awaelchli awaelchli added this to the 1.3 milestone Mar 2, 2021
@awaelchli awaelchli changed the title Standardize environment variable access across different ClusterEnvironments Cluster Environment interface and improvements Mar 2, 2021
@awaelchli awaelchli changed the title Cluster Environment interface and improvements ClusterEnvironment interface and improvements Mar 2, 2021
@awaelchli awaelchli self-assigned this Apr 13, 2021
@awaelchli awaelchli added the priority: 1 Medium priority task label Apr 18, 2021
@awaelchli awaelchli modified the milestones: v1.3, v1.4 Apr 18, 2021
@leezu
Copy link
Contributor

leezu commented May 6, 2021

Trainer has a SLURM specific property. Should it be deprecated as part of the ClusterEnvironment interface improvements?

https://github.com/PyTorchLightning/pytorch-lightning/blob/8c0ea92af237542f5b36ae684543f871da829379/pytorch_lightning/trainer/properties.py#L176-L189

@awaelchli
Copy link
Contributor Author

Hi, good observation. I would say the logic inside of it can and should definitely move into the cluster plugin. I would still keep the property as an easy way to access the cluster environment IF that is something users actually access (I have no idea).

@ananthsub
Copy link
Contributor

Consider renaming "creates_children". It is not clear if it means the environment within Lightning or the actual cluster. cc @ananthsub

Since it came up in #9389 - we need a consistent value here during the course of the job. relying on the mutable os.environ for instance can lead to surprising side effects. a naming nit would be to to rename this as created_children to be ultra clear that this either happened or didn't, as opposed to a continuous/ongoing behavior.

@awaelchli
Copy link
Contributor Author

@ananthsub

a naming nit would be to to rename this as created_children to be ultra clear that this either happened or didn't, as opposed to a continuous/ongoing behavior.

I propose "creates_processes_externally" as a clearer name. #10106
Suggestions for better names welcome :)

@awaelchli awaelchli modified the milestones: v1.5, v1.6 Nov 4, 2021
@carmocca carmocca added environment and removed help wanted Open to be worked on docs Documentation related discussion In a discussion stage feature Is an improvement or enhancement labels Feb 1, 2022
@carmocca
Copy link
Contributor

@awaelchli is this done? can we close?

@awaelchli
Copy link
Contributor Author

@carmocca there is one item open

Convert methods to proper setters/getters

It came up in reviews that people were wondering why the methods are not properties. They are not properties because the plugins are meant to be customized by overriding them and in Lightning we have the strict rule to avoid decorators when it comes to overriding hooks or similar. We can attempt to make a decision or just close the issue until someone brings this up again. The reason why nobody has done it so far is probably because integrating backward compatibility is absolutely insane on this one.

@carmocca
Copy link
Contributor

If you don't mind, can you open a separate issue for just that so we close this one?

because integrating backward compatibility is absolutely insane on this one.

In the new issue, can you broadly describe what would be the pattern used to keep backwards compatibility?

@awaelchli
Copy link
Contributor Author

Follow up #12025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
environment good first issue Good for newcomers priority: 1 Medium priority task refactor
Projects
None yet
Development

No branches or pull requests

5 participants