-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClusterEnvironment interface and improvements #6303
Comments
Trainer has a SLURM specific property. Should it be deprecated as part of the ClusterEnvironment interface improvements? |
Hi, good observation. I would say the logic inside of it can and should definitely move into the cluster plugin. I would still keep the property as an easy way to access the cluster environment IF that is something users actually access (I have no idea). |
Since it came up in #9389 - we need a consistent value here during the course of the job. relying on the mutable os.environ for instance can lead to surprising side effects. a naming nit would be to to rename this as |
I propose "creates_processes_externally" as a clearer name. #10106 |
@awaelchli is this done? can we close? |
@carmocca there is one item open
It came up in reviews that people were wondering why the methods are not properties. They are not properties because the plugins are meant to be customized by overriding them and in Lightning we have the strict rule to avoid decorators when it comes to overriding hooks or similar. We can attempt to make a decision or just close the issue until someone brings this up again. The reason why nobody has done it so far is probably because integrating backward compatibility is absolutely insane on this one. |
If you don't mind, can you open a separate issue for just that so we close this one?
In the new issue, can you broadly describe what would be the pattern used to keep backwards compatibility? |
Follow up #12025 |
🚀 Feature
In #5915 we started discussing what a good interface for the cluster environment could be and where it is headed in the future.
We should iterate on these ideas.
Some ideas / issues / refactors to discuss:
there is a small amount of code duplication.
Review Add LSF support #5102 and determine if interface needs changes
Add docs for all environments
Teardown logic: cluster environments that set environment variables (globally) should also be responsible for cleaning them up ([fix] Add a cluster environment teardown to clean up environment state #6942)
Convert methods to proper setters/getters (Rename "master" methods to "main" in ClusterEnvironment plugins #10103)
Consider renaming "creates_children". It is not clear if it means the environment within Lightning or the actual cluster. cc @Borda @justusschock @awaelchli @akihironitta @rohitgr7 @tchaton @ananthsub Remove deprecated method
ClusterEnvironment.creates_children
#10339Move left over slurm logic in slurm connector to the slurm environment (refactor slurm_job_id #10622 Deprecate
trainer.slurm_job_id
#10615)A
detect
method:The text was updated successfully, but these errors were encountered: