-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cascade SIGTERM to subprocesses #16525
Conversation
@awaelchli I ran a manual test and this is necessary. The subprocesses do not get the signal automatically and the patch works as expected. Do you prefer that I add a test with mocks, or a complete test to https://github.com/Lightning-AI/lightning/blob/c854e4d1e230133b80e92c5dd088f2289219772f/tests/tests_pytorch/run_standalone_tasks.sh? |
@carmocca This looks great. A simple test through unitest.mock would be great! |
ed1befb
to
f977a2b
Compare
⚡ Required checks status: All passing 🟢Groups summary🟢 pytorch_lightning: Tests workflowThese checks are required after the changes to 🟢 pytorch_lightning: Azure GPU
These checks are required after the changes to 🟢 pytorch_lightning: Azure HPU
These checks are required after the changes to 🟢 pytorch_lightning: Azure IPU
These checks are required after the changes to 🟢 pytorch_lightning: Docs
These checks are required after the changes to 🟢 lightning_fabric: CPU workflow
These checks are required after the changes to 🟢 lightning_fabric: Azure GPU
These checks are required after the changes to 🟢 mypy
These checks are required after the changes to 🟢 installThese checks are required after the changes to 🟢 link-check
These checks are required after the changes to Thank you for your contribution! 💜
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really neat !
class _Launcher(_FabricLauncher, ABC): | ||
@abstractmethod | ||
def kill(self, signum: _SIGNUM) -> None: | ||
"""Kill existing alive processes.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carmocca You could also push this to fabric and raise NotImplementedError. To avoid having to define a new base class in PL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do you see us implementing signal handling for fabric in the future? Otherwise I wouldn't do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at the core, fabric would expose these primitives yes. You could call fabric.strategy.launcher.kill()
for example if that's useful to you. The launcher implementation is also very PL-independent. The only reason why they are subclassed in PL is because of some extra state management and hydra support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not do this as it would be unused code in Fabric. In the future, we can do that and add docs for it, or try to merge the classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolving. We can do this as a follow-up if we decide
What does this PR do?
Follow-up to #16501
Part of #16410
Does your PR introduce any breaking changes? If yes, please list them.
None
cc @Borda @carmocca @justusschock @awaelchli