-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Move early checkpoint removal into mixin #621
Conversation
else: | ||
callback_kwargs = None | ||
if callback_kwargs is not None: | ||
# General case: Scheduler implements ``trials_checkpoints_can_be_removed`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code (2nd PR) will become a special case here. This is why a factory is needed.
|
||
def params_early_checkpoint_removal(self) -> Optional[Dict[str, Any]]: | ||
""" | ||
Supports special cases, in which :meth:`trials_checkpoints_can_be_removed` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method will only be needed in the 2nd part of the PR
@@ -101,3 +101,17 @@ def on_tuning_sleep(self, sleep_time: float): | |||
:param sleep_time: Time (in secs) for which tuner has just slept | |||
""" | |||
pass | |||
|
|||
def on_start_trial(self, trial: Trial): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods were originally missing. They are also needed in the 2nd part of the PR
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #621 +/- ##
==========================================
+ Coverage 65.13% 65.29% +0.16%
==========================================
Files 364 367 +3
Lines 26542 26680 +138
==========================================
+ Hits 17287 17420 +133
- Misses 9255 9260 +5
... and 7 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a very good change. I have added some comments about refactoring for simplicity but I do not have the full picture of the changes to come so happy to discuss.
def params_early_checkpoint_removal(self) -> Optional[Dict[str, Any]]: | ||
""" | ||
Supports special cases, in which :meth:`trials_checkpoints_can_be_removed` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current implmentation this function does two things 1/ provides parameters for checkpoint removal factory and 2/ acts as a flag for whether trials_checkpoints_can_be_removed
is implemented by the scheduler. We should split it into two functions to make the interface more explicit and therefore clear.
In practice, we can move the logic of if params_early_checkpoint_removal() is None: Scheduler implements trials_checkpoints_can_be_removed
that is currently found in L43 of the factory to the mixin itself. This way no new logic is introduced by this change. In case we retain the params_early_checkpoint_removal function this cna be done as:
def scheduler_implements_checkpoint_removal():
return self.params_early_checkpoint_removal() is None
Otherwise it can just return the scheduler_implements_checkpoint_removal
attribute of the mixin (getter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I introduce a method does_checkpoint_removal
def early_checkpoint_removal_factory( | ||
scheduler: TrialScheduler, | ||
stop_criterion: Callable[[TuningStatus], bool], | ||
) -> Optional[TunerCallback]: | ||
""" | ||
Early checkpoint removal is implemented by callbacks, which depend on which | ||
scheduler is being used. For many schedulers, early checkpoint removal is | ||
not supported. | ||
|
||
:param scheduler: Scheduler for which early checkpoint removal is requested | ||
:param stop_criterion: Stop criterion as passed to :class:`~syne_tune.Tuner` | ||
:return: Callback for early checkpoint removal, or ``None`` if this is not | ||
supported for the scheduler | ||
""" | ||
callback = None | ||
if isinstance(scheduler, RemoveCheckpointsSchedulerMixin): | ||
callback_kwargs = scheduler.params_early_checkpoint_removal() | ||
else: | ||
callback_kwargs = None | ||
if callback_kwargs is not None: | ||
# General case: Scheduler implements ``trials_checkpoints_can_be_removed`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added two comments on RemoveCheckpointsSchedulerMixin 1/ adding an explicit function that decides whether a Scheduler implements checkpoint removal and 2/ moving callback initialization to RemoveCheckpointsSchedulerMixin. With those two applied this function would look like (draft code)
def early_checkpoint_removal_factory(
scheduler: TrialScheduler,
stop_criterion: Callable[[TuningStatus], bool],
) -> Optional[TunerCallback]:
"""
Early checkpoint removal is implemented by callbacks, which depend on which
scheduler is being used. For many schedulers, early checkpoint removal is
not supported.
:param scheduler: Scheduler for which early checkpoint removal is requested
:param stop_criterion: Stop criterion as passed to :class:`~syne_tune.Tuner`
:return: Callback for early checkpoint removal, or ``None`` if this is not
supported for the scheduler
"""
if isinstance(scheduler, RemoveCheckpointsSchedulerMixin):
return scheduler.checkpoint_removal_callback(stop_criterion)
else:
return None
Which now keeps the callback generation logic togeter rather than split between two modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, you suggest to move the creation of the callback into the mixin, so in the end, the scheduler is responsible. This would get rid of this params_early_checkpoint_removal
.
This is a good idea. Somehow, I wanted the factory to be decoupled from the scheduler, but now I don't really see a very good reason why.
from typing import List, Optional, Dict, Any | ||
|
||
|
||
class RemoveCheckpointsSchedulerMixin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we can move the checkpoint-removal-callback initialization to this class, I believe it should be easier to parse compared to the current setup where the class just returns the params and initialization happens inside a different module. In the proposed logic, we retain early_checkpoint_removal_factory.py
which would handle the default cases of schedulers that are not concerned with checkpoints removal. The function for callback factory inside the mixin could look like
def checkpoint_removal_callback(self, stop_criterion: Callable[[TuningStatus], bool]) -> Optional[TunerCallback]:
# Get parameters for callback. This ca be implemented here or
# as previously via: callback_kwargs = scheduler.params_early_checkpoint_removal()
if self.scheduler_implements_checkpoint_removal():
# General case: Scheduler implements ``trials_checkpoints_can_be_removed``
# method, and we can use the generic ``RemoveCheckpointsCallback``
# callback
assert len(callback_kwargs) == 0, (
"params_early_checkpoint_removal of your scheduler returns "
"arguments, which are not used in RemoveCheckpointsCallback"
)
callback = RemoveCheckpointsCallback()
else:
# Special case
return callback
This way we avoid splitting the 1/ parametrization and 2/ initialization between two functions. We keep all logic related to generating the callback here thus making the code easier to parse. This also gives us more flexibility since if scheduler needs a very specific checkpoint-removal-callback, it can override the checkpoint_removal_callback()
method and return a completely custom callback (instead of getting creative with parameters passing).
See comment on the early_checkpoint_removal_factory
for usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, thanks. I am doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll implement the general case as a subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! This is what I thought would be the best state byt wasnt sure if its possible given the changes to come.
This is the first part of the speculative early CP removal PR. It shows how we can avoid new methods in
TrialScheduler
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.