Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Move early checkpoint removal into mixin #621

Merged
merged 3 commits into from
Apr 17, 2023
Merged

Conversation

mseeger
Copy link
Collaborator

@mseeger mseeger commented Apr 11, 2023

This is the first part of the speculative early CP removal PR. It shows how we can avoid new methods in TrialScheduler


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@mseeger mseeger requested a review from jgolebiowski April 11, 2023 18:15
else:
callback_kwargs = None
if callback_kwargs is not None:
# General case: Scheduler implements ``trials_checkpoints_can_be_removed``
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code (2nd PR) will become a special case here. This is why a factory is needed.


def params_early_checkpoint_removal(self) -> Optional[Dict[str, Any]]:
"""
Supports special cases, in which :meth:`trials_checkpoints_can_be_removed`
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method will only be needed in the 2nd part of the PR

@@ -101,3 +101,17 @@ def on_tuning_sleep(self, sleep_time: float):
:param sleep_time: Time (in secs) for which tuner has just slept
"""
pass

def on_start_trial(self, trial: Trial):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods were originally missing. They are also needed in the 2nd part of the PR

@codecov
Copy link

codecov bot commented Apr 11, 2023

Codecov Report

Patch coverage: 79.06% and project coverage change: +0.16 🎉

Comparison is base (7df7640) 65.13% compared to head (63e78f6) 65.29%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #621      +/-   ##
==========================================
+ Coverage   65.13%   65.29%   +0.16%     
==========================================
  Files         364      367       +3     
  Lines       26542    26680     +138     
==========================================
+ Hits        17287    17420     +133     
- Misses       9255     9260       +5     
Impacted Files Coverage Δ
benchmarking/commons/hpo_main_local.py 0.00% <0.00%> (ø)
benchmarking/commons/launch_remote_local.py 0.00% <ø> (ø)
syne_tune/backend/trial_backend.py 82.14% <ø> (ø)
syne_tune/experiments.py 0.00% <0.00%> (ø)
syne_tune/optimizer/scheduler.py 92.95% <ø> (-0.20%) ⬇️
syne_tune/results_callback.py 89.47% <ø> (ø)
syne_tune/callbacks/remove_checkpoints_callback.py 61.11% <66.66%> (-38.89%) ⬇️
syne_tune/tuner.py 84.49% <81.25%> (-0.77%) ⬇️
...ne_tune/optimizer/schedulers/remove_checkpoints.py 83.33% <83.33%> (ø)
benchmarking/commons/hpo_main_common.py 47.22% <100.00%> (ø)
... and 4 more

... and 7 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@jgolebiowski jgolebiowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a very good change. I have added some comments about refactoring for simplicity but I do not have the full picture of the changes to come so happy to discuss.

Comment on lines 64 to 66
def params_early_checkpoint_removal(self) -> Optional[Dict[str, Any]]:
"""
Supports special cases, in which :meth:`trials_checkpoints_can_be_removed`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implmentation this function does two things 1/ provides parameters for checkpoint removal factory and 2/ acts as a flag for whether trials_checkpoints_can_be_removed is implemented by the scheduler. We should split it into two functions to make the interface more explicit and therefore clear.

In practice, we can move the logic of if params_early_checkpoint_removal() is None: Scheduler implements trials_checkpoints_can_be_removed that is currently found in L43 of the factory to the mixin itself. This way no new logic is introduced by this change. In case we retain the params_early_checkpoint_removal function this cna be done as:

def scheduler_implements_checkpoint_removal():
    return self.params_early_checkpoint_removal() is None

Otherwise it can just return the scheduler_implements_checkpoint_removal attribute of the mixin (getter)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I introduce a method does_checkpoint_removal

Comment on lines 24 to 44
def early_checkpoint_removal_factory(
scheduler: TrialScheduler,
stop_criterion: Callable[[TuningStatus], bool],
) -> Optional[TunerCallback]:
"""
Early checkpoint removal is implemented by callbacks, which depend on which
scheduler is being used. For many schedulers, early checkpoint removal is
not supported.

:param scheduler: Scheduler for which early checkpoint removal is requested
:param stop_criterion: Stop criterion as passed to :class:`~syne_tune.Tuner`
:return: Callback for early checkpoint removal, or ``None`` if this is not
supported for the scheduler
"""
callback = None
if isinstance(scheduler, RemoveCheckpointsSchedulerMixin):
callback_kwargs = scheduler.params_early_checkpoint_removal()
else:
callback_kwargs = None
if callback_kwargs is not None:
# General case: Scheduler implements ``trials_checkpoints_can_be_removed``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added two comments on RemoveCheckpointsSchedulerMixin 1/ adding an explicit function that decides whether a Scheduler implements checkpoint removal and 2/ moving callback initialization to RemoveCheckpointsSchedulerMixin. With those two applied this function would look like (draft code)

def early_checkpoint_removal_factory(
    scheduler: TrialScheduler,
    stop_criterion: Callable[[TuningStatus], bool],
) -> Optional[TunerCallback]:
    """
    Early checkpoint removal is implemented by callbacks, which depend on which
    scheduler is being used. For many schedulers, early checkpoint removal is
    not supported.
    :param scheduler: Scheduler for which early checkpoint removal is requested
    :param stop_criterion: Stop criterion as passed to :class:`~syne_tune.Tuner`
    :return: Callback for early checkpoint removal, or ``None`` if this is not
        supported for the scheduler
    """
    if isinstance(scheduler, RemoveCheckpointsSchedulerMixin):
        return scheduler.checkpoint_removal_callback(stop_criterion)
    else:
        return None

Which now keeps the callback generation logic togeter rather than split between two modules

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you suggest to move the creation of the callback into the mixin, so in the end, the scheduler is responsible. This would get rid of this params_early_checkpoint_removal.

This is a good idea. Somehow, I wanted the factory to be decoupled from the scheduler, but now I don't really see a very good reason why.

from typing import List, Optional, Dict, Any


class RemoveCheckpointsSchedulerMixin:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we can move the checkpoint-removal-callback initialization to this class, I believe it should be easier to parse compared to the current setup where the class just returns the params and initialization happens inside a different module. In the proposed logic, we retain early_checkpoint_removal_factory.py which would handle the default cases of schedulers that are not concerned with checkpoints removal. The function for callback factory inside the mixin could look like

def checkpoint_removal_callback(self, stop_criterion: Callable[[TuningStatus], bool]) -> Optional[TunerCallback]:
    # Get parameters for callback. This ca be implemented here or 
    # as previously via: callback_kwargs = scheduler.params_early_checkpoint_removal()
    
    if self.scheduler_implements_checkpoint_removal():
        # General case: Scheduler implements ``trials_checkpoints_can_be_removed``
        # method, and we can use the generic ``RemoveCheckpointsCallback``
        # callback
        assert len(callback_kwargs) == 0, (
            "params_early_checkpoint_removal of your scheduler returns "
            "arguments, which are not used in RemoveCheckpointsCallback"
        )
        callback = RemoveCheckpointsCallback()
    else:
        # Special case
    return callback

This way we avoid splitting the 1/ parametrization and 2/ initialization between two functions. We keep all logic related to generating the callback here thus making the code easier to parse. This also gives us more flexibility since if scheduler needs a very specific checkpoint-removal-callback, it can override the checkpoint_removal_callback() method and return a completely custom callback (instead of getting creative with parameters passing).

See comment on the early_checkpoint_removal_factory for usage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, thanks. I am doing this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll implement the general case as a subclass.

Copy link
Collaborator

@jgolebiowski jgolebiowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! This is what I thought would be the best state byt wasnt sure if its possible given the changes to come.

@mseeger mseeger merged commit e28b555 into main Apr 17, 2023
@mseeger mseeger deleted the refactor_cp_removal branch April 17, 2023 10:05
@wesk wesk added the refactor label Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants