lr scheduler defined in config cannot be overwritten by lr scheduler defined in code and pass to `deepspeed.initialize` [BUG] #5726

xiyang-aads-lilly · 2024-07-05T01:58:13Z

Describe the issue
note: this is not a bug but a inconsistent design.

In the current implementation of the learning rate (LR) scheduler configuration in DeepSpeed, the LR scheduler is always initialized from the configuration file if it is defined there, regardless of whether a scheduler is provided programmatically. This behavior leads to inconsistency with the optimizer, which can be overwritten programmatically even if it is defined in the configuration file.

Here is the current implementation of the LR scheduler configuration:

    def _configure_lr_scheduler(self, client_lr_scheduler):
        # First check for scheduler in json configuration
        lr_scheduler = self._scheduler_from_config(self.optimizer)
        if lr_scheduler:
            log_dist(f"DeepSpeed using configured LR scheduler = {self.scheduler_name()}", ranks=[0])
            self.lr_scheduler = lr_scheduler
        else:
            if isinstance(client_lr_scheduler, Callable):
                log_dist('DeepSpeed using client callable to create LR scheduler', ranks=[0])
                self.lr_scheduler = client_lr_scheduler(self.basic_optimizer)
            else:
                log_dist('DeepSpeed using client LR scheduler', ranks=[0])
                self.lr_scheduler = client_lr_scheduler

        log_dist(f'DeepSpeed LR Scheduler = {self.lr_scheduler}', ranks=[0])

Expected behavior
We should be able to overwrite the lr scheduler defined in config.
Ideally would prefer something like:

    def _configure_lr_scheduler(self, client_lr_scheduler):
        # First check for scheduler in json configuration
       if client_lr_scheduler:
            if isinstance(client_lr_scheduler, Callable):
                log_dist('DeepSpeed using client callable to create LR scheduler', ranks=[0])
                self.lr_scheduler = client_lr_scheduler(self.basic_optimizer)
            else:
                log_dist('DeepSpeed using client LR scheduler', ranks=[0])
                self.lr_scheduler = client_lr_scheduler
        else:
            lr_scheduler = self._scheduler_from_config(self.optimizer)
            log_dist(f"DeepSpeed using configured LR scheduler = {self.scheduler_name()}", ranks=[0])
            self.lr_scheduler = lr_scheduler
        

        log_dist(f'DeepSpeed LR Scheduler = {self.lr_scheduler}', ranks=[0])

Why does the current design enforce the initialization of the LR scheduler from the configuration file if it is defined there, while allowing the optimizer to be overwritten programmatically?

tjruwase · 2024-08-03T16:48:05Z

@xiyang-aads-lilly, this is a good catch. Your proposed solution looks reasonable. Are you able to provide a PR? Thanks!

xiyang-aads-lilly · 2024-08-04T03:04:35Z

@xiyang-aads-lilly, this is a good catch. Your proposed solution looks reasonable. Are you able to provide a PR? Thanks!

Sure. I will create PR next week.

This PR is based on #5726. The current lr scheduler initialization always prioritize config over manual defined scheduler in the code. However, the optimizer initialization implementation prioritize manual defined optimizer over config. This PR aims to make initialization behavior for both optimizer and lr scheduler consistent where if lr scheduler is defined in the code, then it will overwrite config. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

tohtana · 2024-09-12T20:53:31Z

Closing as this issue was resolved by #5846. Please feel free to reopen when needed.

xiyang-aads-lilly added bug Something isn't working training labels Jul 5, 2024

xiyang-aads-lilly mentioned this issue Aug 6, 2024

Update LR scheduler configuration #5846

Merged

tohtana closed this as completed Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lr scheduler defined in config cannot be overwritten by lr scheduler defined in code and pass to `deepspeed.initialize` [BUG] #5726

lr scheduler defined in config cannot be overwritten by lr scheduler defined in code and pass to `deepspeed.initialize` [BUG] #5726

xiyang-aads-lilly commented Jul 5, 2024 •

edited

Loading

tjruwase commented Aug 3, 2024

xiyang-aads-lilly commented Aug 4, 2024

tohtana commented Sep 12, 2024

lr scheduler defined in config cannot be overwritten by lr scheduler defined in code and pass to deepspeed.initialize [BUG] #5726

lr scheduler defined in config cannot be overwritten by lr scheduler defined in code and pass to deepspeed.initialize [BUG] #5726

Comments

xiyang-aads-lilly commented Jul 5, 2024 • edited Loading

tjruwase commented Aug 3, 2024

xiyang-aads-lilly commented Aug 4, 2024

tohtana commented Sep 12, 2024

lr scheduler defined in config cannot be overwritten by lr scheduler defined in code and pass to `deepspeed.initialize` [BUG] #5726

lr scheduler defined in config cannot be overwritten by lr scheduler defined in code and pass to `deepspeed.initialize` [BUG] #5726

xiyang-aads-lilly commented Jul 5, 2024 •

edited

Loading