Add docs for optimizers and LR schedulers

openvpi · Aug 24, 2023 · 57087ca · 57087ca
1 parent 4d1d332
commit 57087ca
Show file tree

Hide file tree

Showing 2 changed files with 35 additions and 172 deletions.
diff --git a/docs/BestPractices.md b/docs/BestPractices.md
@@ -279,11 +279,13 @@ accumulate_grad_batches: 4  # the actual batch size will be 4x.
 
 Please note that enabling gradient accumulation will slow down training because the losses must be calculated for several times before the weights are updated (1 update to the weights = 1 actual training step).
 
-### Optimizer and learning rate
+## Optimizers and learning rate schedulers
 
-The optimizer and the learning rate scheduler can take an important role in accelerating the training process. DiffSinger uses a flexible configuration logic for these two modules.
+The optimizer and the learning rate scheduler can take an important role in the training process. DiffSinger uses a flexible configuration logic for these two modules.
 
-You can modify options of the optimizer and learning rate scheduler, or even use other classes of them by editing the configuration file:
+### Basic configurations
+
+The optimizer and learning rate scheduler used during training can be configured by their full class name and keyword arguments in the configuration file. Take the following as an example for the optimizer:
 
 ```yaml
 optimizer_args:
@@ -292,6 +294,11 @@ optimizer_args:
   beta1: 0.9
   beta2: 0.98
   weight_decay: 0
+```
+
+and for the learning rate scheduler:
+
+```yaml
 lr_scheduler_args:
   scheduler_cls: torch.optim.lr_scheduler.StepLR  # class name of learning rate schedule
   warmup_steps: 2000
@@ -305,7 +312,31 @@ Also, note that the LR scheduler performs scheduling on the granularity of steps
 
 The special case applies when a tuple is needed in `__init__`: `beta1` and `beta2` are treated separately and form a tuple in the code. You could try to pass in an array instead. (And as an experiment, AdamW does accept `[beta1, beta2]`). If there is another special treatment required, please submit an issue.
 
-If you found other optimizer and learning rate scheduler useful, you can raise a topic in [Discussions](https://github.com/openvpi/DiffSinger/discussions), raise [Issues](https://github.com/openvpi/DiffSinger/issues) or submit [PRs](https://github.com/openvpi/DiffSinger/pulls) if it introduces new codes or dependencies.
+For PyTorch built-in optimizers and LR schedulers, see official [documentation](https://pytorch.org/docs/stable/optim.html) of the `torch.optim` package. If you found other optimizer and learning rate scheduler useful, you can raise a topic in [Discussions](https://github.com/openvpi/DiffSinger/discussions), raise [Issues](https://github.com/openvpi/DiffSinger/issues) or submit [PRs](https://github.com/openvpi/DiffSinger/pulls) if it introduces new codes or dependencies.
+
+### Composite LR schedulers
+
+Some LR schedulers like `SequentialLR` and `ChainedScheduler` may use other schedulers as arguments. Besides built-in types, there is a special design to configure these scheduler objects. See the following example.
+
+```yaml
+lr_scheduler_args:
+  scheduler_cls: torch.optim.lr_scheduler.SequentialLR
+  schedulers:
+  - cls: torch.optim.lr_scheduler.ExponentialLR
+    gamma: 0.5
+  - cls: torch.optim.lr_scheduler.LinearLR
+  - cls: torch.optim.lr_scheduler.MultiStepLR
+    milestones:
+    - 10
+    - 20
+  milestones:
+  - 10
+  - 20
+```
+
+The LR scheduler objects will be recursively construct objects if `cls` is present in sub-arguments. Please note that `cls` must be a scheduler class because this is a special design.
+
+**WARNING:** Nested `SequentialLR` and `ChainedScheduler` have unexpected behavior. **DO NOT** nest them. Also, make sure the scheduler is _chainable_ before using it in `ChainedScheduler`.
 
 ## Fine-tuning and parameter freezing
 

diff --git a/docs/ConfigurationSchemas.md b/docs/ConfigurationSchemas.md
@@ -1710,30 +1710,6 @@ int
 
 100
 
-### lr_scheduler_args.gamma
-
-Learning rate decay ratio of `torch.optim.lr_scheduler.StepLR`.
-
-#### visibility
-
-all
-
-#### scope
-
-training
-
-#### customizability
-
-recommended
-
-#### type
-
-float
-
-#### default
-
-0.5
-
 ### lr_scheduler_args
 
 Arguments of learning rate scheduler. Keys will be used as keyword arguments of the `__init__()` method of [lr_scheduler_args.scheduler_cls](#lr_scheduler_args.scheduler_cls).
@@ -1766,54 +1742,6 @@ str
 
 torch.optim.lr_scheduler.StepLR
 
-### lr_scheduler_args.step_size
-
-Learning rate decays every this number of training steps.
-
-#### visibility
-
-all
-
-#### scope
-
-training
-
-#### customizability
-
-recommended
-
-#### type
-
-int
-
-#### default
-
-50000
-
-### lr_scheduler_args.warmup_steps
-
-Number of warmup steps of the learning rate scheduler.
-
-#### visibility
-
-all
-
-#### scope
-
-training
-
-#### customizability
-
-normal
-
-#### type
-
-int
-
-#### default
-
-2000
-
 ### max_batch_frames
 
 Maximum number of data frames in each training batch. Used to dynamically control the batch size.
@@ -2184,78 +2112,6 @@ Arguments of optimizer. Keys will be used as keyword arguments  of the `__init__
 
 dict
 
-### optimizer_args.beta1
-
-Parameter of the `torch.optim.AdamW` optimizer.
-
-#### visibility
-
-all
-
-#### scope
-
-training
-
-#### customizability
-
-not recommended
-
-#### type
-
-float
-
-#### default
-
-0.9
-
-### optimizer_args.beta2
-
-Parameter of the `torch.optim.AdamW` optimizer.
-
-#### visibility
-
-all
-
-#### scope
-
-training
-
-#### customizability
-
-not recommended
-
-#### type
-
-float
-
-#### default
-
-0.98
-
-### optimizer_args.lr
-
-Initial learning rate of the optimizer.
-
-#### visibility
-
-all
-
-#### scope
-
-training
-
-#### customizability
-
-recommended
-
-#### type
-
-float
-
-#### default
-
-0.0004
-
 ### optimizer_args.optimizer_cls
 
 Optimizer class name
@@ -2280,30 +2136,6 @@ str
 
 torch.optim.AdamW
 
-### optimizer_args.weight_decay
-
-Weight decay ratio of optimizer.
-
-#### visibility
-
-all
-
-#### scope
-
-training
-
-#### customizability
-
-not recommended
-
-#### type
-
-float
-
-#### default
-
-0
-
 ### pe
 
 Pitch extractor type.