Skip to content

Commit

Permalink
Merge branch 'master' into Acce_refactor
Browse files Browse the repository at this point in the history
  • Loading branch information
four4fish authored Nov 16, 2021
2 parents 550d250 + 247f5aa commit 3be2ac6
Show file tree
Hide file tree
Showing 45 changed files with 371 additions and 298 deletions.
8 changes: 3 additions & 5 deletions .github/workflows/probot-auto-cc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,14 @@ name: Probot

on:
issues:
types:
- labeled
types: [labeled]
pull_request:
types:
- labeled
types: [labeled, ready_for_review]

jobs:
auto-cc:
if: ${{ github.repository_owner == 'PyTorchLightning' }}
runs-on: ubuntu-latest
if: github.event_name == "issue" || github.event.pull_request.draft == false
steps:
- uses: carmocca/probot@v1
env:
Expand Down
26 changes: 23 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,13 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- The `monitor` argument in the `EarlyStopping` callback is no longer optional ([#10328](https://github.com/PyTorchLightning/pytorch-lightning/pull/10328))


- Moved `precision_plugin` into `Training_type_plugin` and updated reference ([#10570](https://github.com/PyTorchLightning/pytorch-lightning/pull/10570))
- Do not fail if batch size could not be inferred for logging when using DeepSpeed ([#10438](https://github.com/PyTorchLightning/pytorch-lightning/issues/10438))


-
- Raise `MisconfigurationException` when `enable_progress_bar=False` and a progress bar instance has been passed in the callback list ([#10520](https://github.com/PyTorchLightning/pytorch-lightning/issues/10520))


- Moved `precision_plugin` into `Training_type_plugin` and updated reference ([#10570](https://github.com/PyTorchLightning/pytorch-lightning/pull/10570))


-
Expand Down Expand Up @@ -127,9 +130,15 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Removed deprecated `Trainer.train_loop` property in favor of `Trainer.fit_loop` ([#10482](https://github.com/PyTorchLightning/pytorch-lightning/pull/10482))


- Removed deprecated `disable_validation` property from Trainer ([#10450](https://github.com/PyTorchLightning/pytorch-lightning/pull/10450))


- Removed deprecated `CheckpointConnector.hpc_load` property in favor of `CheckpointConnector.restore` ([#10525](https://github.com/PyTorchLightning/pytorch-lightning/pull/10525))


- Removed deprecated `reload_dataloaders_every_epoch` from `Trainer` in favour of `reload_dataloaders_every_n_epochs` ([#10481](https://github.com/PyTorchLightning/pytorch-lightning/pull/10481))


- Removed `precision_plugin` from `Accelerator` in favor of `precision_plugin` in `training_type_plugin` ([#10570](https://github.com/PyTorchLightning/pytorch-lightning/pull/10570))


Expand All @@ -141,15 +150,26 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Fixed `CombinedLoader` and `max_size_cycle` didn't receive a `DistributedSampler` ([#10374](https://github.com/PyTorchLightning/pytorch-lightning/issues/10374))


- Fixed scripting causing false positive deprecation warnings ([#10470](https://github.com/PyTorchLightning/pytorch-lightning/pull/10470), [#10555](https://github.com/PyTorchLightning/pytorch-lightning/pull/10555))


- Fixed `isinstance` not working with `init_meta_context`, materialized model not being moved to the device ([#10493](https://github.com/PyTorchLightning/metrics/pull/10493))


- Fixed an issue that prevented the Trainer to shutdown workers when execution is interrupted due to failure([#10463](https://github.com/PyTorchLightning/pytorch-lightning/issues/10463))


- Squeeze the early stopping monitor to remove empty tensor dimensions ([#10461](https://github.com/PyTorchLightning/pytorch-lightning/issues/10461))


-
- Fixed sampler replacement logic with `overfit_batches` to only replace the sample when `SequentialSampler` is not used ([#10486](https://github.com/PyTorchLightning/pytorch-lightning/issues/10486))


- Fixed propagation of device and dtype information to submodules of LightningLite when they inherit from `DeviceDtypeModuleMixin` ([#10559](https://github.com/PyTorchLightning/pytorch-lightning/issues/10559))


-

## [1.5.1] - 2021-11-09

### Fixed
Expand Down
2 changes: 1 addition & 1 deletion docs/source/_templates/layout.html
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
{% block footer %}
{{ super() }}
<script script type="text/javascript">
var collapsedSections = ['Best practices', 'Lightning API', 'Optional extensions', 'Tutorials', 'API References', 'Bolts', 'Examples', 'Partner Domain Frameworks', 'Community'];
var collapsedSections = ['Best practices', 'Optional extensions', 'Tutorials', 'API References', 'Bolts', 'Examples', 'Partner Domain Frameworks', 'Community'];
</script>

{% endblock %}
2 changes: 1 addition & 1 deletion pl_examples/loop_examples/kfold.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ def on_run_end(self) -> None:
voting_model = EnsembleVotingModel(type(self.trainer.lightning_module), checkpoint_paths)
voting_model.trainer = self.trainer
# This requires to connect the new model and move it the right device.
self.trainer.accelerator.connect(voting_model)
self.trainer.training_type_plugin.connect(voting_model)
self.trainer.training_type_plugin.model_to_device()
self.trainer.test_loop.run()

Expand Down
2 changes: 1 addition & 1 deletion pl_examples/loop_examples/yielding_training_step.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ def _training_step(self, generator):
# Here, instead of calling `lightning_module.training_step()`
# we call next() on the generator!
training_step_output = next(generator)
self.trainer.accelerator.post_training_step()
self.trainer.training_type_plugin.post_training_step()

training_step_output = self.trainer.call_hook("training_step_end", training_step_output)

Expand Down
17 changes: 12 additions & 5 deletions pytorch_lightning/core/lightning.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,8 @@ def __init__(self, *args: Any, **kwargs: Any) -> None:
self._param_requires_grad_state = {}
self._metric_attributes: Optional[Dict[int, str]] = None
self._should_prevent_trainer_and_dataloaders_deepcopy: bool = False
# TODO: remove after the 1.6 release
self._running_torchscript = False

self._register_sharded_tensor_state_dict_hooks_if_available()

Expand Down Expand Up @@ -1893,6 +1895,8 @@ def to_torchscript(
"""
mode = self.training

self._running_torchscript = True

if method == "script":
torchscript_module = torch.jit.script(self.eval(), **kwargs)
elif method == "trace":
Expand All @@ -1918,6 +1922,8 @@ def to_torchscript(
with fs.open(file_path, "wb") as f:
torch.jit.save(torchscript_module, f)

self._running_torchscript = False

return torchscript_module

@property
Expand All @@ -1927,11 +1933,12 @@ def model_size(self) -> float:
Note:
This property will not return correct value for Deepspeed (stage 3) and fully-sharded training.
"""
rank_zero_deprecation(
"The `LightningModule.model_size` property was deprecated in v1.5 and will be removed in v1.7."
" Please use the `pytorch_lightning.utilities.memory.get_model_size_mb`.",
stacklevel=5,
)
if not self._running_torchscript: # remove with the deprecation removal
rank_zero_deprecation(
"The `LightningModule.model_size` property was deprecated in v1.5 and will be removed in v1.7."
" Please use the `pytorch_lightning.utilities.memory.get_model_size_mb`.",
stacklevel=5,
)
return get_model_size_mb(self)

def add_to_queue(self, queue: torch.multiprocessing.SimpleQueue) -> None:
Expand Down
6 changes: 5 additions & 1 deletion pytorch_lightning/core/mixins/device_dtype_mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
import torch
from torch.nn import Module

import pytorch_lightning as pl


class DeviceDtypeModuleMixin(Module):
__jit_unused_properties__ = ["device", "dtype"]
Expand Down Expand Up @@ -177,7 +179,9 @@ def __update_properties(
self, device: Optional[torch.device] = None, dtype: Optional[Union[str, torch.dtype]] = None
) -> None:
def apply_fn(module: Union["DeviceDtypeModuleMixin", Module]) -> None:
if not isinstance(module, DeviceDtypeModuleMixin):
# TODO: Find why `isinstance(module, DeviceDtypeModuleMixin)` doesn't
# work when using `init_meta_context`.
if not isinstance(module, (DeviceDtypeModuleMixin, pl.LightningModule)):
return
if device is not None:
module._device = device
Expand Down
3 changes: 2 additions & 1 deletion pytorch_lightning/lite/wrappers.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
from torch.utils.data import DataLoader

from pytorch_lightning.accelerators import Accelerator
from pytorch_lightning.core.mixins import DeviceDtypeModuleMixin
from pytorch_lightning.plugins import PrecisionPlugin
from pytorch_lightning.utilities.apply_func import apply_to_collection, move_data_to_device

Expand Down Expand Up @@ -64,7 +65,7 @@ def step(self, closure: Optional[Callable] = None) -> None:
)


class _LiteModule(nn.Module):
class _LiteModule(DeviceDtypeModuleMixin):
def __init__(self, module: nn.Module, precision_plugin: PrecisionPlugin) -> None:
"""The LiteModule is a thin wrapper around the :class:`torch.nn.Module` and handles precision / autocast
automatically for the forward pass.
Expand Down
2 changes: 2 additions & 0 deletions pytorch_lightning/loggers/tensorboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,9 @@ def log_graph(self, model: "pl.LightningModule", input_array=None):

if input_array is not None:
input_array = model._apply_batch_transfer_handler(input_array)
model._running_torchscript = True
self.experiment.add_graph(model, input_array)
model._running_torchscript = False
else:
rank_zero_warn(
"Could not log computational graph since the"
Expand Down
21 changes: 13 additions & 8 deletions pytorch_lightning/plugins/training_type/deepspeed.py
Original file line number Diff line number Diff line change
Expand Up @@ -620,11 +620,6 @@ def _format_batch_size_and_grad_accum_config(self):
)
self.config["gradient_accumulation_steps"] = self.lightning_module.trainer.accumulate_grad_batches
if "train_micro_batch_size_per_gpu" not in self.config:
rank_zero_warn(
"Inferring the batch size for internal deepspeed logging from the `train_dataloader()`. "
"If you require skipping this, please pass "
"`Trainer(strategy=DeepSpeedPlugin(logging_batch_size_per_gpu=batch_size))`"
)
batch_size = self._auto_select_batch_size()
self.config["train_micro_batch_size_per_gpu"] = batch_size
if "gradient_clipping" not in self.config:
Expand All @@ -636,9 +631,19 @@ def _auto_select_batch_size(self):
batch_size = 1
train_dl_source = self.lightning_module.trainer._data_connector._train_dataloader_source
if train_dl_source.is_defined():
train_dataloader = train_dl_source.dataloader()
if hasattr(train_dataloader, "batch_sampler"):
batch_size = train_dataloader.batch_sampler.batch_size
try:
train_dataloader = train_dl_source.dataloader()
if hasattr(train_dataloader, "batch_sampler"):
batch_size = train_dataloader.batch_sampler.batch_size
# broad exception on purpose as `source.dataloader()` will fail if the dataloader requires `setup`
# to have been called before
except Exception:
if self.global_rank == 0:
deepspeed.utils.logging.logger.warning(
"Tried to infer the batch size for internal deepspeed logging from the `train_dataloader()`. "
"To ensure DeepSpeed logging remains correct, please manually pass the plugin with the "
"batch size, `Trainer(strategy=DeepSpeedPlugin(logging_batch_size_per_gpu=batch_size))`."
)
return batch_size

def _format_precision_config(self):
Expand Down
20 changes: 12 additions & 8 deletions pytorch_lightning/plugins/training_type/ipu.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,21 +238,25 @@ def to_tensor(x):
args = apply_to_collection(args, dtype=(int, float), function=to_tensor)
return args

def training_step(self, *args, **kwargs):
def _step(self, stage: RunningStage, *args: Any, **kwargs: Any):
args = self._prepare_input(args)
return self.poptorch_models[RunningStage.TRAINING](*args, **kwargs)
poptorch_model = self.poptorch_models[stage]
self.lightning_module._running_torchscript = True
out = poptorch_model(*args, **kwargs)
self.lightning_module._running_torchscript = False
return out

def training_step(self, *args, **kwargs):
return self._step(RunningStage.TRAINING, *args, **kwargs)

def validation_step(self, *args, **kwargs):
args = self._prepare_input(args)
return self.poptorch_models[RunningStage.VALIDATING](*args, **kwargs)
return self._step(RunningStage.VALIDATING, *args, **kwargs)

def test_step(self, *args, **kwargs):
args = self._prepare_input(args)
return self.poptorch_models[RunningStage.TESTING](*args, **kwargs)
return self._step(RunningStage.TESTING, *args, **kwargs)

def predict_step(self, *args, **kwargs):
args = self._prepare_input(args)
return self.poptorch_models[RunningStage.PREDICTING](*args, **kwargs)
return self._step(RunningStage.PREDICTING, *args, **kwargs)

def teardown(self) -> None:
# undo dataloader patching
Expand Down
20 changes: 12 additions & 8 deletions pytorch_lightning/trainer/connectors/callback_connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,12 +94,9 @@ def on_trainer_init(
" bar pass `enable_progress_bar = False` to the Trainer."
)

if enable_progress_bar:
self.trainer._progress_bar_callback = self.configure_progress_bar(
progress_bar_refresh_rate, process_position
)
else:
self.trainer._progress_bar_callback = None
self.trainer._progress_bar_callback = self.configure_progress_bar(
progress_bar_refresh_rate, process_position, enable_progress_bar
)

# configure the ModelSummary callback
self._configure_model_summary_callback(enable_model_summary, weights_summary)
Expand Down Expand Up @@ -215,7 +212,9 @@ def _configure_swa_callbacks(self):
if not existing_swa:
self.trainer.callbacks = [StochasticWeightAveraging()] + self.trainer.callbacks

def configure_progress_bar(self, refresh_rate=None, process_position=0):
def configure_progress_bar(
self, refresh_rate: Optional[int] = None, process_position: int = 0, enable_progress_bar: bool = True
) -> Optional[ProgressBarBase]:
if os.getenv("COLAB_GPU") and refresh_rate is None:
# smaller refresh rate on colab causes crashes, choose a higher value
refresh_rate = 20
Expand All @@ -229,7 +228,12 @@ def configure_progress_bar(self, refresh_rate=None, process_position=0):
)
if len(progress_bars) == 1:
progress_bar_callback = progress_bars[0]
elif refresh_rate > 0:
if not enable_progress_bar:
raise MisconfigurationException(
"Trainer was configured with `enable_progress_bar=False`"
f" but found `{progress_bar_callback.__class__.__name__}` in callbacks list."
)
elif refresh_rate > 0 and enable_progress_bar:
progress_bar_callback = TQDMProgressBar(refresh_rate=refresh_rate, process_position=process_position)
self.trainer.callbacks.append(progress_bar_callback)
else:
Expand Down
8 changes: 0 additions & 8 deletions pytorch_lightning/trainer/connectors/data_connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ def on_trainer_init(
self,
check_val_every_n_epoch: int,
reload_dataloaders_every_n_epochs: int,
reload_dataloaders_every_epoch: bool,
prepare_data_per_node: Optional[bool] = None,
) -> None:
self.trainer.datamodule = None
Expand All @@ -83,13 +82,6 @@ def on_trainer_init(

self.trainer.check_val_every_n_epoch = check_val_every_n_epoch

if reload_dataloaders_every_epoch:
reload_dataloaders_every_n_epochs = int(reload_dataloaders_every_epoch)
rank_zero_deprecation(
"`reload_dataloaders_every_epoch` is deprecated in v1.4 and will be removed in v1.6."
" Please use `reload_dataloaders_every_n_epochs` in Trainer."
)

if not isinstance(reload_dataloaders_every_n_epochs, int) or (reload_dataloaders_every_n_epochs < 0):
raise MisconfigurationException(
f"`reload_dataloaders_every_n_epochs` should be an int >= 0, got {reload_dataloaders_every_n_epochs}."
Expand Down
18 changes: 9 additions & 9 deletions pytorch_lightning/trainer/data_loading.py
Original file line number Diff line number Diff line change
Expand Up @@ -438,8 +438,7 @@ def _reset_eval_dataloader(
for loader_i in range(len(dataloaders)):
loader = dataloaders[loader_i]

if hasattr(loader, "sampler") and isinstance(loader.sampler, RandomSampler):

if hasattr(loader, "sampler") and not isinstance(loader.sampler, SequentialSampler):
# when overfitting, the dataloader should not have sampler
if self.overfit_batches > 0 and mode.evaluating:
rank_zero_warn(
Expand Down Expand Up @@ -591,16 +590,17 @@ def _add_sampler_metadata_collate(dataloader: DataLoader) -> None:

@staticmethod
def _resolve_overfit_batches(dataloader: Collection[DataLoader]) -> Collection[DataLoader]:
has_random_sampler = False
all_have_sequential_sampler = True

def resolve_had_random_sampler(dataloader: DataLoader):
nonlocal has_random_sampler
if not has_random_sampler:
has_random_sampler = isinstance(dataloader.sampler, RandomSampler)
def resolve_has_no_sequential_sampler(dataloader: DataLoader):
nonlocal all_have_sequential_sampler
all_have_sequential_sampler = all_have_sequential_sampler & isinstance(
dataloader.sampler, SequentialSampler
)

apply_to_collection(dataloader, DataLoader, resolve_had_random_sampler)
apply_to_collection(dataloader, DataLoader, resolve_has_no_sequential_sampler)

if has_random_sampler:
if not all_have_sequential_sampler:
rank_zero_warn(
"You requested to overfit but enabled training dataloader shuffling."
" We are turning off the training dataloader shuffling for you."
Expand Down
Loading

0 comments on commit 3be2ac6

Please sign in to comment.