Lightning-AI · williamFalcon · Mar 5, 2020 · Mar 5, 2020 · Mar 5, 2020 · Mar 5, 2020
@@ -34,11 +34,11 @@ Log metrics
 
 To plot metrics into whatever logger you passed in (tensorboard, comet, neptune, etc...)
 
-1. Training_end, validation_end, test_end will all log anything in the "log" key of the return dict.
+1. training_step_end, validation_end, test_end will all log anything in the "log" key of the return dict.
 
 .. code-block:: python
 
-   def training_end(self, outputs):
+   def training_step_end(self, outputs):
       loss = some_loss()
       ...
 
@@ -62,7 +62,7 @@ To plot metrics into whatever logger you passed in (tensorboard, comet, neptune,
       results = {'log': logs}
       return results
 
-2. Most of the time, you only need training_step and not training_end. You can also return logs from here:
+2. Most of the time, you only need training_step and not training_step_end. You can also return logs from here:
 
 .. code-block:: python
 

@@ -26,7 +26,7 @@ Training loop
 - on_batch_start
 - tbptt_split_batch
 - training_step
-- training_end (optional)
+- training_step_end (optional)
 - backward
 - on_after_backward
 - optimizer.step()

@@ -165,12 +165,13 @@ you will only be operating on one of those pieces.
         y_0 = batch
 
 For most metrics, this doesn't really matter. However, if you want
-full batch statistics or want to use the outputs of the training_step
-to do something like a softmax, you can use the `training_end` step.
+to add something to your computational graph (like softmax)
+using all batch parts you can use the `training_step_end` step.
 
 .. code-block:: python
 
-    def training_end(self, outputs):
+    def training_step_end(self, outputs):
+        # only use when  on dp
         outputs = torch.cat(outputs, dim=1)
         softmax = softmax(outputs, dim=1)
         out = softmax.mean()
@@ -195,9 +196,43 @@ In pseudocode, the full sequence is:
         out = gpu_model(batch_split)
         all_results.append(out)
 
-    # calculate statistics for all parts of the batch
-    full out = model.training_end(all_results)
+    # use the full batch for something like softmax
+    full out = model.training_step_end(all_results)
 
+to illustrate why this is needed, let's look at dataparallel
+
+.. code-block:: python
+
+    def training_step(self, batch, batch_idx):
+        x, y = batch
+        y_hat = self.forward(batch)
+
+        # on dp or ddp2 if we did softmax now it would be wrong
+        # because batch is actually a piece of the full batch
+        return y_hat
+
+    def training_step_end(self, batch_parts_outputs):
+        # batch_parts_outputs has outputs of each part of the batch
+
+        # do softmax here
+        outputs = torch.cat(outputs, dim=1)
+        softmax = softmax(outputs, dim=1)
+        out = softmax.mean()
+
+        return out
+
+If `training_step_end` is defined it will be called regardless of tpu, dp, ddp, etc... which means
+it will behave the same no matter the backend.
+
+Validation and test step also have the same option when using dp
+
+.. code-block:: python
+
+        def validation_step_end(self, batch_parts_outputs):
+            ...
+
+        def test_step_end(self, batch_parts_outputs):
+            ...
 
 Implement Your Own Distributed (DDP) training
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@@ -1,16 +1,73 @@
 """
-A LightningModule is a strict superclass of torch.nn.Module but provides an interface to standardize
-the "ingredients" for a research or production system.
+A LightningModule organizes your PyTorch code into the following sections:
 
-- The model/system definition (__init__)
-- The model/system computations (forward)
-- What happens in the training loop (training_step, training_end)
-- What happens in the validation loop (validation_step, validation_end)
-- What happens in the test loop (test_step, test_end)
-- What optimizers to use (configure_optimizers)
-- What data to use (train_dataloader, val_dataloader, test_dataloader)
+.. figure:: /_images/lightning_module/pt_to_pl.png
+   :alt: Convert from PyTorch to Lightning
 
-Most methods are optional. Here's a minimal example.
+
+Notice a few things.
+
+1. It's the SAME code.
+
+2. The PyTorch code IS NOT abstracted - just organized.
+
+3. All the other code that didn't go in the LightningModule has been automated
+for you by the trainer
+
+.. code-block:: python
+
+    net = Net()
+    trainer = Trainer()
+    trainer.fit(net)
+
+4. There are no .cuda() or .to() calls... Lightning does these for you.
+
+.. code-block:: python
+
+    # don't do in lightning
+    x = torch.Tensor(2, 3)
+    x = x.cuda()
+    x = x.to(device)
+
+    # do this instead
+    x = x  # leave it alone!
+
+    # or to init a new tensor
+    new_x = torch.Tensor(2, 3)
+    new_x = new_x.type_as(x.type())
+
+
+5. There are no samplers for distributed, Lightning also does this for you.
+
+.. code-block:: python
+
+    # Don't do in Lightning...
+    data = MNIST(...)
+    sampler = DistributedSampler(data)
+    DataLoader(data, sampler=sampler)
+
+    # do this instead
+    data = MNIST(...)
+    DataLoader(data)
+
+
+6. A LightingModule is a torch.nn.Module but with added functionality. Use it as such!
+
+.. code-block:: python
+
+    net = Net.load_from_checkpoint(PATH)
+    net.freeze()
+    out = net(x)
+
+Thus, to use Lightning, you just need to organize your code which takes about 30 minutes,
+(and let's be real, you probably should do anyhow).
+
+------------
+
+Minimal Example
+---------------
+
+Here are the only required methods.
 
 .. code-block:: python
 
@@ -37,13 +94,13 @@ def training_step(self, batch, batch_idx):
             y_hat = self.forward(x)
             return {'loss': F.cross_entropy(y_hat, y)}
 
-        def configure_optimizers(self):
-            return torch.optim.Adam(self.parameters(), lr=0.02)
-
         def train_dataloader(self):
             return DataLoader(MNIST(os.getcwd(), train=True, download=True,
                               transform=transforms.ToTensor()), batch_size=32)
 
+        def configure_optimizers(self):
+            return torch.optim.Adam(self.parameters(), lr=0.02)
+
 Which you can train by doing:
 
 .. code-block:: python
@@ -53,7 +110,35 @@ def train_dataloader(self):
 
    trainer.fit(model)
 
-If you wanted to add a validation loop
+----------
+
+Training loop structure
+-----------------------
+
+The general pattern is that each loop (training, validation, test loop)
+has 2 methods:
+
+- ``` ___step ```
+- ``` ___epoch_end```
+
+To show how lightning calls these, let's use the validation loop as an example
+
+.. code-block:: python
+
+    val_outs = []
+    for val_batch in val_data:
+        # do something with each batch
+        out = validation_step(val_batch)
+        val_outs.append(out)
+
+    # do something with the outputs for all batches
+    # like calculate validation set accuracy or loss
+    validation_epoch_end(val_outs)
+
+Add validation loop
+^^^^^^^^^^^^^^^^^^^
+
+Thus, if we wanted to add a validation loop you would add this to your LightningModule
 
 .. code-block:: python
 
@@ -63,36 +148,96 @@ def validation_step(self, batch, batch_idx):
                 y_hat = self.forward(x)
                 return {'val_loss': F.cross_entropy(y_hat, y)}
 
-            def validation_end(self, outputs):
+            def validation_epoch_end(self, outputs):
                 val_loss_mean = torch.stack([x['val_loss'] for x in outputs]).mean()
                 return {'val_loss': val_loss_mean}
 
             def val_dataloader(self):
                 # can also return a list of val dataloaders
-                return DataLoader(MNIST(os.getcwd(), train=True, download=True,
-                                  transform=transforms.ToTensor()), batch_size=32)
+                return DataLoader(...)
 
-Or add a test loop
+Add test loop
+^^^^^^^^^^^^^
 
-.. code_block:: python
+.. code-block:: python
 
         class CoolModel(pl.LightningModule):
-
             def test_step(self, batch, batch_idx):
                 x, y = batch
                 y_hat = self.forward(x)
                 return {'test_loss': F.cross_entropy(y_hat, y)}
 
-            def test_end(self, outputs):
+            def test_epoch_end(self, outputs):
                 test_loss_mean = torch.stack([x['test_loss'] for x in outputs]).mean()
                 return {'test_loss': test_loss_mean}
 
             def test_dataloader(self):
-                # OPTIONAL
                 # can also return a list of test dataloaders
-                return DataLoader(MNIST(os.getcwd(), train=False, download=True,
-                                  transform=transforms.ToTensor()), batch_size=32)
+                return DataLoader(...)
+
+However, the test loop won't ever be called automatically to make sure you
+don't run your test data by accident. Instead you have to explicitly call:
+
+.. code-block:: python
+
+    # call after training
+    trainer = Trainer()
+    trainer.fit(model)
+    trainer.test()
+
+    # or call with pretrained model
+    model = MyLightningModule.load_from_checkpoint(PATH)
+    trainer = Trainer()
+    trainer.test(model)
+
+Training_step_end method
+------------------------
+When using dataParallel or distributedDataParallel2, the training_step
+will be operating on a portion of the batch. This is normally ok but in special
+cases like calculating NCE loss using negative samples, we might want to
+perform a softmax across all samples in the batch.
+
+For these types of situations, each loop has an additional ```__step_end``` method
+which allows you to operate on the pieces of the batch
+
+.. code-block:: python
+
+        training_outs = []
+        for train_batch in train_data:
+            # dp, ddp2 splits the batch
+            sub_batches = split_batches_for_dp(batch)
+
+            # run training_step on each piece of the batch
+            batch_parts_outputs = [training_step(sub_batch) for sub_batch in sub_batches]
+
+            # do softmax with all pieces
+            out = training_step_end(batch_parts_outputs)
+            training_outs.append(out)
+
+        # do something with the outputs for all batches
+        # like calculate validation set accuracy or loss
+        training_epoch_end(val_outs)
+
+Remove cuda calls
+-----------------
+In a LightningModule, all calls to ```.cuda()```
+and ```.to(device)``` should be removed. Lightning will do these
+automatically. This will allow your code to work on CPUs, TPUs and GPUs.
+
+When you init a new tensor in your code, just use type_as
+
+.. code-block:: python
+
+    def training_step(self, batch, batch_idx):
+        x, y = batch
+
+        # put the z on the appropriate gpu or tpu core
+        z = sample_noise()
+        z = z.type_as(x.type())
 
+Live demo
+---------
+Check out how this live demo
 Check out this
 `COLAB <https://colab.research.google.com/drive/1F_RNcHzTfFuQf-LeKvSlud6x7jXYkG31#scrollTo=HOk9c4_35FKg>`_
 for a live demo.