Val sanity steps affect seed reproducibility of train shuffling #6988

addisonklinke · 2021-04-13T16:11:38Z

🐛 Bug

Despite using a separate val dataset, the default sanity val steps affect the indices returned by the train dataloader when shuffling is enabled. The training still executes fine, but this makes it impossible to replicate the exact results of a vanilla PyTorch training (which likely doesn't do a sanity val check) using the same random seed and default trainer settings.

Currently, you can work around this by setting trainer.num_sanity_val_steps=0 but I think it's bad practice to encourage that. Or you might not care about reproducing exact results, however I find that is a crucial step in refactoring from vanilla PyTorch to Lightning. Otherwise, you don't know whether it's your random seeds causing a performance difference or something more serious with the optimizer(s), loss, scheduler(s), data loaders, etc.

To Reproduce

Run this Colab notebook with the BoringModel

Expected behavior

The shuffling of train batches should not be affected by whether we've run sanity steps with the val data loader

Environment

Automated output provided by Colab notebook

CUDA:
- GPU: Tesla T4
- available: True
- version: 10.1
Packages:
- numpy: 1.18.5
- pyTorch_debug: False
- pyTorch_version: 1.6.0+cu101
- pytorch-lightning: 0.10.0
- tqdm: 4.41.1
System:
- OS: Linux
- architecture: 64bit
- processor: x86_64
- python: 3.6.9
- version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020

Additional context

This is my first time working with Colab, so let me know if there are any permissions or other issues that need to be resolved

The text was updated successfully, but these errors were encountered:

addisonklinke added bug Something isn't working help wanted Open to be worked on labels Apr 13, 2021

awaelchli self-assigned this Apr 13, 2021

tchaton added the priority: 0 High priority task label Apr 14, 2021

awaelchli mentioned this issue Apr 14, 2021

Fixed num_sanity_val_steps affecting reproducibility of training data shuffling #7014

Merged

11 tasks

kaushikb11 closed this as completed in #7014 Apr 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Val sanity steps affect seed reproducibility of train shuffling #6988

Val sanity steps affect seed reproducibility of train shuffling #6988

addisonklinke commented Apr 13, 2021

Val sanity steps affect seed reproducibility of train shuffling #6988

Val sanity steps affect seed reproducibility of train shuffling #6988

Comments

addisonklinke commented Apr 13, 2021

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context