Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Val sanity steps affect seed reproducibility of train shuffling #6988

Closed
addisonklinke opened this issue Apr 13, 2021 · 0 comments · Fixed by #7014
Closed

Val sanity steps affect seed reproducibility of train shuffling #6988

addisonklinke opened this issue Apr 13, 2021 · 0 comments · Fixed by #7014
Assignees
Labels
bug Something isn't working help wanted Open to be worked on priority: 0 High priority task

Comments

@addisonklinke
Copy link

🐛 Bug

Despite using a separate val dataset, the default sanity val steps affect the indices returned by the train dataloader when shuffling is enabled. The training still executes fine, but this makes it impossible to replicate the exact results of a vanilla PyTorch training (which likely doesn't do a sanity val check) using the same random seed and default trainer settings.

Currently, you can work around this by setting trainer.num_sanity_val_steps=0 but I think it's bad practice to encourage that. Or you might not care about reproducing exact results, however I find that is a crucial step in refactoring from vanilla PyTorch to Lightning. Otherwise, you don't know whether it's your random seeds causing a performance difference or something more serious with the optimizer(s), loss, scheduler(s), data loaders, etc.

To Reproduce

Run this Colab notebook with the BoringModel

Expected behavior

The shuffling of train batches should not be affected by whether we've run sanity steps with the val data loader

Environment

Automated output provided by Colab notebook

  • CUDA:
    • GPU: Tesla T4
    • available: True
    • version: 10.1
  • Packages:
    • numpy: 1.18.5
    • pyTorch_debug: False
    • pyTorch_version: 1.6.0+cu101
    • pytorch-lightning: 0.10.0
    • tqdm: 4.41.1
  • System:
    • OS: Linux
    • architecture: 64bit
    • processor: x86_64
    • python: 3.6.9
    • version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020

Additional context

This is my first time working with Colab, so let me know if there are any permissions or other issues that need to be resolved

@addisonklinke addisonklinke added bug Something isn't working help wanted Open to be worked on labels Apr 13, 2021
@awaelchli awaelchli self-assigned this Apr 13, 2021
@tchaton tchaton added the priority: 0 High priority task label Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on priority: 0 High priority task
Projects
None yet
3 participants