You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Use a distributed computing setup
Ensure cache==True for make_dataset
See error
Expected behavior
No error. If the dataset is not already cached, it should be created then cached under the unique hash. If it exists, the hash should be recognised and the dataset loaded.
Environment (please complete the following information):
OS: Ubuntu
Version: 0.27.0
Python Version: 3.11
Packages: pytorch==2.1.2
The text was updated successfully, but these errors were encountered:
Pale-Blue-Dot-97
changed the title
Tries to cache dataset to existing file in distributed computing
Bug in caching dataset with existing file error in distributed computing
Jan 23, 2024
What appears to be happening here is that each process of a distributed process group sees that the dataset requested does not exist, hence they all try to independently create and they cache the dataset. As this will be in the same location, a conflict arises when the slower processes try caching to a now extant dataset
The solution is to ensure that only process 0 attempts to create the dataset. All other processes should then wait until 0 is finished, then they can load the dataset from the new cache.
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
cache==True
formake_dataset
Expected behavior
No error. If the dataset is not already cached, it should be created then cached under the unique hash. If it exists, the hash should be recognised and the dataset loaded.
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: