You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I use datasets with 600GB data, the dataloading speed increases significantly.
I am experimenting with two datasets, and one is about 60GB and the other 600GB.
Simply speaking, my code uses datasets.set_format("torch") function and let pytorch-lightning handle ddp training.
When looking at the pytorch-lightning supported profile of two different runs, I see that fetching a batch(get_train_batch) consumes an unreasonable amount of time when data is large. What could be the cause?
Hi ! Yes this is an issue with datasets<=1.5.0
This issue has been fixed by #2122 , we'll do a new release soon :)
For now you can test it on the master branch.
Hi,
When I use datasets with 600GB data, the dataloading speed increases significantly.
I am experimenting with two datasets, and one is about 60GB and the other 600GB.
Simply speaking, my code uses
datasets.set_format("torch")
function and let pytorch-lightning handle ddp training.When looking at the pytorch-lightning supported profile of two different runs, I see that fetching a batch(
get_train_batch
) consumes an unreasonable amount of time when data is large. What could be the cause?The text was updated successfully, but these errors were encountered: