Parallelize data loading #32

nshaud · 2020-11-25T16:54:03Z

Currently, the torch DataLoader uses blocking data loading. Although loading is very fast (we store the NumPy arrays in-memory), transfer to GPU and data augmentation (which is done on CPU) can slow things done.

Using workers > 0 would make data loading asynchronous and workers > 1 could increase speed somewhat.

TODO:

Benchmark speed gain using asynchronous data loading
Implement asynchronous data loading for all DataLoader objects
Add a user-input option to define the number of jobs

nshaud self-assigned this Nov 25, 2020

nshaud added the enhancement New feature or request label Nov 25, 2020

nshaud added this to the 0.1.0 milestone Nov 25, 2020

nshaud mentioned this issue Nov 27, 2020

Refactor #35

Open

23 tasks

nshaud added a commit that referenced this issue Dec 2, 2020

Add N_JOBS option for parallel data loading (closes #32)

62c1f67

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize data loading #32

Parallelize data loading #32

nshaud commented Nov 25, 2020 •

edited

Loading

Parallelize data loading #32

Parallelize data loading #32

Comments

nshaud commented Nov 25, 2020 • edited Loading

nshaud commented Nov 25, 2020 •

edited

Loading