Parallelize Meg CUDA Kernel build system #174

stas00 · 2021-10-29T15:35:50Z

It takes forever to build the Meg cuda kernels as it does it sequentially and doesn't take advantage of multiple cores. It takes some 5 minutes to build. And every time one changes the number of gpus it rebuilds itself, which is both very non-productive and it also makes the CI really slow.

Need to rewrite the build to parallelize it.

Sidenotes: apex and deepspeed have this too, but deepspeed supports make -j

And ideally the solution needs to come from pytorch, perhaps if we solve it generically we could upstream the solution to pytorch core.

The text was updated successfully, but these errors were encountered:

stas00 added Good First Issue Good for newcomers Good Difficult Issue For complex tasks labels Oct 29, 2021

stas00 mentioned this issue Nov 4, 2021

Support skip iteration flag #177

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize Meg CUDA Kernel build system #174

Parallelize Meg CUDA Kernel build system #174

stas00 commented Oct 29, 2021 •

edited

Loading

Parallelize Meg CUDA Kernel build system #174

Parallelize Meg CUDA Kernel build system #174

Comments

stas00 commented Oct 29, 2021 • edited Loading

stas00 commented Oct 29, 2021 •

edited

Loading