You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It takes forever to build the Meg cuda kernels as it does it sequentially and doesn't take advantage of multiple cores. It takes some 5 minutes to build. And every time one changes the number of gpus it rebuilds itself, which is both very non-productive and it also makes the CI really slow.
Need to rewrite the build to parallelize it.
Sidenotes: apex and deepspeed have this too, but deepspeed supports make -j
And ideally the solution needs to come from pytorch, perhaps if we solve it generically we could upstream the solution to pytorch core.
The text was updated successfully, but these errors were encountered:
It takes forever to build the Meg cuda kernels as it does it sequentially and doesn't take advantage of multiple cores. It takes some 5 minutes to build. And every time one changes the number of gpus it rebuilds itself, which is both very non-productive and it also makes the CI really slow.
Need to rewrite the build to parallelize it.
Sidenotes: apex and deepspeed have this too, but deepspeed supports
make -j
And ideally the solution needs to come from pytorch, perhaps if we solve it generically we could upstream the solution to pytorch core.
The text was updated successfully, but these errors were encountered: