-
Notifications
You must be signed in to change notification settings - Fork 5.5k
GPU utilization become zero after long term training #19
Comments
python2 tools/train_net.py problem the GPU utilization become zero, however CPU is using a lot resource. |
terminate called after throwing an instance of 'caffe2::EnforceNotMet' |
same problem... |
@gaopeng-eugene, @terrychenism: can you try switching to the NCCL implementation of AllReduce to see if that resolves the problem? Instructions for building Caffe2 with NCCL support and enabling NCCL in Detectron can be found in #32. |
fix it by using UCCL |
The text was updated successfully, but these errors were encountered: