-
Hello Dr. Hie, I encountered a similar problem regarding a super long training time while trying to retrain your model. I had 2 Nvidia Tesla A100 40GB memory graphic cards, AMD EPYC 7763 CPU, and I allocated 450GB RAM. I used one GPU for PBS job submitting and left the other for tracking the graphic card usage. Initially, when I didn't change anything but just ran I used It seemed like the GPU memory was in full usage. I'm not sure where the problem was. Any suggestions will be super helpful. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
Dear Dr. Hie, Turns out what I need is the most up-to-date version of tensorflow-gpu (2.8.0). After I upgraded necessary packages instead of using specific versions, the training time fell into an acceptable range. Hopefully this may help others with the same issue with GPU. Best Regards |
Beta Was this translation helpful? Give feedback.
Dear Dr. Hie,
Turns out what I need is the most up-to-date version of tensorflow-gpu (2.8.0). After I upgraded necessary packages instead of using specific versions, the training time fell into an acceptable range. Hopefully this may help others with the same issue with GPU.
Best Regards