ZeRO1: Add bucketting logic to control the size of tensors for all-gather/reduce-scatter #6540
Job | Run time |
---|---|
1h 2m 39s | |
0s | |
11m 39s | |
9m 53s | |
16m 46s | |
14m 21s | |
25m 13s | |
16m 44s | |
1h 13m 54s | |
19m 14s | |
9m 40s | |
48m 13s | |
11m 47s | |
53m 45s | |
2h 4m 47s | |
25m 0s | |
8h 43m 35s |