Skip to content

Distributed optimizer reduces GPT embedding grads in FP32 #841

Distributed optimizer reduces GPT embedding grads in FP32

Distributed optimizer reduces GPT embedding grads in FP32 #841

Annotations

2 warnings

The logs for this run have expired and are no longer available.