Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[air] Increase global batch size for air_benchmark_tensorflow_mnist_g…
…pu_4x4 (#31402) The benchmark currently times out, because a single run takes over 16 minutes. This is a regression compared to e.g. the 2.0.0 release, where a run took only 4 minutes. Upon closer investigation, this seems to be related to severe underutilization of the GPU. With a small batch size, we are bound by compute/data iteration, and this seems to have been regressed in later tensorflow versions. To achieve shorter training times, we increase the global batch size from 64 (which is tiny) to 1024. This severely speeds up training (even though the GPUs are still underutilized with <10% utilizaton). Signed-off-by: Kai Fricke <kai@anyscale.com>
- Loading branch information