Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Performance Variability Across Nodes in Spark Cluster with Version 0.5.0 #552

Open
ewan0x79 opened this issue Jul 22, 2024 · 1 comment

Comments

@ewan0x79
Copy link

I've been using version 0.5.0 and observed some performance inconsistencies across different nodes in my Spark cluster. Specifically, some nodes execute tasks significantly faster than others, with the difference in execution times ranging from tens to thousands of times slower on certain nodes.
Given this situation, I'm curious to know if there are any CPU-specific optimizations made during the compilation of this library. For instance, are there optimizations that favor Intel CPUs over AMD CPUs, which might explain the observed performance disparity?
Any insights or suggestions on this matter would be greatly appreciated.

@Craigacp
Copy link
Collaborator

Craigacp commented Jul 22, 2024

TensorFlow will optimize things based on the available CPU instructions, so if you have Intel Xeons with AVX-512 and older AMD Epycs without AVX-512 then you'll get a lot faster matrix multiplies and convolution operations on the Intel CPUs. I think we compile against AVX 1, but it pulls in MKL for matrix operations and that has fast paths for more complicated vector instructions. As MKL is made by Intel it might also favour their CPUs in other ways, but we don't have much control over that.

Tens to thousands of times slower doesn't sound right though, typically I'd expect AVX-512 to result in at most a 2x speedup over AVX 2. Are there other differences between these nodes beyond the CPU?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants