-
-
Notifications
You must be signed in to change notification settings - Fork 14.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pythonPackages.tensorflow: fix for bazel settings for intel mkl, dnnl #69454
Conversation
This is blocked on #68014, which has already been reviewed by people with commit access but for some reason not merged. I don't have time to review that right now. This PR looks good to me, as long as mkl and dnnl are open source this should be a clear improvement :) |
Although it would be nice if you could point out where exactly those guidelines are. You just linked to the benchmarks folder? |
@timokau It looks like Google pulled that page since releasing Tensorflow 2.0 and now is redirecting it. It's possible to use wayback to look at the historical page, but here is Intel's page on the same topic. And, FWIW, it seems Anaconda now defaults to using MKL. Having these libraries should generally improve performance for those using CPUs. While the MKL-DNN would generally qualify under most definitions of free (>= 1.x), the core MKL is shipped as binary so would not be available in Nix by default and has to be enabled per the documentation. For many people and Hydra, this change will be a no-op. But for those performance focused, it would be easily available in a similar way to PyTorch and other libraries once they enable MKL in their site configuration. It looks like you have 1.15 and 2.0 PR updates in progress, so I'm fine with this being rolled into those (I just don't want it to get lost). Tensorflow 2.0 now enables Cuda by default and I'm unsure of the interaction with MKL and the Cuda piece, so it may be that MKL should only be enabled when Cuda is not enabled. How you want to handle this depends on how you're planning to approach the "enabled by default" for Tensorflow 2.0 in Nix. |
Ah right, I was in a bit of a hurry and didn't realize that users have to opt-in for mkl. Then the only downside I see here is that people opting in for mkl will have to rebuild tensorflow from source, which can take quite some time. Its probably a reasonable assumption that people that care enough for performance to enable mkl are willing to do this, but we should still give people that only want mkl for numpy (which doesn't take quite as long to rebuild) a way out. So I suggest that instead of directly depending on Once that is done and the dnnl PR is merged I see no reason to hold this back. Doesn't have to be merged with the 1.15 update. That update could be blocked for as long as upstream needs to adjust tensorflow for bazel 1.0. Its a mystery to me why bazel would make its first major release without making sure that at least its biggest public consumers don't break, but that is a different question. I think there should be no bad interaction with cuda, although if you have the hardware it would be nice if you could test that. The "cuda enabled by default" question is a difficult one, which I'll postpone to the actual 2.0 update (keeping it disabled for now). |
I suppose tensorflow cannot work with MKL if numpy is built without? And why would you build without MKL if numpy is built with already? Note one has to perform a rebuild anyway because the hash of numpy changes when enabling MKL. |
I'm not sure.
Right, good point. So in that case this PR should be fine as-is, as soon as the dependency is merged. |
You probably don't want to mix math libraries in your Python stack. If MKL is used, it should be applied consistently.
I attempted to test the You reference this in the WIP PRs, but I think it affects even the master branch. |
Right, we'll have to wait for upstream on this. Looks like someone is on it. Of course if you want to take a shot at fixing this yourself, feel free to :) |
Note there is a PR for Bazel 1.1. |
Bazel has promised to adhere to semver (and leave at least 3 months between breaking changes), so that shouldn't make the situation worse than it is. I would've preferred blocking the 1.0 update on tensorflow compatibility, but now I hope that the upstream support will come soon. |
I'm also a bit surprised tensorflow was taken by surprise on that. Good a fix is in sight. On the other hand, I also want to note bazel was merged to |
Is this still needed? Note there's a merge conflict. |
Yes, this is still desirable and it looks like the prerequisites should now be in place to allow forward progress. If you have time to review, I'll prioritize resolving the PR conflict over the coming week. |
a8fc881
to
0c1e158
Compare
ce3211a
to
5006c71
Compare
@FRidh Merge conflict resolved. Retested for cpu, cpu-mkl, and cuda-mkl on Linux. This is ready for review.
|
I will apply this to both versions of tensorflow after the upgrade. |
5006c71
to
47f4533
Compare
47f4533
to
ac6e8b2
Compare
ac6e8b2
to
30bda2a
Compare
Thanks for the rebase. Did you ever try |
No, I followed the instructions from here: https://www.tensorflow.org/install/source I believe |
30bda2a
to
4a4b448
Compare
I dunno why ofborg didn't run; this is definitely fine though---I changed the assert to use the |
@Ericson2314 IIRC mkl has a unfree license |
and recently, ofborg is very slow (10+ hrs) in eval'ing a PR |
It's indeed been accumulating jobs https://monitoring.nix.ci/d/000000004/evaluation-jobs-over-last-7-days?orgId=1&from=1585505457363&to=1585678257363 |
Motivation for this change
Nix Tensorflow does not support optionally using the Intel provided optimizations for Intel MKL, which must be enabled through additional compile flags.
On modern Intel processors, this will generally provide improved performance and is recommended in Tensorflow's performance guide.
NOTE: This change depends on MKL-DNN, the open PR found here: #68014
I've tested the changes on Linux with the above mentioned PR applied, both with an without Intel MKL enabled to confirm it works in both cases. I do not have access to test on darwin, but believe that redirects to binary versions. I would appreciate if someone could test with cuda, as this may need to be disabled when cudaSupport is enabled (I can make the change, but not test that myself).
With MKL enabled tensorflow.python.framework.test_util.IsMklEnabled() returns True
Things done
sandbox
innix.conf
on non-NixOS)nix-shell -p nix-review --run "nix-review wip"
./result/bin/
)nix path-info -S
before and after)Notify maintainers
cc @jyp @abbradar