-
Notifications
You must be signed in to change notification settings - Fork 222
This is Awesome #50
Comments
Hi Lee I'm glad you found these binaries helpful. As far as plans for AVX512F go, I'm not sure if all the major CPUs support those instructions so I don't know if making them a default would be a good idea. Same goes for MKL. Let me know what problem you are facing with compiling it locally and we can try to figure it out. If that doesn't work, I can give AVX512F and MKL a shot on my machine. |
I appreciate the response! I am working on a Mac with a 18-core Xenon W processor, so I'm hopeful it would support AV512F and MKL. But, compiling from source has been extremely difficult. It would go through all the downloads, but then fail to create the tmp folder to put the .whl binary in. I tried multiple guides, but this one comes closest to the approach I implemented (I modified the bash script to include mkl, but it failed with and without it all the same). Error messages were very difficult to interpret, but included some such as:
and
Thanks again for putting together such a fantastic set of binaries. |
Okay, I'll give it a shot locally. What python are you using 2 or 3? |
I use python 3, but can switch if helpful. |
Cool, I'll post the binaries if I'm able to build them |
Have you tried |
Did you give this a try? I have been trying to build with MKL but haven't had any success :( |
I was able to compile it! I can send it over if you'd like a copy. The parameters that worked are as follows:
Next, I made this .sh script, that I moved to the TensorFlow directory:
I then run it with two lines:
Though that said - the .whl I got from this page, and the conda install tensorflow-mkl -c defaults still lead to faster models. I need to tweak the parameters, but I'm close. Once I've got a shell script that works, I can send over a .whl for folks with xeon processors. |
That would be great. Thanks! |
Though head-to-head comparison with your MacOS Mojave 3.6.0 build (without AVX512F) is still twice as fast as this file on the same xeon processor (word embedding takes 3 minutes on the file on this page, but 6 minutes with the version I compiled that should have more cpu features). I am not sure why the build on your page is so much better despite missing a key optimization. Even more confusing - the 3 minutes with your build per epoch is with CPU utilization < 50%, yet the 6 minutes for the same epoch on this build uses > 95%. So the one on your page is far better on all counts. |
I've been doing some final tests to evaluate the effects these different builds have with the mnist dataset. Here I am using R to evaluate, but it's just a front end really.
And finally, using plaid-ml with a Radeon Vega 64 card with Metal on the same machine. To do this, install plaid-ml and use this code snippet in R:
If you prefer python, use this code snippet instead:
People aren't kidding when they say graphics cards are the future of deep learning. But, CPU optimization certainly helps. I've learned though that plaid-ml cannot be relied on for natural language processing, as the model loss NaN's out after a couple epochs. For whatever reason, CPU builds are more likely to be successful despite their low speed (thus far). |
This is fantastic! I spent two days trying to compile TensorFlow on my one to no avail - these precompiled files were a lifesaver!
I wanted to ask about a couple things (for Mac):
Any plans to add AVX512F?
Can MKL support be added too (for all builds)?
Thanks,
The text was updated successfully, but these errors were encountered: