Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow easyconfig #4101

Closed
thiell opened this issue Feb 3, 2017 · 11 comments
Closed

Tensorflow easyconfig #4101

thiell opened this issue Feb 3, 2017 · 11 comments
Milestone

Comments

@thiell
Copy link
Contributor

thiell commented Feb 3, 2017

When I saw the announce of Tensorflow being available in the latest EasyBuild I was very excited. Our users are so demanding of the latest TF version that we have spent hours patching and fixing issues with the installation.

Unfortunately, the provided easyconfig is based on a wheel package, which is platform specific and doesn't work with old glibc available in RHEL/CentOS 6 for instance.

tensorflow-0.12.1-cp27-cp27mu-manylinux1_x86_64.whl is not a supported wheel on this platform.

EasyBuild helped us to resolve the required dependencies for Tensorflow but due to the frequent changes in packaging, almost at each version (and we support TF on our clusters since v0.6 I think), it is very tough to automate the process. The first challenge is to compile Bazel. Then, we had to edit third_party/gpus/crosstool/CROSSTOOL.tpl crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc because the environment is ignored and all paths are hardcoded.

Also it should be bound to CUDA as using Tensorflow only really makes sense with GPUs.

I think you should make it clear that this is a platform specific binary version of Tensorflow (.whl).

@boegel boegel added this to the 3.2.0 milestone Feb 6, 2017
@boegel
Copy link
Member

boegel commented Feb 6, 2017

@thiell Points well taken, we plan to improve on this. It's clear that Tensorflow is going to require a custom easyblock, for example. The current easyconfig is sort of a stopgap, w can definitely do a lot better.

Hopefully the build procedure is going to stabilise now that TF v1.0 is in sight.

@gppezzi also has some experience with getting TF installed in an optimised way. We hope to discuss this further during the EasyBuild User Meeting this week, and maybe even get some work done on a first draft of an easyblock for TF.

If you have any more detailed instructions to share, please do.

@rjeschmi
Copy link
Contributor

It looks a bit like we need to provide all the various sources to the build in an archive and then let bazel do its thing.

I think we can provide the source location for all the dependencies to be built under the same target install path. In the wheel you can see a lot of non python files that are carried along in the .whl file

I think I can do a custom build now for some basic example, but I don't have cuda and such to test those types of builds.

@rjeschmi
Copy link
Contributor

Also the configure script can be automated by setting environment variables.

Some of those are:

TF_NEED_OPENCL
TF_NEED_CUDA
GCC_HOST_COMPILER_PATH
TF_CUDA_VERSION
CUDA_TOOLKIT_PATH
TF_CUDNN_VERSION
CUDNN_INSTALL_PATH
TF_CUDA_COMPUTE_CAPABILITIES
HOST_CXX_COMPILER
HOST_C_COMPILER
COMPUTECPP_TOOLKIT_PATH

so the easyblock would run the configure script with these environment variables

configure also runs bazel_clean_and_fetch (which will fetch the needed external archive files), but hopefully we can provide them before that runs.

bazel build -c opt //tensorflow/tools/pip_package:build_pip_package

It is also possible that the ci build scripts will be more straightforward (deviating from the published docs, but avoiding the wheel building and then immediately installing) https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/ci_parameterized_build.sh

The wheel should be relocatable, which might be an interesting side effect

Some other info on wrapping the configure script here (where they wrap it for their docker builds I guess) https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/builds/configured

@rjeschmi
Copy link
Contributor

Some more info on getting the deps satisfied locally

tensorflow/tensorflow#5428 (comment)

We will need to patch the workspace.bzl

@verdurin
Copy link
Member

We've also had a request for TF with CUDA.

@korkinof
Copy link

Hi guys,
Yeah, this would be super useful for me too. Anyone had any progress?
Best,
D.

@boegel
Copy link
Member

boegel commented Mar 27, 2017

@korkinof Latest progress I know about is what was discussed on the EasyBuild mailing list recently, see https://lists.ugent.be/wws/arc/easybuild/2017-03/msg00140.html .

Maybe @ysagon has more info?

@gppezzi
Copy link
Contributor

gppezzi commented Mar 27, 2017

I'm not particularly proud of these easyconfig files (with CmdCp easyblock) but we have TF1.0 (both CPU and GPU versions) working on Piz Daint:

@gppezzi
Copy link
Contributor

gppezzi commented Mar 29, 2017

Now I have a version that should work with foss (see PR #4412).

Reproducibility could be an issue, since it builds its own dependencies inside the bazel workspace (if I just add protobuf as dependecy, it seems to be ignored).

Any feedback is welcome.

@boegel boegel modified the milestones: 3.2.0, 3.3.0 May 2, 2017
@boegel boegel modified the milestones: 3.3.0, 3.4.0 Jun 25, 2017
@boegel boegel modified the milestones: 3.5.0, 3.4.0 Sep 6, 2017
@boegel boegel modified the milestones: 3.5.0, next release Dec 6, 2017
@boegel boegel modified the milestones: 3.5.1, 3.6.0 Jan 11, 2018
@boegel boegel modified the milestones: 3.5.2, 3.x Feb 24, 2018
@Flamefire
Copy link
Contributor

@boegel Isn't this fixed already?

@boegel
Copy link
Member

boegel commented Nov 21, 2019

Yes, we have easyconfigs for building TensorFlow from source for a while now, so closing...

@boegel boegel closed this as completed Nov 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants