Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow 2.3 #24

Open
marcelotrevisani opened this issue Oct 16, 2020 · 16 comments
Open

Tensorflow 2.3 #24

marcelotrevisani opened this issue Oct 16, 2020 · 16 comments

Comments

@marcelotrevisani
Copy link

Hello folks,

Do you have any news regarding tensorflow 2.3? or a perspective when it might be available on the main channel?

@marcelotrevisani
Copy link
Author

friendly ping on @katietz as he was the last person to modify the recipe :)

@roebel
Copy link

roebel commented Nov 25, 2020

The packages for CPU only are available since quite a while - I wonder whether there is a problem with the
packages for GPU? Are these to arrive or is GPU support dropped ?

Thanks

tensorflow                     2.3.0 eigen_py37h189e6a2_0  pkgs/main           
tensorflow                     2.3.0 eigen_py38h71ff20e_0  pkgs/main           
tensorflow                     2.3.0 mkl_py37h0481017_0  pkgs/main           
tensorflow                     2.3.0 mkl_py38hd53216f_0  pkgs/main   

@npanpaliya
Copy link

Please have a look at this ContinuumIO/anaconda-issues#11967 (comment).

@0x1997
Copy link

0x1997 commented Dec 23, 2020

Any updates on tensorflow 2.4? Is it also blocked on ContinuumIO/anaconda-issues#11967?

@npanpaliya
Copy link

@0x1997 The project we are working on (Open-CE as mentioned in one of the related threads by @jayfurmanek), is about to publish another release which includes conda recipe for TF 2.4 (both GPU and CPU). For TF's conda recipe, you can refer to https://github.com/open-ce/tensorflow-feedstock.

@katietz
Copy link
Contributor

katietz commented Mar 1, 2021

I updated to tensorflow 2.4.1 for linux-64. The rc binaries can be found in my private channel 'ktietz' for testing. I will continue on Windows and MacOS builds soon too.

@katietz
Copy link
Contributor

katietz commented Mar 1, 2021

As side-note. New version supports eigen, mkl, and gpu version for linux-64.

@roebel
Copy link

roebel commented Mar 1, 2021

I installed from your channel and this seems to work for me with python 3.7. I just loaded tensorflow for the moment and had it report the visible devices. That worked fine. I will put it into regular use over the following days and let you know if I find anything.

Many thanks for the update!

@roebel
Copy link

roebel commented Mar 8, 2021

It mostly works fine, but this is an issue

WARNING:tensorflow:AutoGraph could not transform <bound method PulseWaveTable._linear_lookup of <tensorflow.python.eager.function.TfMethodTarget object at 0x7f1f4d18c610>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'

I found this thread serge-sans-paille/gast#53 explaining the problem is using gast=0.4.0 while tensorflow requires gast=0.3.3

Indeed the gast dependency for tensorflow 2.4.1 is still 0.3.3
https://libraries.io/pypi/tensorflow/2.4.1

while it appears you pinned it to

tensorflow-base 2.4.1 gpu_py39h29c2da4_0
----------------------------------------
file name   : tensorflow-base-2.4.1-gpu_py39h29c2da4_0.conda
name        : tensorflow-base
version     : 2.4.1
build       : gpu_py39h29c2da4_0
build number: 0
size        : 195.2 MB
license     : Apache 2.0
subdir      : linux-64
url         : https://repo.anaconda.com/pkgs/main/linux-64/tensorflow-base-2.4.1-gpu_py39h29c2da4_0.conda
md5         : aec0b7780731b25ecff1e146c646b518
timestamp   : 2021-03-01 09:39:26 UTC
dependencies: 
...
  - gast >=0.4.0,<0.4.1.0a0
...

@jayfurmanek
Copy link

jayfurmanek commented Mar 11, 2021

Another problem here, and this is likely more of a problem with Anaconda cudatoolkit package, is XLA doesn't work on the gpu version.

A good test for this can be found here:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/g3doc/tutorials/jit_compile.ipynb

when running that, I get:

2021-03-11 21:50:04.976184: W tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:592] Internal: ptxas exited with non-zero error code 256, output: 
Relying on driver to perform ptx compilation. 
Setting XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda  or modifying $PATH can be used to set the location of ptxas
This message will only be logged once.
2021-03-11 21:50:06.579105: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:70] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-03-11 21:50:06.579157: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:71] Searched for CUDA in the following directories:
2021-03-11 21:50:06.579168: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74]   ./cuda_sdk_lib
2021-03-11 21:50:06.579176: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74]   /usr/local/cuda-10.1
2021-03-11 21:50:06.579183: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74]   .
2021-03-11 21:50:06.579191: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:76] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2021-03-11 21:50:06.582894: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:324] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2021-03-11 21:50:06.583354: I tensorflow/compiler/jit/xla_compilation_cache.cc:333] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2021-03-11 21:50:06.583775: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at xla_ops.cc:238 : Internal: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
  File "jit_compile.py", line 42, in <module>
    train_mnist(images, labels)
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 888, in _call
    return self._stateless_fn(*args, **kwds)
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2942, in __call__
    return graph_function._call_flat(
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1918, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 555, in call
    outputs = execute.execute(
  File "/opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_train_mnist_204]

There are two changes that could be made to the cudatoolkit package to fix this:

  • the libdevice.10.bc package is in the wrong location. It's shipped in $CONDA_HOME/lib when it should probably be in $CONDA_HOME/lib64 or $CONDA_HOME/nvvm/libdevice/ or $CONDA_HOME/nvvmx/libdevice/ (or all three)
  • the ptxas binary, which is used by the XLA compiler, is not included in the package at all and could be dropped into $CONDA_HOME/bin

I could put up a PR against your cudatoolkit feedstock with these changes if it would be considered.

@katietz
Copy link
Contributor

katietz commented Mar 12, 2021

Sure, a PR would be welcome!

About the gast version. I added hotfix for it, so that all tensorflow 2.4.1 version will have gast 0.3.3 as dependency. Hotfix just needs to be reviewed internally.

@andrewsali
Copy link

@katietz any update on the gast 0.3.3 issue? It still seems that 0.4.0 is the dependency for TF 2.4.1

@katietz
Copy link
Contributor

katietz commented Apr 9, 2021

I made a hotpatch for it, and gast should be by this using 0.3.3. The recipe isn't touched for now.

@andrewsali
Copy link

Thanks @katietz , is there anything that needs to be done on the client (install side) to consume this repodata hotpatch?

Currently when trying to install tensorflow==2.4.1 and gast==0.3.3 together, getting an error:

Package gast conflicts for:
gast==0.3.3
tensorflow==2.4.1 -> tensorflow-base==2.4.1=gpu_py37h29c2da4_0 -> gast[version='>=0.4.0,<0.4.1.0a0']

@roebel
Copy link

roebel commented Jun 22, 2021

@katietz I don't quite know what to make out of this. It still does not install correctly.I think the only way to handle this currently is install tf2.4 and then post install gast 0.3.3 with pip. Is this the intended procedure?

@roebel
Copy link

roebel commented Jun 22, 2021

@katietz I don't quite know what to make out of this. It still does not install correctly.I think the only way to handle this currently is install tf2.4 and then post install gast 0.3.3 with pip and the --user flag. Is this the intended procedure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants