-
Notifications
You must be signed in to change notification settings - Fork 950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Openspiel+Bazel+Tensorflow build failure #172
Comments
Hi @lancelot-ch , Yes I think this is being cause by a recent bug in abseil. We got around it by fixing the checkout to a specific commit number for now, so if you pull from master (+ get rid of the abseil-cpp directory that There is also a very quick fix to a single file in abseil that you can apply and also abseil should be fixed by tomorrow. For details, see: abseil/abseil-cpp#640 |
Thanks, @lanctot ! I tried the fix, and the problem isn't solved yet. I felt that could be the cause, given that I downloaded Abseil-cpp on March 9. However, even if I delete that and switched to the March 7 version Abseil-cpp as you suggested, the problem is still there. One thought is that, I am using Bazel to build, so the change of a Cmake file might not help. Also, I remembered that I encountered this problem during previous build and by using below command I can circumvent this issue. bazel build --copt=-std=c++17 open_spiel/games/go_test Therefore the current problem might be induced by Tensorflow? Since it also drags in another version of Abseil? I am confused. |
Good point, agreed: it cannot be due to a change in CMake files in your case. But I guess it must be related because those are the exact same errors I was getting (but in different files). |
Might be. I will point the team to this issue, maybe someone will have an idea. You could also try following up on the abseil-cpp thread, they responded pretty quickly and they might have an idea (even if it may not be directly related to absl). |
When you say you put open_spiel in a TensorFlow folder, what exactly do you mean? I think the problem is likely that you are accidentally doing a mixed-mode compile somehow. I think TensorFlow by default uses C++14 and it looks like open_spiel uses C++17. Linking will fail in this case because This is probably the single most common issue that trips people up. |
Thanks, @lanctot @derekmauro By "put open_spiel in a TensorFlow folder", I mean I cloned Tensorflow souce codes, and put open_spiel in the souce code folder like "tensorflow/tensorflow/open_speil/spiel.cc" etc. I kept Tensorflow root folder WORKSPACE file. I cloned Abseil-cpp to open_spiel folder, like "tensorflow/tensorflow/open_spiel/Abseil-cpp". I used "bazel build --copt=-std=c++17 open_spiel/games/go_test" to build. My goal is to use Openspiel for a C++ program for AlphaZero selp-play computation. With network inference, I am thinking of using Tensorflow C++ API, which has to be built from source codes, which leads to current situation. Meanwhile, I will tweak with ABI options and see if I can work around this issue. If there are other options to achieve my goal, please let me know. Thanks. |
I'll also point you to https://abseil.io/blog/201901115-options and https://github.com/abseil/abseil-cpp/blob/master/absl/base/options.h. You should be able to make this work with a little tweaking. |
Thanks a lot, @derekmauro . I will give it a try. I am not sure if I can handle it well though. Tensorflow will pull in an older version of Abseil as well. I might also work with that also. Besides, Tensorflow default with C++14 and OpenSpiel requires C++17. I had not expected such a difficult senario... |
@lanctot @lancelot-ch |
We are about to release some code in the next few weeks that would require TF inference (note, btw in We might spend a bit of effort trying to see how to get this to work because it would be nice to support the TF-compiled code. It's unfortunately not high priority, but we can at least try. I agree the C++14 / C++17 mix is awkward. One option is to go back down to C++14 until TF allows C++17 (I'm not sure how many C++17 features we truly rely on); this would require some work on our part, probably not much. But, it feels like the wrong direction. We will soon need to support TF 2.2 because Ubuntu comes with Python 3.8 ( see #166 for details ), and it would be really nice if TF 2.2 supported C++17, because then we would not have to mix C++ standards and would not have to go back to an old standard just to support compiling with TF. |
Thanks, @lanctot . I didn't notice there is already a TF example. I will read it through. On a quick study, I feel I am also on a similar route. With the link you recommended in the example note, I also read it. It mentions either go with Bazel or link to a TF API in .so file. I took the first path with Bazel and programmed below codes following codes and instructions from other links with some revisions. Meanwhile the TF inference in C++ works, but I cannot get it work when combined with TF, which induces this thread's question. I also tried with the second path to link to a compiled TF .so file. But I know little about this linking thing so I failed several times. Probably I need to wait when Openspiel develops more progress. With the methods Derek suggested earlier, I am still playing with it. Every time I changed the settings of Abseil option.h, I have to compile the whole TF again. It takes a very long time to try every possible solution. So far I still cannot get it. It seems that I always can only satisfy one of Openspiel or TF inference. Bazel+TF inference solution example: hello.cc #include "tensorflow/core/public/session.h" //using std; int main(void) // Load model from SavedModel Tensor input(tensorflow::DT_FLOAT, tensorflow::TensorShape({1,2,2,2})); // Prediction return 0; BUILD load("//tensorflow:tensorflow.bzl", "tf_cc_binary") package( tf_cc_binary(
) train.py board_height=2 conv1 = tf.layers.conv2d(inputs=input_state, action_conv = tf.layers.conv2d(inputs=conv3, filters=4, action_conv_flat = tf.reshape( prediction = tf.layers.dense(inputs=action_conv_flat, evaluation_fc2 = tf.layers.dense(inputs=evaluation_fc1, test_data = np.array([[[[1,2],[3,4]],[[5,6],[7,8]]]], dtype='float32') with tf.Session() as sess:
|
Interesting, I didn't realize this was an option. Can you point me to any places you found on how to do this or where you found out about it? If we could get it to work without having to compile TF from scratch, I'd far prefer that option. I wonder if the pip packages come with everything you need, though (we would need the headers in addition to the .so). After a few quick searches and I found that TF actually has a CMake build: https://github.com/tensorflow/tensorflow/tree/9590c4c32dd4346ea5c35673336f5912c6072bf2/tensorflow/contrib/cmake . This is great, it might make it easier to integrate with OpenSpiel. I'll keep you posted if look at this further and make any progress. |
@lanctot At Openspiel tf_trajectories.h file's note link, https://tebesu.github.io/posts/Training-a-TensorFlow-graph-in-C++-API, it says "There are two ways to compile this: one is bazel and the other is linking against the tensorflow library. I prefer the latter." The above link mentioned how to link to libtensorflow_cc.so file. But I just cannot reproduce it on my system. While I still think this is a promising way to avoid having to unify TF and Openspiel's source codes' compiler. It seems to be such an elegant solution, that we can use a Tensorflow C++ API, just like Tensorflow Python API and Tensofrflow C API from TF. |
@lanctot And please also refer to the following two links. One uses Bazel while another uses Cmake. There are some parts I cannot fully follow so I am still studying them. |
Quick heads-up that I've been playing around with tensorflow_cc; for now just trying to get it to compile with TF2.2 and Ubuntu 20.04 (we need to change OpenSpiel to support these soon so I'm taking the opportunity to try it in this environment). I've run into some trouble:
It's great that TF can be compiled with CMake.. it would make supporting compiling with it externally within OpenSpiel a lot easier. |
Ok, I've managed to compile Tensorflow via CMake using tensorflow_cc (and a very new version as well, TF2.2rc2 on Ubuntu 20.04!) Thanks for pointing us out to that @lancelot-ch , seems like a great project. This is the first step to getting OpenSpiel + TF compiling externally together. I can't promise any time lines, but it means we're just a few steps from getting them to work together. |
@lanctot Thanks for your wonderful contribution. I tried a few times but with no luck. Recently I am occupied with some other work, but I will be eager to be looking forward to your new milestones! |
Hi, However, I'm getting the following error from
Do you have any ideas? |
I also tried on an Ubuntu 18.04 machine with
|
P.S: The TensorflowCC (Tensorflow v1.15.2) is working without problem in both environments. |
Hi @mrdaliri , yes I think you're not linking to the alpha_zero library that you're defining in You're defining a library when you do this:
But then the executables are not including them because they are not being bundled into OPEN_SPIEL_OBJECTS. You will also need to add a line somewhere around here: https://github.com/deepmind/open_spiel/blob/695fad0ac25383e7f66cb0bb30fa8a4ea07d6bb9/open_spiel/CMakeLists.txt#L154 |
Oh I assumed uncommenting was enough. I changed that block to the following:
Now I'm getting a great number of errors on Ubuntu, all complain about Tensorflow stuff. It seems it is not yet correctly linked:
|
@lanctot: I've added TensorflowCC example as per your suggestion, and disabled all other alpha_zero targets ( So, perhaps the issue is in one of the alpha_zero files and TensorflowCC (by itself) is working just fine. My guess is some sort of function name overlapping between external modules (Eigen, Tensorflow and Abseil). |
Ok thanks for doing that.. it will help move us along at least. I unfortunately have a lot of reviewing to do tonight but maybe tomorrow I can clone your fork and take a look. @tewalds, any ideas? I'm reluctant to blame Eigen or Abseil only because all the undefined references are coming from Tensorflow itself. So possibly TensorflowCC is not exposing everything we need from TF (doubtful) or it's providing several link targets and we're not using the ones we need. @FloopCZ, do these errors look familiar to you.. or do you have any ideas on what we could be doing wrong? |
Hey @mrdaliri I also noticed TensorflowCC is using TF2.2. I will have my PR that lets us upgrade to TF2.2 (#249) ready to go. I have already imported it, there's just a bunch of work to do on it which I plan to get done tomorrow. So it will likely be in on Friday morning. If we don't have this solved by then I'd like to try see what happens if we move over to trying this with TF2.2 instead of TF1.15. |
Yes, by default is is using TF 2.2. Just to clarify, I modified TensorflowCC config file and changed its TF version to 1.15.2. So all tests were running against TF 1.15.2. |
I'm trying it out. |
Great let's Google this one function or link error and find out when it was added and/or how to include it in the .so. Almost there!!! |
At this point we can also try posting on TF github. Since it's just down to one error, with luck they might get back to us today. |
Do you mean his
What I understand is that Protobuf is NOT included in tensorflow_cc, so we have to install it externally, and then link it manually. ( |
TF uses protobufs, so they might have a subset of protobufs within the TF code (likely an older version). Ok coukd I ask you to post a brief message on TF github pointing to the most relevant comment in this thread and quoting the final link error? I will also post on our internal sites and contact the TF devs, but it would be great if I could include a link to your post. |
Something has happened to that |
Great news! Protobuf 3.10.1 did the trick! I was able to run
|
@lanctot is it the expected output? EDIT: I just realized that it is an error. Similar error occurred when I tried |
Yeah I saw, could you try @tewalds , do you recognize that error? |
@lanctot
I changed line 41 of its python file:https://github.com/deepmind/open_spiel/blob/549e48010a81c023902a39c41319ed08769d3f26/open_spiel/contrib/python/export_graph.py#L41 to |
Found this issue tensorflow/tensorflow#38393. It seems that the OP in that issue was using TF_CC with TF 2.2-rc2 (like me). Maybe it got solved in 2.2.0 final version. I have to build TF_CC again which takes some time. In the meantime, I'll try TF_CC docker container which uses 2.2.0 final version. Side note: my machine is GPU-enabled with CUDA installed. Since the error looks related to XLA, the might be a mismatch between the installed CUDA libraries and what TF 2.2 requires. |
I will finally fix this today. But basically every instance of |
I think I've fixed it now (see my tf_trajectories_cpp branch). It also failed with same XLA-related runtime errors. |
Same errors with TF_CC Docker (TF2.2). I'm now re-building TF_CC with |
It doesn't look related to XLA. The issue is with the For tf_trajectories_example, if I comment out that line, it runs without issues (no errors but no output). However, with vpnet_test, commenting out
|
Is there a better way to set the device, so that you can load the same graph onto each device on a system with multiple GPUs? |
New updates: I re-compiled TF_CC with a modified version of Tensorflow 2.2 (here). I added It also eliminates all errors with
|
Hello all, I am on Debian 10, and I was able to get the tensorflow_cc_test from mrdaliri's comment (commit 9ebbf6b) building and running on my machine. (Fresh install of cuda 10.2 with cudnn 7.6.5.32, tensorflow_cc v2.2.0, libprotobuf-dev and protobuf-compiler 3.6.1.3-2) However, when I update to the latest az_cpp_cmake branch (commit aba6e59), vpnet_test builds, but then fails on run with
EDIT: Just saw your new comment from today, will try out the modified version of tensorflow. |
Hi @jeremysalwen, You need to modify
|
Hi @mrdaliri I was able to build and install tensorflow_cc using your modified version of tensorflow. I was then able to build open_spiel, but running the vpnet_test fails with
I tried uninstalling my libprotobuf-dev debian package, but then openspiel refuses to compile at all (cmake complains) do you have further modifications to open_spiel to address this? |
@mrdaliri could you add a section to the AlphaZero README.md describing the steps necessary to compile and run TF within OpenSpiel? Actually it could even be better as a separate independent doc (because maybe it will come up again in different contexts) and for now we can link from the AlphaZero doc and in the header of tf_trajectories. |
Hi @lanctot, |
Yes basically something tidy that people can read and follow to reproduce your success in getting this to work. |
Sure. I'll make a pull request so you can try it out before adding it to master. |
Hi @jeremysalwen, if you have built and installed the modified version of TF, you don't need external Protobuf anymore. Please check out my latest commit on P.S. I don't have |
Hi @mrdaliri, when do you think your fixes will be pushed to master in OpenSpiel? Will there also be documentation explaining how to properly compile an AlphaZero instance with the fixes? Thanks! |
Hi @alextrudeau, |
Thanks @mrdaliri , this PR will be merged tomorrow. |
Hello,
I tried to use Bazel to build Openspiel. It succeeds. But when I tried to use Bazel to build Openspiel in a Tensorflow folder, it failed.
I put open_spiel folder in Tensorflow folder. I used Tensorflow's WORKSPACE. It gives following errors(seemed that compiling passed, while linking failed with errors on absl). I tried removing .bazelrc file in Tensorflow source folder, and Bazel build passed again. Could anyone help? Thanks a lot.
Environment: WSL Ubuntu 18.04; Openspiel latest version; Bazel 1.2.1; Tensorflow: using source codes; Python 3.6.
Linking of rule '//tensorflow/open_spiel/games:go_test' failed (Exit 1)
bazel-out/k8-opt/bin/tensorflow/open_spiel/tests/_objs/basic_tests/basic_tests.o:basic_tests.cc:function std::__cxx11::basic_string<char, std::char_traits, std::allocator > absl::StrCat<char [25], int, char [17], int, char [2], std::__cxx11::basic_string<char, std::char_traits, std::allocator > >(absl::AlphaNum const&, absl::AlphaNum const&, absl::AlphaNum const&, absl::AlphaNum const&, absl::AlphaNum const&, char const (&) [25], int const&, char const (&) [17], int const&, char const (&) [2], std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&): error: undefined reference to 'absl::strings_internal::CatPieces[abi:cxx11](std::initializer_list<std::basic_string_view<char, std::char_traits > >)'
bazel-out/k8-opt/bin/tensorflow/open_spiel/_objs/spiel/spiel.o:spiel.cc:function open_spiel::SampleAction(std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > const&, absl::BitGenRef): error: undefined reference to 'absl::strings_internal::CatPieces[abi:cxx11](std::initializer_list<std::basic_string_view<char, std::char_traits > >)'
bazel-out/k8-opt/bin/tensorflow/open_spiel/_objs/spiel/spiel.o:spiel.cc:function open_spiel::Game::DeserializeState(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) const: error: undefined reference to 'absl::ByChar::Find(std::basic_string_view<char, std::char_traits >, unsigned long) const'
bazel-out/k8-opt/bin/tensorflow/open_spiel/_objs/spiel/spiel.o:spiel.cc:function open_spiel::Game::DeserializeState(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) const: error: undefined reference to 'absl::ByChar::Find(std::basic_string_view<char, std::char_traits >, unsigned long) const'
bazel-out/k8-opt/bin/tensorflow/open_spiel/_objs/spiel/spiel.o:spiel.cc:function open_spiel::GameRegisterer::CreateByName(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, open_spiel::GameParameter, std::less<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, open_spiel::GameParameter> > > const&): error: undefined reference to 'absl::strings_internal::CatPieces[abi:cxx11](std::initializer_list<std::basic_string_view<char, std::char_traits > >)'
bazel-out/k8-opt/bin/tensorflow/open_spiel/_objs/spiel/spiel.o:spiel.cc:function open_spiel::DeserializeGameAndState(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&): error: undefined reference to 'absl::ByChar::Find(std::basic_string_view<char, std::char_traits >, unsigned long) const'
bazel-out/k8-opt/bin/tensorflow/open_spiel/_objs/spiel/spiel.o:spiel.cc:function open_spiel::DeserializeGameAndState(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&): error: undefined reference to 'absl::ByChar::Find(std::basic_string_view<char, std::char_traits >, unsigned long) const'
Best regards,
Lancelot
The text was updated successfully, but these errors were encountered: