-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GPU backend option for TensorFlow session #40551
Conversation
test parameters:
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40551/33781
|
A new Pull Request was created by @valsdav (Davide Valsecchi) for master. It involves the following packages:
@cmsbuild, @mandrenguyen, @clacaputo can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@cmsbuild , please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-39a9f4/30044/summary.html Comparison SummaryThere are some workflows for which there are errors in the baseline: Summary:
GPU Comparison SummarySummary:
|
@valsdav @tvami , there are few unit tests which failed on ppc64le ( where we have GPUs available). See all the testTF logs here. Can you please makethe same change in https://github.com/cms-sw/cmssw/tree/39394a2c91eba54f396c5e454c87403b89797d52/PhysicsTools/TensorFlow/test |
also for |
Hi @smuzaffar I checked and the tests are using the same options already. The problem seems a CUDA driver version mismatch:
|
@fwyzard , on our Power PC nodes, scram hook to find the compatibility between cuda driver and runtime [a] shows that driver and runtime are compatible so it add $CUDA_BASE/driver but tensorflow disagrees . No idea why TF is still checking for cuda device which is is explicitly set to not use GPU . Any idea how to avoid it? [a]
|
In a couple of tests the |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40551/33788
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40551/34102
|
Pull request #40551 was updated. @cmsbuild, @mandrenguyen, @clacaputo can you please check and sign again. |
please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-39a9f4/30487/summary.html The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: Comparison SummarySummary:
GPU Comparison SummarySummary:
|
We still see problems in the tests run on I think it is a problem with some module still not properly setup through the new |
Tests on |
I am planning to introduce some improvements in the organization of the TF options we are exposing to the users, but I think that can be done in a separate PR. The changes in the PR allow all the tests to pass (with Do you think we can merge this one? Thanks! |
+1 |
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
+1 |
@rappoccio just a kind reminder that this PR should be merged along with cms-sw/cmsdist#7648 |
PR description:
This PR introduces a GPU backend option for TensorFlow sessions. (The interface is similar to the ONNX GPU support one #36963)
The GPU session can be activated by user code by requesting
tensorflow::Backend::cuda
in the session creation.This is needed to be able to compile TF with GPU support in the cmssw-dist (see cms-sw/cmsdist#7648).
By default the CPU backend is used for all the standard workflows.
Moreover, TensorFlow has been setup to avoid occupying all the cuda memory of a GPU, but just the necessary one (
allow_growth=true
option).Tests have been modified to run both a CPU and a GPU version, if a device is available.
PR validation
We are working to provide a test with the GPU backend active in a reconstruction sequence.
@yongbinfeng @riga @tvami @smuzaffar