-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to choose current device because CUDAService is disabled. #32428
Comments
A new Issue was created by @silviodonato Silvio Donato. @Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign heterogeneous, core |
New categories assigned: heterogeneous,core @Dr15Jones,@smuzaffar,@makortel,@makortel,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks |
The workflows How would you like them to behave when there are no GPUs ? |
Here are some options:
|
I would guess that
|
I think Therefore I think we should look into If solving @smuzaffar, what do you think? |
A possible straightforward way to implement |
+1 for |
assign pdmv |
New categories assigned: pdmv @chayanit,@wajidalikhan,@jordan-martins you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Hi, Just for my education on how IB works, why we don't see i.e. 10824.512 failed (ECAL-gpu only) in the same IB? |
Because until #31719 gets merged, there is no ECAL-only gpu workflow - |
@fwyzard @makortel @smuzaffar I've made #32547 , I hope this is what you asked for |
@makortel @smuzaffar with #32547 I created
I think we need to update jenkins to run specific matrix (eg. |
@silviodonato , I am working on improving GPU PR tests. Currently when we enable GPU tests then bot runs two jobs
cms-sw/cms-bot#1459 should allow to run GPU tests as a additional test within the standard PR test. This will avoid the compilation of externals and cmssw on GPU machines. Once cms-bot changes are merged then I can include About IBs tests, I will add an extra GPU relval tests which will run runTheMatrix with |
@silviodonato , for IBs tests, the easiest solution is to create a new IB queue e.g. GPU_X and run its tests of a GPU now. This will use the existing build and reporting system and results will be available via usual IB pages. Last night I ran a test GPU_X IB and you can already see the results here https://cmssdt.cern.ch/SDT/html/cmssdt-ib/#/ib/CMSSW_11_3_X If this looks good then I will suggest to keep this speical GPU IB alive |
Thanks @smuzaffar, it looks perfect to me. Can I remove |
Looks like all the tests under https://github.com/cms-sw/cms-bot/blob/master/cmssw-pr-test-config#L2 do not belong to
Before dropping gpu from standard, we need to understand which wf should go in the GPU PR tests |
@fwyzard , currently the PR tests for GPU runs https://github.com/cms-sw/cms-bot/blob/master/cmssw-pr-test-config#L2 workflows. Which are combination of |
@smuzaffar , the GPU workflows are only those from the So I think we can keep the .5?1 workflows in the standard tests, and run only the .5?2 in the GPU tests. |
@fwyzard , cms-sw/cms-bot#1463 should run |
Eventually we should have a single GPU workflow, and a single CPU-equivalent workflow, and I think we should run both of them for all PRs that potentially affect those workflows (i.e. probably all PRs that affect ECAL, HCAL, Pixel, Tracking, PF - so a good fraction of those that require a RECO signature). O&C has decided that changes to the GPU reconstruction are signed only by @cms-sw/reconstruction-l2 , not by @cms-sw/heterogeneous-l2 , so I guess it's really up to them to speak up on how the prefer the tests to be organised. |
IIRC the workflow definitions are not signed by reco either, it's in PDMV hands. For the short matrix it's probably more practical to add a data workflow which does not also rerun the HLT, since none of the GPU-related reco relies on it; this is mainly for time/cost optimization of the workflow. |
Issue solved by #32650. |
Now that #31719 has been merged, we should add the I'm kind of lost about the technical details: do we add it to the matrix, or to the bot configuration ? |
+heterogeneous I believe this was fixed a long time ago for the CUDA workflows. |
cms-bot internal usage |
+core
I agree. I think the root cause has been addressed in several ways by now. |
In
CMSSW_11_3_X_2020-12-08-2300
, we are getting in wf136.885522
,136.888522
,10824.522
,11634.522
It seems related to #31720 . It sounds like a kind of expected error due to missing GPU in the IB test machines.
The text was updated successfully, but these errors were encountered: