Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ECAL and HCAL reconstruction to run on multple GPUs #502

Merged

Conversation

vkhristenko
Copy link

@vkhristenko vkhristenko commented Jul 8, 2020

PR description:

superseeds #498

hcal and ecal are done together in here cause of the change CUDADataFormats/HcalCommon -> CUDADataFormats/CaloCommon which both hcal and ecal now depend on. this avoids duplication...

this is to allow hcal and ecal running on a node with multiple gpus.
all the modules have been updated for that and now basically no protection for cuda service is needed.
note: Ecal RecHit was only updated but not validated (fillDescriptions is not self-sufficient) @amassiro

the only thing in this pr is that the newly added conditions' Records should be moved to CondFormats/DataRecord eventually.

PR validation:

using standalone execs

@vkhristenko vkhristenko mentioned this pull request Jul 8, 2020

// input cpu data
ecal::raw::InputDataCPU inputCPU = {
cms::cuda::make_host_unique<unsigned char[]>(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment for the future (after CMSSW 11.1.0 / CUDA 11.0 / c++17): would it make sense to use std::byte instead of unsigned char ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we should be consistent with what sits inside of the FEDRawData... although we do make a copy before sending to the device...

@fwyzard fwyzard added ECAL ECAL-related developments HCAL HCAL-related developments labels Jul 8, 2020
@fwyzard
Copy link

fwyzard commented Jul 8, 2020

Sorry for taking long to go through this, my plan is to

  • finish review the changes
  • check that it works fine with 2 GPUs

I am also using this PR as an excuse to add the HCAL-only workflows to the validation, but if that takes too long I'll just go ahead and merge it.

@vkhristenko
Copy link
Author

@fwyzard np. i tested this on cmg-gpu1080...

@fwyzard fwyzard changed the title Multigpu hcal ecal Update ECAL and HCAL reconstruction to run on multple GPUs Jul 8, 2020
@fwyzard
Copy link

fwyzard commented Jul 8, 2020

I haven't gotten to the HCAL workflow yet - but this PR already breaks the ECAL-only workflow, and the HLT customisation.

Looks like the reasons are:

  • the old parameters left in the HLT workflow
  • the new ESProdcers missing from the HLT and offline workflows

I can try to add them ...

@fwyzard
Copy link

fwyzard commented Jul 8, 2020

What is the configuration for EcalRecHitParametersGPUESProducer supposed to look like ?

The autogenerated cfi is

import FWCore.ParameterSet.Config as cms

ecalRecHitParametersGPUESProducer = cms.ESSource('EcalRecHitParametersGPUESProducer',
  ChannelStatusToBeExcluded = cms.VPSet(
    cms.PSet()
  ),
  appendToDataLabel = cms.string('')
)

but using it fails with

----- Begin Fatal Exception 08-Jul-2020 19:06:29 CEST-----------------------
An exception of category 'Configuration' occurred while
[0] Constructing the EventProcessor
[1] Validating configuration of ESProducer or ESSource of type EcalRecHitParametersGPUESProducer with label: 'ecalRecHitParametersGPUESProducer'
Exception Message:
Missing required parameter. It should have label "kDAC" and have type "tracked string".
The description has no default. The parameter must be defined in the configuration
----- End Fatal Exception -------------------------------------------------

@vkhristenko
Copy link
Author

vkhristenko commented Jul 8, 2020 via email

@fwyzard
Copy link

fwyzard commented Jul 8, 2020

I see - I hadn't understood that the comment was about the EDProducer/ESProducer split, thanks for the clarification.

@vkhristenko
Copy link
Author

I see - I hadn't understood that the comment was about the EDProducer/ESProducer split, thanks for the clarification.

ok, I think that you need to use this guy https://github.com/cms-patatrack/cmssw/blob/CMSSW_11_1_X_Patatrack/RecoLocalCalo/EcalRecProducers/python/ecalRecHitGPU_cfi.py

but again the EDProducer itself does not really have self-sufficient fillDescriptions. And when i moved to ES..., I moved only what was present in the EDProducer. Therefore, I think somewhere in the customizations there should be loading of this cfi... @amassiro

@amassiro
Copy link

I see - I hadn't understood that the comment was about the EDProducer/ESProducer split, thanks for the clarification.

ok, I think that you need to use this guy https://github.com/cms-patatrack/cmssw/blob/CMSSW_11_1_X_Patatrack/RecoLocalCalo/EcalRecProducers/python/ecalRecHitGPU_cfi.py

but again the EDProducer itself does not really have self-sufficient fillDescriptions. And when i moved to ES..., I moved only what was present in the EDProducer. Therefore, I think somewhere in the customizations there should be loading of this cfi... @amassiro

Thanks @vkhristenko !

I think that with this
#504
the configuration producer should be ok.

I tested with the usual test configuration under RecoLocalCalo/EcalRecProducers/test/testEcalRechitProducer_cfg.py

@fwyzard
Copy link

fwyzard commented Jul 12, 2020

Validated together with #504, #507 and #508.

@fwyzard fwyzard merged commit e7ffb2c into cms-patatrack:CMSSW_11_1_X_Patatrack Jul 12, 2020
fwyzard pushed a commit that referenced this pull request Oct 7, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Oct 7, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Oct 7, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Nov 9, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Nov 9, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Nov 12, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Nov 12, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard added a commit that referenced this pull request Nov 26, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard added a commit that referenced this pull request Nov 26, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Dec 25, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
Use caching allocators for host and device CUDA memory.
Use dedicated ESProducers to make part of the modules' configuration available on all GPUs.
Rename hcal and hcal::common namespaces to to calo::common.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ECAL ECAL-related developments HCAL HCAL-related developments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants