Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hcal Raw Decoding + Reco (method 0, mahi) on GPU #374

Conversation

vkhristenko
Copy link

@vkhristenko vkhristenko commented Jul 19, 2019

PR description:

This is the replacemnt of #367 made on top of the #373

PR validation:

an exe is provided to validate against cpu version.
note, that validation branch should be used to match exactly.

method 0 changes:

  • containment correct removed
  • hbm legacy correction removed (to be removed in run 3 completely)

mahi changes:

  • do not use prefit. The logic is simple - prefit is there not to run mahi for a particular case and cut down runtime. if we are able to use mahi for all situations and improve runtime (speed-up) better run full mahi.
  • fixes for 2 bugs found in cpu version and reported

@mariadalfonso
Copy link

general note:
you need to factorize the code

MahiGPU.cu should contain only MAHI not the conditions and neither nnls()

Can you put in conditions to validate the raw to the digi on CPU part and then only the MAHI on GPU ?

@vkhristenko
Copy link
Author

vkhristenko commented Oct 9, 2019

so,

  • MahiGPU, at this point ecal and hcal are almost identical, exept for covariance matrix treatment. in there i separated fnnls and other computations. the goal is once merged, reuse the code and to completely remove various comptuation funs from the either ecal or hcal packages.
  • for validation for raw to digi, please carefully check what sits in EventFilter/HcalRawToDigi package and in particular standalone execs that were added.
  • both raw to digi and digi to rec hit are written and validated separately.

// move index to the right part of the vector
w_max_idx += npassive;

Eigen::numext::swap(pulseOffsets.coeffRef(npassive),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: how you avoided to swap all the vectors (and do only the fitted amplitude) ?
https://github.com/cms-sw/cmssw/blob/master/RecoLocalCalo/HcalRecAlgos/src/MahiFit.cc#L476-L484
If I do in the CPU version all the quantities are wrong.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check how pulseOffsets are used. you do not need to swap (on cpu it's prolly better, also you will not be able to use eigen in that case). but for gpu, it's better not to swap especially for small matrices. you just need to keep track of the active/passive set, which is what pulseOffsets does, and load things accordingly.

@fwyzard
Copy link

fwyzard commented Oct 22, 2019

@vkhristenko could you remove the file RecoLocalCalo/HcalRecAlgos/src/.MahiGPU.cu.swk from the PR ?

@vkhristenko
Copy link
Author

@vkhristenko could you remove the file RecoLocalCalo/HcalRecAlgos/src/.MahiGPU.cu.swk from the PR ?

done

@fwyzard
Copy link

fwyzard commented Oct 22, 2019

Hi @vkhristenko,
I am trying to run your HCAL code, but I am not able to build it in a general working area.

If I checkout out only the minimal set of packages, with

cmsrel CMSSW_10_6_3_Patatrack
cd CMSSW_10_6_3_Patatrack/src
cmsenv
git cms-init -x cms-patatrack
git cms-fetch-pr cms-patatrack:374
git merge pull/374
git cms-addpkg CUDADataFormats/HcalCommon  CUDADataFormats/HcalDigi  CUDADataFormats/HcalRecHitSoA  EventFilter/HcalRawToDigi  HeterogeneousCore/CUDACore  RecoLocalCalo/HcalRecAlgos  RecoLocalCalo/HcalRecProducers

# edit HeterogeneousCore/CUDACore/interface/CUDAESProduct.h and add
#include <cassert> 

# edit RecoLocalCalo/HcalRecAlgos/src/HcalSiPMCharacteristicsGPU.cc and add
#include "FWCore/Utilities/interface/Exception.h"

scram b -j

it works.
Note that is seems a couple of files miss some of #includes.

However, if I add a couple more (unmodified) packages:

git cms-addpkg RecoLocalCalo/EcalRecAlgos RecoLocalCalo/EcalRecProducers
scram b -j

the build fails with a linker error:

>> Building shared library tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/libRecoLocalCaloHcalRecAlgos.so
/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_79_tmpxft_0002b32b_00000000_7_AmplitudeComputationCommonKernels_compute_70_cpp1_ii_9f04962c':
link.stub:(.text+0x121): undefined reference to `__fatbinwrap_79_tmpxft_0002b32b_00000000_7_AmplitudeComputationCommonKernels_compute_70_cpp1_ii_9f04962c'
/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_75_tmpxft_0002b336_00000000_7_AmplitudeComputationKernelsV1_compute_70_cpp1_ii_443c6056':
link.stub:(.text+0x201): undefined reference to `__fatbinwrap_75_tmpxft_0002b336_00000000_7_AmplitudeComputationKernelsV1_compute_70_cpp1_ii_443c6056'
/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_68_tmpxft_0002b341_00000000_7_TimeComputationKernels_compute_70_cpp1_ii_7bf553a9':
link.stub:(.text+0x2e1): undefined reference to `__fatbinwrap_68_tmpxft_0002b341_00000000_7_TimeComputationKernels_compute_70_cpp1_ii_7bf553a9'
/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_83_tmpxft_0002b34c_00000000_7_EcalUncalibRecHitMultiFitAlgo_gpu_new_compute_70_cpp1_ii_27e3a93b':
link.stub:(.text+0x3c1): undefined reference to `__fatbinwrap_83_tmpxft_0002b34c_00000000_7_EcalUncalibRecHitMultiFitAlgo_gpu_new_compute_70_cpp1_ii_27e3a93b'
/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_59_tmpxft_0002b357_00000000_7_inplace_fnnls_compute_70_cpp1_ii_a006763c':
link.stub:(.text+0x4a1): undefined reference to `__fatbinwrap_59_tmpxft_0002b357_00000000_7_inplace_fnnls_compute_70_cpp1_ii_a006763c'
/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_59_tmpxft_0002b362_00000000_7_KernelHelpers_compute_70_cpp1_ii_5fcd2574':
link.stub:(.text+0x581): undefined reference to `__fatbinwrap_59_tmpxft_0002b362_00000000_7_KernelHelpers_compute_70_cpp1_ii_5fcd2574'
collect2: error: ld returned 1 exit status

Have you seen this error before ?

@vkhristenko
Copy link
Author

vkhristenko commented Oct 23, 2019

@fwyzard

first of, the errors are coming from ecal.... and not hcal
and what you check out is part of release already, therefore should've been compilable... just thinking out loud.
i've seen this kinda stuff before, but do not remember what i did to remove those. these undef refs are to cuda stuff... not user stuff...

sorry... would need to check out myself. i also never did minimalistic check outs, meaning i would always get head of the patatrack branch and do things on top... but here the problem is from what is already part of the release and what has been compiled before... may be try scram b clean; scram b -j 8

just to add

>> Building shared library tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/libRecoLocalCaloHcalRecAlgos.so

but next

undefined ref to __fatbinwrap_79_tmpxft_0002b32b_00000000_7_AmplitudeComputationCommonKernels_compute_70_cpp1_ii_9f04962c

this guy AmplitudeComputationCommonKernels this is ecal guy, not hcal

@fwyzard
Copy link

fwyzard commented Oct 23, 2019

I think you can also reproduce it working directly in your branch.

Yes, the errors come from the ECAL packages, but are caused by the changes in the HCAL packages, because of the dependency among them.

I think we may have seen similar errors quite some time ago, and IIRC they were caused by multiple shared libraries (or plugins) trying to load the same CUDA kernels (for example, because they were linking a shared library that included those kernels).

@fwyzard
Copy link

fwyzard commented Oct 25, 2019

@vkhristenko did you have a look at the link problems ?

@vkhristenko
Copy link
Author

vkhristenko commented Oct 25, 2019

@fwyzard not yet, give me time until monday. i need that branch working on top of 10 6 3 patatrack for opendata in any case before then...

@vkhristenko
Copy link
Author

vkhristenko commented Oct 28, 2019

@fwyzard

I"m not able to reproduce it. note that i've added the include of cms::Exception header directly into my branch. the cassert is missing in the Heterogeneous/CUDACore.

 1012  cmsrel CMSSW_10_6_3_Patatrack
 1013  cd CMSSW_10_6_3_Patatrack/src/
 1014  cmsenv
 1015  git cms-init -x cms-patatrack
 1016  git branch CMSSW_10_6_X_Patatrack --track cms-patatrack/CMSSW_10_6_X_Patatrack
 1017  git checkout CMSSW_10_6_X_Patatrack
 1018  git checkout -b test_merging
 1019  git cms-merge-topic vkhristenko:hcal_mahi_patatrack
 1020  ls
 1021  USER_CUDA_FLAGS="-Xptxas -v" scram b -v -j 20
 1022  git cms-merge-topic vkhristenko:hcal_mahi_patatrack
 1023  USER_CUDA_FLAGS="-Xptxas -v" scram b -v -j 20
 1024  ls
 1025  git cms-addpkg HeterogeneousCore/CUDACore
 1026  cd HeterogeneousCore/CUDACore/in
 1027  ls
 1028  cd HeterogeneousCore/CUDACore/interface/
 1029  ls
 1030  vim CUDAESProduct.h 
 1031  cd ../../..
 1032  ls
 1033  USER_CUDA_FLAGS="-Xptxas -v" scram b -v -j 20
 1034  git cms-addpkg RecoLocalCalo/EcalRecAlgos RecoLocalCalo/EcalRecProducers
 1035  ls
 1036  USER_CUDA_FLAGS="-Xptxas -v" scram b -v -j 20

compiles w/o any linking errors.

@fwyzard
Copy link

fwyzard commented Oct 28, 2019

Hi @vkhristenko ,
it does fail for me if I checkout out the dependencies:

# create a working area for CMSSW_10_6_3_Patatrack
scram list
cmsrel CMSSW_10_6_3_Patatrack
cd CMSSW_10_6_3_Patatrack/src
cmsenv
git cms-init -x cms-patatrack
git branch CMSSW_10_6_X_Patatrack --track cms-patatrack/CMSSW_10_6_X_Patatrack
git checkout CMSSW_10_6_X_Patatrack

# merge vkhristenko:hcal_mahi_patatrack and checkout the modified packages
git checkout -b test_merging
git cms-merge-topic vkhristenko:hcal_mahi_patatrack
git diff --name-only $CMSSW_VERSION | cut -d/ -f-2 | uniq | xargs git cms-addpkg

# fix CUDAESProduct.h to add #include <cassert>
git cms-addpkg HeterogeneousCore/CUDACore
vim HeterogeneousCore/CUDACore/interface/CUDAESProduct.h 

# checkout the ECAL packages, and all dependencies 
git cms-addpkg RecoLocalCalo/EcalRecAlgos RecoLocalCalo/EcalRecProducers
git cms-checkdeps -a

# build
scram b -v -j

results in

/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_79_tmpxft_0002b32b_00000000_7_AmplitudeComputationCommonKernels_compute_70_cpp1_ii_9f04962c':
link.stub:(.text+0x121): undefined reference to `__fatbinwrap_79_tmpxft_0002b32b_00000000_7_AmplitudeComputationCommonKernels_compute_70_cpp1_ii_9f04962c'
/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_75_tmpxft_0002b336_00000000_7_AmplitudeComputationKernelsV1_compute_70_cpp1_ii_443c6056':
link.stub:(.text+0x201): undefined reference to `__fatbinwrap_75_tmpxft_0002b336_00000000_7_AmplitudeComputationKernelsV1_compute_70_cpp1_ii_443c6056'
/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_68_tmpxft_0002b341_00000000_7_TimeComputationKernels_compute_70_cpp1_ii_7bf553a9':
link.stub:(.text+0x2e1): undefined reference to `__fatbinwrap_68_tmpxft_0002b341_00000000_7_TimeComputationKernels_compute_70_cpp1_ii_7bf553a9'
/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_83_tmpxft_0002b34c_00000000_7_EcalUncalibRecHitMultiFitAlgo_gpu_new_compute_70_cpp1_ii_27e3a93b':
link.stub:(.text+0x3c1): undefined reference to `__fatbinwrap_83_tmpxft_0002b34c_00000000_7_EcalUncalibRecHitMultiFitAlgo_gpu_new_compute_70_cpp1_ii_27e3a93b'
/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_59_tmpxft_0002b357_00000000_7_inplace_fnnls_compute_70_cpp1_ii_a006763c':
link.stub:(.text+0x4a1): undefined reference to `__fatbinwrap_59_tmpxft_0002b357_00000000_7_inplace_fnnls_compute_70_cpp1_ii_a006763c'
/data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-pafccj/bin/../lib/gcc/x86_64-unknown-linux-gnu/8.3.1/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc820/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: in function `__cudaRegisterLinkedBinary_59_tmpxft_0002b362_00000000_7_KernelHelpers_compute_70_cpp1_ii_5fcd2574':
link.stub:(.text+0x581): undefined reference to `__fatbinwrap_59_tmpxft_0002b362_00000000_7_KernelHelpers_compute_70_cpp1_ii_5fcd2574'
collect2: error: ld returned 1 exit status

@vkhristenko
Copy link
Author

@fwyzard

just added dependencies as well... after checking ecal stuff... still no errors

@fwyzard
Copy link

fwyzard commented Oct 28, 2019

I am not sure how else to allow you to reproduce it...
where are you running ?
what architecture ?

@vkhristenko
Copy link
Author

vkhristenko commented Oct 28, 2019

scram arch: slc7_amd64_gcc700 (note you have 820 gcc)
cmg-gpu1080:/data/patatrack/vkhriste/cmssw_releases/hcal/mahi/CMSSW_10_6_3_Patatrack/src

@fwyzard
Copy link

fwyzard commented Oct 28, 2019

I've made a copy of your area, and I'll try to rebuild that...

@fwyzard
Copy link

fwyzard commented Oct 28, 2019

...I did just

scram b clean
scram b -j

and I got the same problem

--- Registered EDM Plugin: RecoLocalCaloEcalRecProducersPlugins
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_79_tmpxft_0000e2c4_00000000_7_AmplitudeComputationCommonKernels_compute_70_cpp1_ii_9f04962c':
link.stub:(.text+0x121): undefined reference to `__fatbinwrap_79_tmpxft_0000e2c4_00000000_7_AmplitudeComputationCommonKernels_compute_70_cpp1_ii_9f04962c'
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_75_tmpxft_0000e2e3_00000000_7_AmplitudeComputationKernelsV1_compute_70_cpp1_ii_443c6056':
link.stub:(.text+0x201): undefined reference to `__fatbinwrap_75_tmpxft_0000e2e3_00000000_7_AmplitudeComputationKernelsV1_compute_70_cpp1_ii_443c6056'
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_68_tmpxft_0000e2fd_00000000_7_TimeComputationKernels_compute_70_cpp1_ii_7bf553a9':
link.stub:(.text+0x2e1): undefined reference to `__fatbinwrap_68_tmpxft_0000e2fd_00000000_7_TimeComputationKernels_compute_70_cpp1_ii_7bf553a9'
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_83_tmpxft_0000e31c_00000000_7_EcalUncalibRecHitMultiFitAlgo_gpu_new_compute_70_cpp1_ii_27e3a93b':
link.stub:(.text+0x3c1): undefined reference to `__fatbinwrap_83_tmpxft_0000e31c_00000000_7_EcalUncalibRecHitMultiFitAlgo_gpu_new_compute_70_cpp1_ii_27e3a93b'
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_59_tmpxft_0000e337_00000000_7_inplace_fnnls_compute_70_cpp1_ii_a006763c':
link.stub:(.text+0x4a1): undefined reference to `__fatbinwrap_59_tmpxft_0000e337_00000000_7_inplace_fnnls_compute_70_cpp1_ii_a006763c'
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_59_tmpxft_0000e365_00000000_7_KernelHelpers_compute_70_cpp1_ii_5fcd2574':
link.stub:(.text+0x581): undefined reference to `__fatbinwrap_59_tmpxft_0000e365_00000000_7_KernelHelpers_compute_70_cpp1_ii_5fcd2574'
>> Leaving Package RecoLocalCalo/EcalRecProducers
>> Package RecoLocalCalo/EcalRecProducers built
collect2: error: ld returned 1 exit status
gmake: *** [config/SCRAM/GMake/Makefile.rules:1733: tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/libRecoLocalCaloHcalRecAlgos.so] Error 1
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

Could you try just that ?

@vkhristenko
Copy link
Author

yep, i did scram b clean; scram b and got the same

--- Registered EDM Plugin: RecoLocalCaloEcalRecProducersPlugins
>> Leaving Package RecoLocalCalo/EcalRecProducers
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_79_tmpxft_0000e2c4_00000000_7_AmplitudeComputationCommonKernels_compute_70_cpp1_ii_9f04962c':
link.stub:(.text+0x121)>> Package RecoLocalCalo/EcalRecProducers built
: undefined reference to `__fatbinwrap_79_tmpxft_0000e2c4_00000000_7_AmplitudeComputationCommonKernels_compute_70_cpp1_ii_9f04962c'
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_75_tmpxft_0000e2e3_00000000_7_AmplitudeComputationKernelsV1_compute_70_cpp1_ii_443c6056':
link.stub:(.text+0x201): undefined reference to `__fatbinwrap_75_tmpxft_0000e2e3_00000000_7_AmplitudeComputationKernelsV1_compute_70_cpp1_ii_443c6056'
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_68_tmpxft_0000e2fd_00000000_7_TimeComputationKernels_compute_70_cpp1_ii_7bf553a9':
link.stub:(.text+0x2e1): undefined reference to `__fatbinwrap_68_tmpxft_0000e2fd_00000000_7_TimeComputationKernels_compute_70_cpp1_ii_7bf553a9'
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_83_tmpxft_0000e31c_00000000_7_EcalUncalibRecHitMultiFitAlgo_gpu_new_compute_70_cpp1_ii_27e3a93b':
link.stub:(.text+0x3c1): undefined reference to `__fatbinwrap_83_tmpxft_0000e31c_00000000_7_EcalUncalibRecHitMultiFitAlgo_gpu_new_compute_70_cpp1_ii_27e3a93b'
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_59_tmpxft_0000e337_00000000_7_inplace_fnnls_compute_70_cpp1_ii_a006763c':
link.stub:(.text+0x4a1): undefined reference to `__fatbinwrap_59_tmpxft_0000e337_00000000_7_inplace_fnnls_compute_70_cpp1_ii_a006763c'
tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o: In function `__cudaRegisterLinkedBinary_59_tmpxft_0000e365_00000000_7_KernelHelpers_compute_70_cpp1_ii_5fcd2574':
link.stub:(.text+0x581): undefined reference to `__fatbinwrap_59_tmpxft_0000e365_00000000_7_KernelHelpers_compute_70_cpp1_ii_5fcd2574'
collect2: error: ld returned 1 exit status
gmake: *** [config/SCRAM/GMake/Makefile.rules:1733: tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/libRecoLocalCaloHcalRecAlgos.so] Error 1
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 

@fwyzard
Copy link

fwyzard commented Oct 28, 2019

OK, I suspect you didn't get it before because of the order of the compilation and link commands :-(

Can you look into what is causing it ?

@vkhristenko
Copy link
Author

yep yep

@vkhristenko
Copy link
Author

Indeed this is a dependency in hcal code on ecal...

so, below is the command for final stage device side linking for RecoLocalCaloHcalRecAlgos_cudadlink.o

this command includes linking device side code from RecoLocalCalo/EcalRecAlgos -lRecoLocalCaloEcalRecAlgos_nv. I guess this goes back to cms-sw/cmssw-config#65

by removing this linkage (of -lRecoLocalCaloEcalRecAlgos_nv) and proceeding to the overall shared object linkage everything works fine.

@fwyzard , as far as i remember, you mentioned at some point although cms-sw/cmssw-config#65 got merged, still things did not work. I think what i see here is the result of this guy not working 100%...

Moreover, in here there is no device side dependency, only host side, but the build rules are still generated... this is actually something we should not add!

/data/patatrack/cmssw/slc7_amd64_gcc700/external/cuda/10.1.243/bin/nvcc -dlink -L/data/patatrack/vkhriste/cmssw_releases/hcal/mahi/CMSSW_10_6_3_Patatrack/static/slc7_amd64_gcc700 -L/data/patatrack/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_3_Patatrack/static/slc7_amd64_gcc700 -lRecoLocalCaloEcalRecAlgos_nv -L/data/patatrack/vkhriste/cmssw_releases/hcal/mahi/CMSSW_10_6_3_Patatrack/lib/slc7_amd64_gcc700 -L/data/patatrack/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_3_Patatrack/lib/slc7_amd64_gcc700 -L/data/patatrack/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_6_3_Patatrack/external/slc7_amd64_gcc700/lib -L/data/patatrack/cmssw/slc7_amd64_gcc700/external/cuda/10.1.243/lib64/stubs -lcudadevrt -gencode arch=compute_60,code=sm_60 -gencode arch=compute_70,code=sm_70 -O3 -std=c++14 --expt-relaxed-constexpr --expt-extended-lambda --generate-line-info --source-in-ptx --cudart=shared --compiler-options '-O2 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -ftree-vectorize -Wstrict-overflow -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -msse3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wunused -Wparentheses -Wno-deprecated -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-error=unused-variable -Wno-error=unused-variable -Wno-error=unused-variable -DCUDA_ENABLE_DEPRECATED -DCUB_STDERR -DBOOST_DISABLE_ASSERTS -std=c++14  -fPIC   ' tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/MahiGPU.cu.o -o tmp/slc7_amd64_gcc700/src/RecoLocalCalo/HcalRecAlgos/src/RecoLocalCaloHcalRecAlgos/RecoLocalCaloHcalRecAlgos_cudadlink.o

@fwyzard fwyzard added this to the CMSSW_11_1_0_Patatrack milestone Jan 22, 2020
@fwyzard
Copy link

fwyzard commented May 31, 2020

Replaced by #431 and #468.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HCAL HCAL-related developments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants