Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Triton to v2.11 #7142

Merged
merged 8 commits into from
Jul 19, 2021
Merged

Conversation

kpedro88
Copy link
Contributor

@kpedro88 kpedro88 commented Jul 15, 2021

This PR updates Triton to its latest release. The Triton repository has been refactored into separate components, so the external is renamed to triton-inference-client.

In the process of refactoring, access to some helper functions was lost, so these are installed by hand in the spec file. This should be fixed in an upcoming version of Triton. (Hopefully the various CMake issues currently solved in the spec file will also be addressed.)

The Triton updates require a newer version of CMake, so that is done here as well.

This PR requires an associated CMSSW PR (to be submitted simultaneously) to compile: cms-sw/cmssw#34508

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kpedro88 (Kevin Pedro) for branch IB/CMSSW_12_0_X/master.

@cmsbuild, @smuzaffar, @mrodozov, @iarspider can you please review it and eventually sign? Thanks.
@silviodonato, @dpiparo, @qliphy, @perrotta you are the release manager for this.
cms-bot commands are listed here

@kpedro88
Copy link
Contributor Author

please test

fi

# extracted from https://github.com/triton-inference-server/server/blob/v2.11.0/src/core/model_config.h
cat << 'EOF' > ${PROJ_DIR}/library/model_config.h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kpedro88 , instead of generating these files here, can you please add these generated files as cmsdist/triton-inference-client/model_config.[h,cc].file and then add following sources

Source1:  triton-inference-client/model_config.h
Source2: triton-inference-client/model_config.cc

and then use

cp %{_sourcedir}/model_config.h  ${PROJ_DIR}/library/
cp %{_sourcedir}/model_config.cc ${PROJ_DIR}/library/

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16869/summary.html
COMMIT: 196d001
CMSSW: CMSSW_12_0_X_2021-07-15-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7142/16869/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation warning when building: See details on the summary page.

@kpedro88
Copy link
Contributor Author

@smuzaffar what is the right way to deal with the renaming of the tool? The error reported was:

ERROR! File not found: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cmsdist/triton-inference-server-toolfile.spec

which is intentional...

@smuzaffar
Copy link
Contributor

smuzaffar commented Jul 15, 2021

@kpedro88 , you need to cleanup cmssw-tool-conf.spec and replace triton-inference-server-toolfile with triton-inference-client-toolfile

@cmsbuild
Copy link
Contributor

Pull request #7142 was updated.

@kpedro88
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16872/summary.html
COMMIT: 43e05c3
CMSSW: CMSSW_12_0_X_2021-07-15-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7142/16872/install.sh to create a dev area with all the needed externals and cmssw changes.

Build

I found compilation error when building:

TritonData.cc:(.text+0x138): undefined reference to `nvidia::inferenceserver::client::InferInput::Reset()'
/cvmfs/cms-ib.cern.ch/nweek-02689/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc900/src/HeterogeneousCore/SonicTriton/src/HeterogeneousCoreSonicTriton/TritonData.cc.o: in function `void TritonData::toServer(std::shared_ptr >, std::allocator > > > >)':
TritonData.cc:(.text._ZN10TritonDataIN6nvidia15inferenceserver6client10InferInputEE8toServerIfEEvSt10shared_ptrISt6vectorIS7_IT_SaIS8_EESaISA_EEE[_ZN10TritonDataIN6nvidia15inferenceserver6client10InferInputEE8toServerIfEEvSt10shared_ptrISt6vectorIS7_IT_SaIS8_EESaISA_EEE]+0x50): undefined reference to `nvidia::inferenceserver::client::InferInput::SetShape(std::vector > const&)'
/cvmfs/cms-ib.cern.ch/nweek-02689/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc900/src/HeterogeneousCore/SonicTriton/src/HeterogeneousCoreSonicTriton/TritonData.cc.o: in function `void TritonData::toServer(std::shared_ptr >, std::allocator > > > >)':
TritonData.cc:(.text._ZN10TritonDataIN6nvidia15inferenceserver6client10InferInputEE8toServerIlEEvSt10shared_ptrISt6vectorIS7_IT_SaIS8_EESaISA_EEE[_ZN10TritonDataIN6nvidia15inferenceserver6client10InferInputEE8toServerIlEEvSt10shared_ptrISt6vectorIS7_IT_SaIS8_EESaISA_EEE]+0x50): undefined reference to `nvidia::inferenceserver::client::InferInput::SetShape(std::vector > const&)'
collect2: error: ld returned 1 exit status
gmake: *** [tmp/slc7_amd64_gcc900/src/HeterogeneousCore/SonicTriton/src/HeterogeneousCoreSonicTriton/libHeterogeneousCoreSonicTriton.so] Error 1
Leaving library rule at HeterogeneousCore/SonicTriton
------- copying files from src/HeterogeneousCore/SonicTriton/scripts -------
>> copied cmsTriton
Entering library rule at src/HeterogeneousCore/SonicTriton/test


@kpedro88
Copy link
Contributor Author

@smuzaffar this PR is included in the tests for cms-sw/cmssw#34508, which are running now. Does it need to be tested here also?

@smuzaffar
Copy link
Contributor

@kpedro88 , no need to test it here.

@kpedro88
Copy link
Contributor Author

@smuzaffar cms-sw/cmssw#34508 tests passed

@smuzaffar
Copy link
Contributor

test parameters:

@smuzaffar
Copy link
Contributor

please test for slc7_amd64_gcc10

@smuzaffar
Copy link
Contributor

please test for cc8_amd64_gcc9

@smuzaffar
Copy link
Contributor

please test for slc7_aarch64_gcc9

@smuzaffar
Copy link
Contributor

please test for CMSSW_12_0_X/slc7_ppc64le_gcc9

@smuzaffar
Copy link
Contributor

+externals
new external builds fine for all archs

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_0_X/master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy, @perrotta (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16972/summary.html
COMMIT: 43e05c3
CMSSW: CMSSW_12_0_X_2021-07-18-2300/slc7_ppc64le_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7142/16972/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16972/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16972/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test test_PrepareInputDb had ERRORS
---> test test_MpsWorkFlow had ERRORS
---> test TestDQMServicesDemo had ERRORS
---> test TestHeterogeneousCoreSonicTritonProducerGPU had ERRORS
and more ...

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16976/summary.html
COMMIT: 43e05c3
CMSSW: CMSSW_12_0_X_2021-07-18-2300/slc7_aarch64_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7142/16976/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16976/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16976/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test TestDQMServicesDemo had ERRORS
---> test testFWCoreUtilities had ERRORS
---> test TestFWCoreServicesDriver had ERRORS
---> test testUploadConditions had ERRORS
and more ...

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16970/summary.html
COMMIT: 43e05c3
CMSSW: CMSSW_12_0_X_2021-07-18-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7142/16970/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16970/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16970/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test TestDQMServicesDemo had ERRORS

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-c3860b/11634.912_TTbar_14TeV+2021_DD4hepDB+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 48194 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 2996268
  • DQMHistoTests: Total failures: 205806
  • DQMHistoTests: Total nulls: 13
  • DQMHistoTests: Total successes: 2790427
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.446 KiB( 38 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): 0.915 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 250202.181 ): -0.117 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): -0.352 KiB SiStrip/MechanicalView
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: found differences in 12 / 38 workflows

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16971/summary.html
COMMIT: 43e05c3
CMSSW: CMSSW_12_0_X_2021-07-18-2300/cc8_amd64_gcc9
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/7142/16971/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16971/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3860b/16971/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test TestDQMServicesDemo had ERRORS

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-c3860b/11634.912_TTbar_14TeV+2021_DD4hepDB+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 53316 differences found in the comparisons
  • DQMHistoTests: Total files compared: 39
  • DQMHistoTests: Total histograms compared: 2996268
  • DQMHistoTests: Total failures: 285032
  • DQMHistoTests: Total nulls: 253
  • DQMHistoTests: Total successes: 2710961
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -3.634 KiB( 38 files compared)
  • DQMHistoSizes: changed ( 10224.0 ): 0.786 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 136.874 ): -0.023 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 250202.181 ): -0.240 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 25202.0 ): -0.777 KiB SiStrip/MechanicalView
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • DQMHistoSizes: changed ( 4.53 ): 0.004 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 7.3 ): -3.380 KiB SiStrip/MechanicalView
  • Checked 165 log files, 37 edm output root files, 39 DQM output files
  • TriggerResults: found differences in 14 / 38 workflows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants