Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit test for Run 3 Scouting formats #41834

Merged
merged 3 commits into from
Jun 4, 2023

Conversation

wddgit
Copy link
Contributor

@wddgit wddgit commented May 31, 2023

PR description:

Add a new unit test for the Run 3 Scouting data formats. This generates a data file containing Run 3 Scouting objects with known content. Then it reads it. It verifies that when we read it we obtain values that match the known written values for all the data fields in the objects. In particular all containers have content so all contained types are also read.

It also reads the old files in the DataFormats-Scouting data repository which are listed in the shell script. The plan is that each time a Run 3 Scouting data format is modified an additional file will added. The data files required for this PR were generated with releases 12_0_4 (2022 data) and 13_0_3 (2023 data).

There are 10 events with content and vector sizes varying from event to event.

PR validation:

This only adds a unit test in DataFormats/Scouting which passes. It shouldn't affect anything else.

Note that the test will only pass if the cms-data/DataFormats-Scouting#1 pull request is merged and that external update is used. Waiting to merge this PR until a couple days after the data PR is merged might avoid some syncing difficulties in IBs and PR tests.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41834/35730

  • This PR adds an extra 28KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @wddgit (W. David Dagenhart) for master.

It involves the following packages:

  • DataFormats/Scouting (core)

@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks.
@missirol, @rovere this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@wddgit
Copy link
Contributor Author

wddgit commented May 31, 2023

please test with cms-data/DataFormats-Scouting#1

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 1, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a96aee/32912/summary.html
COMMIT: 084a0b8
CMSSW: CMSSW_13_2_X_2023-05-31-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41834/32912/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a96aee/32912/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a96aee/32912/git-merge-result

Comparison Summary

Summary:

  • You potentially added 11 lines to the logs
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3221457
  • DQMHistoTests: Total failures: 10
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3221425
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@wddgit
Copy link
Contributor Author

wddgit commented Jun 1, 2023

Continuing the conversation from #41631 (comment). We were discussing why it was necessary to add the call to clear the vector in the iorules for Run3ScoutingElectron when its format was changed.

I did a couple experiments with the new test in this PR. If I temporarily removed the call to clear() from the iorules for Run3ScoutingElectron and run the new test it still completes successfully. In this case it is reading with a module in cmsRun. It is not included in this PR, but I also did a separate test where I read the older format from 12_4_0 with FWLite in a script using the C++-like interface to FWLite. Again, the test passed even if I removed clear(). I didn't notice any difference, neither cmsRun nor FWLite needed to clear in my experiments...

I'm not worried about calling clear. Clearing an already empty vector shouldn't cause any problems. And I expect the performance cost would be negligible. I'm just a little worried why it was needed and I don't like it that we don't understand it... Maybe my test doesn't meet all the requirements to trigger the problem or maybe it is because the test where the problem was observed used the python interface or something is wrong in ROOT or maybe there is some other reason.

That is as far as I got with this. If you want me to pursue this further, let me know. Without taking a deep dive into ROOT or the python interface, I'm not sure what else I can do. Maybe someone else is already looking at this and I should just move on...

Note the new test in this PR does use 10 events with varying content (which was the suggestion that started this discussion).

* a file written by the same release
* files written by (some) earlier releases

If the persistent format of any Scouting data format gets changed in the future, please adjust the `TestReadRun3Scouting` and `TestWriteRun3Scouting` modules accordingly. It is important that every member container has some content in this test. Please also add a new file to the [https://github.com/cms-data/DataFormats-Scouting/](https://github.com/cms-data/DataFormats-Scouting/) repository, and update the `TestRun3ScoutingDataFormats` unit test to read the newly created file. The file name should contain the release or pre-release with which it was written. There will probably be analogous tests added in the future for run 2 and future runs which will need similar maintenance, although it is unlikely the run 2 formats will change anymore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd update the text with the outcome of the discussion in cms-data/DataFormats-Scouting#1 (comment) . I'd also add here a comment along "if the latest file of Run 3 scouting before the update has not been used in data taking, the file can be deleted".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 32 to 33
# 12_4_0 was chosen because that release was used for 2022 data.
# 13_0_3 is being used for 2023 data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something along

Suggested change
# 12_4_0 was chosen because that release was used for 2022 data.
# 13_0_3 is being used for 2023 data.
# 12_4_0 was chosen because the 12_4_X release cycle was used for 2022 data, and the scouting data formats were not updated after 12_4_0.
# 13_0_3 was chosen because the 13_0_X release cycle was used for 2023 data, and the scouting data formats and ROOT were not updated after 13_0_3.

(in order to record the history a bit more accurately)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// to check that values read from persistent storage match the values
// we know were written.

std::vector<double> expectedCaloJetsValues_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like all of these could be const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made all the data members const in both modules.

Comment on lines 169 to 171
if (expectedCaloJetsValues_.size() != 16) {
throwWithMessage("analyzeCaloJets, test configuration error, expectedCaloJetsValues must have size 16");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this check (also elsewhere where the pattern is used) could be moved to constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved them to the constructor. Thanks.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 1, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41834/35750

  • This PR adds an extra 28KB to repository

  • Found files with invalid states:

    • DataFormats/Scouting/test/create_Run3Scouting_test_file.py:

@wddgit
Copy link
Contributor Author

wddgit commented Jun 2, 2023

I just force pushed a few more changes. It only affected file TestWriteRun3Scouting.cc. There was an error in the exception message. I fixed that and also simplified it by adding a function to throw the exception which removed a little duplication. And I squashed it into the last commit.

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 2, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41834/35758

  • This PR adds an extra 28KB to repository

  • Found files with invalid states:

    • DataFormats/Scouting/test/create_Run3Scouting_test_file.py:

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 2, 2023

Pull request #41834 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.

@wddgit
Copy link
Contributor Author

wddgit commented Jun 2, 2023

please test with cms-data/DataFormats-Scouting#1

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 2, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a96aee/32953/summary.html
COMMIT: 2844104
CMSSW: CMSSW_13_2_X_2023-06-02-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41834/32953/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a96aee/32953/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a96aee/32953/git-merge-result

Comparison Summary

Summary:

  • You potentially added 16 lines to the logs
  • Reco comparison results: 6 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3221457
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3221429
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

makortel commented Jun 3, 2023

+core

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 3, 2023

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

* a file written by the same release
* files written by (some) earlier releases

If the persistent format of any Scouting data format gets changed in the future, please adjust the `TestReadRun3Scouting` and `TestWriteRun3Scouting` modules accordingly. It is important that every member container has some content in this test. Please also add a new file to the [https://github.com/cms-data/DataFormats-Scouting/](https://github.com/cms-data/DataFormats-Scouting/) repository, and update the `TestRun3ScoutingDataFormats` unit test to read the newly created file. The file name should contain the version numbers of the data format classes (from classes_def.xml) in alphabetical order and the release or pre-release with which it was written. If the latest file of Run 3 scouting before the update has not been used in data taking, the file can be deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't be bad to also rearrange classes_def.xml in order to have the classes listed there in alphabetical order, in order to avoid some head hache to who's trying to associate the versions in the name of the data file to the classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @perrotta for the suggestion, I agree a matching order in the classes_def.xml would help. @wddgit Could you take care of it (either as part of the Run2 scouting PR, or in a separate PR)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll include it in the Run 2 Scouting PR which I am currently implementing. A good idea.

@perrotta
Copy link
Contributor

perrotta commented Jun 4, 2023

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants