Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema evolution failures because of missing StreamerInfo #41348

Closed
makortel opened this issue Apr 14, 2023 · 18 comments
Closed

Schema evolution failures because of missing StreamerInfo #41348

makortel opened this issue Apr 14, 2023 · 18 comments

Comments

@makortel
Copy link
Contributor

This issue is to follow up #41246 (comment) and collect the situations where we have files that have StreamerInfo missing for classes that should have it, and document what will be/was done to mitigate the problem.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 14, 2023

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel makortel changed the title Missing StreamerInfo Schema evolution failures because of missing StreamerInfo Apr 14, 2023
@makortel
Copy link
Contributor Author

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor Author

The missing StreamerInfo was first noticed on files written in 13_0_0_pre3 (first pre-release with ROOT 6.26) as described in #41246 (comment) .

I checked the file /store/data/Run2023A/HLTPhysics/RAW/v1/000/365/775/00000/d4285cef-88e4-4bb8-a08d-82ae9ab440aa.root from /HLTPhysics/Run2023A-v1/RAW dataset. The file was repacked with 13_0_0. The scanfile.C script (#41246 (comment)) reports the following classes in the file to not have StreamerInfo

Missing: edm::ThinnedAssociationBranches
Missing: edm::HLTPathStatus
Missing: trigger::TriggerObject
Missing: edm::IndexIntoFile::RunOrLumiEntry
Missing: edm::StoredMergeableRunProductMetadata::SingleRunEntry
Missing: edm::StoredMergeableRunProductMetadata::SingleRunEntryAndProcess
Missing: edm::StoredProductProvenance

Of these, the following classes were used in non-empty containers (if a class is used only in empty containers, the StreamerInfo can be legitimately missing)

Missing: edm::HLTPathStatus
Missing: trigger::TriggerObject
Missing: edm::IndexIntoFile::RunOrLumiEntry
Missing: edm::StoredProductProvenance

I also tested adding a new member variable to edm::HLTPathStatus, and indeed reading in the file resulted a read error, confirming that the schema evolution did not work in this case.

We should develop the "workaround b" mentioned in #41246 (comment) (storing the StreamerInfo for all of the classes in the second list, maybe even all in the first list to play safe) for the case any of these classes would need to be changed in the future.

@makortel
Copy link
Contributor Author

makortel commented Apr 14, 2023

I checked also a RAW file from 2022 data taking /store/data/Run2022A/JetHT/RAW/v1/000/353/015/00000/39e2a323-e421-45af-98a3-b16ab080ed8e.root from /JetHT/Run2022A-v1/RAW (repacked with 12_3_4_patch3, which has ROOT 6.24/07, and got the same list of missing StreamerInfos. This observation implies that all 2022 RAW data is affected.

Going back to 2018 data, I checked /store/data/Run2018D/JetHT/RAW/v1/000/320/822/00000/028479AC-8697-E811-99E1-FA163EA21B5C.root from /JetHT/Run2018D-v1/RAW (repacked with CMSSW_10_1_9_patch1, which has ROOT 6.10/09), and that did not report any genuinely missing StreamerInfos (the edm::ThinnedAssociationBranches was reported as missing, but its container has 0 size, so the StreamerInfo being missing is expected). So Run 2 data seems to be ok.

@makortel
Copy link
Contributor Author

makortel commented Apr 14, 2023

Next I checked a scouting RAW file from 2022, file /store/data/Run2022G/ScoutingPFRun3/RAW/v1/000/362/353/00000/3297a9a1-ac45-4a56-ad2f-a6e85582569d.root from /ScoutingPFRun3/Run2022G-v1/RAW (repacked with CMSSW_12_4_10_patch3, which has ROOT 6.24/07). Same for /store/data/Run2022B/ScoutingPFRun3/RAW/v1/000/355/207/00000/fce17d2b-27fe-4d4f-9c85-44fded6488f1.root from /ScoutingPFRun3/Run2022B-v1/RAW (repacked with CMSSW_12_3_6).

I was a bit surprised to see that it did not report additional classes with missing StreamerInfo, because checking e.g. file /store/relval/CMSSW_12_4_0_pre4/RelValZEE_14/GEN-SIM-RECO/PU_124X_mcRun3_2021_realistic_v1-v1/2580000/4a1ae43b-f4b3-4ad9-b86e-a7d9f6fc5c40.root from dataset /RelValZEE_14/CMSSW_12_4_0_pre4-PU_124X_mcRun3_2021_realistic_v1-v1/GEN-SIM-RECO reports the following StreamerInfo missing

Missing: Run3ScoutingParticle
Missing: Run3ScoutingTrack
Missing: Run3ScoutingVertex

(these classes have not evolved since 12_4_0_pre4, so reading them works)

@makortel
Copy link
Contributor Author

Checking finally a RAW file repacked with 13_0_3 (which includes the fix mentioned in #41246 (comment)), file /store/data/Run2023A/HLTPhysics/RAW/v1/000/366/050/00000/454cd775-4024-4cf5-be09-dc0c6a740b94.root from /HLTPhysics/Run2023A-v1/RAW, shows no classes with missing StreamerInfo. The fix seems to be effective.

@makortel
Copy link
Contributor Author

For future reference, we had observed schema evolution problems with the Run3ScoutingParticle in #36908 (comment) and #37013, but unfortunately did not dig deep enough back then.

@wddgit
Copy link
Contributor

wddgit commented May 1, 2023

@smuzaffar Could you add data repositories for

    DataFormats-FEDRawData
    DataFormats-L1TGlobal
    DataFormats-HLTReco

I am implementing unit tests similar to the tests I implemented for TriggerResults in DataFormats/Common.

@smuzaffar
Copy link
Contributor

done @wddgit

@wddgit
Copy link
Contributor

wddgit commented May 30, 2023

@smuzaffar One more. Could you also add a data repository for

DataFormats-Scouting

@smuzaffar
Copy link
Contributor

@wddgit
Copy link
Contributor

wddgit commented Sep 20, 2023

@smuzaffar Hi, could you add a couple more repositories to cms-data for more raw data format unit tests.

DataFormats-SiStripCluster
DataFormats-DetId

Thanks

@smuzaffar
Copy link
Contributor

@wddgit , both of these data repos are available now

@dan131riley
Copy link

Following #43744 we're again having problems with Run3ScoutingVertex schema evolution?

----- Begin Fatal Exception 31-Jan-2024 05:07:03 CET-----------------------
An exception of category 'FileReadError' occurred while
   [0] Processing  Event run: 1 lumi: 141 event: 14003 stream: 1
   [1] Running path 'NANOAODoutput_step'
   [2] Prefetching for module NanoAODOutputModule/'NANOAODoutput'
   [3] Prefetching for module SimpleRun3ScoutingVertexFlatTableProducer/'primaryvertexScoutingTable'
   [4] While reading from source std::vector<Run3ScoutingVertex> hltScoutingPrimaryVertexPacker 'primaryVtx' HLT
   [5] Reading branch Run3ScoutingVertexs_hltScoutingPrimaryVertexPacker_primaryVtx_HLT.
   Additional Info:
      [a] Fatal Root Error: @SUB=TBufferFile::CheckByteCount
object of class vector<Run3ScoutingVertex> read too many bytes: 1086 instead of 822

----- End Fatal Exception -------------------------------------------------

@mmusich
Copy link
Contributor

mmusich commented Jan 31, 2024

Following #43744 we're again having problems with Run3ScoutingVertex schema evolution?

no, it comes from #43758, see #43758 (comment)

@makortel
Copy link
Contributor Author

#43828 adds a customize function (for cmsDriver.py) and uses it to fix the NanoAOD workflow. I think after this PR gets merged we could close this issue.

@makortel
Copy link
Contributor Author

makortel commented Feb 5, 2024

+core

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants