Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calibration update 80 x #14198

Merged
merged 4 commits into from
Apr 28, 2016
Merged

Conversation

dimattia
Copy link
Contributor

Greetings,

this PR contains the fix for the memory issue reported here:
https://hypernews.cern.ch/HyperNews/CMS/get/calibrations/2291/2/1/1/1.html

Functional tests are ok. Profile plots will come tomorrow.

Cheers,
Alessandro

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @dimattia for CMSSW_8_1_X.

It involves the following packages:

CalibTracker/Configuration
CalibTracker/SiStripChannelGain
CalibTracker/SiStripCommon

@cmsbuild, @mmusich, @franzoni, @cerminar, @davidlange6 can you please review it and eventually sign? Thanks.
@ghellwig, @mverzett, @gbenelli, @tocheng, @jlagram this is something you requested to watch as well.
@slava77, @Degano, @smuzaffar you are the release manager for this.

cms-bot commands are list here #13028

@mmusich
Copy link
Contributor

mmusich commented Apr 21, 2016

please test

@cmsbuild
Copy link
Contributor

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/12567/console

@cmsbuild
Copy link
Contributor

@mmusich
Copy link
Contributor

mmusich commented Apr 21, 2016

Hi @dimattia, running wf 1001.0 I get now:

------------------------------------------------------------------------------------------
Detailed ressource usage information FOR ALL ME
------------------------------------------------------------------------------------------
subsystem/folder                          histograms       bins        Empty bins     Empty/Total      bins per       MB         kB per
                                           (total)        (total)        (total)                      histogram     (total)    histogram  
------------------------------------------------------------------------------------------
AlCaReco
 -> DtCalibSynch                            2060        566490        303210         0.535           275          2.16          1.07
 -> SiStrip                                14878       9522460       9153064         0.961           640          36.4          2.51
 -> SiStripGains                               9     177290000         92561      0.000522      1.97e+07           676      7.69e+04
 -> SiStripGainsAfterAbortGap                  9     177290000         92561      0.000522      1.97e+07           676      7.69e+04
 -> TkAlMinBias                              268        124031         21397         0.173           463         0.532          2.03
    SUBSYSTEM TOTAL                        17224     364792981       9662793        0.0265      2.12e+04      1.43e+03          82.7
..........................................................................................

when it used to be:

------------------------------------------------------------------------------------------
Detailed ressource usage information FOR ALL ME
------------------------------------------------------------------------------------------
subsystem/folder             histograms       bins        Empty bins     Empty/Total      bins per       MB         kB per
                                 (total)     (total)        (total)                       histogram     (total)    histogram  
------------------------------------------------------------------------------------------
AlCaReco
 -> DtCalibSynch                   2060        566490        303210         0.535           275          2.16          1.07
 -> SiStrip                       14878       9522460       9153064         0.961           640          36.4          2.51
 -> SiStripGains                     18     354580000        185170      0.000522      1.97e+07      1.35e+03      7.69e+04
 -> SiStripGainsAfterAbortGap        18     354580000        185170      0.000522      1.97e+07      1.35e+03      7.69e+04
 -> TkAlMinBias                     265        123382         21308         0.173           466          0.53          2.05
    SUBSYSTEM TOTAL               17239     719372332       9847922        0.0137      4.17e+04      2.81e+03           163
..........................................................................................

@cmsbuild
Copy link
Contributor

@mmusich
Copy link
Contributor

mmusich commented Apr 22, 2016

@dimattia we need also a backport in 80X once finished with the checks

@mmusich
Copy link
Contributor

mmusich commented Apr 22, 2016

memory

some crude profiling of RSS vs time in WF 1001.0 shows we should be almost back to 803.

@mmusich
Copy link
Contributor

mmusich commented Apr 22, 2016

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @Degano, @smuzaffar

@davidlange6
Copy link
Contributor

A ~1.5G increase isn't really "close" to 803... but definitely better than 804. will have a look in the next days

@davidlange6
Copy link
Contributor

@dimattia @mmusich
so I begin to realize that this is memory in a single threaded job...as its all an end-of-job blow up, this is a reminder as to why the dqm does not use something as simple as TFileService in production to store big histograms. We'll need to resolve that.

However, as a shorter term way to improving things, I have two questions regarding the big histograms:

  1. why not TH1S? Are there either 30k entries in a bin?
  2. was 2000 bins determined? Eg, Is the rms of the Landau you are fitting something like 3-5 bins?

@davidlange6
Copy link
Contributor

Hi @dimattia - any news on the questions from yesterday's ORP?

@dimattia
Copy link
Contributor Author

Hi,

@davidlange6 I checked the maximum number of entries using histograms that contains the “gold plated” statistics (e.g. 2x10^9 clusters) for a calibration run. Indeed there are no more than 3000 entries per bin for the maximum split of statistics (e.g. split per APV readout chip). We may have troubles if we merge the statistics per readout link or per module (e.g. going coarser than a single APV chip). Anyway this usually happens when the statistics is not enough to calibrate a single APV, hence I believe the statistics per single bin will never exceed 32000.

I will implement the switch TH2F -> TH2S in the next update of this pull request. Likely within the next hour.

Cheers,
Alessandro.

On 27 Apr 2016, at 12:01, David Lange notifications@github.com wrote:

Hi @dimattia https://github.com/dimattia - any news on the questions from yesterday's ORP?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub #14198 (comment)

@cmsbuild
Copy link
Contributor

Pull request #14198 was updated. @cmsbuild, @mmusich, @franzoni, @cerminar, @davidlange6 can you please check and sign again.

@mmusich
Copy link
Contributor

mmusich commented Apr 27, 2016

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 27, 2016

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/12675/console

@dimattia
Copy link
Contributor Author

Hi,

@davidlange6 the change TH2F - > TH2F has been implemented in the PR update. Sorry it took a bit more because I need to perform also the test for the gain validation.

Cheers,
Alessandro.

On 27 Apr 2016, at 18:48, cmsbuild notifications@github.com wrote:

The tests are being triggered in jenkins.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub #14198 (comment)

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

@mmusich
Copy link
Contributor

mmusich commented Apr 27, 2016

pr_14918

peak RSS measured in wf 1001.0 merging 9e47c92 is somewhere in 3.5 - 4 GB

@mmusich
Copy link
Contributor

mmusich commented Apr 27, 2016

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_8_1_X IBs (tests are also fine). This pull request requires discussion in the ORP meeting before it's merged. @slava77, @davidlange6, @Degano, @smuzaffar

@davidlange6 davidlange6 merged commit 8de78fe into cms-sw:CMSSW_8_1_X Apr 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants