Reduce memory footprint of parsed datacard #791

nsmith- · 2022-09-01T16:54:44Z

Previously we were storing nbins*nproc*nsyst of mostly zeros for the nuisance parameter effect info (errline)
Python floats also take 24 bytes, due to object boxing

>>> sys.getsizeof(0.0)
24

With this change, the parsed datacard for STXSStage1p2full now takes 90MB vs. previous 2.7GB to store the errlines

Previously we were storing nbins*nproc*nsyst of mostly zeros Python floats also take 24 bytes, due to object boxing

nsmith- · 2022-09-01T17:01:22Z

Other significant memory usage in t2w comes from extracting the shapes from input files. In particular:

HiggsAnalysis-CombinedLimit/python/ShapeTools.py

Lines 723 to 725 in 385ac38

    
           if (file, wname) not in self.wspnames: 
        
               self.wspnames[(file, wname)] = file.Get(wname) 
        
           self.wsp = self.wspnames[(file, wname)]

caches all input workspaces (for good reason, they may be expensive to continuously re-open) from which various objects may be read, and

HiggsAnalysis-CombinedLimit/python/ShapeTools.py

Lines 839 to 847 in 385ac38

    
           ret = file.Get(objname) 
        
           if not ret: 
        
               if allowNoSyst: 
        
                   return None 
        
               raise RuntimeError("Failed to find %s in file %s (from pattern %s, %s)" % (objname, finalNames[0], names[1], names[0])) 
        
           ret.SetName("shape%s_%s_%s%s" % (postFix, process, channel, "_" + syst if syst else "")) 
        
           if self.options.verbose > 2: 
        
               print("import (%s,%s) -> %s\n" % (finalNames[0], objname, ret.GetName())) 
        
           _cache[(channel, process, syst)] = ret

caches all histograms extracted from the input. This latter case I think is optional, because these histograms are likely read once and the data is anyway copied into the RooFit/Combine object that goes into the output workspace.

nsmith- · 2022-09-02T20:10:28Z

Ok the initial implementation is apocalyptically slow. Trying a new one

kcormi · 2023-05-15T09:25:56Z

Hi Nick,

Did you manage to speed up the implementation? Or is this still a work in progress?
If you've got something which is saving a lot of memory/speeding things up significantly for large models, it would be good to get it merged.

nsmith- · 2023-05-15T16:01:13Z

I haven't had a chance to revisit, but where I left off I found it very challenging to maintain the current performance while migrating away from dictionaries to even just defaultdict or something else that doesn't store millions of 0.0 entries. It turns out it is very hard to beat dictionary performance in python! Perhaps if everywhere we switched to a errline.get(val, default) instead of unchecked access then it may stay performant. On the other hand, a general datacard parser rewrite is long overdue.

kcormi · 2023-05-16T08:28:45Z

Yes, maybe using dict.get with a default could work. We are trying to balance still making smaller piece-by-piece improvements with larger overall changes and rewrites. If you think its worth trying a couple of more things to make this work (e.g. dict.get) then that would be great. But also fine if you think this is just better suited to something we try to handle in a larger rewrite (though that unfortunately means it will take longer to be implemented).

Reduce memory footprint of parsed datacard

81baeef

Previously we were storing nbins*nproc*nsyst of mostly zeros Python floats also take 24 bytes, due to object boxing

nsmith- added the safe to test label Sep 1, 2022

nsmith- added 2 commits November 21, 2022 15:37

Go back to native dictionary (for speed)

0b4165b

Try to fix all cases of unchecked access to errline

fe73cd7

nsmith- changed the base branch from 112x to main January 18, 2023 04:04

nsmith- mentioned this pull request Apr 11, 2024

Moved to ordered collections #936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory footprint of parsed datacard #791

Reduce memory footprint of parsed datacard #791

nsmith- commented Sep 1, 2022

nsmith- commented Sep 1, 2022

nsmith- commented Sep 2, 2022

kcormi commented May 15, 2023

nsmith- commented May 15, 2023 •

edited

Loading

kcormi commented May 16, 2023

Reduce memory footprint of parsed datacard #791

Are you sure you want to change the base?

Reduce memory footprint of parsed datacard #791

Conversation

nsmith- commented Sep 1, 2022

nsmith- commented Sep 1, 2022

nsmith- commented Sep 2, 2022

kcormi commented May 15, 2023

nsmith- commented May 15, 2023 • edited Loading

kcormi commented May 16, 2023

nsmith- commented May 15, 2023 •

edited

Loading