read chunks of lhe compressed files in LHEReader #10274

davidlange6 · 2015-07-19T13:42:15Z

now considerably faster.. per file overhead is now purely xml parsing.

cmsbuild · 2015-07-19T13:51:14Z

A new Pull Request was created by @davidlange6 (David Lange) for CMSSW_7_1_X.

read chunks of lhe compressed files in LHEReader

It involves the following packages:

GeneratorInterface/LHEInterface

@vciulli, @cmsbuild, @covarell, @bendavid, @thuer can you please review it and eventually sign? Thanks.
@mkirsano this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
If you are a L2 or a release manager you can ask for tests by saying 'please test' in the first line of a comment.
@Degano you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

bendavid · 2015-07-19T17:03:13Z

Does this only affect the compressed case, or does it also speed up uncompressed reading?

davidlange6 · 2015-07-19T19:13:33Z

I implemented only the compressed case.

On Jul 19, 2015, at 7:03 PM, Josh Bendavid notifications@github.com wrote:

Does this only affect the compressed case, or does it also speed up uncompressed reading?

—
Reply to this email directly or view it on GitHub.

bbockelm · 2015-07-20T14:08:53Z

Something similar can be done in the uncompressed case. I would expect speedups, but perhaps not quite as impressive - I can't tell from reading the code what kind of buffering Xerces might do internally.

@davidlange6 - two thoughts:

You can avoid keeping the separate uncompressed buffer and reading directly into the one provided by Xerces. This would save a significant amount of memory and avoid the memcpy.
Another alternate I just thought of is to add the IOFlags::OpenWrap flag when opening the file (https://github.com/davidlange6/cmssw/blob/lhereader/GeneratorInterface/LHEInterface/src/LHEReader.cc#L56); this will turn on lazy-download if possible. That should benefit both the compressed and uncompressed case.

bendavid · 2015-07-20T14:10:05Z

If lazy download uses (and does not free up as it goes) a significant amount of disk space per file, then that would kill the current workflow by filling up the worker node disk.

bbockelm · 2015-07-20T14:12:50Z

Well, lazy-download cleans up at file close. It also will automatically turn itself off if less than N GB is available on the local disk (IIRC, N=4).

What's the size of the individual pLHE files?

bbockelm · 2015-07-20T16:32:18Z

@davidlange6 - what pset did you use to test this?

bbockelm · 2015-07-20T17:15:08Z

David sent me a sample pset and LHE file off-list.

The good news for the uncompressed case is that it appears Xerces uses a 48KB buffer. While I still prefer larger (MB-sized), this is much better than the compressed case (8KB with random reads).

I'm going to tweak the patch to take into account my suggestions above.

bbockelm · 2015-07-20T20:15:34Z

FYI - my revised PR is in #10287.

read chunks of lhe compressed files

e91b37a

cmsbuild added this to the Next CMSSW_7_1_X milestone Jul 19, 2015

cmsbuild added comparison-pending generators-pending orp-pending pending-signatures tests-pending labels Jul 19, 2015

davidlange6 closed this Jul 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read chunks of lhe compressed files in LHEReader #10274

read chunks of lhe compressed files in LHEReader #10274

davidlange6 commented Jul 19, 2015

cmsbuild commented Jul 19, 2015

bendavid commented Jul 19, 2015

davidlange6 commented Jul 19, 2015

bbockelm commented Jul 20, 2015

bendavid commented Jul 20, 2015

bbockelm commented Jul 20, 2015

bbockelm commented Jul 20, 2015

bbockelm commented Jul 20, 2015

bbockelm commented Jul 20, 2015

read chunks of lhe compressed files in LHEReader #10274

read chunks of lhe compressed files in LHEReader #10274

Conversation

davidlange6 commented Jul 19, 2015

cmsbuild commented Jul 19, 2015

bendavid commented Jul 19, 2015

davidlange6 commented Jul 19, 2015

bbockelm commented Jul 20, 2015

bendavid commented Jul 20, 2015

bbockelm commented Jul 20, 2015

bbockelm commented Jul 20, 2015

bbockelm commented Jul 20, 2015

bbockelm commented Jul 20, 2015