Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read chunks of lhe compressed files in LHEReader #10274

Closed
wants to merge 1 commit into from

Conversation

davidlange6
Copy link
Contributor

now considerably faster.. per file overhead is now purely xml parsing.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @davidlange6 (David Lange) for CMSSW_7_1_X.

read chunks of lhe compressed files in LHEReader

It involves the following packages:

GeneratorInterface/LHEInterface

@vciulli, @cmsbuild, @covarell, @bendavid, @thuer can you please review it and eventually sign? Thanks.
@mkirsano this is something you requested to watch as well.
You can sign-off by replying to this message having '+1' in the first line of your reply.
You can reject by replying to this message having '-1' in the first line of your reply.
If you are a L2 or a release manager you can ask for tests by saying 'please test' in the first line of a comment.
@Degano you are the release manager for this.
You can merge this pull request by typing 'merge' in the first line of your comment.

@bendavid
Copy link
Contributor

Does this only affect the compressed case, or does it also speed up uncompressed reading?

@davidlange6
Copy link
Contributor Author

I implemented only the compressed case.

On Jul 19, 2015, at 7:03 PM, Josh Bendavid notifications@github.com wrote:

Does this only affect the compressed case, or does it also speed up uncompressed reading?


Reply to this email directly or view it on GitHub.

@bbockelm
Copy link
Contributor

Something similar can be done in the uncompressed case. I would expect speedups, but perhaps not quite as impressive - I can't tell from reading the code what kind of buffering Xerces might do internally.

@davidlange6 - two thoughts:

  1. You can avoid keeping the separate uncompressed buffer and reading directly into the one provided by Xerces. This would save a significant amount of memory and avoid the memcpy.
  2. Another alternate I just thought of is to add the IOFlags::OpenWrap flag when opening the file (https://github.com/davidlange6/cmssw/blob/lhereader/GeneratorInterface/LHEInterface/src/LHEReader.cc#L56); this will turn on lazy-download if possible. That should benefit both the compressed and uncompressed case.

@bendavid
Copy link
Contributor

If lazy download uses (and does not free up as it goes) a significant amount of disk space per file, then that would kill the current workflow by filling up the worker node disk.

@bbockelm
Copy link
Contributor

Well, lazy-download cleans up at file close. It also will automatically turn itself off if less than N GB is available on the local disk (IIRC, N=4).

What's the size of the individual pLHE files?

@bbockelm
Copy link
Contributor

@davidlange6 - what pset did you use to test this?

@bbockelm
Copy link
Contributor

David sent me a sample pset and LHE file off-list.

The good news for the uncompressed case is that it appears Xerces uses a 48KB buffer. While I still prefer larger (MB-sized), this is much better than the compressed case (8KB with random reads).

I'm going to tweak the patch to take into account my suggestions above.

@bbockelm
Copy link
Contributor

FYI - my revised PR is in #10287.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants