-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read chunks of lhe compressed files in LHEReader #10274
Conversation
A new Pull Request was created by @davidlange6 (David Lange) for CMSSW_7_1_X. read chunks of lhe compressed files in LHEReader It involves the following packages: GeneratorInterface/LHEInterface @vciulli, @cmsbuild, @covarell, @bendavid, @thuer can you please review it and eventually sign? Thanks. |
Does this only affect the compressed case, or does it also speed up uncompressed reading? |
I implemented only the compressed case.
|
Something similar can be done in the uncompressed case. I would expect speedups, but perhaps not quite as impressive - I can't tell from reading the code what kind of buffering Xerces might do internally. @davidlange6 - two thoughts:
|
If lazy download uses (and does not free up as it goes) a significant amount of disk space per file, then that would kill the current workflow by filling up the worker node disk. |
Well, lazy-download cleans up at file close. It also will automatically turn itself off if less than N GB is available on the local disk (IIRC, N=4). What's the size of the individual pLHE files? |
@davidlange6 - what pset did you use to test this? |
David sent me a sample pset and LHE file off-list. The good news for the uncompressed case is that it appears Xerces uses a 48KB buffer. While I still prefer larger (MB-sized), this is much better than the compressed case (8KB with random reads). I'm going to tweak the patch to take into account my suggestions above. |
FYI - my revised PR is in #10287. |
now considerably faster.. per file overhead is now purely xml parsing.