Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reaction generation using multiprocessing #1459

Merged
merged 28 commits into from
Jun 3, 2019
Merged

Reaction generation using multiprocessing #1459

merged 28 commits into from
Jun 3, 2019

Conversation

ajocher
Copy link
Contributor

@ajocher ajocher commented Aug 30, 2018

Motivation or Problem

Scoop has been found to increase the memory consumption until the OS's out of memory killer kills processes and the reaction generation fails. Consequently, python's multiprocessing module is implemented for MAC and Linux and a deprecation warning is added to scoop references.

Description of Changes

Multiprocessing is implemented for reaction generation in rmgpy/rmg/react.py. The processes are spawned and closed on a single node within the function 'react()'. The number of processes is determined based on the ratio of currently available RAM and currently used RAM. The user can input the maximum number of allowed processes from the command line. For each reaction generation the number of processes will be the minimum value either being the number of allowed processes due to user input or the value obtained by the RAM ratio. The RAM limitation is employed, because multiprocessing is forking the base process and the memory limit (SWAP + RAM) might be exceeded when using too many processes for a base process large in memory.

Multiprocessing is employed from the command line using the -n command and the maximum number of processes the user want to use, here 4. The default is 1 process, only an integer value smaller than the number of available cpu is allowed.
python rmg -n 4 input.py

Testing

The implementation has been tested for the superminimal, minimal and PDD cases on both MAC and RMG-server, ranging from 1 to 24 processes on a single node. An example submission script for the RMG-server is attached here:
submit.txt

Reviewer Tips

Try your own cases and document the changes in run time and memory consumption.

@codecov
Copy link

codecov bot commented Aug 30, 2018

Codecov Report

Merging #1459 into master will increase coverage by 0.07%.
The diff coverage is 57.48%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #1459      +/-   ##
=========================================
+ Coverage   41.53%   41.6%   +0.07%     
=========================================
  Files         176     176              
  Lines       29111   29142      +31     
  Branches     5975    5990      +15     
=========================================
+ Hits        12090   12125      +35     
+ Misses      16189   16173      -16     
- Partials      832     844      +12
Impacted Files Coverage Δ
rmgpy/species.py 0% <0%> (ø) ⬆️
rmgpy/thermo/thermoengine.py 82.89% <100%> (-0.44%) ⬇️
rmgpy/data/kinetics/database.py 49.37% <100%> (+2.72%) ⬆️
rmgpy/data/kinetics/family.py 52.95% <28.57%> (+0.29%) ⬆️
rmgpy/scoop_framework/util.py 65.51% <33.33%> (-0.64%) ⬇️
rmgpy/qm/main.py 65.32% <41.37%> (-7.31%) ⬇️
rmgpy/rmg/model.py 38.46% <48%> (-0.43%) ⬇️
rmgpy/rmg/pdep.py 16.12% <50%> (ø) ⬆️
rmgpy/rmg/main.py 22.72% <56%> (+0.56%) ⬆️
rmgpy/rmg/react.py 87.03% <86.36%> (+8.6%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ce334d...f749f49. Read the comment docs.

Copy link
Contributor

@mliu49 mliu49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I made some comments and suggestions. It would be nice to see some profiling results before this is merged.

Regarding trailing whitespaces and other Codacy issues, you generally only need to address issues in lines that you changed. Removing all trailing whitespaces in a file is not necessary.

rmg.py Outdated
@@ -82,6 +82,10 @@ def parse_command_line_arguments(command_line_args=None):
parser.add_argument('-t', '--walltime', type=str, nargs=1, default='00:00:00:00',
metavar='DD:HH:MM:SS', help='set the maximum execution time')

# Add option to select max number of processes for reaction generation
parser.add_argument('-n', '--maxProc', type=int, nargs=1, default=1,
help='used by multiprocessing in react.py')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be the most helpful info, since users may not know what react.py is. Perhaps max number of processors to use during reaction generation or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pass

if maxProc > psutil.cpu_count():
raise ValueError('Invalid format for user defined maximum number of procesors {0}; should be an integer and smaller or equal to your available number of cpu {1}'.format(maxProc, psutil.cpu_count()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like you're checking the format here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 'Invalid input ...'

spcs.extend(rxn.reactants)
spcs.extend(rxn.products)

ensure_independent_atom_ids(spcs, resonance=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to ensure_independent_atom_ids here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise Species.getResonanceHybrid() fails due to not having valid atom ids. We call ensure_independent_atom_ids in generate_reactions_from_families, however, this is not working for multiprocessing. Using ensure_independent_atom_ids here fixes that issue.

tmp = divmod(user_mem, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
tmp2 = min(maxProc, tmp[0])
procNum = max(1, tmp2)
print 'For reaction generation {0} processes are used.'.format(procNum)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We like to avoid print statements in favor of using the logging module. So you would just replace print with logging.info() or maybe logging.debug depending on whether you want this to be printed normally.

Something you should consider is how many times this will be printed (seems like it will be a lot) and whether it's useful info for a typical user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It will be printed for each time we are spawning a pool of worker. It might be of interest as the number of processes/worker might differ from the user input, depending on the available RAM and the memory consumption of the RMG base process before spawning the workers.

rmg.py Show resolved Hide resolved
@@ -160,6 +172,8 @@ def submit_(func, *args, **kwargs):
returns the return value of the called function, or
when SCOOP is loaded, the future object.
"""
warnings.warn("The option scoop is no longer supported"\
" and may be removed in Version: 2.3 ", DeprecationWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these warnings, could you indent the second line to line up with the " in the first line? Also, the slashes shouldn't be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Got the template from here:
https://github.com/ReactionMechanismGenerator/RMG-Py/wiki/RMG-Contributor-Guidelines
I guess it should be changed accordingly?

tmp = divmod(usermem, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

# Get available RAM (GB)and procnum dependent on OS
if platform == "linux" or platform == "linux2":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use platform.startswith('linux') here to check for both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

# OS X
memoryavailable = psutil.virtual_memory().available/(1000.0 ** 3)
memoryuse = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/(1000.0 ** 3)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add handling for Windows as well. It doesn't have to be equivalent handling, but you could simply add an else for any other platform and perhaps only use one processor in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but couldn't test it. Is someone with a Windows OS volunteering?

@ajocher
Copy link
Contributor Author

ajocher commented Sep 11, 2018

presentation1209.pdf

@mliu49
Copy link
Contributor

mliu49 commented Mar 25, 2019

Hmm, something seems to have gone wrong with your rebase. This branch now includes a ton of commits from master.

@ajocher ajocher force-pushed the parallelRMG_RXNgen branch 4 times, most recently from ee97295 to 915910b Compare March 29, 2019 18:52
@mliu49
Copy link
Contributor

mliu49 commented Mar 29, 2019

I added a commit refactoring the code for family splitting. Let me know what you think, and whether it matches your original design.

rmgpy/rmg/model.py Outdated Show resolved Hide resolved
rmgpy/rmg/model.py Outdated Show resolved Hide resolved
# This method chops the iterable into a number of chunks which it
# submits to the process pool as separate tasks.
p = Pool(processes=procnum)
p.map(calculate_thermo_parallel,spcs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple concerns about the overall implementation for parallel QM. Normally, thermo is calculated for one species at a time in self.processNewReactions() below. Here, you determine all species which would be calculated using QM and pre-calculate their thermo in parallel, then in processNewReactions, it sees the thermo and does not re-calculate it.

  1. The calculate_thermo_parallel function seems to skip a lot of processing done in ThermoDatabase.getThermoData and thermoEngine.processThermoData, which can have a significant effect on the final values.
  2. It appears that you're using SMILES to determine whether species are unique? This is not safe because different resonance structures will have different SMILES. Also, generating SMILES for every new molecule will be relatively time consuming.

Minor suggestions, which may or may not be relevant after resolving the above issues:

  • Since the procnum determination based on RAM is done both here and in react.py, it might be useful to turn it into a function somewhere and import it here.
  • I would also split this section off as a new method, since enlarge is already a really big method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thought behind the implemented procedure is that the QM files should be generated in parallel and then looked up for one species at a time in self.processNewReactions() below. Testing showed that calculating QM thermo for one species at a time is more time consuming than in a bulk up here.

I use SMILES for comparison and then generate resonance structures within calculate_thermo_parallel() to generate QM files for each resonance structure. What would be a better way to ensure that the same QM file is not generated twice/overlapping during the parallel generation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SMILES comparison is now substituted with isIsomorphic comparison.

The procnum determination is now a function and imported for QMTP parallel and reaction generation in parallel.

And the parallel QM files writing section is split off outside of enlarge.

@ajocher ajocher force-pushed the parallelRMG_RXNgen branch from 8e3e27b to 55e1fc6 Compare March 29, 2019 22:06
@ajocher ajocher force-pushed the parallelRMG_RXNgen branch 3 times, most recently from e82fe61 to d7a7f84 Compare April 10, 2019 20:57
@mliu49 mliu49 force-pushed the parallelRMG_RXNgen branch from d7a7f84 to 63e2976 Compare April 11, 2019 19:00
@ajocher ajocher force-pushed the parallelRMG_RXNgen branch 3 times, most recently from 6ed0fc6 to d27647b Compare April 21, 2019 03:24
@mliu49 mliu49 force-pushed the parallelRMG_RXNgen branch from e69745b to 7318b37 Compare May 30, 2019 22:46
@mliu49
Copy link
Contributor

mliu49 commented May 30, 2019

I changed the species re-labeling to only be done for new species created from reaction generation. Turns out that thermo generation for input species was happening earlier than we thought.

I also removed the whitespace changes to clean up the PR a bit.

@mliu49 mliu49 force-pushed the parallelRMG_RXNgen branch from ab4afeb to f362c9d Compare May 31, 2019 20:56
Reduce number of families being tested
Make separate tests for serial and parallel processing
@mliu49 mliu49 force-pushed the parallelRMG_RXNgen branch from f362c9d to f749f49 Compare May 31, 2019 22:07
@mliu49
Copy link
Contributor

mliu49 commented Jun 3, 2019

I think this is ready code-wise. Are there any additional tests that you think should be run?

@ajocher
Copy link
Contributor Author

ajocher commented Jun 3, 2019

I think this is ready code-wise. Are there any additional tests that you think should be run?

Just re-tested, QMTP, pdep, and thermo filtering in serial and parallel and seems to work fine. Ready to go from my side.

Copy link
Contributor

@mliu49 mliu49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooray!

@mliu49 mliu49 merged commit f90526f into master Jun 3, 2019
@mliu49 mliu49 deleted the parallelRMG_RXNgen branch June 3, 2019 19:37
@alongd
Copy link
Member

alongd commented Jul 5, 2019

Should we update the documentation re parallelization?
http://reactionmechanismgenerator.github.io/RMG-Py/users/rmg/running.html?highlight=parallel

@ajocher
Copy link
Contributor Author

ajocher commented Jul 5, 2019

It is updated. Where do you think it needs more update?

mliu49 added a commit that referenced this pull request Jul 17, 2019
Motivation:
- processNewReactions requires a newSpecies argument which was not
being properly identified after multiprocessing changes in #1459

Background:
- The newSpecies argument is retained from early RMG when reactions
were generated right after adding a new species to the core.
- Hence, newSpecies referred to the newly added core species which
was used to generate the new reactions.
- The overall model generation algorithm has since changed, so
identifying newSpecies is not as straightforward.
- The multiprocessing PR changed newSpecies to be any core species,
missing the original purpose of identifying the species from which
the reaction was generated.

Changes:
- Use the species tuples created during reaction generation to
keep track of which species were used to generate the reaction
- Update unit tests which are affected by the changes in return values
from react and react_all
mliu49 added a commit that referenced this pull request Jul 17, 2019
Motivation:
- processNewReactions requires a newSpecies argument which was not
being properly identified after multiprocessing changes in #1459

Background:
- The newSpecies argument is retained from early RMG when reactions
were generated right after adding a new species to the core.
- Hence, newSpecies referred to the newly added core species which
was used to generate the new reactions.
- The overall model generation algorithm has since changed, so
identifying newSpecies is not as straightforward.
- The multiprocessing PR changed newSpecies to be any core species,
missing the original purpose of identifying the species from which
the reaction was generated.

Changes:
- Use the species tuples created during reaction generation to
keep track of which species were used to generate the reaction
- Update unit tests which are affected by the changes in return values
from react and react_all
mliu49 added a commit that referenced this pull request Jul 18, 2019
Motivation:
- processNewReactions requires a newSpecies argument which was not
being properly identified after multiprocessing changes in #1459

Background:
- The newSpecies argument is retained from early RMG when reactions
were generated right after adding a new species to the core.
- Hence, newSpecies referred to the newly added core species which
was used to generate the new reactions.
- The overall model generation algorithm has since changed, so
identifying newSpecies is not as straightforward.
- The multiprocessing PR changed newSpecies to be any core species,
missing the original purpose of identifying the species from which
the reaction was generated.

Changes:
- Use the species tuples created during reaction generation to
keep track of which species were used to generate the reaction
- Update unit tests which are affected by the changes in return values
from react and react_all
mliu49 added a commit that referenced this pull request Jul 19, 2019
Motivation:
- processNewReactions requires a newSpecies argument which was not
being properly identified after multiprocessing changes in #1459

Background:
- The newSpecies argument is retained from early RMG when reactions
were generated right after adding a new species to the core.
- Hence, newSpecies referred to the newly added core species which
was used to generate the new reactions.
- The overall model generation algorithm has since changed, so
identifying newSpecies is not as straightforward.
- The multiprocessing PR changed newSpecies to be any core species,
missing the original purpose of identifying the species from which
the reaction was generated.

Changes:
- Use the species tuples created during reaction generation to
keep track of which species were used to generate the reaction
- Update unit tests which are affected by the changes in return values
from react and react_all
mliu49 added a commit that referenced this pull request Jul 20, 2019
Motivation:
- processNewReactions requires a newSpecies argument which was not
being properly identified after multiprocessing changes in #1459

Background:
- The newSpecies argument is retained from early RMG when reactions
were generated right after adding a new species to the core.
- Hence, newSpecies referred to the newly added core species which
was used to generate the new reactions.
- The overall model generation algorithm has since changed, so
identifying newSpecies is not as straightforward.
- The multiprocessing PR changed newSpecies to be any core species,
missing the original purpose of identifying the species from which
the reaction was generated.

Changes:
- Use the species tuples created during reaction generation to
keep track of which species were used to generate the reaction
- Update unit tests which are affected by the changes in return values
from react and react_all
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants