Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rxn.reactants.sort() without providing sorting key #388

Closed
KEHANG opened this issue May 22, 2015 · 5 comments
Closed

rxn.reactants.sort() without providing sorting key #388

KEHANG opened this issue May 22, 2015 · 5 comments

Comments

@KEHANG
Copy link
Member

KEHANG commented May 22, 2015

In checkForExistingReaction(self, rxn), rxn.reactant.sort() is firstly implemented to created unique order of multiple reactants. But this sort() doesn't provide any sorting key. After some testing, I guess sort() method if no sorting key provided, is sorting based on object's memory address (it's only a hypothesis, anyone who is familiar with it please let us know.), which makes the order of reactants in a same reaction non-deterministic if you run the same RMG job multiple times (I've already observed this phonomenon).

But the reason why currently RMG still can produce same final mechanism is that we only allow one species instance to exist in memory for one species.

The potential problem of this sort() implementation is when unpickling restart files, the memory address of loaded reactants will change so the order of reactants in reactions should change accoordingly, if not, some existing reactions cannot be recognized further leading to multiple DUPLICATE reactions in rmg.reactionModel.reactionDict.

I suggestion this sort() should be provided by some appropriate sorting key to make it more deterministic. I've tried species.index since every reactive species will have a unique index at this time point (when implementing checkForExistingReaction(self, rxn)), which can now solve the non-deterministic problem. But more thoughts about sorting key are welcome.

@nickvandewiele
Copy link
Contributor

The example you describe in the third paragraph would be an excellent unit test.

Would the following example sometimes fail because the memory addresses are allocated in a random way?

import cPickle
from rmgpy.species import Species
from rmgpy.reaction import Reaction

a = Species(...)
b = Species(...)

r = Reaction()
r.reactants = [a,b]

r.reactants.sort()

r_pickle = cPickle.loads(cPickle.dumps(r))
r_pickle.reactants.sort()

for spc, spc_pickle in zip(r.reactants, r_pickle.reactants):
    assert.assertTrue(spc.isIsomorphic(spc_pickle)

@KEHANG
Copy link
Member Author

KEHANG commented May 26, 2015

The example you provided will always return True. The non-deterministic feature isn't because the order of memory allocation (unpickling) for pickled objects is random. The unpickling order is actually fixed but would be complicated enough when you are unpickling a large hierachical structure like self.reactionModel and the unpickling order is not same as the order you create those objects previously. The following example (it will fail) can demonstrate what I just said:

import cPickle
from rmgpy.species import Species
from rmgpy.reaction import Reaction

a = Species(index=1, label='CO')
b = Species(index=2, label = 'CO2')
c = Species(index=3, label='C2O')

r1 = Reaction()
r1.reactants = [a,b]
r2 = Reaction()
r2.reactants = [a,c]
r3 = Reaction()
r3.reactants = [b,c]

r1.reactants.sort()
r2.reactants.sort()
r3.reactants.sort()

complicatedStructureExample = [r2,r1,r3]

[r2_pickle,r1_pickle,r3_pickle] = cPickle.loads(cPickle.dumps(complicatedStructureExample))
r3_pickle.reactants.sort()

for spc, spc_pickle in zip(r3.reactants, r3_pickle.reactants):
    print (spc.index == spc_pickle.index)

@KEHANG
Copy link
Member Author

KEHANG commented May 26, 2015

After made some modification, the test will be always successful.

import cPickle
from rmgpy.species import Species
from rmgpy.reaction import Reaction

a = Species(index=1, label='CO')
b = Species(index=2, label = 'CO2')
c = Species(index=3, label='C2O')

r1 = Reaction()
r1.reactants = [a,b]
r2 = Reaction()
r2.reactants = [a,c]
r3 = Reaction()
r3.reactants = [b,c]

r1.reactants.sort(key = lambda reactant: reactant.index)
r2.reactants.sort(key = lambda reactant: reactant.index)
r3.reactants.sort(key = lambda reactant: reactant.index)

complicatedStructureExample = [r2,r1,r3]

[r2_pickle,r1_pickle,r3_pickle] = cPickle.loads(cPickle.dumps(complicatedStructureExample))
r3_pickle.reactants.sort(key = lambda reactant: reactant.index)

for spc, spc_pickle in zip(r3.reactants, r3_pickle.reactants):
    print (spc.index == spc_pickle.index)

@rwest
Copy link
Member

rwest commented May 26, 2015

Sounds reasonable. It may need to be fast (does this get done a lot?), but I think the key approach is better than the cmp approach for this.

And yes you're right, from https://docs.python.org/2/library/stdtypes.html#comparisons:

CPython implementation detail: Objects of different types except numbers are ordered by their type names; objects of the same types that don’t support proper comparison are ordered by their address.

This is worth reading too: https://docs.python.org/2/reference/datamodel.html#object.__hash__
As long as it doesn't happen before we check the edge for uniqueness, we may be able to use our species index (the order it was added to the edge) as a hash and in comparitors. Would that help with @nickvandewiele's other dictionary/set efforts?

@mliu49
Copy link
Contributor

mliu49 commented Dec 4, 2019

Species sorting now uses comparison methods (implemented in 0c0c7ed), which use the new sorting_key property (implemented in c90c407). This will provide deterministic sorting, although the current implementation depends on the species label and index, which can change depending on thermo libraries and the order that species are generated.

@mliu49 mliu49 closed this as completed Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants