rxn.reactants.sort() without providing sorting key #388

KEHANG · 2015-05-22T21:59:26Z

In checkForExistingReaction(self, rxn), rxn.reactant.sort() is firstly implemented to created unique order of multiple reactants. But this sort() doesn't provide any sorting key. After some testing, I guess sort() method if no sorting key provided, is sorting based on object's memory address (it's only a hypothesis, anyone who is familiar with it please let us know.), which makes the order of reactants in a same reaction non-deterministic if you run the same RMG job multiple times (I've already observed this phonomenon).

But the reason why currently RMG still can produce same final mechanism is that we only allow one species instance to exist in memory for one species.

The potential problem of this sort() implementation is when unpickling restart files, the memory address of loaded reactants will change so the order of reactants in reactions should change accoordingly, if not, some existing reactions cannot be recognized further leading to multiple DUPLICATE reactions in rmg.reactionModel.reactionDict.

I suggestion this sort() should be provided by some appropriate sorting key to make it more deterministic. I've tried species.index since every reactive species will have a unique index at this time point (when implementing checkForExistingReaction(self, rxn)), which can now solve the non-deterministic problem. But more thoughts about sorting key are welcome.

The text was updated successfully, but these errors were encountered:

nickvandewiele · 2015-05-26T14:00:58Z

The example you describe in the third paragraph would be an excellent unit test.

Would the following example sometimes fail because the memory addresses are allocated in a random way?

import cPickle
from rmgpy.species import Species
from rmgpy.reaction import Reaction

a = Species(...)
b = Species(...)

r = Reaction()
r.reactants = [a,b]

r.reactants.sort()

r_pickle = cPickle.loads(cPickle.dumps(r))
r_pickle.reactants.sort()

for spc, spc_pickle in zip(r.reactants, r_pickle.reactants):
    assert.assertTrue(spc.isIsomorphic(spc_pickle)

KEHANG · 2015-05-26T15:17:26Z

The example you provided will always return True. The non-deterministic feature isn't because the order of memory allocation (unpickling) for pickled objects is random. The unpickling order is actually fixed but would be complicated enough when you are unpickling a large hierachical structure like self.reactionModel and the unpickling order is not same as the order you create those objects previously. The following example (it will fail) can demonstrate what I just said:

import cPickle
from rmgpy.species import Species
from rmgpy.reaction import Reaction

a = Species(index=1, label='CO')
b = Species(index=2, label = 'CO2')
c = Species(index=3, label='C2O')

r1 = Reaction()
r1.reactants = [a,b]
r2 = Reaction()
r2.reactants = [a,c]
r3 = Reaction()
r3.reactants = [b,c]

r1.reactants.sort()
r2.reactants.sort()
r3.reactants.sort()

complicatedStructureExample = [r2,r1,r3]

[r2_pickle,r1_pickle,r3_pickle] = cPickle.loads(cPickle.dumps(complicatedStructureExample))
r3_pickle.reactants.sort()

for spc, spc_pickle in zip(r3.reactants, r3_pickle.reactants):
    print (spc.index == spc_pickle.index)

KEHANG · 2015-05-26T15:19:59Z

After made some modification, the test will be always successful.

import cPickle
from rmgpy.species import Species
from rmgpy.reaction import Reaction

a = Species(index=1, label='CO')
b = Species(index=2, label = 'CO2')
c = Species(index=3, label='C2O')

r1 = Reaction()
r1.reactants = [a,b]
r2 = Reaction()
r2.reactants = [a,c]
r3 = Reaction()
r3.reactants = [b,c]

r1.reactants.sort(key = lambda reactant: reactant.index)
r2.reactants.sort(key = lambda reactant: reactant.index)
r3.reactants.sort(key = lambda reactant: reactant.index)

complicatedStructureExample = [r2,r1,r3]

[r2_pickle,r1_pickle,r3_pickle] = cPickle.loads(cPickle.dumps(complicatedStructureExample))
r3_pickle.reactants.sort(key = lambda reactant: reactant.index)

for spc, spc_pickle in zip(r3.reactants, r3_pickle.reactants):
    print (spc.index == spc_pickle.index)

rwest · 2015-05-26T23:52:00Z

Sounds reasonable. It may need to be fast (does this get done a lot?), but I think the key approach is better than the cmp approach for this.

And yes you're right, from https://docs.python.org/2/library/stdtypes.html#comparisons:

CPython implementation detail: Objects of different types except numbers are ordered by their type names; objects of the same types that don’t support proper comparison are ordered by their address.

This is worth reading too: https://docs.python.org/2/reference/datamodel.html#object.__hash__
As long as it doesn't happen before we check the edge for uniqueness, we may be able to use our species index (the order it was added to the edge) as a hash and in comparitors. Would that help with @nickvandewiele's other dictionary/set efforts?

mliu49 · 2019-12-04T20:18:03Z

Species sorting now uses comparison methods (implemented in 0c0c7ed), which use the new sorting_key property (implemented in c90c407). This will provide deterministic sorting, although the current implementation depends on the species label and index, which can change depending on thermo libraries and the order that species are generated.

rwest mentioned this issue Jul 8, 2015

Ordering of species in reaction is nondeterministic #409

Closed

mliu49 mentioned this issue Apr 4, 2019

Chemical identity comparison for Molecules and Species #1329

Merged

mliu49 mentioned this issue Jun 6, 2019

Report reactants and products in a consistent deterministic order #1616

Closed

mliu49 closed this as completed Dec 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rxn.reactants.sort() without providing sorting key #388

rxn.reactants.sort() without providing sorting key #388

KEHANG commented May 22, 2015

nickvandewiele commented May 26, 2015

KEHANG commented May 26, 2015

KEHANG commented May 26, 2015

rwest commented May 26, 2015

mliu49 commented Dec 4, 2019

rxn.reactants.sort() without providing sorting key #388

rxn.reactants.sort() without providing sorting key #388

Comments

KEHANG commented May 22, 2015

nickvandewiele commented May 26, 2015

KEHANG commented May 26, 2015

KEHANG commented May 26, 2015

rwest commented May 26, 2015

mliu49 commented Dec 4, 2019