-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rxn.reactants.sort() without providing sorting key #388
Comments
The example you describe in the third paragraph would be an excellent unit test. Would the following example sometimes fail because the memory addresses are allocated in a random way? import cPickle
from rmgpy.species import Species
from rmgpy.reaction import Reaction
a = Species(...)
b = Species(...)
r = Reaction()
r.reactants = [a,b]
r.reactants.sort()
r_pickle = cPickle.loads(cPickle.dumps(r))
r_pickle.reactants.sort()
for spc, spc_pickle in zip(r.reactants, r_pickle.reactants):
assert.assertTrue(spc.isIsomorphic(spc_pickle) |
The example you provided will always return True. The non-deterministic feature isn't because the order of memory allocation (unpickling) for pickled objects is random. The unpickling order is actually fixed but would be complicated enough when you are unpickling a large hierachical structure like import cPickle
from rmgpy.species import Species
from rmgpy.reaction import Reaction
a = Species(index=1, label='CO')
b = Species(index=2, label = 'CO2')
c = Species(index=3, label='C2O')
r1 = Reaction()
r1.reactants = [a,b]
r2 = Reaction()
r2.reactants = [a,c]
r3 = Reaction()
r3.reactants = [b,c]
r1.reactants.sort()
r2.reactants.sort()
r3.reactants.sort()
complicatedStructureExample = [r2,r1,r3]
[r2_pickle,r1_pickle,r3_pickle] = cPickle.loads(cPickle.dumps(complicatedStructureExample))
r3_pickle.reactants.sort()
for spc, spc_pickle in zip(r3.reactants, r3_pickle.reactants):
print (spc.index == spc_pickle.index) |
After made some modification, the test will be always successful. import cPickle
from rmgpy.species import Species
from rmgpy.reaction import Reaction
a = Species(index=1, label='CO')
b = Species(index=2, label = 'CO2')
c = Species(index=3, label='C2O')
r1 = Reaction()
r1.reactants = [a,b]
r2 = Reaction()
r2.reactants = [a,c]
r3 = Reaction()
r3.reactants = [b,c]
r1.reactants.sort(key = lambda reactant: reactant.index)
r2.reactants.sort(key = lambda reactant: reactant.index)
r3.reactants.sort(key = lambda reactant: reactant.index)
complicatedStructureExample = [r2,r1,r3]
[r2_pickle,r1_pickle,r3_pickle] = cPickle.loads(cPickle.dumps(complicatedStructureExample))
r3_pickle.reactants.sort(key = lambda reactant: reactant.index)
for spc, spc_pickle in zip(r3.reactants, r3_pickle.reactants):
print (spc.index == spc_pickle.index) |
Sounds reasonable. It may need to be fast (does this get done a lot?), but I think the key approach is better than the cmp approach for this. And yes you're right, from https://docs.python.org/2/library/stdtypes.html#comparisons:
This is worth reading too: https://docs.python.org/2/reference/datamodel.html#object.__hash__ |
Species sorting now uses comparison methods (implemented in 0c0c7ed), which use the new |
In
checkForExistingReaction(self, rxn)
,rxn.reactant.sort()
is firstly implemented to created unique order of multiple reactants. But this sort() doesn't provide any sorting key. After some testing, I guesssort()
method if no sorting key provided, is sorting based on object's memory address (it's only a hypothesis, anyone who is familiar with it please let us know.), which makes the order of reactants in a same reaction non-deterministic if you run the same RMG job multiple times (I've already observed this phonomenon).But the reason why currently RMG still can produce same final mechanism is that we only allow one species instance to exist in memory for one species.
The potential problem of this
sort()
implementation is when unpickling restart files, the memory address of loaded reactants will change so the order of reactants in reactions should change accoordingly, if not, some existing reactions cannot be recognized further leading to multiple DUPLICATE reactions inrmg.reactionModel.reactionDict
.I suggestion this
sort()
should be provided by some appropriate sorting key to make it more deterministic. I've tried species.index since every reactive species will have a unique index at this time point (when implementingcheckForExistingReaction(self, rxn)
), which can now solve the non-deterministic problem. But more thoughts about sorting key are welcome.The text was updated successfully, but these errors were encountered: