-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching: NodeCaching._get_objects_to_hash
return type to dict
#6323
Caching: NodeCaching._get_objects_to_hash
return type to dict
#6323
Conversation
@sphuber if this function is so useful, I wonder if it should be promoted to a public interface (drop the leading underscore?). Otherwise the changes looked reasonable to me. |
It is mostly a private one to discourage plugins from overriding it in |
2bbcd6b
to
dfd973a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sphuber Since the function is now public, it would be nice to have some tests for it, but up to you. (I was a bit surprised that the changes in this PR did not require any tests to be updated, but I didn't make a fuss since it was private and I assume it is tested on a higher level?)
When debugging the caching functionality, it is often useful to look at the objects that are used to compute the hash, which is returned by the `NodeCaching._get_objects_to_hash`. It returning a list, makes it difficult to identify what each object represents. By changing the return type to a dictionary, the key for each object allows to give a useful hint as to what it represents, making debugging easier.
dfd973a
to
3f39c48
Compare
This method is useful when debugging caching behavior so it makes sense to make it public.
3f39c48
to
2dfa3db
Compare
if entry.link_label not in self._hash_ignored_inputs | ||
}, | ||
] | ||
objects = super().get_objects_to_hash() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, there is one functional change here: class attribute is added from the parent class? I am assuming that is desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the class
key with str(self._node.__class__)
that I added in another PR yesterday? If so, yes, that is desired. The only exception for the CalcJobNodeCaching
should be the removal of the repository hash, as added in #5998
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's what I meant, all good then.
I go through the changes and all looks good to me, since @danielhollas is review as well so I'll let he approve. |
For the benefit of future git historians: In the review of this PR I seem not to have realized that this change was changing the actual node hashes, it was not just an API change. I think this should have been noted in the PR message. It wouldn't matter much in practice since we were making changes to hashing anyway in 2.6 release. But I still think it's weird that we did not have to change any tests in this PR. It might be useful to have tests to explicitly verify that the hashes of some common AiiDA objects did not change, to make sure we don't introduce accidental changes in the future. |
@@ -44,7 +45,7 @@ def _get_hash(self, ignore_errors: bool = True, **kwargs: t.Any) -> str | None: | |||
:param ignore_errors: return ``None`` on ``aiida.common.exceptions.HashingError`` (logging the exception) | |||
""" | |||
try: | |||
return make_hash(self._get_objects_to_hash(), **kwargs) | |||
return make_hash(self.get_objects_to_hash(), **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielhollas thanks for dig this out. If I understand it correctly, initial scope was not changing the hashing. So for this line, I think it needs convert the dict into the original format (which is the list of values of the dict) that passing to make_hash
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not a mistake. We intentionally changed this now, because there were anyway a couple of other fixes that changed hashes, and these were all released together in 2.6.0. See also the entry in the CHANGELOG that explains this and tells users that nodes should be rehashed. So I wouldn't revert this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. But what you expect to be tested? Something that the hashing should changed after the change, then I think we need to hard code the hashing to the test case. Makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have tests that have hardcoded hash values, so that we're always aware that the hashes have changed. There was for example a cases were there was a new attribute added on the calcjob class and they were by mistake not included in the list of ignored attributes (I'll find the PR later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, @sphuber is right, we can't revert now even if this wasn't intentional, since then we'd have another hash invalidation.
When debugging the caching functionality, it is often useful to look at the objects that are used to compute the hash, which is returned by the
NodeCaching._get_objects_to_hash
. It returning a list, makes it difficult to identify what each object represents. By changing the return type to a dictionary, the key for each object allows to give a useful hint as to what it represents, making debugging easier.