Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching: NodeCaching._get_objects_to_hash return type to dict #6323

Merged
merged 2 commits into from
Mar 18, 2024

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Mar 17, 2024

When debugging the caching functionality, it is often useful to look at the objects that are used to compute the hash, which is returned by the NodeCaching._get_objects_to_hash. It returning a list, makes it difficult to identify what each object represents. By changing the return type to a dictionary, the key for each object allows to give a useful hint as to what it represents, making debugging easier.

@danielhollas
Copy link
Collaborator

@sphuber if this function is so useful, I wonder if it should be promoted to a public interface (drop the leading underscore?). Otherwise the changes looked reasonable to me.

@sphuber
Copy link
Contributor Author

sphuber commented Mar 18, 2024

@sphuber if this function is so useful, I wonder if it should be promoted to a public interface (drop the leading underscore?). Otherwise the changes looked reasonable to me.

It is mostly a private one to discourage plugins from overriding it in Data plugins as it can significantly change the caching behavior. But I guess even that can be a valid use case in some cases. Guess we can make it public

@sphuber sphuber force-pushed the fix/get-objects-to-hash-dictionary branch from 2bbcd6b to dfd973a Compare March 18, 2024 12:12
@sphuber sphuber requested review from danielhollas and unkcpz March 18, 2024 12:13
Copy link
Collaborator

@danielhollas danielhollas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sphuber Since the function is now public, it would be nice to have some tests for it, but up to you. (I was a bit surprised that the changes in this PR did not require any tests to be updated, but I didn't make a fuss since it was private and I assume it is tested on a higher level?)

When debugging the caching functionality, it is often useful to look at
the objects that are used to compute the hash, which is returned by the
`NodeCaching._get_objects_to_hash`. It returning a list, makes it
difficult to identify what each object represents. By changing the
return type to a dictionary, the key for each object allows to give a
useful hint as to what it represents, making debugging easier.
@sphuber sphuber force-pushed the fix/get-objects-to-hash-dictionary branch from dfd973a to 3f39c48 Compare March 18, 2024 13:45
This method is useful when debugging caching behavior so it makes sense
to make it public.
@sphuber sphuber force-pushed the fix/get-objects-to-hash-dictionary branch from 3f39c48 to 2dfa3db Compare March 18, 2024 14:23
if entry.link_label not in self._hash_ignored_inputs
},
]
objects = super().get_objects_to_hash()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, there is one functional change here: class attribute is added from the parent class? I am assuming that is desired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the class key with str(self._node.__class__) that I added in another PR yesterday? If so, yes, that is desired. The only exception for the CalcJobNodeCaching should be the removal of the repository hash, as added in #5998

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's what I meant, all good then.

@sphuber sphuber requested a review from danielhollas March 18, 2024 14:59
@unkcpz
Copy link
Member

unkcpz commented Mar 18, 2024

I go through the changes and all looks good to me, since @danielhollas is review as well so I'll let he approve.

@sphuber sphuber merged commit e330004 into aiidateam:main Mar 18, 2024
20 checks passed
@sphuber sphuber deleted the fix/get-objects-to-hash-dictionary branch March 18, 2024 15:14
@danielhollas
Copy link
Collaborator

For the benefit of future git historians: In the review of this PR I seem not to have realized that this change was changing the actual node hashes, it was not just an API change. I think this should have been noted in the PR message. It wouldn't matter much in practice since we were making changes to hashing anyway in 2.6 release. But I still think it's weird that we did not have to change any tests in this PR. It might be useful to have tests to explicitly verify that the hashes of some common AiiDA objects did not change, to make sure we don't introduce accidental changes in the future.

@@ -44,7 +45,7 @@ def _get_hash(self, ignore_errors: bool = True, **kwargs: t.Any) -> str | None:
:param ignore_errors: return ``None`` on ``aiida.common.exceptions.HashingError`` (logging the exception)
"""
try:
return make_hash(self._get_objects_to_hash(), **kwargs)
return make_hash(self.get_objects_to_hash(), **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielhollas thanks for dig this out. If I understand it correctly, initial scope was not changing the hashing. So for this line, I think it needs convert the dict into the original format (which is the list of values of the dict) that passing to make_hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not a mistake. We intentionally changed this now, because there were anyway a couple of other fixes that changed hashes, and these were all released together in 2.6.0. See also the entry in the CHANGELOG that explains this and tells users that nodes should be rehashed. So I wouldn't revert this.

Copy link
Member

@unkcpz unkcpz Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. But what you expect to be tested? Something that the hashing should changed after the change, then I think we need to hard code the hashing to the test case. Makes sense to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have tests that have hardcoded hash values, so that we're always aware that the hashes have changed. There was for example a cases were there was a new attribute added on the calcjob class and they were by mistake not included in the list of ignored attributes (I'll find the PR later).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, @sphuber is right, we can't revert now even if this wasn't intentional, since then we'd have another hash invalidation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants