Caching: `NodeCaching._get_objects_to_hash` return type to `dict` #6323

sphuber · 2024-03-17T14:30:22Z

When debugging the caching functionality, it is often useful to look at the objects that are used to compute the hash, which is returned by the NodeCaching._get_objects_to_hash. It returning a list, makes it difficult to identify what each object represents. By changing the return type to a dictionary, the key for each object allows to give a useful hint as to what it represents, making debugging easier.

danielhollas · 2024-03-17T22:21:37Z

@sphuber if this function is so useful, I wonder if it should be promoted to a public interface (drop the leading underscore?). Otherwise the changes looked reasonable to me.

sphuber · 2024-03-18T06:18:38Z

@sphuber if this function is so useful, I wonder if it should be promoted to a public interface (drop the leading underscore?). Otherwise the changes looked reasonable to me.

It is mostly a private one to discourage plugins from overriding it in Data plugins as it can significantly change the caching behavior. But I guess even that can be a valid use case in some cases. Guess we can make it public

danielhollas

Thanks @sphuber Since the function is now public, it would be nice to have some tests for it, but up to you. (I was a bit surprised that the changes in this PR did not require any tests to be updated, but I didn't make a fuss since it was private and I assume it is tested on a higher level?)

docs/source/topics/provenance/caching.rst

src/aiida/orm/nodes/caching.py

When debugging the caching functionality, it is often useful to look at the objects that are used to compute the hash, which is returned by the `NodeCaching._get_objects_to_hash`. It returning a list, makes it difficult to identify what each object represents. By changing the return type to a dictionary, the key for each object allows to give a useful hint as to what it represents, making debugging easier.

This method is useful when debugging caching behavior so it makes sense to make it public.

danielhollas · 2024-03-18T14:46:42Z

src/aiida/orm/nodes/process/calculation/calcjob.py

-                if entry.link_label not in self._hash_ignored_inputs
-            },
-        ]
+        objects = super().get_objects_to_hash()


As far as I can see, there is one functional change here: class attribute is added from the parent class? I am assuming that is desired?

You mean the class key with str(self._node.__class__) that I added in another PR yesterday? If so, yes, that is desired. The only exception for the CalcJobNodeCaching should be the removal of the repository hash, as added in #5998

Yep, that's what I meant, all good then.

unkcpz · 2024-03-18T15:11:46Z

I go through the changes and all looks good to me, since @danielhollas is review as well so I'll let he approve.

danielhollas · 2025-01-14T14:24:58Z

For the benefit of future git historians: In the review of this PR I seem not to have realized that this change was changing the actual node hashes, it was not just an API change. I think this should have been noted in the PR message. It wouldn't matter much in practice since we were making changes to hashing anyway in 2.6 release. But I still think it's weird that we did not have to change any tests in this PR. It might be useful to have tests to explicitly verify that the hashes of some common AiiDA objects did not change, to make sure we don't introduce accidental changes in the future.

unkcpz · 2025-01-14T15:41:48Z

src/aiida/orm/nodes/caching.py

@@ -44,7 +45,7 @@ def _get_hash(self, ignore_errors: bool = True, **kwargs: t.Any) -> str | None:
        :param ignore_errors: return ``None`` on ``aiida.common.exceptions.HashingError`` (logging the exception)
        """
        try:
-            return make_hash(self._get_objects_to_hash(), **kwargs)
+            return make_hash(self.get_objects_to_hash(), **kwargs)


@danielhollas thanks for dig this out. If I understand it correctly, initial scope was not changing the hashing. So for this line, I think it needs convert the dict into the original format (which is the list of values of the dict) that passing to make_hash.

This was not a mistake. We intentionally changed this now, because there were anyway a couple of other fixes that changed hashes, and these were all released together in 2.6.0. See also the entry in the CHANGELOG that explains this and tells users that nodes should be rehashed. So I wouldn't revert this.

Okay. But what you expect to be tested? Something that the hashing should changed after the change, then I think we need to hard code the hashing to the test case. Makes sense to me.

I think we should have tests that have hardcoded hash values, so that we're always aware that the hashes have changed. There was for example a cases were there was a new attribute added on the calcjob class and they were by mistake not included in the list of ignored attributes (I'll find the PR later).

Also, @sphuber is right, we can't revert now even if this wasn't intentional, since then we'd have another hash invalidation.

sphuber force-pushed the fix/get-objects-to-hash-dictionary branch from 2bbcd6b to dfd973a Compare March 18, 2024 12:12

sphuber requested review from danielhollas and unkcpz March 18, 2024 12:13

danielhollas requested changes Mar 18, 2024

View reviewed changes

docs/source/topics/provenance/caching.rst Outdated Show resolved Hide resolved

src/aiida/orm/nodes/caching.py Show resolved Hide resolved

src/aiida/orm/nodes/caching.py Outdated Show resolved Hide resolved

sphuber force-pushed the fix/get-objects-to-hash-dictionary branch from dfd973a to 3f39c48 Compare March 18, 2024 13:45

Caching: Make NodeCaching._get_object_to_hash public

2dfa3db

This method is useful when debugging caching behavior so it makes sense to make it public.

sphuber force-pushed the fix/get-objects-to-hash-dictionary branch from 3f39c48 to 2dfa3db Compare March 18, 2024 14:23

danielhollas reviewed Mar 18, 2024

View reviewed changes

sphuber requested a review from danielhollas March 18, 2024 14:59

danielhollas approved these changes Mar 18, 2024

View reviewed changes

sphuber merged commit e330004 into aiidateam:main Mar 18, 2024
20 checks passed

sphuber deleted the fix/get-objects-to-hash-dictionary branch March 18, 2024 15:14

danielhollas mentioned this pull request Dec 4, 2024

WIP: Support AiiDA v2.6 aiidateam/aiida-test-cache#85

Closed

This was referenced Jan 8, 2025

Support AiiDA v2.6 - take 2 aiidateam/aiida-test-cache#96

Open

Run mypy on src/aiida/orm/nodes/caching.py #6703

Merged

unkcpz reviewed Jan 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching: `NodeCaching._get_objects_to_hash` return type to `dict` #6323

Caching: `NodeCaching._get_objects_to_hash` return type to `dict` #6323

sphuber commented Mar 17, 2024

danielhollas commented Mar 17, 2024

sphuber commented Mar 18, 2024

danielhollas left a comment

danielhollas Mar 18, 2024

sphuber Mar 18, 2024

danielhollas Mar 18, 2024

unkcpz commented Mar 18, 2024

danielhollas commented Jan 14, 2025

unkcpz Jan 14, 2025

sphuber Jan 14, 2025

unkcpz Jan 14, 2025 •

edited

Loading

danielhollas Jan 14, 2025

danielhollas Jan 14, 2025

Caching: NodeCaching._get_objects_to_hash return type to dict #6323

Caching: NodeCaching._get_objects_to_hash return type to dict #6323

Conversation

sphuber commented Mar 17, 2024

danielhollas commented Mar 17, 2024

sphuber commented Mar 18, 2024

danielhollas left a comment

Choose a reason for hiding this comment

danielhollas Mar 18, 2024

Choose a reason for hiding this comment

sphuber Mar 18, 2024

Choose a reason for hiding this comment

danielhollas Mar 18, 2024

Choose a reason for hiding this comment

unkcpz commented Mar 18, 2024

danielhollas commented Jan 14, 2025

unkcpz Jan 14, 2025

Choose a reason for hiding this comment

sphuber Jan 14, 2025

Choose a reason for hiding this comment

unkcpz Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

danielhollas Jan 14, 2025

Choose a reason for hiding this comment

danielhollas Jan 14, 2025

Choose a reason for hiding this comment

Caching: `NodeCaching._get_objects_to_hash` return type to `dict` #6323

Caching: `NodeCaching._get_objects_to_hash` return type to `dict` #6323

unkcpz Jan 14, 2025 •

edited

Loading