Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching for mimedata comparison #282

Merged
merged 4 commits into from
Apr 21, 2017
Merged

Add caching for mimedata comparison #282

merged 4 commits into from
Apr 21, 2017

Conversation

vidartf
Copy link
Collaborator

@vidartf vidartf commented Apr 20, 2017

It seems that with the hierarchical algorithm we end up comparing the same things several times. Due to their low level in the hierarchy, and comparatively large size, MIME data comparisons are especially affected by this. This PR addresses this by simply adding an LRU cache on the MIME data comparison.

Related to #237.

It seems that with the hierarchical algorithm we end up comparing the
same things several times. Due to their low level in the hierarchy, and
comparatively large size, MIME data comparisons are especially affected
by this. For this reason, we simply add a LRU cache on the comparison.
@vidartf vidartf requested a review from martinal April 20, 2017 11:33
We can only cache string type mime data comparisons (hashable). Also
added a cache on compare_text_approximate, as profiling showed that a
hash there gives a reasonable amount of hits.
@minrk
Copy link
Member

minrk commented Apr 21, 2017

Are there any higher-level comparisons that should be similarly memoized?

@vidartf
Copy link
Collaborator Author

vidartf commented Apr 21, 2017

I think ideally, you would want the memoization on output comparison, but currently the outputs include lists, which are unhashable. We could ensure that all such lists are tuples before we start processing, but I'm not sure how much other code that would affect.

Another point of improvement could be to make the memoization commutable on the two comparison operands, assuming that the overhead for such a feature is small (e.g simply sort the two hashes before combining), but I'm not sure the gains are worth the implementation and maintenance effort.

@minrk
Copy link
Member

minrk commented Apr 21, 2017

Fair enough. Merging this now, but we can explore further memoization later.

Since we shouldn't be modifying notebooks during diff, we could do a one-time transform to immutable / hashable containers (list -> tuple, dict would need a custom hashable subclass, I think).

@minrk minrk merged commit 73f5f70 into jupyter:master Apr 21, 2017
@vidartf vidartf deleted the optimize branch April 21, 2017 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants