Add caching for mimedata comparison #282

vidartf · 2017-04-20T11:32:43Z

It seems that with the hierarchical algorithm we end up comparing the same things several times. Due to their low level in the hierarchy, and comparatively large size, MIME data comparisons are especially affected by this. This PR addresses this by simply adding an LRU cache on the MIME data comparison.

Related to #237.

It seems that with the hierarchical algorithm we end up comparing the same things several times. Due to their low level in the hierarchy, and comparatively large size, MIME data comparisons are especially affected by this. For this reason, we simply add a LRU cache on the comparison.

We can only cache string type mime data comparisons (hashable). Also added a cache on compare_text_approximate, as profiling showed that a hash there gives a reasonable amount of hits.

minrk · 2017-04-21T08:50:17Z

Are there any higher-level comparisons that should be similarly memoized?

vidartf · 2017-04-21T10:55:28Z

I think ideally, you would want the memoization on output comparison, but currently the outputs include lists, which are unhashable. We could ensure that all such lists are tuples before we start processing, but I'm not sure how much other code that would affect.

Another point of improvement could be to make the memoization commutable on the two comparison operands, assuming that the overhead for such a feature is small (e.g simply sort the two hashes before combining), but I'm not sure the gains are worth the implementation and maintenance effort.

minrk · 2017-04-21T11:33:32Z

Fair enough. Merging this now, but we can explore further memoization later.

Since we shouldn't be modifying notebooks during diff, we could do a one-time transform to immutable / hashable containers (list -> tuple, dict would need a custom hashable subclass, I think).

vidartf added 2 commits April 20, 2017 13:27

Cleanup/adjustments

ce8e5b7

vidartf requested a review from martinal April 20, 2017 11:33

Fixes test error + further optimization

4e8b96a

We can only cache string type mime data comparisons (hashable). Also added a cache on compare_text_approximate, as profiling showed that a hash there gives a reasonable amount of hits.

Also allow profiling timer to work as decorator

a5f5423

minrk merged commit 73f5f70 into jupyter:master Apr 21, 2017

vidartf deleted the optimize branch April 21, 2017 12:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add caching for mimedata comparison #282

Add caching for mimedata comparison #282

vidartf commented Apr 20, 2017

minrk commented Apr 21, 2017

vidartf commented Apr 21, 2017 •

edited

Loading

minrk commented Apr 21, 2017

Add caching for mimedata comparison #282

Add caching for mimedata comparison #282

Conversation

vidartf commented Apr 20, 2017

minrk commented Apr 21, 2017

vidartf commented Apr 21, 2017 • edited Loading

minrk commented Apr 21, 2017

vidartf commented Apr 21, 2017 •

edited

Loading