Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parent nodes are not deduplicated #21

Open
Tarmean opened this issue Mar 8, 2023 · 1 comment
Open

Parent nodes are not deduplicated #21

Tarmean opened this issue Mar 8, 2023 · 1 comment

Comments

@Tarmean
Copy link

Tarmean commented Mar 8, 2023

The node list is a set, so the semigroup instance automatically deduplicates. But the parents are stored as a list with cached size and after merges the list often contains duplicates. The worklist does get nubbed, so it is not catastrophic for performance, but I figured I'd mention it.

This discussion mentions nubbing the parent list as well, so it probably wasn't intentional? #13

Here is the relevant line:

& _parents %~ (sub_class^._parents <>)

Edit: looked it up and seems like egg doesn't do it either. egraphs-good/egg#113
Probably not crucial then, though it is a tad annoying that the graphviz graph shows them as duplicate nodes.

@alt-romes
Copy link
Owner

alt-romes commented Mar 8, 2023

@Tarmean thanks for your interest in the library!
The graphviz module was broken at some point, and I never got around to making it work correctly despite being a useful tool.

Hmhm, I see what you mean, though I'm not too sure if it's worth it to do something about it. I don't recall but I believe I did try having sets for the parents rather than lists and it performed worse. It might be worth accounting for it in the graphviz visualisation to avoid showing duplicates.

Regarding performance, the last time I benchmarked I believe the bottleneck was the find operation on the union find. It was recently suggested to me that we might freeze/thaw the union find on rebuild which would keep the purely functional semantics of the data structure, but might fix the bottleneck. However, performance is currently quite good -- a use case in which performance is not good enough would be more compelling to attempt such a change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants