-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Union implementation #444
Conversation
This is WIP for a few reasons at the moment:
|
Documentation and tests now fixed. Benchmarking would still be good. |
This example is hit very badly:
Used to take 12ms, now takes 21,456ms. While that would be fairly easy to fix with an explicit check for when all input ranges have skip 1, and the output also has skip 1, similar issues arise with skips as well.
Used to take 10ms, now 4,274ms. I've been talking this over, I think just special casing some important cases (small number of ranges, all ranges dense, ranges all with same skip) might be easier and faster, if less satisfying / does less merging. |
These hit a heuristic that I think I got wrong. If you comment out lines 2929 to 2937 it's much faster. |
That is much faster. It's still a little slower (if I push up to 1,000,000, then old gap takes about 1.6s, new gap 4.8s). However, the old code slows down quite a bit if I shuffle the list. I wrote a little extension (which can be seen at https://github.com/ChrisJefferson/gap/tree/union_sweep ), which special cases when all ranges, and the resulting range, are dense, and then does a simple one-pass, which speeds up the algorithm in that case back to the old one. |
Another bad case:
This takes about 1.3s in old GAP, 76s in head. I can greatly improve the performance by sorting by first element instead of largest first on line 2881. I think this is probably a better order, as it will reduce the amount of splitting of ranges which goes on (we could then also add an optimisation which checks if the first element of the remaining values is ever greater than the first element of the first range, and if so abort early). |
I think the latest version deals with this pretty well. I have switched to the order you use (by leftmost entry) and added a variation of your sweep that deals with all the matching stride inputs without creating new ranges. Then whatever is left goes on to be checked against non-matching stride inputs and non-ranges. |
One small comment, based on what you said you seem to sort first by stride, then first element: Other than that (and if you did that on purpose), then this is great. I can imagine no better union algorithm :) |
…new test. Improve test file to get 100% coverage of new code. Fix some bugs. Document change. Add a sweep phase to deal with the matching stride case faster.
Ready for merge I think. Fixes a bug so maybe should be in stable, but it's quite a lot of new code |
Looks good to me right now. |
New Union implementation
@stevelinton New diffs in manual examples after this PR, but only the 1st one points to a genuine problem:
|
Fixed in a commit a minute ago. S
|
The new Union code breaks test code in the CRISP package, namely Union (TrivialGroups, TrivialGroups); The reason is that the new code tries to apply 'Set' to to an object representing a proper class (lines 2728 and 2755). Does it make sense to use 'Set' only if the object in question is a list and to return the collection otherwise? An attempt to unite two collections which happen to be identical might run into a similar problem, trying to expand them into a (possiby huge) ssorted list. I suspect that there is a similar problem with resclasses. |
@bh11 you're quite right. The special path for union of a single object shouldn't blindly apply Set. |
Have pushed a further fix. |
Another related error could be in SimpComp package:
|
@stevelinton Thanks. Btw., I think that line 2730 should read 'return Set(x)', not just 'Set(x)'. |
@alex-konovalov This one is slightly ugly. simpcomp creates some objects that do not have IsListOrCollection but for which it installs a Union2 method. Since my code checks IsListOrCollection, it gives the error before ever getting to Union2. This is easy enough to fix. I can remove the check and let bad cases go through to a No Method Found for Union2. However, what am I supposed to do if asked for the Union of just ONE such object? If I don't do the check then you get daft things like Union([Z(5)]) returning Z(5), but I can't easily tell whether a arbitrary object is going to have a method for Union2. I coudl return Union2(x,x) in such a situation, but it seems rather a kludge. I wonder if the right answer here is to suggest that simpcomp should set IsListOrCollection for these objects. |
…eak the Union command for simplicial complexes, see gap-system/gap#444 (comment)
Avoids bug reported by Nick Loughlin and detects all cases where the result can be represented as a range without unpacking ranges in the arguments.