-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: sort: return equal values in non-deterministic order #13884
Comments
Interesting. Perhaps we only do it for "small" inputs (most likely: tests), where it would especially unnoticed. |
Given my experience fixing Google tests, I would do it on large inputs too.
The larger the inputs the more the time required fades into the noise.
|
There's one thing to consider. The work the runtime does to randomize map iteration order is 100% justified because every time you iterate over a map you get a different order, so you are seeing the effects of the randomization 100% of the times. That wouldn't be true for sort.Sort randomization. You write
but you don't really care about "shuffling the array well". What you really care about is the chance that N swaps change the order of the sorted array. Suppose you do the math and find out that that chance is really small, unless the array has a lot of repeated entries. That would mean that you are always paying a price for a benefit that has zero chances to affect the user for arrays with no repeated entries, a very small chance to be seen if the array has a moderated number of repeated entries, and a decent chance to be a real benefit only for arrays with a decent number of repeated entries. The bottom line is that one thing is to pay a (maybe) small price for something that has a visible effect 100% of the times, and another thing is to pay a (maybe) small price for something that has a (maybe) very small chance to actually do something useful. If the final result of the sorting is unchanged, the work spent shuffling is 100% wasted. Just a random thought. I didn't do any math, nor run any benchmark. |
I like it. If people want stable sort they should use stable sort. |
@ALTree The math works out OK. If you have two equal elements and sort.Sort puts them in one order, then if you swap them sort.Sort will necessarily put them in the other order, since it can't tell them apart. For any pair of places they might be, as long as the chance of one being before the other is 50/50 it works out. Another possibility is a final run over the sorted list looking for equal entries and swap them randomly. That's N comparisons instead of N swaps though, and swaps are usually cheaper. The price here is being paid on behalf of future compatibility. It's always 100% wasted if you ignore that. It would only make sense if the price were low enough to make sense to pay all the time. (And the larger the data being sorted the lower the price.) |
We could simply randomly flip the order of the insertion sort we use in the final step. If the comparison function is not fully specified, we should get different results. That would be essentially free. |
What does "randomly flip the order of the insertion sort" mean? If X is
before Y at the start of the insertion sort, it will stay X before Y
throughout, no? I guess the insertion sort could go out of its way to drag
equal elements past each other.
|
This gives me a bad taste. You're (mildly) punishing all users to make it easier for you (you in particular) to find bugs in others' code triggered by changes in the code base. I understand the argument for maps, but there the cost is minute and the frequency of dependence high. Here you need multiple sort operations even to create the possibility of the issue. Much as I understand your frustration in dealing with the problem, it bothers me that that your solution is to take it out on everyone. |
The insertion sort code is
We randomly flip between that version and
I probably have an off-by-one error in there, but that's the basic idea. |
I like this idea. WRT the performance hit: people who need really fast sorting aren't using |
It's not the performance hit that bothers me, it's the message: We will deliberately surprise you with our algorithms because of something that doesn't really matter to you. I'm not totally opposed, I just find it distasteful. |
I'm with Rob. The understand the sentiment of this and the precedent of the To give some background, juju is close to 200k lines and many of our tests However I've also spent an equal amount of time internally defending the This sort change may help people project themselves from themselves, but Thanks Dave On Sat, 9 Jan 2016, 15:35 Rob Pike notifications@github.com wrote:
|
@davecheney I understand the sentiment in this case. But a future perhaps more efficient implementation may lead to a similar unanticipated change of order, with the same effect on code that makes implicit order assumptions. It seems to me it would be better to eliminate those errors earlier than later (or possibly even multiple times). It would be interesting to make the experiment. GO16SORTEXPERIMENT=1 could enable it. (disabled by default). That way teams willing to take the plunge can try it out and see if it exposes any real errors. It would be fine to remove again for 1.7 if there's no real benefit. |
I think if we're going to make it an experiment, it
should be default on, and people can opt-out if
necessary.
|
Thanks for your comments Robert. My position is that it was not that this I honestly believe if a change to the sort algorithm caused the effect that Re making this optional, nobody used the vendor experiment til it was Thanks Dave On Sat, 9 Jan 2016, 17:58 Robert Griesemer notifications@github.com wrote:
|
@davecheney, regardless of this issue, sort.Sort is changing in Go 1.6 (it
already has). The new code runs significantly (10-20%) faster than the old
code, but it also leaves ostensibly equal elements in a different order
than Go 1.5's sort.Sort did. We are not including the old sort algorithm as
a user-controlled option in Go 1.6. From what you wrote, it sounds like
your colleagues, if they are affected, will be surprised that sort.Sort has
fewer guarantees that they realized and upset that Go 1.6 has changed a
detail they didn't realize could change. If so, I am sorry.
But that situation is _exactly_ the rationale for making sort.Sort just a
little unpredictable in its handling of equal elements. It keeps people
from getting too comfortable with the new sort.Sort implementation and
being surprised the next time we take a significant optimization
opportunity. It also keeps people from being surprised if there's ever a
competing Go implementation with a different algorithm.
|
On Sun, Jan 10, 2016 at 5:55 AM, Russ Cox notifications@github.com wrote:
Hi Russ, This is fine, sort.Sort is still operating within its contract. I'm sure
I just cannot relate this paragraph to the previous one. In the previous But, what this proposal is suggesting to me is deliberately introducing Here's how the conversation usually goes on IRC, mailing list, internal A: hey, sort.Sort broke in 1.6, it doesn't give me the same results! But instead I'm worried about the conversation going like this A: hey, sort.Sort broke in 1.6, it doesn't give me the same results! I'd much rather be having the first conversation, not the second. Thanks for your time Dave
|
@davecheney "A: And now sort.Sort is always unstable, so now I have to update all my |
Please note that this issue is not for Go 1.6. It is marked as a crazy idea
for Go 1.7, one that we could try early and see (or not).
Please also note that the suggestion is to flip a coin to decide the order
of equal values. Shuffling the input was an implementation detail, and Ian
came up with a simpler, cheaper implementation. But the implementation is
not important, just the effect. I've retitled the bug accordingly.
The only possible visible difference between sorting algorithms is how they
arrange equal values. There was discussion on the CL about whether to
accept the Go 1.6 change at all, exactly because it would break things. I
decided that we couldn't reasonably turn down a 10-20% improvement for fear
of breaking code depending on unspecified behavior. But as expected it does
break things.
Re "now I have to update all my code and check my tests", that's basically
true for Go 1.6 with just the algorithmic change. The idea in this issue -
flipping a coin for equal values, so that all future algorithmic changes
are invisible - is meant to make it possible to respond "and this is the
last time you'll need to do that when you update to a new Go version."
But again, it's a crazy idea, and it's not for Go 1.6.
|
Thanks for pointing out that this is for 1.7, I had lost that part of the Thanks for listening to my side of the story. Dave On Sun, 10 Jan 2016, 08:07 Russ Cox notifications@github.com wrote:
|
How about a much weaker compromise: Make it stronger in the documentation that the sort is not stable and that it may change between versions and implementations. |
We should emphasize that in the docs, yes, but I am not sure it will have
much effect. Most of us here know this already and don't need the reminder
and yet (1) the package example for sort had this bug until we fixed it
earlier this cycle, and (2) the Go compiler had this bug in its register
allocator, which I discovered as different compiler output from the C
version (using OS X qsort) and the Go version (using sort.Sort).
People will write code that depends on the internals of sort and not
realize it. It's incredibly easy. We do it too. The question is should we
leave that kind of bug undetected, a landmine to discover when moving
between different versions or implementations of Go, or should we make it
likely to show up in ordinary testing and use? I honestly don't know. I see
the arguments on both sides.
It will be interesting to see how much trouble the sort change gives people
updating to Go 1.6 (once it is out).
I will note that we had a similar problem last cycle with programs
(especially tests) implicitly depending on scheduling order and breaking
from Go 1.4 to Go 1.5, so we put some randomness into the scheduler when
using -race, precisely to make that kind of bug likely to show up. In that
case we did not see a way to randomize the relevant decisions in the
general case without a significant slowdown. There are things that are
different about sort.Sort, but not everything.
|
(Aside: the scheduler randomness for |
Would it make sense to have sort (and other potentially-future-different interfaces) behave more randomly under "go test", combined with flags to allow you to control where the wiggles are? That makes testing more aggressive and production more predictable (usually). So the randomness would be opt-out for testing, opt-in for production. In the SSA code I get a lot of mileage from a hash-based match for where to (not)activate buggy code; I'm not sure how we would do that for APIs, but it would be nifty to be able to isolate the exact buggy call in the same way that I can (now automatically) isolate exactly which function is incorrectly compiled. Contra that, there's a bit of a learning curve to managing all the random-activation knobs. |
I wish we'd done this when we introduced sort.Sort, but I think it's too late to do the experiment now. Proposal withdrawn/declined. |
Crazy idea, but what if sort.Sort randomly permuted its input a bit before starting?
Go 1.6 has a different sort.Sort than Go 1.5 and I've seen at least a dozen test failures at Google that were implicitly depending on the old algorithm. The usual scenario is that you sort a slice of structs by one field in the struct. If there are entries with that field equal but others unequal and you expect a specific order for the structs at the end, you are depending on sort.Sort's algorithm. A later sort.Sort might make a different choice and produce a different order. This makes programs not portable from one version of Go to another, much like map hash differences used to make programs not portable from one architecture to another. We solved maps by randomizing the iteration order a bit. In the map case it's not a full permutation, but just enough variation to make tests obviously flaky.
I wonder if we should do the same for sort.Sort. It would only take N swaps to shuffle things quite well, and we already use N*log(N) swaps, so N*(log(N)+1) is not likely to be noticed. That would also protect a bit against malicious inputs.
It would surprise people, especially people who think sort.Sort == sort.Stable. But the rationale is that it's better to surprise them the first time they run the code instead of however many Go releases later.
/cc @robpike, @griesemer, @ianlancetaylor, @aclements, @dr2chase
The text was updated successfully, but these errors were encountered: