Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: exp/slices: change slices.BinarySearchFunc E, T ordering in the closure #59260

Closed
twmb opened this issue Mar 27, 2023 · 22 comments
Closed

Comments

@twmb
Copy link
Contributor

twmb commented Mar 27, 2023

Today we have:

func BinarySearchFunc[E, T any](x []E, target T, cmp func(E, T) int) (int, bool)

The proposal is to switch it to:

func BinarySearchFunc[E, T any](x []E, target T, cmp func(T, E) int) (int, bool)

Why?

tldr: the current ordering feels like sort.Search, which is awkward because "you have to think in terms of greater-than rather than less-than"

The original BinarySearchFunc was:

func BinarySearchFunc[E any](x []E, target E, cmp func(E, E) int) (int, bool)

Meaning that if I wanted to implement an inline closure, it was not obvious which argument my target would be. I would most likely implement this as general less function: given left and right, return whether left is less than right.

#57348 proposed allowing the target to be a different type. @rsc asked whether E, T is the standard order here, and @aclements mentioned that it is, given Python's bisect and Java's binarySearch. I think the @aclements reply was actually talking about the first two arguments to the binary search function, not the closure's arguments: the first two arguments are always array, then target.

No other language that has a binary search function uses a comparison function similar to BinarySearchFunc's closure: no other language explicitly has the target being the first argument or the second argument. C++'s std::binary_search existed before lambda expressions and the Compare function never calls out argument ordering (in fact, it flips: here then here). C# is similar to C++, C is just completely type blind through void *. Java's Arrays.binarySearch existed since 1.6, before lambdas were introduced in 1.8, and the input Comparator never mentions argument ordering. Rust and Ruby do not even have something similar to slices.BinarySearchFunc -- they have something similar to sort.Find.

In #50340, @rogpeppe mentions that sort.Search is awkward and one of the reasons is that "you have to think in terms of greater-than rather than less-than". This is the basis of the proposal.

When the comparison function uses identical types a person never needs to consider the target in the comparison function. Now that the comparison function supports a different type, if you use a different type, you are thinking backwards. If a person wants to implement the comparator function against a target whose type T is different from the input slice element type E, they now have to think in terms of greater-than rather than less-than. The new comparison function feels like sort.Search.

Example of how I want to write slices.BinarySearchFunc using sort.Find, following with how I always actually do write a buggy comparison function and then spend too much time thinking about conditionals: https://go.dev/play/p/RLsTqohcXcA.


Also as a last unimportant aside that might be in a later proposal, I'd actually prefer:

  • making it much clearer why I'd use sort.Find vs. slices.BinarySearchFunc
  • revert x/exp/slices: Allow different types for haystack/needle in BinarySearchFunc #57348 because these multi-type use cases seems better suited for sort.Find
  • mention sort.Find in the docs of slices.BinarySearchFunc (mostly because I completely forgot about sort.Find and personally prefer the closure way of doing things)
  • switch BinarySearchFunc to be similar to sort.Find with an index and no target -- the current form seems tailored for primitive types, not large structs
@twmb twmb added the Proposal label Mar 27, 2023
@gopherbot gopherbot added this to the Proposal milestone Mar 27, 2023
@crisman
Copy link
Contributor

crisman commented Mar 27, 2023

Example of how I want to write slices.BinarySearchFunc using sort.Find, following with how I always actually do write a buggy comparison function and then spend too much time thinking about conditionals: https://go.dev/play/p/4v5SyTKnoMu.

I am confused on what this example is trying to show.

Why do both examples ignore the possible 0 return out of cmp?

And in what I think is on the same topic given:

I would most likely implement this as general less function: given left and right, return whether left is less than right.

With a three output cmp why reduce that to a two output "whether left is less than right"?

@twmb
Copy link
Contributor Author

twmb commented Mar 27, 2023

@crisman the proposal isn't about the three way comparison aspect -- I'm +1 to keeping a three way comparison. My example was ripped last night from code I was writing where I only needed two way comparison -- I always want to write my logic with the target on the left, and returning comparisons based on the target being less than the other value. With the current E, T, then the target is on the right and the closure implementation is based on the target being greater than the other value.

@ianlancetaylor
Copy link
Contributor

CC @eliben

@ianlancetaylor
Copy link
Contributor

According to the man page, C's bsearch function passes target first to the comparator function.

@eliben
Copy link
Member

eliben commented Mar 27, 2023

Thanks for the detailed proposal, @twmb - this is indeed interesting to consider.

@Merovius who proposed #57348

@Merovius
Copy link
Contributor

Merovius commented Mar 27, 2023

I don't understand the argument TBH. You are passing a three-way comparison function, with IMO pretty standard and clear definition of in what order the arguments should be compared. So I don't get why you'd have to "think in terms of greater-than". You return different things for "greater-than" and "less-than", so you always think in terms of both.

My example was ripped last night from code I was writing where I only needed two way comparison

Did you "only need two way comparison", or did you "need two way comparison only"? i.e. why couldn't you still return 0 in the equality case?

This is a compelling argument to leave the order as is, in my opinion.

@eliben
Copy link
Member

eliben commented Mar 27, 2023

I've just been running some tests and seem to have been thinking in a similar direction as @Merovius

Consider this playground link: https://gotipplay.golang.org/p/I0MHhBKveLI
If we flip the order of passing T, E to the comparator, existing comparators like strings.Compare will not work properly.

@crisman
Copy link
Contributor

crisman commented Mar 27, 2023

The doc for exp/slices says

The slice must be sorted in increasing order, where "increasing" is defined by cmp.

If that's true then something like a perl sort {$a cmp $b} @items would have either the negative and positive outputs of cmp backwards or the list sorted the wrong way.

@crisman
Copy link
Contributor

crisman commented Mar 27, 2023

I also like the current usage of cmp() as when mapped over the whole range of the input slice the outputs will be a set of -1s,0s, and finally 1s.

@Merovius
Copy link
Contributor

I guess if we flipped the order of the argument we'd flip the order of the logic as well, which makes string.Compare work fine? Or I might be confused.

I think as long as we can still pass the natural strings.Compare functions to it, I don't have a strong opinion. But then, if we can do that, I don't really get the problem. Because ISTM it wouldn't solve any confusion about the order of arguments, if the function would look the same?

@twmb
Copy link
Contributor Author

twmb commented Mar 27, 2023

The equality case is not a factor in this proposal. My point is that if you use a different type, the current argument ordering primes you to think in terms of greater than. I've edited my example above to emphasize that the equality case is not a factor -- the equality case was never mentioned in my first comment, and this is my second comment clarifying that equality is not the confusion.

@Merovius your order-of-the-logic flipping comment is exactly my proposal. I should've led with the full implementation rather than just the type signature.

This is a clearer example using two different types, showing sort.Find, a bugged slices.BinarySearchFunc, a correct slices.BinarySearchFunc, and my commented out preference: https://go.dev/play/p/RLsTqohcXcA
(I've updated the proposal link to this example)

@crisman
Copy link
Contributor

crisman commented Mar 27, 2023

Why do insist on greater than and -1?

If you want less than you can just return 1 and it works fine: https://go.dev/play/p/nJZ7v9qOguX

@Merovius
Copy link
Contributor

Merovius commented Mar 27, 2023

@twmb It is this assertion, that "the current argument ordering primes you to think in terms of greater than", which I do not understand. The semantics of the comparison function with a single type are

func compare[T constraints.Ordered](a, b T) int {
    switch {
    case a < b:
        return -1
    case a > b:
        return 1
    default:
        return 0
    }
}

This seems like a pretty well-known, standard way to define comparison functions. "If a is less, return -1, if a is greater return 1, if they are equal, return 0".

I fail to see your example making a coherent case as to why "your way" of thinking is the better one. You name the parameter more suggestively (though I'm not sure why) but even then it's not clear to me why you have to put "needle" first in the comparison - especially given that it appears second in the argument list.

Just put them into the same order as they appear in the argument list. And then "less" is negative, "greater" is positive… feels pretty intuitive to me.

Again, I'm not really passionate about this. And if C's bsearch uses the other order, then maybe that's an argument in and off itself. But the argument you are making just isn't understandable to me.

@twmb
Copy link
Contributor Author

twmb commented Mar 27, 2023

@Merovius how does that comparison function look when the types are different, i.e. for #57348 (edit: off by one here, whoops)? I'm not sure how much clearer I could show / phrase that different types prime you to think in terms of greater than, and these replies haven't been discussing the different-types aspect of the problem.

@Merovius
Copy link
Contributor

Merovius commented Mar 27, 2023

@twmb

func compare(p Person, name string) int {
    switch {
    case p.Name < name:
        return -1
    case p.Name > name:
        return 1
    default:
        return 0
    }
}

@Merovius
Copy link
Contributor

In my opinion, with different types the confusion goes away entirely, because the types themselves make clear which argument is which. And at that point, again, the order of arguments and the order of the operator just have to align with the sign of the return, which seems intuitive.

The reason I've been talking about equal type arguments is that that's the only way where I see a confusion happening.

@twmb
Copy link
Contributor Author

twmb commented Mar 27, 2023

It's more obvious when the compare function is defined separately from using BinarySearchFunc. I'd like to note that in #57348, the first example in the proposal is framed in terms of greater-than first:

lo, lok := slices.BinarySearchFunc(s, i.Min, func(i Interval[T], v T) int {
	if v > j.Max {
		return -1
	}
	if v+1 < j.Min {
		return 1
	}
	return 0
}

I'm not sure how this doesn't feel like sort.Search. I don't have any more to say / argue that I haven't said above, so at this point I'd just like to hear what other people think.

@Merovius
Copy link
Contributor

Merovius commented Mar 27, 2023

It's more obvious when the compare function is defined separately from using BinarySearchFunc.

I do actually tend to agree. Which is why I think one key to this is to stop thinking about it in that term. I do genuinely think 90% of the reason you find the order of arguments confusing is because you assign a meaning like "needle" to the argument, instead of just treating it as a plain comparison function.

FWIW I don't even think that's invalid. I think "I'm thinking in terms of putting the needle first into the comparison" is a reasonable way to look at the API. But if we applied your change and in a week someone made a proposal to revert that change because they are "thinking in terms of putting the needle last into the comparison", I wouldn't know what to tell them. Neither way seems intrinsically better, to me. Or at least I can't see a reason why it would be.

That's why we asked in #57348 and why we looked at other languages. "Other languages do it that way" would be such an objective reason.

I'd like to note that in #57348, the first example in the proposal is framed in terms of greater-than first:

I don't think I wrote that particularly well (you can tell from the fact that it wouldn't compile). And I also think you should make allowance for just how exotic that comparison function is - note how there are four different terms which are compared against each other. [edit] also, you should allow for the fact that these functions where initially formulated in terms of sort.Search and took a lot of tinkering to get right. Like, they aren't really "my first intuitive thought", but the product of an hour or so of hellish debugging. They appeared in the issue only in their function of demonstrating the typing issue. [/edit]

I have since we applied #57348 used BinarySearchFunc perhaps a dozen times, mostly with different argument types and in each case it looked exactly like this.

@earthboundkid
Copy link
Contributor

I would write twmb's code snippet like this, so it's using less than:

case  v.Int < needle:
	return -1

v comes first in the arguments to the function, so it makes sense that it comes first in the case statement too.

I think this is one of those things that is just going to be inherently confusing depending on random factors in how you think about the code.

@ianlancetaylor
Copy link
Contributor

We're all agreed that cmp(a, b) returns a negative number if a is "less" than b, and a positive number if a is "greater" than b. That is true regardless of the order in which cmp takes the types. Like others, I don't understand why the type order of cmp makes you think in terms of greater-than. No matter what, you are comparing two values. When the values have different types, then at least conceptually you are going to apply some conversion to give them the same type. That conversion might be on either E or T (or in some cases you might have to convert both). Either way, you will wind up with two values of the same type. Then you will compare them and return a value depending on whether a is less than b.

For example, see the test of cmp with different types. The comparison function is

	cmp := func(a int, b string) int {
		return strings.Compare(strconv.Itoa(a), b)
	}

We convert a to a string, so that a and b have the same type, and then we compare them.

I don't see any principled argument for why the order of the types passed to cmp matters.

@twmb
Copy link
Contributor Author

twmb commented May 24, 2023

In isolation, cmp(a, b) regardless of the types is noncontroversial. In tandem with the needle one argument prior, T, cmp(a, T), my needle is on the right -- I'm no longer blind to the order of arguments to cmp.

Regardless, this proposal isn't moving forward. I wish I noticed the original proposal to support a second type in cmp. Perhaps if I chimed in there, there'd be more willingness to consider the type ordering. This bit me the first time I used it because I wrote it in the version of "needle on left", even though the needle was the right argument in cmp (and yes, I know that strictly speaking, there is no "needle" in the cmp function -- but seeing the types primed me personally to think that way). I was looking to avoid other people experiencing this same papercut. I haven't seen any argument in favor of "keep the needle's type on the right" besides "because it's the way it is right now". Since nobody else seems to experience this papercut, I'll retract this proposal, close the issue, and avoid the function in favor of sort.Find.

@twmb twmb closed this as completed May 24, 2023
@Merovius
Copy link
Contributor

Merovius commented May 25, 2023

FWIW I think if the only argument in favor of a change is "I'd like it better that way", then "this is how it is right now" is a pretty good argument. Nevertheless, here is a concrete argument in favor of the current ordering:

func BinarySearchFunc[E, T any](x []E, target T, cmp func(E, T) int) (int, bool)

The order of arguments passed to cmp matches the order of arguments passed to BinarySearchFunc itself - []E, T.

@golang golang locked and limited conversation to collaborators May 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants