Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACP: Range::cmp_scalar; comparison (less/equal/greater) to a primitive of the Range #115

Closed
golddranks opened this issue Sep 29, 2022 · 5 comments
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@golddranks
Copy link

golddranks commented Sep 29, 2022

Proposal

Problem statement

Comparing ranges to their underlying primitives could be easier; currently one needs to compare with the endpoints directly, which is prone to logic and one-off errors. A comparison operation seems like a natural thing to support for a range.

Motivation, use-cases

I found the motivation for this API with code like this – searching through a list of segmented files to find a range that contains the desired point:

let files = vec![
    File { seqnum: 0, range: 0..1000 },
    File { seqnum: 1, range: 1000..2200 },
    File { seqnum: 2, range: 2200..3900 },
    File { seqnum: 3, range: 3900..5000 },
];
let target = 1600;
let index = files.binary_search_by(|f| f.range.cmp_scalar(target))?;
assert_eq!(files[index].seqnum, 1);

Solution sketches

I actually started by sending a PR: rust-lang/rust#102343 There are some alternative designs one could consider, mentioned in the PR: first, it was pointed out that returning just Ordering doesn't support the case where the Range is degenerate, i.e. the start point being after the endpoint. In this case an ordering makes hardly any sense. Perhaps returning Option<Ordering> would account for this?

I'll copy the other points for discussion here from the PR, for ease of review:

Why compare a range to a scalar, and not scalar to range?

I.e. why (3..9).cmp_scalar(5) instead of 5.cmp_range(3..9)?

Originally, I thought that an API to compare a number to a range would make more sense intuitively, and set out to implement it. However, it turned out that there were two kinds of problems:

  1. The order in the original motivation gets reversed, and reverse() must be called after this operation to fix it:
    files.binary_search_by(|f| target.cmp_range(f.range)).reverse()?;
    Of course, cmp_range would make sense, if searching for a single number among of many, that fits the range. However, I had hard time imagining a common case where this would be the requirement, whereas the motivation for cmp_scalar was clear from the case introduced above.

  2. The implementation gets gnarly, as the API must take the range as a generic parameter, and the different kind of Ranges are different types. I tried to implement it as generic T: Into<RangeBounds>, but the ergonomics kind of sucked. Also, the problem of ranges not being Copy could be mitigated when the range is passed as the &self parameter because auto-ref works in that position.

Other points for discussion

  • Does the name cmp_scalar make sense? To me it kind of does, but I see no precedent with calling stuff "scalars" in the stdlib. I wonder if there are better alternatives.
  • Ord vs PartialOrd? With contains, PartialOrd makes sense, but with a proper ordering returned by this API, I think that Ord is the only sensible choice.
  • The method currently takes the scalar by copy. It might make sense to take it by reference to support textual ranges? That worsens the ergonomics for numbers though.
  • Does it make sense to make it more generic (to support &str vs String comparisons, for example)? That might worsen the ergonomics because it makes type inference harder.
  • Does it make sense to actually just implement PartialOrd<T> for Range<T>, instead of having this as a separate method? Can we consider this kind of comparison "canonical" enough to be able to provide a standard implementation?

Links and related work

Again, here's my attempt at an implementation: rust-lang/rust#102343

I also checked out whether C++ supports comparing ranges. It does seem only support comparing two ranges with a lexicographical comparison: https://cplusplus.com/reference/algorithm/lexicographical_compare/ In my mind, comparing a range to a single scalar actually makes more sense, because it isn't clear what the answer should be when the ranges overlap. (Of course, the word lexicographical accounts for this, but I find the operation not well-motivated.)

@golddranks golddranks added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Sep 29, 2022
@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/libs-api meeting. We didn't finish going through "Other points for discussion" and still need to, but we found the binary_search_by example compelling and we liked the idea of having a method like this. We'll need to finish going through "Other points for discussion" to determine the signature of the method.

@golddranks
Copy link
Author

Thanks for the heads up. When do you expect to continue the discussion?

@Amanieu Amanieu added the I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. label Jul 9, 2023
@Amanieu
Copy link
Member

Amanieu commented Jul 9, 2023

It slipped from the agenda in the last meeting, but I've nominated it for discussion this week.

@Amanieu Amanieu removed the I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. label Jul 11, 2023
@Amanieu
Copy link
Member

Amanieu commented Jul 11, 2023

We discussed this in the libs-api meeting today.

One of the key points that was mentioned was that this isn't the best way to perform a binary search since this performs multiple comparisons at each step. Instead, this can be implemented much more efficiently like this:

let files = vec![
    File { seqnum: 0, range: 0..1000 },
    File { seqnum: 1, range: 1000..2200 },
    File { seqnum: 2, range: 2200..3900 },
    File { seqnum: 3, range: 3900..5000 },
];
let target = 1600;
let index = files.partition_point(|f| f.range.end > target);
let index = files.get(index).and_then(|f| target >= f.range.start).expect("no matching range");
assert_eq!(files[index].seqnum, 1);

Perhaps it might be worth looking into a different API that makes this pattern easier to use.


We did discuss the specific questions, although because of the previous concern it's not clear that this API is the best way to solve the motivating example.

  • Does the name cmp_scalar make sense? To me it kind of does, but I see no precedent with calling stuff "scalars" in the stdlib. I wonder if there are better alternatives.

We couldn't think of anything better, but this can be revisited during FCP.

  • Ord vs PartialOrd? With contains, PartialOrd makes sense, but with a proper ordering returned by this API, I think that Ord is the only sensible choice.

We prefer to just have Ord for now, and potentially a separate function for PartialOrd if there is demand for it.

  • The method currently takes the scalar by copy. It might make sense to take it by reference to support textual ranges? That worsens the ergonomics for numbers though.

This should take the scalar by reference like contains.

  • Does it make sense to make it more generic (to support &str vs String comparisons, for example)? That might worsen the ergonomics because it makes type inference harder.

Yes, because contains does the same thing.

  • Does it make sense to actually just implement PartialOrd<T> for Range<T>, instead of having this as a separate method? Can we consider this kind of comparison "canonical" enough to be able to provide a standard implementation?

(We didn't get around to discussing this question, because the issue around partition_point came up)

@m-ou-se
Copy link
Member

m-ou-se commented Mar 5, 2024

We discussed this again in the libs-api meeting, but we feel we cannot continue with this ACP without discussing whether this is actually the right solution for the use case, considering what was mentioned in the previous comment:

One of the key points that was mentioned was that this isn't the best way to perform a binary search since this performs multiple comparisons at each step. Instead, this can be implemented much more efficiently like this

Closing due to the lack of reponse since the comment above, but feel free to re-open at any time.

@m-ou-se m-ou-se closed this as completed Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

4 participants