Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slice::sort_by_cached_key as a memoised sort_by_key #48639

Merged
merged 15 commits into from
Mar 28, 2018

Conversation

varkor
Copy link
Member

@varkor varkor commented Mar 1, 2018

At present, slice::sort_by_key calls its key function twice for each comparison that is made. When the key function is expensive (which can often be the case when sort_by_key is chosen over sort_by), this can lead to very suboptimal behaviour.

To address this, I've introduced a new slice method, sort_by_cached_key, which has identical semantic behaviour to sort_by_key, except that it guarantees the key function will only be called once per element.

Where there are n elements and the key function is O(m):

  • slice::sort_by_cached_key time complexity is O(m n log m n), compared to slice::sort_by_key's O(m n + n log n).
  • slice::sort_by_cached_key space complexity remains at O(n + m). (Technically, it now reserves a slice of size n, whereas before it reserved a slice of size n/2.)

slice::sort_unstable_by_key has not been given an analogue, as it is important that unstable sorts are in-place, which is not a property that is guaranteed here. However, this also means that slice::sort_unstable_by_key is likely to be slower than slice::sort_by_cached_key when the key function does not have negligible complexity. We might want to explore this trade-off further in the future.

Benchmarks (for a vector of 100 i32s):

# Lexicographic: `|x| x.to_string()`
test bench_sort_by_key ... bench:      112,638 ns/iter (+/- 19,563)
test bench_sort_by_cached_key ... bench:       15,038 ns/iter (+/- 4,814)

# Identity: `|x| *x`
test bench_sort_by_key ... bench:        1,346 ns/iter (+/- 238)
test bench_sort_by_cached_key ... bench:        1,839 ns/iter (+/- 765)

# Power: `|x| x.pow(31)`
test bench_sort_by_key ... bench:        3,624 ns/iter (+/- 738)
test bench_sort_by_cached_key ... bench:        1,997 ns/iter (+/- 311)

# Abs: `|x| x.abs()`
test bench_sort_by_key ... bench:        1,546 ns/iter (+/- 174)
test bench_sort_by_cached_key ... bench:        1,668 ns/iter (+/- 790)

(So it seems functions that are single operations do perform slightly worse with this method, but for pretty much any more complex key, you're better off with this optimisation.)

I've definitely found myself using expensive keys in the past and wishing this optimisation was made (e.g. for #47415). This feels like both desirable and expected behaviour, at the small cost of slightly more stack allocation and minute degradation in performance for extremely trivial keys.

Resolves #34447.

@rust-highfive
Copy link
Collaborator

r? @bluss

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 1, 2018
@petrochenkov
Copy link
Contributor

cc @stjepang

@hanna-kruppe
Copy link
Contributor

The numbers for the case of a very expensive key function are promising, but can we get some numbers for trivial key functions as well to quantify how much slower those become?

@varkor
Copy link
Member Author

varkor commented Mar 1, 2018

I'm not feeling very inspired when it comes to comparators, so I've added some simpler tests (that may or may not be semantically meaningful) — you can see there is a slight regression for extremely simple keys.

If you have any suggestions for benchmarks, I'll add them above.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR! The benchmark numbers look great. :)

The part after sorting that applies the permutation to the original slice looks a little suspicious. Does it work correctly and in linear time?

AFAIK, the best way of applying a permutation is by traversing each cycle in it, swapping elements in the original slice and updating indices in the permutation as you go from one index to the next. See this article for an example. Your code looks much simpler than that, which is why I'm asking. :)

Could you do a few more benchmarks? Let's make sure that this optimization doesn't regress in any cases. In particular, I'm interested in the following:

  • Sort 50000 random integers using v.sort_by_key(|x| -x)
  • Sort 50000 random strings using v.sort_by_key(|x| x.len())

Also, I think we should avoid the allocation when sorting short slices. Here's a suggestion:

// Play a little with benchmarks to find the best constant...
const ARRAY_LEN: usize = 16;

let mut array: [ManuallyDrop<(K, usize)>; ARRAY_LEN];
let mut vec: Vec<ManuallyDrop<(K, usize)>>;
let mut indices: &mut [ManuallyDrop<(K, usize)>];

if self.len() <= ARRAY_LEN {
    unsafe {
        array = mem::uninitialized();
        for ((i, k), elem) in self.iter().map(f).enumerate().zip(array.iter_mut()) {
            ptr::write(elem, ManuallyDrop::new((k, i)));
        }
    }
    indices = &mut array[0 .. self.len()];
} else {
    vec = self.iter().map(f).enumerate().map(|(i, k)| ManuallyDrop::new((k, i))).collect();
    indices = &mut vec[..];
}

indices.sort_unstable();
// ... use `indices` to permute `self`

for elem in indices[0 .. self.len()] {
    unsafe {
        ManuallyDrop::drop(elem);
    }
}

Finally, regarding sort_unstable_key - unfortunately, we cannot use dynamic allocation with it, but we can still optimize it somewhat. Note that the partitioning phase picks a pivot and then compares every element in the slice with it. Currently, we're recomputing the key for the pivot on each comparison, but we could compute it just once before partitioning. That'd effectively halve the number of key computations.

index = indices[index].1;
}
self.swap(i, index);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the time complexity of this for loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was too concerned about correctness to notice that I might have slipped up with the time complexity! I think this is quadratic in the worst case, given some thought. I'll switch to the one you suggested — that seems like a safer choice in any respect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I can easily modify my method to reduce it to linear. It also seems to have a lower constant factor overhead than the one in the article, so a win both ways :)

@@ -1315,8 +1314,7 @@ impl<T> [T] {
/// It is designed to be very fast in cases where the slice is nearly sorted, or consists of
/// two or more sorted sequences concatenated one after another.
///
/// Also, it allocates temporary storage half the size of `self`, but for short slices a
/// non-allocating insertion sort is used instead.
/// The algorithm allocates temporary storage the size of `self`.
Copy link

@ghost ghost Mar 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be a bit more precise here. Temporary storage is a Vec<(K, usize)> of the same length as self.

@@ -1328,10 +1326,21 @@ impl<T> [T] {
/// ```
#[stable(feature = "slice_sort_by_key", since = "1.7.0")]
#[inline]
pub fn sort_by_key<B, F>(&mut self, mut f: F)
pub fn sort_by_key<B, F>(&mut self, f: F)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: B is an odd choice for the key type. :)
How about we change it to K here and in sort_unstable_by_key?

/// It is typically faster than stable sorting, except in a few special cases, e.g. when the
/// slice consists of several concatenated sorted sequences.
/// Due to its key calling strategy, `sort_unstable_by_key` is likely to be slower than
/// `sort_by_key` in cases where the key function is expensive.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to turn sort_unstable_by_key and sort_by_key into actual links to their documentation.

@bluss
Copy link
Member

bluss commented Mar 1, 2018

I would a priori assume that this change would not be in our plans.. such a decorate sort undecorated solution is quite different from the current sort by key, and could exist along side it.

If this is in the absolute majority of cases a better way to do it, then it makes sense to switch.

Making the sort indirect (on indices) is a loss for cache efficiency compared to the original sort and a cheap key function. I may not have the time to review this properly, but stjepang's review is of course more than enough.

@ghost
Copy link

ghost commented Mar 1, 2018

Let's not forget to run benchmarks on an ARM processor and make sure there are no significant regressions.

Another idea for improving performance: use u32 rather than usize for indices whenever possible.

@bluss The sort is not indirect. We're sorting pairs of values and indices, not bare indices. We lose some cache efficiency only by the fact that every element is one word larger.

@bluss
Copy link
Member

bluss commented Mar 1, 2018

Yes, that's a good point. The sorting has the relatively cache local behavior of merge sort, but the apply permutation step doesn't.

@scottmcm
Copy link
Member

scottmcm commented Mar 2, 2018

Can you try a benchmark for sorting by a https://doc.rust-lang.org/std/cmp/struct.Reverse.html key? If that regresses (the easiest "sort it descending" way of which I know), then I'm against this.

It feels likely that people would know when their comparator is expensive, and thus could explicitly do it a different way. Maybe there's a way we can provide an "apply this permutation" helper to make doing the Schwartzian transform version easy? That can use sort_unstable since having the index there turns it into a total order, so the total allocation is the same n-item Vec as this proposed change.

@ghost
Copy link

ghost commented Mar 2, 2018

@scottmcm

Sorting a random Vec<u64> of 10 million numbers in descending order, three different ways:

  1. v.sort_by_key(|x| Reverse(*x))
  2. v.sort_unstable_by_key(|x| Reverse(*x));
  3. // Note: We should also permute the original vector at the end.
    let mut indices: Vec<_> = v.iter()
        .map(|x| Reverse(*x))
        .enumerate()
        .map(|(i, k)| (k, i))
        .collect();
    indices.sort_unstable();

Results on my x86-64 laptop and ARMv7 Raspberry Pi:

Sorting method x86-64 ARM
1. method 0.99 sec 3.56 sec
2. method 0.39 sec 3.39 sec
3. method 0.65 sec 5.24 sec

While sort_unstable brings a huge speedup over sort on x86-64 thus allowing the new sort_by_key a lot of leeway, on ARM it doesn't help that much so in the end the new sort_by_key regresses considerably.

Unless we find a way to somehow alleviate this problem, I find it a deal breaker at the moment.

@varkor
Copy link
Member Author

varkor commented Mar 2, 2018

After addressing some accidental quadraticity, here are some new benchmark results as requested by @stjepang and @scottmcm:

Method Old New
50,000 i32 by -x 2,304,420 (+/- 411,811) 2,214,489 (+/- 395,984)
50,000 strings (len 0..100) by s.len() 15,012,777 (+/- 3,407,938) 15,000,316 (+/- 2,833,091)
100,000 u64 by Reverse(*x) 6,095,566 (+/- 919,154) 6,074,391 (+/- 1,036,496)

Bearing in mind that these keys are essentially worst cases for the new method (i.e. functions with very low overhead), the fact it now outperforms the original is very encouraging.
This comes with the caveat that I've only been testing on x86-64 so far, and as @stjepang's results show, we might not be getting such a gain on ARM.

However, @stjepang's suggestion to use u32 where possible is promising, and I think it should improve the benchmarks even more (albeit not for these cases, where we need to use at least a u64 to index). I haven't tried the small slice optimisation yet either.

@ghost
Copy link

ghost commented Mar 2, 2018

@varkor

My ARM machine is 32-bit so unfortunately using u32 won't help in improving my benchmark results. We're still lagging far behind the current implementation when calling v.sort_by_key(|x| Reverse(*x)). We need to fix that regression somehow. :(

Note that, currently, the fastest and easiest way of sorting a slice in descending order is typically by sorting by Reverse(*x) as the key. We cannot allow this idiom to suddenly become much slower on ARM platforms.

By the way, another issue we haven't really explored is performance on pre-sorted (ascending/descending) and partially sorted (pre-sorted intervals or subsequences) inputs. Those cases mustn't regress too much either.

@varkor
Copy link
Member Author

varkor commented Mar 2, 2018

@stjepang: Ah, okay — yeah, that's frustrating. I'll give it some thought. At worst, we could split the method off into two variants, but it'd be nice if we could optimise this one enough that it was never significantly worse than the original.

@varkor
Copy link
Member Author

varkor commented Mar 8, 2018

@scottmcm @stjepang: just so I have a bit more context in trying to figure how to improve this case, what's the motivation for doing v.sort_by_key(|x| Reverse(*x)) instead of v.sort(); v.reverse()? In my benchmarks, the sort-then-reverse method is faster and (at least to me) seems more intuitive. It gives slightly different behaviour for equal elements ("reverse stable"?), but I imagine this exact behaviour is very rarely required.

In addition, even switching to the unstable sort with this change would fix any regressions. So to be clear, the regression here seems to be: large performance-sensitive stable sorts with trivial keys on ARM (if I haven't missed something).

To be clear, I'm not against adding a new method specifically for a "cached key" sort, but it would certainly be simpler for users if we could boost the performance of 90% of their sort-by-keys without any effort on their part :)

@ghost
Copy link

ghost commented Mar 8, 2018

In my benchmarks, the sort-then-reverse method is faster and (at least to me) seems more intuitive.

Are you sure? Reversing the slice after sorting in ascending order is just extra work. And Reverse is a zero-cost abstraction so sorting by the key Reverse(*x) should really be the fastest way to sort in descending order.

So to be clear, the regression here seems to be: large performance-sensitive stable sorts with trivial keys on ARM (if I haven't missed something).

Yes, pretty much. Actually, the regression probably appears on all machines which don't have highly parallel instruction pipelining (only modern x86-64 machines do, AFAIK).

To be clear, I'm not against adding a new method specifically for a "cached key" sort, but it would certainly be simpler for users if we could boost the performance of 90% of their sort-by-keys without any effort on their part :)

Do you perhaps have any concrete examples of how you're using sort_by_key, like what is the key type and what is the key extraction function? I'd just like to have a bit more context. :)

My guess is that most of the time the key extraction function is slow because it's just a .clone() on strings. And this is usually the case because the function cannot borrow from the element, which is because we don't have HKTs yet (I think).

@varkor
Copy link
Member Author

varkor commented Mar 8, 2018

Are you sure? Reversing the slice after sorting in ascending order is just extra work.

The three methods are very close, but a representative benchmark (on my machine, at least):

test bench_sort_by_key_new   ... bench:     174,708 ns/iter (+/- 37,198)
test bench_sort_by_key_old   ... bench:     178,405 ns/iter (+/- 101,724)
test bench_sort_then_reverse ... bench:     166,406 ns/iter (+/- 12,145)

I haven't really looked into why this is — perhaps Reverse(*x) as a key turns out not to be quite zero cost.

Do you perhaps have any concrete examples of how you're using sort_by_key, like what is the key type and what is the key extraction function?

I can find more later, but one off the top of my head is:

codegen_units.sort_by_key(|cgu| usize::MAX - cgu.size_estimate());

(Additional work was done to avoid recomputing expensive functions that could have been avoided if the implementation of sort_by_key did not recompute the key). (There are other similar related examples in rustc too.)

My guess is that most of the time the key extraction function is slow because it's just a .clone() on strings.

I think this is one occurrence, but it is definitely not the prevalent (for example, looking through the Rust codebase gives other examples.) Regardless of which patterns necessitate non-trivial keys, the fact that they are frequently used indicates an improvement would be welcome.

@ghost
Copy link

ghost commented Mar 8, 2018

I haven't really looked into why this is — perhaps Reverse(*x) as a key turns out not to be quite zero cost.

Hmm, that's odd. The bench_sort_by_key_old seems to have highly variable results (+/- 101,724), which might be worth looking into. Reverse is literally just a simple wrapper defined as struct Reverse<T>(T) - it has to be zero cost. :)

(Additional work was done to avoid recomputing expensive functions that could have been avoided if the implementation of sort_by_key did not recompute the key). (There are other similar related examples in rustc too.)

Thank you, having such examples is really helpful!

I think this is one occurrence, but it is definitely not the prevalent (for example, looking through the Rust codebase gives other examples.) Regardless of which patterns necessitate non-trivial keys, the fact that they are frequently used indicates an improvement would be welcome.

Agreed - we should do something about this. However, I'm afraid that the current state of the PR will make some people happy and others sad - there are improvements in some cases and regressions in others.

I've often wondered if we should extend itertools with methods on slices. See this comment. Perhaps a decorate-sort-undecorate function might be a good fit for it? What do you think - would it make sense to contribute such a function to an external crate and then include it as a dependency in rustc?

@bluss
Copy link
Member

bluss commented Mar 8, 2018

@stjepang No slice functionality in itertools -- having separate slicetools would be lovely.

@varkor
Copy link
Member Author

varkor commented Mar 9, 2018

Hmm, that's odd. The bench_sort_by_key_old seems to have highly variable results (+/- 101,724), which might be worth looking into.

It seemed to have been a one-off — following up on that after spotting it, the variance was much lower, with around the same average.

Reverse is literally just a simple wrapper defined as struct Reverse<T>(T) - it has to be zero cost. :)

I'm not sure then! It does seem slightly odd.

However, I'm afraid that the current state of the PR will make some people happy and others sad - there are improvements in some cases and regressions in others.

Yes, I think you're right. I think, intrinsically, it won't be possible to improve every case and in general, although the delta isn't large, I do expect that the majority of trivial key cases would regress a small amount if the current implementation is replaced. If we don't want this to happen, then perhaps a new method is the best way forward. In theory, we could have a clippy lint that suggests using the "key-cached" method for occurrences of sort_by_key that use non-trivial keys.

Scanning through a GitHub search for sort_by_key, both trivial and non-trivial keys occur reasonably commonly. I think this is definitely a common-enough use case (also considering that some languages, like Python, have deemed its properties important enough to make it the default implementation of sorting-by-key) that it should definitely be available in the standard library, as opposed to in an external crate (I presume you meant including it as a dependency just within rustc, and not making it available in the standard library?). That does raise other questions, but this is a very common pattern when sorting and I think it makes sense to include it directly.

@varkor varkor force-pushed the sort_by_key-cached branch from 78f2a06 to bdcc6f9 Compare March 16, 2018 14:00
@scottmcm
Copy link
Member

I don't know which PR will get in first, but there's a perfect situation for using this in https://github.com/rust-lang/rust/pull/49196/files#diff-c16751a917081c702c37a8959be40f50R1440

@varkor
Copy link
Member Author

varkor commented Mar 20, 2018

@scottmcm: I was thinking of going through rustc, finding good candidates for the cached sort, as a follow-up 👍

@varkor
Copy link
Member Author

varkor commented Mar 21, 2018

Is this at a suitable point to start a FCP for the method?

@kennytm
Copy link
Member

kennytm commented Mar 21, 2018

Given that the method is unstable I don't think an FCP is needed at this point.

@pietroalbini
Copy link
Member

Ping from triage @bluss! The author pushed new commits, could you review them?

///
/// For expensive key functions (e.g. functions that are not simple property accesses or
/// basic operations), [`sort_by_cached_key`](#method.sort_by_cached_key) is likely to be
/// significantly faster, as it does not recompute element keys.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good pointer to make but I think we are normally cautious and omit this; otherwise we have the docs for a stable method recommending an experimental method (for the next few releases).

This is how we handled sort_unstable_by; the mention of it in sort_by's doc first showed up in Rust 1.20 when it went stable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that's a reasonable decision. I'll get rid of those comments for now.

@bluss
Copy link
Member

bluss commented Mar 26, 2018

Thanks everyone that worked on this PR, @varkor and the reviewers, @stjepang with others.

I think we should go without the sort_by_key doc change, so it's ready to go after that's fixed (other reviewers can r=me when it's done).

@kennytm kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 26, 2018
@varkor
Copy link
Member Author

varkor commented Mar 26, 2018

I've removed the mention of sort_cached_by_key from sort_by_key and sort_unstable_by_key, though I've left in their more precise time complexities, as I think they're still an improvement over the previous documentation regardless :)

@bluss
Copy link
Member

bluss commented Mar 26, 2018

@bors r+ rollup

thanks!

@bors
Copy link
Contributor

bors commented Mar 26, 2018

📌 Commit 9c7b69e has been approved by bluss

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 26, 2018
kennytm added a commit to kennytm/rust that referenced this pull request Mar 27, 2018
Add slice::sort_by_cached_key as a memoised sort_by_key

At present, `slice::sort_by_key` calls its key function twice for each comparison that is made. When the key function is expensive (which can often be the case when `sort_by_key` is chosen over `sort_by`), this can lead to very suboptimal behaviour.

To address this, I've introduced a new slice method, `sort_by_cached_key`, which has identical semantic behaviour to `sort_by_key`, except that it guarantees the key function will only be called once per element.

Where there are `n` elements and the key function is `O(m)`:
- `slice::sort_by_cached_key` time complexity is `O(m n log m n)`, compared to `slice::sort_by_key`'s `O(m n + n log n)`.
- `slice::sort_by_cached_key` space complexity remains at `O(n + m)`. (Technically, it now reserves a slice of size `n`, whereas before it reserved a slice of size `n/2`.)

`slice::sort_unstable_by_key` has not been given an analogue, as it is important that unstable sorts are in-place, which is not a property that is guaranteed here. However, this also means that `slice::sort_unstable_by_key` is likely to be slower than `slice::sort_by_cached_key` when the key function does not have negligible complexity. We might want to explore this trade-off further in the future.

Benchmarks (for a vector of 100 `i32`s):
```
# Lexicographic: `|x| x.to_string()`
test bench_sort_by_key ... bench:      112,638 ns/iter (+/- 19,563)
test bench_sort_by_cached_key ... bench:       15,038 ns/iter (+/- 4,814)

# Identity: `|x| *x`
test bench_sort_by_key ... bench:        1,346 ns/iter (+/- 238)
test bench_sort_by_cached_key ... bench:        1,839 ns/iter (+/- 765)

# Power: `|x| x.pow(31)`
test bench_sort_by_key ... bench:        3,624 ns/iter (+/- 738)
test bench_sort_by_cached_key ... bench:        1,997 ns/iter (+/- 311)

# Abs: `|x| x.abs()`
test bench_sort_by_key ... bench:        1,546 ns/iter (+/- 174)
test bench_sort_by_cached_key ... bench:        1,668 ns/iter (+/- 790)
```
(So it seems functions that are single operations do perform slightly worse with this method, but for pretty much any more complex key, you're better off with this optimisation.)

I've definitely found myself using expensive keys in the past and wishing this optimisation was made (e.g. for rust-lang#47415). This feels like both desirable and expected behaviour, at the small cost of slightly more stack allocation and minute degradation in performance for extremely trivial keys.

Resolves rust-lang#34447.
bors added a commit that referenced this pull request Mar 27, 2018
Rollup of 11 pull requests

- Successful merges: #48639, #49223, #49333, #49369, #49381, #49395, #49399, #49401, #49417, #49202, #49426
- Failed merges:
bors added a commit that referenced this pull request Mar 28, 2018
Rollup of 11 pull requests

- Successful merges: #48639, #49223, #49333, #49369, #49381, #49395, #49399, #49401, #49417, #49202, #49426
- Failed merges:
@bors bors merged commit 9c7b69e into rust-lang:master Mar 28, 2018
@varkor varkor deleted the sort_by_key-cached branch March 28, 2018 07:58
kennytm added a commit to kennytm/rust that referenced this pull request Apr 11, 2018
…n, r=scottmcm

Use sort_by_cached_key where appropriate

A follow-up to rust-lang#48639, converting various slice sorting calls to `sort_by_cached_key` when the key functions are more expensive.
kennytm added a commit to kennytm/rust that referenced this pull request Feb 16, 2019
…ey, r=SimonSapin

Stabilize slice_sort_by_cached_key

I was going to ask on the tracking issue (rust-lang#34447), but decided to just send this and hope for an FCP here.  The method was added last March by rust-lang#48639.

Signature: https://doc.rust-lang.org/std/primitive.slice.html#method.sort_by_cached_key
```rust
impl [T] {
    pub fn sort_by_cached_key<K, F>(&mut self, f: F)
        where F: FnMut(&T) -> K, K: Ord;
}
```

That's an identical signature to the existing `sort_by_key`, so I think the questions are just naming, implementation, and the usual "do we want this?".

The implementation seems to have proven its use in rustc at least, which many uses: https://github.com/rust-lang/rust/search?l=Rust&q=sort_by_cached_key

(I'm asking because it's exactly what I just needed the other day:
```rust
    all_positions.sort_by_cached_key(|&n|
        data::CITIES.iter()
            .map(|x| *metric_closure.get_edge(n, x.pos).unwrap())
            .sum::<usize>()
    );
```
since caching that key is a pretty obviously good idea.)

Closes rust-lang#34447
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants