Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slice::sort_by_cached_key as a memoised sort_by_key #48639

Merged
merged 15 commits into from
Mar 28, 2018
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/liballoc/benches/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#![feature(i128_type)]
#![feature(rand)]
#![feature(repr_simd)]
#![feature(slice_sort_by_cached_key)]
#![feature(test)]

extern crate rand;
Expand Down
15 changes: 15 additions & 0 deletions src/liballoc/benches/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,17 @@ macro_rules! sort_expensive {
}
}

macro_rules! sort_lexicographic {
($f:ident, $name:ident, $gen:expr, $len:expr) => {
#[bench]
fn $name(b: &mut Bencher) {
let v = $gen($len);
b.iter(|| v.clone().$f(|x| x.to_string()));
b.bytes = $len * mem::size_of_val(&$gen(1)[0]) as u64;
}
}
}

sort!(sort, sort_small_ascending, gen_ascending, 10);
sort!(sort, sort_small_descending, gen_descending, 10);
sort!(sort, sort_small_random, gen_random, 10);
Expand Down Expand Up @@ -312,6 +323,10 @@ sort!(sort_unstable, sort_unstable_large_big, gen_big_random, 10000);
sort_strings!(sort_unstable, sort_unstable_large_strings, gen_strings, 10000);
sort_expensive!(sort_unstable_by, sort_unstable_large_expensive, gen_random, 10000);

sort_lexicographic!(sort_by_key, sort_by_key_lexicographic, gen_random, 10000);
sort_lexicographic!(sort_unstable_by_key, sort_unstable_by_key_lexicographic, gen_random, 10000);
sort_lexicographic!(sort_by_cached_key, sort_by_cached_key_lexicographic, gen_random, 10000);

macro_rules! reverse {
($name:ident, $ty:ty, $f:expr) => {
#[bench]
Expand Down
94 changes: 85 additions & 9 deletions src/liballoc/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ use core::mem::size_of;
use core::mem;
use core::ptr;
use core::slice as core_slice;
use core::{u8, u16, u32};

use borrow::{Borrow, BorrowMut, ToOwned};
use boxed::Box;
Expand Down Expand Up @@ -1302,7 +1303,12 @@ impl<T> [T] {

/// Sorts the slice with a key extraction function.
///
/// This sort is stable (i.e. does not reorder equal elements) and `O(n log n)` worst-case.
/// This sort is stable (i.e. does not reorder equal elements) and `O(m n log(m n))`
/// worst-case, where the key function is `O(m)`.
///
/// For expensive key functions (e.g. functions that are not simple property accesses or
/// basic operations), [`sort_by_cached_key`](#method.sort_by_cached_key) is likely to be
/// significantly faster, as it does not recompute element keys.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good pointer to make but I think we are normally cautious and omit this; otherwise we have the docs for a stable method recommending an experimental method (for the next few releases).

This is how we handled sort_unstable_by; the mention of it in sort_by's doc first showed up in Rust 1.20 when it went stable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that's a reasonable decision. I'll get rid of those comments for now.

///
/// When applicable, unstable sorting is preferred because it is generally faster than stable
/// sorting and it doesn't allocate auxiliary memory.
Expand All @@ -1328,12 +1334,82 @@ impl<T> [T] {
/// ```
#[stable(feature = "slice_sort_by_key", since = "1.7.0")]
#[inline]
pub fn sort_by_key<B, F>(&mut self, mut f: F)
where F: FnMut(&T) -> B, B: Ord
pub fn sort_by_key<K, F>(&mut self, mut f: F)
where F: FnMut(&T) -> K, K: Ord
{
merge_sort(self, |a, b| f(a).lt(&f(b)));
}

/// Sorts the slice with a key extraction function.
///
/// During sorting, the key function is called only once per element.
///
/// This sort is stable (i.e. does not reorder equal elements) and `O(m n + n log n)`
/// worst-case, where the key function is `O(m)`.
///
/// For simple key functions (e.g. functions that are property accesses or
/// basic operations), [`sort_by_key`](#method.sort_by_key) is likely to be
/// faster.
///
/// # Current implementation
///
/// The current algorithm is based on [pattern-defeating quicksort][pdqsort] by Orson Peters,
/// which combines the fast average case of randomized quicksort with the fast worst case of
/// heapsort, while achieving linear time on slices with certain patterns. It uses some
/// randomization to avoid degenerate cases, but with a fixed seed to always provide
/// deterministic behavior.
///
/// In the worst case, the algorithm allocates temporary storage in a `Vec<(K, usize)>` the
/// length of the slice.
///
/// # Examples
///
/// ```
/// #![feature(slice_sort_by_cached_key)]
/// let mut v = [-5i32, 4, 32, -3, 2];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[01:04:41] ---- slice.rs - slice::[T]::sort_by_cached_key (line 1367) stdout ----
[01:04:41] 	error[E0658]: use of unstable library feature 'slice_sort_by_cached_key' (see issue #34447)
[01:04:41]  --> slice.rs:1370:3
[01:04:41]   |
[01:04:41] 6 | v.sort_by_cached_key(|k| k.to_string());
[01:04:41]   |   ^^^^^^^^^^^^^^^^^^
[01:04:41]   |
[01:04:41]   = help: add #![feature(slice_sort_by_cached_key)] to the crate attributes to enable

///
/// v.sort_by_cached_key(|k| k.to_string());
/// assert!(v == [-3, -5, 2, 32, 4]);
/// ```
///
/// [pdqsort]: https://github.com/orlp/pdqsort
#[unstable(feature = "slice_sort_by_cached_key", issue = "34447")]
#[inline]
pub fn sort_by_cached_key<K, F>(&mut self, f: F)
where F: FnMut(&T) -> K, K: Ord
{
// Helper macro for indexing our vector by the smallest possible type, to reduce allocation.
macro_rules! sort_by_key {
($t:ty, $slice:ident, $f:ident) => ({
let mut indices: Vec<_> =
Copy link

@ghost ghost Mar 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to be able to use SmallVec here and save an allocation when sorting short slices.

Just a suggestion, though. Maybe leave that for a future PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea; I might leave that for a future PR though, yeah.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than pulling in a full crate (which is mostly about encapsulating stuff in a single type that can be moved as a unit), you can reproduce that functionality with a couple of stack variables. Something like:

pub fn foo<T>(s: &[T], f: &Fn(&T) -> T) {
    let iter = s.iter().map(f).enumerate().map(|(i, k)| (k, i as u16));
    let mut vec: Vec<_>;
    let mut array: [_; ARRAY_LEN];
    const ARRAY_LEN: usize = 32;
    let len = iter.len();
    let mapped = if len <= ARRAY_LEN {
        array = unsafe {
            std::mem::uninitialized()
        };
        for (x, slot) in iter.zip(&mut array) {
            *slot = x
        }
        &mut array[..len]
    } else {
        vec = iter.collect();
        &mut vec[..]
    };
    // …
}

Copy link
Member

@bluss bluss Mar 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just a caveat, that's the gist of it and the array code needs to take more precautions to be safe.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bluss Oh, do you mean panic-safe? Yes, I forgot about that, the array should be inside a ManuallyDrop.

Copy link
Member

@bluss bluss Mar 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And use ptr::write or the equivalent with a ManuallyDrop around each element, that's the two things I could spot.

$slice.iter().map($f).enumerate().map(|(i, k)| (k, i as $t)).collect();
// The elements of `indices` are unique, as they are indexed, so any sort will be
// stable with respect to the original slice. We use `sort_unstable` here because
// it requires less memory allocation.
indices.sort_unstable();
for i in 0..$slice.len() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need some kind of proof (even an informal one would be okay) that this loop takes linear time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the pictures in this article.

I'm having a harder time proving it is correct rather than linear time, but it looks good to me.

It should be linear time because the inner while loop is walking one of those cycles (illustrated in the article after “will create the following set of cycles:”).

Whenever we walked k + 1 links in a cycle, the indices[i].1 = index write means that we shorten the cycle by k links. (All the now-unreachable links have an index that is < i, so they are not going to be used again.)

That means that the inner while loop's body can only be visited O(self.len()) times.

let mut index = indices[i].1;
while (index as usize) < i {
index = indices[index as usize].1;
}
indices[i].1 = index;
$slice.swap(i, index as usize);
}
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the time complexity of this for loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was too concerned about correctness to notice that I might have slipped up with the time complexity! I think this is quadratic in the worst case, given some thought. I'll switch to the one you suggested — that seems like a safer choice in any respect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I can easily modify my method to reduce it to linear. It also seems to have a lower constant factor overhead than the one in the article, so a win both ways :)


let sz_u8 = mem::size_of::<(K, u8)>();
let sz_u16 = mem::size_of::<(K, u16)>();
let sz_u32 = mem::size_of::<(K, u32)>();
let sz_usize = mem::size_of::<(K, usize)>();

let len = self.len();
if sz_u8 < sz_u16 && len <= ( u8::MAX as usize) { return sort_by_key!( u8, self, f) }
if sz_u16 < sz_u32 && len <= (u16::MAX as usize) { return sort_by_key!(u16, self, f) }
if sz_u32 < sz_usize && len <= (u32::MAX as usize) { return sort_by_key!(u32, self, f) }
sort_by_key!(usize, self, f)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we have four cases, and each one will use a separate monomorphized version of sort_unstable. This creates some code bloat, but the opportunity of choosing the most suitable version at runtime is hopefully worth it (any benchmarks?).

However, my feeling is that in many cases size_of::<(K, u8)>() will be equal to size_of::<(K, usize)>() anyway so it doesn't matter which index type we choose.

Let's add another set of conditions to these ifs to remove the code bloat whenever possible, like this:

let size_u8    = mem::size_of::<(K, u8)>();
let size_u16   = mem::size_of::<(K, u16)>();
let size_u32   = mem::size_of::<(K, u32)>();
let size_usize = mem::size_of::<(K, usize)>();

if size_u8  < size_u16   && len <= ( u8::MAX as usize) { ... }
if size_u16 < size_u32   && len <= (u16::MAX as usize) { ... }
if size_u32 < size_usize && len <= (u32::MAX as usize) { ... }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did check that modifying the sizes like this had a positive effect, though I don't think I kept the results now; I can do more later if you'd like to see them.

}

/// Sorts the slice, but may not preserve the order of equal elements.
///
/// This sort is unstable (i.e. may reorder equal elements), in-place (i.e. does not allocate),
Expand Down Expand Up @@ -1410,7 +1486,7 @@ impl<T> [T] {
/// elements.
///
/// This sort is unstable (i.e. may reorder equal elements), in-place (i.e. does not allocate),
/// and `O(n log n)` worst-case.
/// and `O(m n log(m n))` worst-case, where the key function is `O(m)`.
///
/// # Current implementation
///
Expand All @@ -1420,8 +1496,9 @@ impl<T> [T] {
/// randomization to avoid degenerate cases, but with a fixed seed to always provide
/// deterministic behavior.
///
/// It is typically faster than stable sorting, except in a few special cases, e.g. when the
/// slice consists of several concatenated sorted sequences.
/// Due to its key calling strategy, [`sort_unstable_by_key`](#method.sort_unstable_by_key)
/// is likely to be slower than [`sort_by_cached_key`](#method.sort_by_cached_key) in
/// cases where the key function is expensive.
///
/// # Examples
///
Expand All @@ -1435,9 +1512,8 @@ impl<T> [T] {
/// [pdqsort]: https://github.com/orlp/pdqsort
#[stable(feature = "sort_unstable", since = "1.20.0")]
#[inline]
pub fn sort_unstable_by_key<B, F>(&mut self, f: F)
where F: FnMut(&T) -> B,
B: Ord
pub fn sort_unstable_by_key<K, F>(&mut self, f: F)
where F: FnMut(&T) -> K, K: Ord
{
core_slice::SliceExt::sort_unstable_by_key(self, f);
}
Expand Down
1 change: 1 addition & 0 deletions src/liballoc/tests/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#![feature(pattern)]
#![feature(placement_in_syntax)]
#![feature(rand)]
#![feature(slice_sort_by_cached_key)]
#![feature(splice)]
#![feature(str_escape)]
#![feature(string_retain)]
Expand Down
19 changes: 16 additions & 3 deletions src/liballoc/tests/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,14 @@ fn test_sort() {
v.sort_by(|a, b| b.cmp(a));
assert!(v.windows(2).all(|w| w[0] >= w[1]));

// Sort in lexicographic order.
let mut v1 = orig.clone();
let mut v2 = orig.clone();
v1.sort_by_key(|x| x.to_string());
v2.sort_by_cached_key(|x| x.to_string());
assert!(v1.windows(2).all(|w| w[0].to_string() <= w[1].to_string()));
assert!(v1 == v2);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since sort_by_cached_key is a stable sort, let's also verify its stability in the fn test_sort_stability() below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

// Sort with many pre-sorted runs.
let mut v = orig.clone();
v.sort();
Expand Down Expand Up @@ -477,24 +485,29 @@ fn test_sort_stability() {
// the second item represents which occurrence of that
// number this element is, i.e. the second elements
// will occur in sorted order.
let mut v: Vec<_> = (0..len)
let mut orig: Vec<_> = (0..len)
.map(|_| {
let n = thread_rng().gen::<usize>() % 10;
counts[n] += 1;
(n, counts[n])
})
.collect();

// only sort on the first element, so an unstable sort
let mut v = orig.clone();
// Only sort on the first element, so an unstable sort
// may mix up the counts.
v.sort_by(|&(a, _), &(b, _)| a.cmp(&b));

// this comparison includes the count (the second item
// This comparison includes the count (the second item
// of the tuple), so elements with equal first items
// will need to be ordered with increasing
// counts... i.e. exactly asserting that this sort is
// stable.
assert!(v.windows(2).all(|w| w[0] <= w[1]));

let mut v = orig.clone();
v.sort_by_cached_key(|&(x, _)| x);
assert!(v.windows(2).all(|w| w[0] <= w[1]));
}
}
}
Expand Down