From 10cbf3cc855d4c6cfccd3144f07c7095927a2cdf Mon Sep 17 00:00:00 2001 From: Jeremie Knuesel Date: Sat, 8 Jul 2023 00:14:40 +0200 Subject: [PATCH] Improve documentation of sort-related functions (#48387) * document the `order` keyword in `sort!` * list explicitly the required properties of `lt` in `sort!` * clarify the sequence of "by" transformations if both `by` and `order` are given * show default values in the signatures for `searchsorted` and related functions * note that `by` is also applied to searched value in `searchsorted` and related * add `isunordered` to the manual (it's already exported) --------- Co-authored-by: Lilith Orion Hafner (cherry picked from commit a660798e47ed38c1f8039b0d7af3ff5a451f53e8) --- base/operators.jl | 12 +-- base/ordering.jl | 8 +- base/sort.jl | 209 +++++++++++++++++++++++++++----------- doc/src/base/base.md | 1 + doc/src/base/sort.md | 2 +- doc/src/manual/missing.md | 2 +- 6 files changed, 161 insertions(+), 73 deletions(-) diff --git a/base/operators.jl b/base/operators.jl index 3f51be737ca5c..3f0f8bc49b164 100644 --- a/base/operators.jl +++ b/base/operators.jl @@ -143,7 +143,7 @@ isequal(x::AbstractFloat, y::Real ) = (isnan(x) & isnan(y)) | signequal( isless(x, y) Test whether `x` is less than `y`, according to a fixed total order (defined together with -[`isequal`](@ref)). `isless` is not defined on all pairs of values `(x, y)`. However, if it +[`isequal`](@ref)). `isless` is not defined for pairs `(x, y)` of all types. However, if it is defined, it is expected to satisfy the following: - If `isless(x, y)` is defined, then so is `isless(y, x)` and `isequal(x, y)`, and exactly one of those three yields `true`. @@ -154,13 +154,13 @@ Values that are normally unordered, such as `NaN`, are ordered after regular values. [`missing`](@ref) values are ordered last. -This is the default comparison used by [`sort`](@ref). +This is the default comparison used by [`sort!`](@ref). # Implementation Non-numeric types with a total order should implement this function. Numeric types only need to implement it if they have special values such as `NaN`. Types with a partial order should implement [`<`](@ref). -See the documentation on [Alternate orderings](@ref) for how to define alternate +See the documentation on [Alternate Orderings](@ref) for how to define alternate ordering methods that can be used in sorting and related functions. # Examples @@ -335,6 +335,8 @@ New types with a canonical partial order should implement this function for two arguments of the new type. Types with a canonical total order should implement [`isless`](@ref) instead. +See also [`isunordered`](@ref). + # Examples ```jldoctest julia> 'a' < 'b' @@ -1352,7 +1354,7 @@ corresponding position in `collection`. To get a vector indicating whether each in `items` is in `collection`, wrap `collection` in a tuple or a `Ref` like this: `in.(items, Ref(collection))` or `items .∈ Ref(collection)`. -See also: [`∉`](@ref). +See also: [`∉`](@ref), [`insorted`](@ref), [`contains`](@ref), [`occursin`](@ref), [`issubset`](@ref). # Examples ```jldoctest @@ -1390,8 +1392,6 @@ julia> [1, 2] .∈ ([2, 3],) 0 1 ``` - -See also: [`insorted`](@ref), [`contains`](@ref), [`occursin`](@ref), [`issubset`](@ref). """ in diff --git a/base/ordering.jl b/base/ordering.jl index d0c9cb99f9c72..5383745b1dd1f 100644 --- a/base/ordering.jl +++ b/base/ordering.jl @@ -87,8 +87,8 @@ By(by) = By(by, Forward) """ Lt(lt) -`Ordering` which calls `lt(a, b)` to compare elements. `lt` should -obey the same rules as implementations of [`isless`](@ref). +`Ordering` that calls `lt(a, b)` to compare elements. `lt` must +obey the same rules as the `lt` parameter of [`sort!`](@ref). """ struct Lt{T} <: Ordering lt::T @@ -146,8 +146,8 @@ Construct an [`Ordering`](@ref) object from the same arguments used by Elements are first transformed by the function `by` (which may be [`identity`](@ref)) and are then compared according to either the function `lt` or an existing ordering `order`. `lt` should be [`isless`](@ref) or a function -which obeys similar rules. Finally, the resulting order is reversed if -`rev=true`. +that obeys the same rules as the `lt` parameter of [`sort!`](@ref). Finally, +the resulting order is reversed if `rev=true`. Passing an `lt` other than `isless` along with an `order` other than [`Base.Order.Forward`](@ref) or [`Base.Order.Reverse`](@ref) is not permitted, diff --git a/base/sort.jl b/base/sort.jl index 90f8755d3b1a4..786d8e110e6e2 100644 --- a/base/sort.jl +++ b/base/sort.jl @@ -65,8 +65,8 @@ end """ issorted(v, lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) -Test whether a vector is in sorted order. The `lt`, `by` and `rev` keywords modify what -order is considered to be sorted just as they do for [`sort`](@ref). +Test whether a collection is in sorted order. The keywords modify what +order is considered sorted, as described in the [`sort!`](@ref) documentation. # Examples ```jldoctest @@ -81,6 +81,9 @@ false julia> issorted([(1, "b"), (2, "a")], by = x -> x[2], rev=true) true + +julia> issorted([1, 2, -2, 3], by=abs) +true ``` """ issorted(itr; @@ -96,14 +99,17 @@ maybeview(v, k) = view(v, k) maybeview(v, k::Integer) = v[k] """ - partialsort!(v, k; by=, lt=, rev=false) + partialsort!(v, k; by=identity, lt=isless, rev=false) -Partially sort the vector `v` in place, according to the order specified by `by`, `lt` and -`rev` so that the value at index `k` (or range of adjacent values if `k` is a range) occurs +Partially sort the vector `v` in place so that the value at index `k` (or +range of adjacent values if `k` is a range) occurs at the position where it would appear if the array were fully sorted. If `k` is a single index, that value is returned; if `k` is a range, an array of values at those indices is returned. Note that `partialsort!` may not fully sort the input array. +For the keyword arguments, see the documentation of [`sort!`](@ref). + + # Examples ```jldoctest julia> a = [1, 2, 4, 3, 4] @@ -150,9 +156,9 @@ partialsort!(v::AbstractVector, k::Union{Integer,OrdinalRange}; partialsort!(v, k, ord(lt,by,rev,order)) """ - partialsort(v, k, by=, lt=, rev=false) + partialsort(v, k, by=identity, lt=isless, rev=false) -Variant of [`partialsort!`](@ref) which copies `v` before partially sorting it, thereby returning the +Variant of [`partialsort!`](@ref) that copies `v` before partially sorting it, thereby returning the same thing as `partialsort!` but leaving `v` unmodified. """ partialsort(v::AbstractVector, k::Union{Integer,OrdinalRange}; kws...) = @@ -161,7 +167,7 @@ partialsort(v::AbstractVector, k::Union{Integer,OrdinalRange}; kws...) = # reference on sorted binary search: # http://www.tbray.org/ongoing/When/200x/2003/03/22/Binary -# index of the first value of vector a that is greater than or equal to x; +# index of the first value of vector a that is greater than or equivalent to x; # returns lastindex(v)+1 if x is greater than all values in v. function searchsortedfirst(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keytype(v) where T<:Integer hi = hi + T(1) @@ -180,7 +186,7 @@ function searchsortedfirst(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::key return lo end -# index of the last value of vector a that is less than or equal to x; +# index of the last value of vector a that is less than or equivalent to x; # returns firstindex(v)-1 if x is less than all values of v. function searchsortedlast(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keytype(v) where T<:Integer u = T(1) @@ -197,7 +203,7 @@ function searchsortedlast(v::AbstractVector, x, lo::T, hi::T, o::Ordering)::keyt return lo end -# returns the range of indices of v equal to x +# returns the range of indices of v equivalent to x # if v does not contain x, returns a 0-length range # indicating the insertion point of x function searchsorted(v::AbstractVector, x, ilo::T, ihi::T, o::Ordering)::UnitRange{keytype(v)} where T<:Integer @@ -290,16 +296,19 @@ for s in [:searchsortedfirst, :searchsortedlast, :searchsorted] end """ - searchsorted(a, x; by=, lt=, rev=false) + searchsorted(v, x; by=identity, lt=isless, rev=false) -Return the range of indices of `a` which compare as equal to `x` (using binary search) -according to the order specified by the `by`, `lt` and `rev` keywords, assuming that `a` -is already sorted in that order. Return an empty range located at the insertion point -if `a` does not contain values equal to `x`. +Return the range of indices in `v` where values are equivalent to `x`, or an +empty range located at the insertion point if `v` does not contain values +equivalent to `x`. The vector `v` must be sorted according to the order defined +by the keywords. Refer to [`sort!`](@ref) for the meaning of the keywords and +the definition of equivalence. Note that the `by` function is applied to the +searched value `x` as well as the values in `v`. -See [`sort!`](@ref) for an explanation of the keyword arguments `by`, `lt` and `rev`. +The range is generally found using binary search, but there are optimized +implementations for some inputs. -See also: [`insorted`](@ref), [`searchsortedfirst`](@ref), [`sort`](@ref), [`findall`](@ref). +See also: [`searchsortedfirst`](@ref), [`sort!`](@ref), [`insorted`](@ref), [`findall`](@ref). # Examples ```jldoctest @@ -324,15 +333,19 @@ julia> searchsorted([1=>"one", 2=>"two", 2=>"two", 4=>"four"], 2=>"two", by=firs """ searchsorted """ - searchsortedfirst(a, x; by=, lt=, rev=false) + searchsortedfirst(v, x; by=identity, lt=isless, rev=false) -Return the index of the first value in `a` greater than or equal to `x`, according to the -specified order. Return `lastindex(a) + 1` if `x` is greater than all values in `a`. -`a` is assumed to be sorted. +Return the index of the first value in `v` greater than or equivalent to `x`. +If `x` is greater than all values in `v`, return `lastindex(v) + 1`. -`insert!`ing `x` at this index will maintain sorted order. +The vector `v` must be sorted according to the order defined by the keywords. +`insert!`ing `x` at the returned index will maintain the sorted order. Refer to +[`sort!`](@ref) for the meaning of the keywords and the definition of +"greater than" and equivalence. Note that the `by` function is applied to the +searched value `x` as well as the values in `v`. -See [`sort!`](@ref) for an explanation of the keyword arguments `by`, `lt` and `rev`. +The index is generally found using binary search, but there are optimized +implementations for some inputs. See also: [`searchsortedlast`](@ref), [`searchsorted`](@ref), [`findfirst`](@ref). @@ -353,19 +366,24 @@ julia> searchsortedfirst([1, 2, 4, 5, 5, 7], 9) # no match, insert at end julia> searchsortedfirst([1, 2, 4, 5, 5, 7], 0) # no match, insert at start 1 -julia> searchsortedfirst([1=>"one", 2=>"two", 4=>"four"], 3=>"three", by=first) # Compare the keys of the pairs +julia> searchsortedfirst([1=>"one", 2=>"two", 4=>"four"], 3=>"three", by=first) # compare the keys of the pairs 3 ``` """ searchsortedfirst """ - searchsortedlast(a, x; by=, lt=, rev=false) + searchsortedlast(v, x; by=identity, lt=isless, rev=false) + +Return the index of the last value in `v` less than or equivalent to `x`. +If `x` is less than all values in `v` the function returns `firstindex(v) - 1`. -Return the index of the last value in `a` less than or equal to `x`, according to the -specified order. Return `firstindex(a) - 1` if `x` is less than all values in `a`. `a` is -assumed to be sorted. +The vector `v` must be sorted according to the order defined by the keywords. +Refer to [`sort!`](@ref) for the meaning of the keywords and the definition of +"less than" and equivalence. Note that the `by` function is applied to the +searched value `x` as well as the values in `v`. -See [`sort!`](@ref) for an explanation of the keyword arguments `by`, `lt` and `rev`. +The index is generally found using binary search, but there are optimized +implementations for some inputs # Examples ```jldoctest @@ -390,12 +408,16 @@ julia> searchsortedlast([1=>"one", 2=>"two", 4=>"four"], 3=>"three", by=first) # """ searchsortedlast """ - insorted(x, a; by=, lt=, rev=false) -> Bool + insorted(x, v; by=identity, lt=isless, rev=false) -> Bool + +Determine whether a vector `v` contains any value equivalent to `x`. +The vector `v` must be sorted according to the order defined by the keywords. +Refer to [`sort!`](@ref) for the meaning of the keywords and the definition of +equivalence. Note that the `by` function is applied to the searched value `x` +as well as the values in `v`. -Determine whether an item `x` is in the sorted collection `a`, in the sense that -it is [`==`](@ref) to one of the values of the collection according to the order -specified by the `by`, `lt` and `rev` keywords, assuming that `a` is already -sorted in that order, see [`sort`](@ref) for the keywords. +The check is generally done using binary search, but there are optimized +implementations for some inputs. See also [`in`](@ref). @@ -415,6 +437,9 @@ false julia> insorted(0, [1, 2, 4, 5, 5, 7]) # no match false + +julia> insorted(2=>"TWO", [1=>"one", 2=>"two", 4=>"four"], by=first) # compare the keys of the pairs +true ``` !!! compat "Julia 1.6" @@ -741,8 +766,8 @@ Insertion sort traverses the collection one element at a time, inserting each element into its correct, sorted position in the output vector. Characteristics: -* *stable*: preserves the ordering of elements which compare equal -(e.g. "a" and "A" in a sort of letters which ignores case). +* *stable*: preserves the ordering of elements that compare equal +(e.g. "a" and "A" in a sort of letters that ignores case). * *in-place* in memory. * *quadratic performance* in the number of elements to be sorted: it is well-suited to small collections but should not be used for large ones. @@ -981,8 +1006,8 @@ is treated as the first or last index of the input, respectively. `lo` and `hi` may be specified together as an `AbstractUnitRange`. Characteristics: - * *stable*: preserves the ordering of elements which compare equal - (e.g. "a" and "A" in a sort of letters which ignores case). + * *stable*: preserves the ordering of elements that compare equal + (e.g. "a" and "A" in a sort of letters that ignores case). * *not in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`QuickSort`](@ref). * *linear runtime* if `length(lo:hi)` is constant @@ -1330,16 +1355,54 @@ defalg(v::AbstractArray{Union{}}) = DEFAULT_UNSTABLE # for method disambiguation """ sort!(v; alg::Algorithm=defalg(v), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) -Sort the vector `v` in place. A stable algorithm is used by default. You can select a -specific algorithm to use via the `alg` keyword (see [Sorting Algorithms](@ref) for -available algorithms). The `by` keyword lets you provide a function that will be applied to -each element before comparison; the `lt` keyword allows providing a custom "less than" -function (note that for every `x` and `y`, only one of `lt(x,y)` and `lt(y,x)` can return -`true`); use `rev=true` to reverse the sorting order. `rev=true` preserves forward stability: -Elements that compare equal are not reversed. These options are independent and can -be used together in all possible combinations: if both `by` and `lt` are specified, the `lt` -function is applied to the result of the `by` function; `rev=true` reverses whatever -ordering specified via the `by` and `lt` keywords. +Sort the vector `v` in place. A stable algorithm is used by default: the +ordering of elements that compare equal is preserved. A specific algorithm can +be selected via the `alg` keyword (see [Sorting Algorithms](@ref) for available +algorithms). + +Elements are first transformed with the function `by` and then compared +according to either the function `lt` or the ordering `order`. Finally, the +resulting order is reversed if `rev=true` (this preserves forward stability: +elements that compare equal are not reversed). The current implemention applies +the `by` transformation before each comparison rather than once per element. + +Passing an `lt` other than `isless` along with an `order` other than +[`Base.Order.Forward`](@ref) or [`Base.Order.Reverse`](@ref) is not permitted, +otherwise all options are independent and can be used together in all possible +combinations. Note that `order` can also include a "by" transformation, in +which case it is applied after that defined with the `by` keyword. For more +information on `order` values see the documentation on [Alternate +Orderings](@ref). + +Relations between two elements are defined as follows (with "less" and +"greater" exchanged when `rev=true`): + +* `x` is less than `y` if `lt(by(x), by(y))` (or `Base.Order.lt(order, by(x), by(y))`) yields true. +* `x` is greater than `y` if `y` is less than `x`. +* `x` and `y` are equivalent if neither is less than the other ("incomparable" + is sometimes used as a synonym for "equivalent"). + +The result of `sort!` is sorted in the sense that every element is greater than +or equivalent to the previous one. + +The `lt` function must define a strict weak order, that is, it must be + +* irreflexive: `lt(x, x)` always yields `false`, +* asymmetric: if `lt(x, y)` yields `true` then `lt(y, x)` yields `false`, +* transitive: `lt(x, y) && lt(y, z)` implies `lt(x, z)`, +* transitive in equivalence: `!lt(x, y) && !lt(y, x)` and `!lt(y, z) && !lt(z, + y)` together imply `!lt(x, z) && !lt(z, x)`. In words: if `x` and `y` are + equivalent and `y` and `z` are equivalent then `x` and `z` must be + equivalent. + +For example `<` is a valid `lt` function for `Int` values but `≤` is not: it +violates irreflexivity. For `Float64` values even `<` is invalid as it violates +the fourth condition: `1.0` and `NaN` are equivalent and so are `NaN` and `2.0` +but `1.0` and `2.0` are not equivalent. + +See also [`sort`](@ref), [`sortperm`](@ref), [`sortslices`](@ref), +[`partialsort!`](@ref), [`partialsortperm`](@ref), [`issorted`](@ref), +[`searchsorted`](@ref), [`insorted`](@ref), [`Base.Order.ord`](@ref). # Examples ```jldoctest @@ -1366,6 +1429,29 @@ julia> v = [(1, "c"), (3, "a"), (2, "b")]; sort!(v, by = x -> x[2]); v (3, "a") (2, "b") (1, "c") + +julia> sort(0:3, by=x->x-2, order=Base.Order.By(abs)) # same as sort(0:3, by=abs(x->x-2)) +4-element Vector{Int64}: + 2 + 1 + 3 + 0 + +julia> sort([2, NaN, 1, NaN, 3]) # correct sort with default lt=isless +5-element Vector{Float64}: + 1.0 + 2.0 + 3.0 + NaN + NaN + +julia> sort([2, NaN, 1, NaN, 3], lt=<) # wrong sort due to invalid lt. This behavior is undefined. +5-element Vector{Float64}: + 2.0 + NaN + 1.0 + NaN + 3.0 ``` """ function sort!(v::AbstractVector{T}; @@ -1443,15 +1529,15 @@ merge(x::NTuple, y::NTuple, o::Ordering) = ## partialsortperm: the permutation to sort the first k elements of an array ## """ - partialsortperm(v, k; by=, lt=, rev=false) + partialsortperm(v, k; by=ientity, lt=isless, rev=false) Return a partial permutation `I` of the vector `v`, so that `v[I]` returns values of a fully sorted version of `v` at index `k`. If `k` is a range, a vector of indices is returned; if `k` is an integer, a single index is returned. The order is specified using the same -keywords as `sort!`. The permutation is stable, meaning that indices of equal elements -appear in ascending order. +keywords as `sort!`. The permutation is stable: the indices of equal elements +will appear in ascending order. -Note that this function is equivalent to, but more efficient than, calling `sortperm(...)[k]`. +This function is equivalent to, but more efficient than, calling `sortperm(...)[k]`. # Examples ```jldoctest @@ -1477,7 +1563,7 @@ partialsortperm(v::AbstractVector, k::Union{Integer,OrdinalRange}; kwargs...) = partialsortperm!(similar(Vector{eltype(k)}, axes(v,1)), v, k; kwargs...) """ - partialsortperm!(ix, v, k; by=, lt=, rev=false) + partialsortperm!(ix, v, k; by=identity, lt=isless, rev=false) Like [`partialsortperm`](@ref), but accepts a preallocated index vector `ix` the same size as `v`, which is used to store (a permutation of) the indices of `v`. @@ -1543,7 +1629,7 @@ end Return a permutation vector or array `I` that puts `A[I]` in sorted order along the given dimension. If `A` has more than one dimension, then the `dims` keyword argument must be specified. The order is specified using the same keywords as [`sort!`](@ref). The permutation is guaranteed to be stable even -if the sorting algorithm is unstable, meaning that indices of equal elements appear in +if the sorting algorithm is unstable: the indices of equal elements will appear in ascending order. See also [`sortperm!`](@ref), [`partialsortperm`](@ref), [`invperm`](@ref), [`indexin`](@ref). @@ -1777,7 +1863,8 @@ end sort!(A; dims::Integer, alg::Algorithm=defalg(A), lt=isless, by=identity, rev::Bool=false, order::Ordering=Forward) Sort the multidimensional array `A` along dimension `dims`. -See [`sort!`](@ref) for a description of possible keyword arguments. +See the one-dimensional version of [`sort!`](@ref) for a description of +possible keyword arguments. To sort slices of an array, refer to [`sortslices`](@ref). @@ -1931,8 +2018,8 @@ algorithm. Partial quick sort returns the smallest `k` elements sorted from smal to largest, finding them and sorting them using [`QuickSort`](@ref). Characteristics: - * *not stable*: does not preserve the ordering of elements which - compare equal (e.g. "a" and "A" in a sort of letters which + * *not stable*: does not preserve the ordering of elements that + compare equal (e.g. "a" and "A" in a sort of letters that ignores case). * *in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref). @@ -1969,8 +2056,8 @@ Indicate that a sorting function should use the quick sort algorithm, which is *not* stable. Characteristics: - * *not stable*: does not preserve the ordering of elements which - compare equal (e.g. "a" and "A" in a sort of letters which + * *not stable*: does not preserve the ordering of elements that + compare equal (e.g. "a" and "A" in a sort of letters that ignores case). * *in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref). @@ -1988,8 +2075,8 @@ subcollection at each step, until the entire collection has been recombined in sorted form. Characteristics: - * *stable*: preserves the ordering of elements which compare - equal (e.g. "a" and "A" in a sort of letters which ignores + * *stable*: preserves the ordering of elements that compare + equal (e.g. "a" and "A" in a sort of letters that ignores case). * *not in-place* in memory. * *divide-and-conquer* sort strategy. diff --git a/doc/src/base/base.md b/doc/src/base/base.md index 5851439408775..a4f90ba6c6d0a 100644 --- a/doc/src/base/base.md +++ b/doc/src/base/base.md @@ -129,6 +129,7 @@ Core.:(===) Core.isa Base.isequal Base.isless +Base.isunordered Base.ifelse Core.typeassert Core.typeof diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index 16e1839cf64a2..455a95d39617c 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -163,7 +163,7 @@ Base.Sort.defalg(::AbstractArray{<:Union{SmallInlineStrings, Missing}}) = Inline be stable since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays. -## Alternate orderings +## Alternate Orderings By default, `sort` and related functions use [`isless`](@ref) to compare two elements in order to determine which should come first. The diff --git a/doc/src/manual/missing.md b/doc/src/manual/missing.md index 9bddcdfbb2ac2..8c8e801ccac9a 100644 --- a/doc/src/manual/missing.md +++ b/doc/src/manual/missing.md @@ -88,7 +88,7 @@ true ``` The [`isless`](@ref) operator is another exception: `missing` is considered -as greater than any other value. This operator is used by [`sort`](@ref), +as greater than any other value. This operator is used by [`sort!`](@ref), which therefore places `missing` values after all other values: ```jldoctest