Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster slice PartialOrd #28436

Merged
merged 6 commits into from
Sep 17, 2015
Merged

Faster slice PartialOrd #28436

merged 6 commits into from
Sep 17, 2015

Commits on Sep 16, 2015

  1. Specialize PartialOrd for totally ordered primitive types

    Knowing the result of equality comparison can enable additional
    optimizations in LLVM.
    
    Additionally, this makes it obvious that `partial_cmp` on totally
    ordered types cannot return `None`.
    ranma42 committed Sep 16, 2015
    Configuration menu
    Copy the full SHA
    1614173 View commit details
    Browse the repository at this point in the history
  2. Improve PartialOrd for slices

    Reusing the same idea as in rust-lang#26884, we can exploit the fact that the
    length of slices is known, hence we can use a counted loop instead of
    iterators, which means that we only need a single counter, instead of
    having to increment and check one pointer for each iterator.
    
    Using the generic implementation of the boolean comparison operators
    (`lt`, `le`, `gt`, `ge`) provides further speedup for simple
    types. This happens because the loop scans elements checking for
    equality and dispatches to element comparison or length comparison
    depending on the result of the prefix comparison.
    
    ```
    test u8_cmp          ... bench:      14,043 ns/iter (+/- 1,732)
    test u8_lt           ... bench:      16,156 ns/iter (+/- 1,864)
    test u8_partial_cmp  ... bench:      16,250 ns/iter (+/- 2,608)
    test u16_cmp         ... bench:      15,764 ns/iter (+/- 1,420)
    test u16_lt          ... bench:      19,833 ns/iter (+/- 2,826)
    test u16_partial_cmp ... bench:      19,811 ns/iter (+/- 2,240)
    test u32_cmp         ... bench:      15,792 ns/iter (+/- 3,409)
    test u32_lt          ... bench:      18,577 ns/iter (+/- 2,075)
    test u32_partial_cmp ... bench:      18,603 ns/iter (+/- 5,666)
    test u64_cmp         ... bench:      16,337 ns/iter (+/- 2,511)
    test u64_lt          ... bench:      18,074 ns/iter (+/- 7,914)
    test u64_partial_cmp ... bench:      17,909 ns/iter (+/- 1,105)
    ```
    
    ```
    test u8_cmp          ... bench:       6,511 ns/iter (+/- 982)
    test u8_lt           ... bench:       6,671 ns/iter (+/- 919)
    test u8_partial_cmp  ... bench:       7,118 ns/iter (+/- 1,623)
    test u16_cmp         ... bench:       6,689 ns/iter (+/- 921)
    test u16_lt          ... bench:       6,712 ns/iter (+/- 947)
    test u16_partial_cmp ... bench:       6,725 ns/iter (+/- 780)
    test u32_cmp         ... bench:       7,704 ns/iter (+/- 1,294)
    test u32_lt          ... bench:       7,611 ns/iter (+/- 3,062)
    test u32_partial_cmp ... bench:       7,640 ns/iter (+/- 1,149)
    test u64_cmp         ... bench:       7,517 ns/iter (+/- 2,164)
    test u64_lt          ... bench:       7,579 ns/iter (+/- 1,048)
    test u64_partial_cmp ... bench:       7,629 ns/iter (+/- 1,195)
    ```
    ranma42 committed Sep 16, 2015
    Configuration menu
    Copy the full SHA
    d04b8b5 View commit details
    Browse the repository at this point in the history
  3. Reuse cmp in totally ordered types

    Instead of manually defining it, `partial_cmp` can simply wrap the
    result of `cmp` for totally ordered types.
    ranma42 committed Sep 16, 2015
    Configuration menu
    Copy the full SHA
    bf9254a View commit details
    Browse the repository at this point in the history
  4. Remove boundary checks in slice comparison operators

    In order to get rid of all range checks, the compiler needs to
    explicitly see that the slices it iterates over are as long as the
    loop variable upper bound.
    
    This further improves the performance of slice comparison:
    
    ```
    test u8_cmp          ... bench:       4,761 ns/iter (+/- 1,203)
    test u8_lt           ... bench:       4,579 ns/iter (+/- 649)
    test u8_partial_cmp  ... bench:       4,768 ns/iter (+/- 761)
    test u16_cmp         ... bench:       4,607 ns/iter (+/- 580)
    test u16_lt          ... bench:       4,681 ns/iter (+/- 567)
    test u16_partial_cmp ... bench:       4,607 ns/iter (+/- 967)
    test u32_cmp         ... bench:       4,448 ns/iter (+/- 891)
    test u32_lt          ... bench:       4,546 ns/iter (+/- 992)
    test u32_partial_cmp ... bench:       4,415 ns/iter (+/- 646)
    test u64_cmp         ... bench:       4,380 ns/iter (+/- 1,184)
    test u64_lt          ... bench:       5,684 ns/iter (+/- 602)
    test u64_partial_cmp ... bench:       4,663 ns/iter (+/- 1,158)
    ```
    ranma42 committed Sep 16, 2015
    Configuration menu
    Copy the full SHA
    369a9dc View commit details
    Browse the repository at this point in the history
  5. Remove inline attribute

    Be more conservative with inlining.
    ranma42 committed Sep 16, 2015
    Configuration menu
    Copy the full SHA
    08b9edf View commit details
    Browse the repository at this point in the history
  6. Explain explicit slicing in slice cmp and partial_cmp methods

    The explicit slicing is needed in order to enable additional range
    check optimizations in the compiler.
    ranma42 committed Sep 16, 2015
    Configuration menu
    Copy the full SHA
    74dc146 View commit details
    Browse the repository at this point in the history