Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize equality for [T] and comparison for [u8] to use memcmp when possible #32699

Merged
merged 3 commits into from
Apr 7, 2016

Conversation

bluss
Copy link
Member

@bluss bluss commented Apr 3, 2016

Specialize equality for [T] and comparison for [u8] to use memcmp when possible

Where T is a type that can be compared for equality bytewise, we can use
memcmp. We can also use memcmp for PartialOrd, Ord for [u8].

Use specialization to call memcmp in PartialEq for slices for certain element types. This PR does not change the user visible API since the implementation uses an intermediate trait. See commit messages for more information.

The memcmp signature was changed from *const i8 to *const u8 which is in line with how the memcmp function is defined in C (taking const void * arguments, interpreting the values as unsigned bytes for purposes of the comparison).

@rust-highfive
Copy link
Collaborator

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@bluss
Copy link
Member Author

bluss commented Apr 3, 2016

Previous specialization PR: #32586

@alexcrichton
Copy link
Member

Could we avoid the addition of a new sys module and just move the definition? The eq_slice lang item seems like it should be able to just call as_bytes() == as_bytes() now, right?

@bluss
Copy link
Member Author

bluss commented Apr 3, 2016

I have a completely unsubstantiated fear that messing with eq_slice will result in ICEs. We can see what happens if we try.


/// Trait implemented for types that can be compared for equality using
/// their bytewise representation
trait BytewiseEquality { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t we be able to implement this ∀T: Copy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example f32 and &i32 are two copy types that don't compare for equality byte by byte

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this would be helpful for fast comparisons/hashing.

Where T is a type that can be compared for equality bytewise, we can use
memcmp. We can also use memcmp for PartialOrd, Ord for [u8] and by
extension &str.

This is an improvement for example for the comparison [u8] == [u8] that
used to emit a loop that compared the slices byte by byte.

One worry here could be that this introduces function calls to memcmp
in contexts where it should really inline the comparison or even
optimize it out, but llvm takes care of recognizing memcmp specifically.
The old test for Ord used no asserts, and appeared to have a wrong test. (!).
@bluss
Copy link
Member Author

bluss commented Apr 5, 2016

I have updated (rebased) the branch. Moving memcmp into slice.rs required specializing PartialOrd, Ord too, so that &str could use that. I'll let travis run the full testsuite on this.

@bluss bluss changed the title Specialize &[u8] equality (and other primitive types) to use memcmp Specialize equality for [T] and comparison for [u8] to use memcmp when possible Apr 5, 2016
}
}

impl_marker_for!(BytewiseEquality,
u8 i8 u16 i16 u32 i32 u64 i64 usize isize char bool);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is room to eek out more performance here if we specialize for [u16], [u32] and etc. I did some experimentation with trying to speed up memcmp a year or so ago, and most implementations just check a byte at a time until the slices align on a usize. For these larger types, we could just skip some of these alignment checks since we already know they're aligned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! That sounds like exactly the motivation needed to make memcmp(*const u8, *const u8, usize) into an llvm intrinsic, so that it uses the alignment information from the pointer (like memcpy already does). http://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bluss: Exactly. There was a few questions a number of years ago (1, 2) to add one, but it didn't get any traction.

PS: I think I found when I was talking about this on #rust-internals with you, @bluss, back in 2015 :) I can't say that this would be a big win, it might just shave off a few conditionals, which may or may not really matter in real code. I also found my old benchmarks, which I've uploaded to https://github.com/erickt/rust-memcmp-benches.

@alexcrichton
Copy link
Member

@bors: r+ 28c4d12

@bors
Copy link
Contributor

bors commented Apr 6, 2016

⌛ Testing commit 28c4d12 with merge 523ae98...

@Manishearth
Copy link
Member

http://buildbot.rust-lang.org/builders/auto-win-msvc-64-opt-rustbuild/builds/597/steps/test/logs/stdio

Linkcheck stage2 (x86_64-pc-windows-msvc)
std\primitive.bool.html:110: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.char.html:746: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.i16.html:826: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.i32.html:826: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.i64.html:826: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.i8.html:826: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.isize.html:826: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.slice.html:664: broken link - core\slice\trait.SliceOrd.html
std\primitive.slice.html:664: broken link - core\slice\trait.SliceOrd.html
std\primitive.slice.html:665: broken link - core\slice\trait.SliceOrd.html
std\primitive.slice.html:665: broken link - core\slice\trait.SliceOrd.html
std\primitive.slice.html:666: broken link - core\slice\trait.SlicePartialOrd.html
std\primitive.slice.html:666: broken link - core\slice\trait.SlicePartialOrd.html
std\primitive.slice.html:667: broken link - core\slice\trait.SlicePartialOrd.html
std\primitive.slice.html:667: broken link - core\slice\trait.SlicePartialOrd.html
std\primitive.slice.html:668: broken link - core\slice\trait.SlicePartialEq.html
std\primitive.slice.html:668: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.slice.html:668: broken link - core\slice\trait.SlicePartialEq.html
std\primitive.slice.html:669: broken link - core\slice\trait.SlicePartialEq.html
std\primitive.slice.html:670: broken link - core\slice\trait.SlicePartialEq.html
std\primitive.slice.html:670: broken link - core\slice\trait.SlicePartialEq.html
std\primitive.slice.html:671: broken link - core\slice\trait.SlicePartialEq.html
std\primitive.u16.html:751: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.u32.html:751: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.u64.html:751: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.u8.html:751: broken link - core\slice\trait.BytewiseEquality.html
std\primitive.usize.html:751: broken link - core\slice\trait.BytewiseEquality.html


command did not execute successfully: "C:\\bot\\slave\\auto-win-msvc-64-opt-rustbuild\\build\\obj\\build\\x86_64-pc-windows-msvc\\stage2-rustc\\x86_64-pc-windows-msvc\\release\\linkchecker.exe" "C:\\bot\\slave\\auto-win-msvc-64-opt-rustbuild\\build\\obj\\build\\x86_64-pc-windows-msvc\\doc"
expected success, got: exit code: 101

@bors
Copy link
Contributor

bors commented Apr 6, 2016

⛄ The build was interrupted to prioritize another pull request.

@Manishearth
Copy link
Member

@bors r-

(feel free to re-r when the bug is fixed)

@bluss
Copy link
Member Author

bluss commented Apr 6, 2016

oh, thanks Manishearth.

I guess we need doc(hidden) on the traits, even if they are private?

This should avoid the trait impls showing up in rustdoc.
@bluss
Copy link
Member Author

bluss commented Apr 6, 2016

ok, all other private traits seem to use doc(hidden). resubmitting with that fixed.

@bors r=alexcrichton

@bors
Copy link
Contributor

bors commented Apr 6, 2016

📌 Commit a6c27be has been approved by alexcrichton

Manishearth added a commit to Manishearth/rust that referenced this pull request Apr 7, 2016
Specialize equality for [T] and comparison for [u8] to use memcmp when possible

Specialize equality for [T] and comparison for [u8] to use memcmp when possible

Where T is a type that can be compared for equality bytewise, we can use
memcmp. We can also use memcmp for PartialOrd, Ord for [u8].

Use specialization to call memcmp in PartialEq for slices for certain element types. This PR does not change the user visible API since the implementation uses an intermediate trait. See commit messages for more information.

The memcmp signature was changed from `*const i8` to `*const u8` which is in line with how the memcmp function is defined in C (taking const void * arguments, interpreting the values as unsigned bytes for purposes of the comparison).
bors added a commit that referenced this pull request Apr 7, 2016
Rollup of 11 pull requests

- Successful merges: #32016, #32583, #32699, #32729, #32731, #32738, #32741, #32745, #32748, #32757, #32786
- Failed merges: #32773
Manishearth added a commit to Manishearth/rust that referenced this pull request Apr 7, 2016
Specialize equality for [T] and comparison for [u8] to use memcmp when possible

Specialize equality for [T] and comparison for [u8] to use memcmp when possible

Where T is a type that can be compared for equality bytewise, we can use
memcmp. We can also use memcmp for PartialOrd, Ord for [u8].

Use specialization to call memcmp in PartialEq for slices for certain element types. This PR does not change the user visible API since the implementation uses an intermediate trait. See commit messages for more information.

The memcmp signature was changed from `*const i8` to `*const u8` which is in line with how the memcmp function is defined in C (taking const void * arguments, interpreting the values as unsigned bytes for purposes of the comparison).
bors added a commit that referenced this pull request Apr 7, 2016
Rollup of 7 pull requests

- Successful merges: #32674, #32699, #32711, #32745, #32748, #32757, #32789
- Failed merges:
@bors bors merged commit a6c27be into rust-lang:master Apr 7, 2016
@bluss bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Apr 9, 2016
Centril added a commit to Centril/rust that referenced this pull request Jul 11, 2019
core: check for pointer equality when comparing Eq slices

Because `Eq` types must be reflexively equal, an equal-length slice to the same memory location must be equal.

This is related to rust-lang#33892 (and rust-lang#32699) answering this comment from that PR:

> Great! One more easy question: why does this optimization not apply in the non-BytewiseEquality implementation directly above?

Because slices of non-reflexively equal types (like `f64`) are not equal even if it's the same slice. But if the types are `Eq`, we can use this same-address optimization, which this PR implements. Obviously this changes behavior if types violate the reflexivity condition of `Eq`, because their impls of `PartialEq` will no longer be called per-item, but 🤷‍♂ .

It's not clear how often this optimization comes up in the real world outside of the same-`&str` case covered by rust-lang#33892, so **I'm requesting a perf run** (on MacOS today, so can't run `rustc_perf` myself). I'm going ahead and making the PR on the basis of being surprised things didn't already work this way.

This is my first time hacking rust itself, so as a perf sanity check I ran `./x.py bench --stage 0 src/lib{std,alloc}`, but the differences were noisy.

To make the existing specialization for `BytewiseEquality` explicit, it's now a supertrait of `Eq + Copy`. `Eq` should be sufficient, but `Copy` was included for clarity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants