-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To digit simplification #82094
Merged
Merged
To digit simplification #82094
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
r? @dtolnay (rust-highfive has picked a reviewer for you, use r? to override) |
rust-highfive
added
the
S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
label
Feb 14, 2021
This comment has been minimized.
This comment has been minimized.
gilescope
force-pushed
the
to_digit_speedup2
branch
from
February 14, 2021 12:18
63c34a1
to
b5d68cf
Compare
This comment has been minimized.
This comment has been minimized.
gilescope
force-pushed
the
to_digit_speedup2
branch
from
February 14, 2021 17:06
b5d68cf
to
845c14d
Compare
m-ou-se
reviewed
Feb 14, 2021
m-ou-se
added
the
T-libs
Relevant to the library team, which will review and decide on the PR/issue.
label
Feb 14, 2021
Co-authored-by: Mara <m-ou.se@m-ou.se>
This comment has been minimized.
This comment has been minimized.
Remove unused const
@bors r+ |
📌 Commit d2ba68b has been approved by |
bors
added
S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed
S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
Feb 15, 2021
Dylan-DPC-zz
pushed a commit
to Dylan-DPC-zz/rust
that referenced
this pull request
Feb 16, 2021
To digit simplification I found out the other day that all the ascii digits have the first four bits as one would hope them to. (Eg. char `2` ends `0b0010`). There are two bits to indicate it's in the digit range ( `0b0011_0000`). If it is a true digit then all the higher bits aside from these two will be 0 (as ascii is the lowest part of the unicode u32 spectrum). So XORing with `0b11_0000` should mean we either get the number 0-9 or alternativly we get a larger number in the u32 space. If we get something that's not 0-9 then it will be discarded as it will be greater than the radix. The code seems so fast though that there's quite a lot of noise in the benchmarks so it's not that easy to prove conclusively that it's faster as well as less instructions. The non-fast path I was toying with as well wondering if we could do this as then we'd only have one return and less instructions still: ``` match self { 'a'..='z' => self as u32 - 'a' as u32 + 10, 'A'..='Z' => self as u32 - 'A' as u32 + 10, _ => { radix = 10; self as u32 ^ ASCII_DIGIT_MASK}, } ``` Here's the [godbolt](https://godbolt.org/z/883c9n). ( H/T to `@byteshadow` for pointing out xor was what I needed)
bors
added a commit
to rust-lang-ci/rust
that referenced
this pull request
Feb 17, 2021
…laumeGomez Rollup of 11 pull requests Successful merges: - rust-lang#79981 (Add 'consider using' message to overflowing_literals) - rust-lang#82094 (To digit simplification) - rust-lang#82105 (Don't fail to remove files if they are missing) - rust-lang#82136 (Fix ICE: Use delay_span_bug for mismatched subst/hir arg) - rust-lang#82169 (Document that `assert!` format arguments are evaluated lazily) - rust-lang#82174 (Replace File::create and write_all with fs::write) - rust-lang#82196 (Add caveat to Path::display() about lossiness) - rust-lang#82198 (Use internal iteration in Iterator::is_sorted_by) - rust-lang#82204 (Update books) - rust-lang#82207 (rustdoc: treat edition 2021 as unstable) - rust-lang#82231 (Add long explanation for E0543) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
T-libs
Relevant to the library team, which will review and decide on the PR/issue.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I found out the other day that all the ascii digits have the first four bits as one would hope them to. (Eg. char
2
ends0b0010
). There are two bits to indicate it's in the digit range (0b0011_0000
). If it is a true digit then all the higher bits aside from these two will be 0 (as ascii is the lowest part of the unicode u32 spectrum). So XORing with0b11_0000
should mean we either get the number 0-9 or alternativly we get a larger number in the u32 space. If we get something that's not 0-9 then it will be discarded as it will be greater than the radix.The code seems so fast though that there's quite a lot of noise in the benchmarks so it's not that easy to prove conclusively that it's faster as well as less instructions.
The non-fast path I was toying with as well wondering if we could do this as then we'd only have one return and less instructions still:
Here's the godbolt.
( H/T to @byteshadow for pointing out xor was what I needed)