Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i8 and u8::to_string() specialisation (far less asm). #82576

Merged
merged 6 commits into from
May 3, 2021

Conversation

gilescope
Copy link
Contributor

@gilescope gilescope commented Feb 27, 2021

Take 2. Around 1/6th of the assembly to without specialisation.

https://godbolt.org/z/bzz8Mq

(partially fixes #73533 )

@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 27, 2021
@rust-log-analyzer

This comment has been minimized.

@gilescope
Copy link
Contributor Author

-1 isn’t a u8 is it? This failing test is a bit of a curveball.

@SkiFire13
Copy link
Contributor

I guess it's some shenanigans with type inference. Before it was being inferred as i32, but now that it's being inferred as u8. Consider this example (playground), without the specialized impl of Foo for u8 it compiles fine, with it it throws an error.

@the8472
Copy link
Member

the8472 commented Feb 27, 2021

Type inference breaking changes are in principle allowed by the stability rules, but you could try routing the default implementation (impl<T: fmt::Display + ?Sized> ToString for T {)) through a private specialization trait instead. Or try implementing it for all integer types.

@gilescope
Copy link
Contributor Author

Ok, well if the easiest thing is to make the rest faster I'm gaim with that. Double or quits. That would completely close the associated ticket.

@SkiFire13
Copy link
Contributor

If you prefer splitting the PR then you could first specialize for i32 since that's picked over the other integer types anyway.

@Mark-Simulacrum
Copy link
Member

r? @KodrAus since it sounds like there's inference breakage here so likely needs T-libs decision

The new assembly looks pretty unfortunate to me FWIW, even though it does seem better than what was there before.

@gilescope
Copy link
Contributor Author

I don't want to break anything. Will update this PR with a more comprehensive solution.

@gilescope
Copy link
Contributor Author

"The new assembly looks pretty unfortunate to me" - really interested in this. I'm just getting into understanding the asm side of things. I've found a construction that allows the String with_capacity to be set to the exact length without the vec growth asm being included but it does add an extra 20+ instructions. I guess memory versus instructions is a tradeoff. At the moment b'a'.to_string().capacity() returns 8 so whether it's 3 or the exact size it's a lot smaller in memory?

@Mark-Simulacrum
Copy link
Member

Oh, I was mainly thinking that there's several multiplications in the generated code - it feels like a u8 should be convertible to decimal without going through that overhead, but I'm not really familiar with the state of the art here.

@gilescope
Copy link
Contributor Author

gilescope commented Mar 3, 2021

I tried lots of things and couldn't get it below 65ns (The original is around 85ns for me). Then I removed everything and just returned String::with_capacity(3) - took 65ns. Alas there's no way that I can avoid that allocation so I guess we should optimise for min code size and trying to match the capacity closely.
(edit) I switched to jemalloc and it's now about 22ns. That makes it a bit easier to tell if changes are making a perf difference.

@rust-log-analyzer

This comment has been minimized.

library/alloc/src/string.rs Outdated Show resolved Hide resolved
@gilescope
Copy link
Contributor Author

gilescope commented Mar 4, 2021

Sometimes it looks like the compiler can figure out when there's no possibility of growth in a vec and avoid generating the instructions but quite often it can't tell: both these approches trigger it including the grow: https://godbolt.org/z/zxG7hK
Is that just something that's fine becaue it gets amortised over a big program or will it dump out those instructions every time it's inlined?

This is probably the best example of catching it in the act:

https://godbolt.org/z/ePvYrs

Digging into this MIR seems the same as well as LLIR so alas this seems to be in llvm that it is deciding to do something very different. (Edit: Am now wondering if the above differences are just godbolt having an off day or if it's really that different.)

library/alloc/src/string.rs Outdated Show resolved Hide resolved
library/alloc/src/string.rs Outdated Show resolved Hide resolved
library/alloc/src/string.rs Outdated Show resolved Hide resolved
library/alloc/src/string.rs Outdated Show resolved Hide resolved
library/alloc/src/string.rs Outdated Show resolved Hide resolved
After much tweaking found a way to get similar asm size as the
u8 to_string implementation.
@gilescope
Copy link
Contributor Author

Well godbolt ( https://rust.godbolt.org/z/Keb5ao ) at least seems happy with the i8 impl. It's about as good as I can get it and it's pretty fast - the compulsary alloc consumes the vast amount of the time, but with jemalloc it's over twice as fast as the old way and with the system allocator it seems to shave off 25%.

If anyone has any other sugestions of things I've not tried yet please let me know, but if not I think its ready for another review.

@the8472
Copy link
Member

the8472 commented Mar 7, 2021

Using a vec for at most 4 bytes (since it's all ascii) seems quite complicated. Have you tried operating on a local array instead and only allocating the output string once the length is known?

Copy link
Contributor

@LingMan LingMan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit like we're solving the wrong problem by hacking around compiler deficiencies. I would expect the compiler to generate decent assembly for something like this:

pub fn to_string(elf: &i8) -> String {
    let mut s = String::with_capacity(4);
    if elf.is_negative() {
        s.push('-');
    }
    let mut n = elf.unsigned_abs();
    if n >= 10 {
        if n >= 100 {
            n -= 100;
            s.push('1');
        }
        s.push((b'0' + n/10) as char);
        n %= 10;
    }
    s.push((b'0' + n) as char);
    s.shrink_to_fit();
    s
}

However, as you already noticed in #82576 (comment), it can no longer tell that the Vec won't grow once you add the fourth push. Could you open an issue for that if there isn't one already?

The question is now: Do we want to push a simpler, safe version, in the hopes that the compiler will better optimize it in the future? Or do we want to push the more complex, unsafe version and hopefully remember to replace it once the compiler can deal with the simpler one?

In any case I've left some minor comments on the current code.

library/alloc/src/string.rs Outdated Show resolved Hide resolved
library/alloc/src/string.rs Outdated Show resolved Hide resolved
@LingMan
Copy link
Contributor

LingMan commented Mar 8, 2021

Looks like the String-based version optimizes fine when MIR optimizations are disabled, which suggests that it's the same problem as #82801.
(Dropped the shrink_to_fit, since it generates quite a bit of assembly and wasn't a fair comparison.)

@LingMan
Copy link
Contributor

LingMan commented Mar 11, 2021

Just to make sure there's no misunderstanding: It's not my approval you need to land this. I'm just a random dude without any power in that regard.
Maybe update the PR's title now that it covers i8 in addition to u8. Can't comment on the stability attributes. The code looks fine to me.

That said, it would be cool if we could do this and have the compiler optimize it well:

impl ToString for i8 {
    #[inline]
    fn to_string(&self) -> String {
        let mut buf = String::with_capacity(4);
        if self.is_negative() {
            buf.push('-');
        }
        buf.push_str(&self.unsigned_abs().to_string());
        buf
    }
}

Alas, it can't do that (yet?).

@gilescope gilescope changed the title u8::to_string() specialisation (far less asm). i8 and u8::to_string() specialisation (far less asm). Mar 12, 2021
@gilescope
Copy link
Contributor Author

@LingMan thank you for all your help on this PR, it's been really appreciated! Frankly I think to_string() is technically a little broken. Ideally something like write_to_string(&mut String) would be better. That way you could do something like you suggest above nicely but as it is that would be two allocations.

@gilescope
Copy link
Contributor Author

@Mark-Simulacrum this one's ready for review at your leasure.

@crlf0710 crlf0710 added T-libs Relevant to the library team, which will review and decide on the PR/issue. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 9, 2021
@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 26, 2021
@JohnTitor
Copy link
Member

r? @Amanieu

@rust-highfive rust-highfive assigned Amanieu and unassigned KodrAus Apr 26, 2021
@gilescope
Copy link
Contributor Author

Knowing what I have recently learned I can probably shave a bit off these timings by pulling the adds a little earlier in the function. Give me an evening to revisit this.

@gilescope
Copy link
Contributor Author

No maybe not. I think that's as good as it gets.

@Amanieu
Copy link
Member

Amanieu commented Apr 27, 2021

That's great! Do you think you could implement this for other integer sizes too?

@gilescope
Copy link
Contributor Author

Idk, the way the larger integer types are currently handled makes sense - the two at a time digits makes sense - it's just at the short end that the general implemention seems overkill. Mostly I'm depressed that anything done here is dwarfed by the alloc that we can't ever avoid due to the signature. Rather I'm focusing on the other end - parsing ints which isn't bounded by allocation times.

@Amanieu
Copy link
Member

Amanieu commented May 2, 2021

@bors r+

@bors
Copy link
Contributor

bors commented May 2, 2021

📌 Commit 05330aa has been approved by Amanieu

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 2, 2021
@bors
Copy link
Contributor

bors commented May 2, 2021

⌛ Testing commit 05330aa with merge 8a8ed07...

@bors
Copy link
Contributor

bors commented May 3, 2021

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing 8a8ed07 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 3, 2021
@bors bors merged commit 8a8ed07 into rust-lang:master May 3, 2021
@rustbot rustbot added this to the 1.54.0 milestone May 3, 2021
@@ -2224,6 +2224,47 @@ impl ToString for char {
}
}

#[stable(feature = "u8_to_string_specialization", since = "1.999.0")]
Copy link
Contributor

@pickfire pickfire May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is with the since tag? Why 1.999?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed on master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, wasn't sure what to put for that. Yes 1.54 makes sense, but at the time I didn't want to presume so put something a little further out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bad performance of byte and integer to_string() vs str literal to_string()