Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vec<u8> clone in rustc 1.33.0 is 3 times slower than rustc 1.29.0 #57437

Closed
breezewish opened this issue Jan 8, 2019 · 10 comments
Closed

Vec<u8> clone in rustc 1.33.0 is 3 times slower than rustc 1.29.0 #57437

breezewish opened this issue Jan 8, 2019 · 10 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@breezewish
Copy link

breezewish commented Jan 8, 2019

Benchmark code:

#[bench]
fn bench(b: &mut test::Bencher) {
    let raw = vec![0u8; 1000];
    b.iter(|| {
        test::black_box(test::black_box(&raw).clone());
    });
}

In rustc 1.29.0-nightly (4f3c7a4 2018-07-17): 32 ns/iter (+/- 34)
In rustc 1.33.0-nightly (9eac386 2018-12-31): 127 ns/iter (+/- 45)

@killercup
Copy link
Member

Can you try this? #47745 (comment)

@killercup killercup added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jan 8, 2019
@breezewish breezewish changed the title &[u8] clone in rustc 1.33.0 is 50% slower than rustc 1.29.0 Vec<u8> clone in rustc 1.33.0 is 3 times slower than rustc 1.29.0 Jan 8, 2019
@breezewish
Copy link
Author

breezewish commented Jan 8, 2019

@killercup

Hi, I tried with the following profile:

[profile.bench]
lto = false
opt-level = 3
debug = true
codegen-units = 1

[profile.release]
lto = false
opt-level = 3
debug = true
codegen-units = 1

and

[profile.bench]
lto = true
opt-level = 3
debug = true
codegen-units = 1

[profile.release]
lto = true
opt-level = 3
debug = true
codegen-units = 1

the outcome is similar.

@ollie27
Copy link
Member

ollie27 commented Jan 8, 2019

I'd guess this is due to #55238. You could try using jemallocator to confirm.

@brson
Copy link
Contributor

brson commented Jan 8, 2019

@ollie27 Oh very good guess. That would explain a lot. I do think that @breeswish's benchmarks are running against system malloc in the 'after' run. We are running at least one set of benchmarks with jemalloc both before/after: https://gist.github.com/brson/13586d9f12f3af5c8377628c3d0f12d0#file-benchcmp-tikv and have seen regressions there too, but not investigated.

We'll fix our side to make sure we are comparing jemalloc to jemalloc then see how our benchmarks look.

@brson
Copy link
Contributor

brson commented Jan 9, 2019

What I reported yesterday about not comparing allocator to allocator looks to be incorrect. @breeswish's benchmarks may have been using the same jemalloc. Still investigating.

@mati865
Copy link
Contributor

mati865 commented Jan 10, 2019

What is your system?

Since switch system allocator I'm seeing small performance increase on 3 systems with glibc 2.28 (Arch Linux, Fedora and Ubuntu).

With your benchmark I was getting results so close they weren't reliable.
These are results with let raw = vec![0u8; 1000000];:

$ cargo +nightly-2018-07-17 bench
[...]
test bench ... bench:      19,454 ns/iter (+/- 241)

$ cargo +nightly-2018-07-17 bench
[...]
test bench ... bench:      19,422 ns/iter (+/- 207)

$ cargo +nightly-2018-12-31 bench
[...]
test bench ... bench:      19,378 ns/iter (+/- 2,560)

$ cargo +nightly-2018-12-31 bench
[...]
test bench ... bench:      19,374 ns/iter (+/- 422)

$ cargo +nightly bench           
[...]
test bench ... bench:      19,352 ns/iter (+/- 7,552)

$ cargo +nightly bench
[...]
test bench ... bench:      19,342 ns/iter (+/- 7,586)

@breezewish
Copy link
Author

breezewish commented Jan 10, 2019

Hi @mati865 My OS is MacOS 10.12.6. I will try again with jemalloc linked. During that, you may first view a result powered by Travis CI (although it may not be very stable, but still referable): https://travis-ci.com/breeswish/vec_clone_play

@mati865
Copy link
Contributor

mati865 commented Jan 10, 2019

@breeswish I don't use macOS so I cannot speak for it but for such old Linux distributions jemallocator should fix the performance.

@brson
Copy link
Contributor

brson commented Jan 23, 2019

After further investigation, there indeed wasn't a problem with Vec<u8>, so this can be closed.

@breezewish
Copy link
Author

I forced to use jemalloc and discovered that there is no notable difference in the case reported by this issue. So closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

5 participants