Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIR 4x slower #33828

Closed
MagaTailor opened this issue May 23, 2016 · 12 comments
Closed

MIR 4x slower #33828

MagaTailor opened this issue May 23, 2016 · 12 comments
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@MagaTailor
Copy link

Running the benchmark suite from ethcore/parity compiled with a recent nightly, produces mostly equal results, except for one type of benchmark which seems to be hit:

MIR:

test bench_stream_1000_empty_lists   ... bench:     458,104 ns/iter (+/- 21,291)
test bench_stream_nested_empty_lists ... bench:      16,640 ns/iter (+/- 500)
test bench_stream_u256_value         ... bench:      26,468 ns/iter (+/- 353)
test bench_stream_u64_value          ... bench:      13,893 ns/iter (+/- 272)

Ye Olde:

test bench_stream_1000_empty_lists   ... bench:      91,131 ns/iter (+/- 3,154)
test bench_stream_nested_empty_lists ... bench:       4,702 ns/iter (+/- 160)
test bench_stream_u256_value         ... bench:       6,478 ns/iter (+/- 383)
test bench_stream_u64_value          ... bench:       3,047 ns/iter (+/- 132)

Found on ARM Linux but hopefully not exclusive to that platform. Not sure if MIR should be held accountable at this early stage but here it is.

@huonw huonw added I-slow Issue: Problems and improvements with respect to performance of generated code. A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html labels May 23, 2016
@alexcrichton
Copy link
Member

cc @rust-lang/compiler

@nikomatsakis
Copy link
Contributor

@petevine great, thanks! I want to get more organized about MIR benchmarks for runtime in particular.

@nagisa
Copy link
Member

nagisa commented May 23, 2016

x86_64:

Orbit

test bench_stream_1000_empty_lists   ... bench:      26,937 ns/iter (+/- 3,985)
test bench_stream_nested_empty_lists ... bench:         842 ns/iter (+/- 131)
test bench_stream_u256_value         ... bench:       1,405 ns/iter (+/- 54)
test bench_stream_u64_value          ... bench:         582 ns/iter (+/- 19)

time to run ~/.cargo/bin/rustup run nightly cargo bench -p ethcore-util --bench rlp -j1
99% (600.02 real, 13.73 kernel, 581.89 user); 468572k resident

Plain old

test bench_stream_1000_empty_lists   ... bench:      11,147 ns/iter (+/- 362)
test bench_stream_nested_empty_lists ... bench:         278 ns/iter (+/- 247)
test bench_stream_u256_value         ... bench:         609 ns/iter (+/- 40)
test bench_stream_u64_value          ... bench:         154 ns/iter (+/- 13)

time to run ~/.cargo/bin/rustup run nightly cargo bench -p ethcore-util --bench rlp -j1
99% (563.41 real, 14.16 kernel, 544.92 user); 454708k resident

While the difference is not as big it is still very noticeable.

@nagisa
Copy link
Member

nagisa commented Jun 5, 2016

More recent results:

test bench_stream_1000_empty_lists   ... bench:      23,499 ns/iter (+/- 1,156)
test bench_stream_nested_empty_lists ... bench:         748 ns/iter (+/- 30)
test bench_stream_u256_value         ... bench:       1,294 ns/iter (+/- 45)
test bench_stream_u64_value          ... bench:         563 ns/iter (+/- 16)

99% (527.78 real, 28.33 kernel, 497.98 user); 467064k resident

Thus, a small imporovement, but still a big regression. Waiting for the nightly with static drops to see if and how much that helps.

@eddyb
Copy link
Member

eddyb commented Jun 5, 2016

@nagisa Suppose I can try with my build of master.

@eddyb
Copy link
Member

eddyb commented Jun 5, 2016

I get the following results with -Z orbit:

test bench_stream_1000_empty_lists   ... bench:      11,624 ns/iter (+/- 632)
test bench_stream_nested_empty_lists ... bench:         398 ns/iter (+/- 4)
test bench_stream_u256_value         ... bench:         631 ns/iter (+/- 5)
test bench_stream_u64_value          ... bench:         240 ns/iter (+/- 2)

EDIT: With old trans:

test bench_stream_1000_empty_lists   ... bench:      11,544 ns/iter (+/- 88)
test bench_stream_nested_empty_lists ... bench:         486 ns/iter (+/- 4)
test bench_stream_u256_value         ... bench:         731 ns/iter (+/- 22)
test bench_stream_u64_value          ... bench:         305 ns/iter (+/- 3)

@alexbool
Copy link
Contributor

alexbool commented Jun 5, 2016

@eddyb It would be also nice to see the results of old trans on your hardware

@eddyb
Copy link
Member

eddyb commented Jun 5, 2016

@alexbool Yes, I just added those, they hadn't finished yet.

EDIT: Wait, I might not have done that right, let me make sure.
EDIT2: Phew, for a second there I thought I was on #34096. Results are valid.

@alexbool
Copy link
Contributor

alexbool commented Jun 5, 2016

@eddyb thanks, looks extremely promising

@nagisa
Copy link
Member

nagisa commented Jun 5, 2016

These are my results for -Z orbit (the non-orbit are above) on master:

test bench_stream_1000_empty_lists   ... bench:       9,954 ns/iter (+/- 626)
test bench_stream_nested_empty_lists ... bench:         267 ns/iter (+/- 17)
test bench_stream_u256_value         ... bench:         574 ns/iter (+/- 52)
test bench_stream_u64_value          ... bench:         163 ns/iter (+/- 19)

99% (454.66 real, 8.66 kernel, 443.18 user); 437932k resident

thus a considerable speed-up in both run and compile times. Considered resolved, thus closing.

@nagisa nagisa closed this as completed Jun 5, 2016
@MagaTailor
Copy link
Author

A quick heads-up, Ye Olde strikes back in bigint! (actually MIR trans regresses, for reference see the final part of this comment)

name                     gcc6-llvm3.9-mir ns/iter       gcc6-llvm3.9-old ns/iter                    diff ns/iter   diff %
u128_mul                         1,046,807                     629,104                               -417,703  -39.90%
u256_add                         1,186,208                     628,804                               -557,404  -46.99%
u256_full_mul                    21,829,551                    21,381,247                            -448,304   -2.05%
u256_mul                         1,699,611                     1,198,108                             -501,503  -29.51%
u256_sub                         1,186,208                     628,804                               -557,404  -46.99%
u512_add                         1,015,707                     1,016,607                                  900    0.09%
u512_sub                         1,066,007                     1,066,007                                    0    0.00%

@nagisa
Copy link
Member

nagisa commented Aug 4, 2016

Fill a new issue, please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

7 participants