MIR 4x slower #33828

MagaTailor · 2016-05-23T21:42:22Z

Running the benchmark suite from ethcore/parity compiled with a recent nightly, produces mostly equal results, except for one type of benchmark which seems to be hit:

MIR:

test bench_stream_1000_empty_lists   ... bench:     458,104 ns/iter (+/- 21,291)
test bench_stream_nested_empty_lists ... bench:      16,640 ns/iter (+/- 500)
test bench_stream_u256_value         ... bench:      26,468 ns/iter (+/- 353)
test bench_stream_u64_value          ... bench:      13,893 ns/iter (+/- 272)

Ye Olde:

test bench_stream_1000_empty_lists   ... bench:      91,131 ns/iter (+/- 3,154)
test bench_stream_nested_empty_lists ... bench:       4,702 ns/iter (+/- 160)
test bench_stream_u256_value         ... bench:       6,478 ns/iter (+/- 383)
test bench_stream_u64_value          ... bench:       3,047 ns/iter (+/- 132)

Found on ARM Linux but hopefully not exclusive to that platform. Not sure if MIR should be held accountable at this early stage but here it is.

The text was updated successfully, but these errors were encountered:

alexcrichton · 2016-05-23T22:16:14Z

cc @rust-lang/compiler

nikomatsakis · 2016-05-23T22:22:40Z

@petevine great, thanks! I want to get more organized about MIR benchmarks for runtime in particular.

nagisa · 2016-05-23T23:10:26Z

x86_64:

Orbit

test bench_stream_1000_empty_lists   ... bench:      26,937 ns/iter (+/- 3,985)
test bench_stream_nested_empty_lists ... bench:         842 ns/iter (+/- 131)
test bench_stream_u256_value         ... bench:       1,405 ns/iter (+/- 54)
test bench_stream_u64_value          ... bench:         582 ns/iter (+/- 19)

time to run ~/.cargo/bin/rustup run nightly cargo bench -p ethcore-util --bench rlp -j1
99% (600.02 real, 13.73 kernel, 581.89 user); 468572k resident

Plain old

test bench_stream_1000_empty_lists   ... bench:      11,147 ns/iter (+/- 362)
test bench_stream_nested_empty_lists ... bench:         278 ns/iter (+/- 247)
test bench_stream_u256_value         ... bench:         609 ns/iter (+/- 40)
test bench_stream_u64_value          ... bench:         154 ns/iter (+/- 13)

time to run ~/.cargo/bin/rustup run nightly cargo bench -p ethcore-util --bench rlp -j1
99% (563.41 real, 14.16 kernel, 544.92 user); 454708k resident

While the difference is not as big it is still very noticeable.

nagisa · 2016-06-05T18:21:27Z

More recent results:

test bench_stream_1000_empty_lists   ... bench:      23,499 ns/iter (+/- 1,156)
test bench_stream_nested_empty_lists ... bench:         748 ns/iter (+/- 30)
test bench_stream_u256_value         ... bench:       1,294 ns/iter (+/- 45)
test bench_stream_u64_value          ... bench:         563 ns/iter (+/- 16)

99% (527.78 real, 28.33 kernel, 497.98 user); 467064k resident

Thus, a small imporovement, but still a big regression. Waiting for the nightly with static drops to see if and how much that helps.

eddyb · 2016-06-05T18:22:51Z

@nagisa Suppose I can try with my build of master.

eddyb · 2016-06-05T18:40:21Z

I get the following results with -Z orbit:

test bench_stream_1000_empty_lists   ... bench:      11,624 ns/iter (+/- 632)
test bench_stream_nested_empty_lists ... bench:         398 ns/iter (+/- 4)
test bench_stream_u256_value         ... bench:         631 ns/iter (+/- 5)
test bench_stream_u64_value          ... bench:         240 ns/iter (+/- 2)

EDIT: With old trans:

test bench_stream_1000_empty_lists   ... bench:      11,544 ns/iter (+/- 88)
test bench_stream_nested_empty_lists ... bench:         486 ns/iter (+/- 4)
test bench_stream_u256_value         ... bench:         731 ns/iter (+/- 22)
test bench_stream_u64_value          ... bench:         305 ns/iter (+/- 3)

alexbool · 2016-06-05T18:46:53Z

@eddyb It would be also nice to see the results of old trans on your hardware

eddyb · 2016-06-05T18:49:58Z

@alexbool Yes, I just added those, they hadn't finished yet.

EDIT: Wait, I might not have done that right, let me make sure.
EDIT2: Phew, for a second there I thought I was on #34096. Results are valid.

alexbool · 2016-06-05T18:54:38Z

@eddyb thanks, looks extremely promising

nagisa · 2016-06-05T19:06:01Z

These are my results for -Z orbit (the non-orbit are above) on master:

test bench_stream_1000_empty_lists   ... bench:       9,954 ns/iter (+/- 626)
test bench_stream_nested_empty_lists ... bench:         267 ns/iter (+/- 17)
test bench_stream_u256_value         ... bench:         574 ns/iter (+/- 52)
test bench_stream_u64_value          ... bench:         163 ns/iter (+/- 19)

99% (454.66 real, 8.66 kernel, 443.18 user); 437932k resident

thus a considerable speed-up in both run and compile times. Considered resolved, thus closing.

MagaTailor · 2016-08-04T17:14:57Z

A quick heads-up, Ye Olde strikes back in bigint! (actually MIR trans regresses, for reference see the final part of this comment)

name                     gcc6-llvm3.9-mir ns/iter       gcc6-llvm3.9-old ns/iter                    diff ns/iter   diff %
u128_mul                         1,046,807                     629,104                               -417,703  -39.90%
u256_add                         1,186,208                     628,804                               -557,404  -46.99%
u256_full_mul                    21,829,551                    21,381,247                            -448,304   -2.05%
u256_mul                         1,699,611                     1,198,108                             -501,503  -29.51%
u256_sub                         1,186,208                     628,804                               -557,404  -46.99%
u512_add                         1,015,707                     1,016,607                                  900    0.09%
u512_sub                         1,066,007                     1,066,007                                    0    0.00%

nagisa · 2016-08-04T17:19:04Z

Fill a new issue, please.

huonw added I-slow Issue: Problems and improvements with respect to performance of generated code. A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html labels May 23, 2016

alexbool mentioned this issue Jun 5, 2016

Switch to MIR-based translation by default. #34096

Merged

nagisa closed this as completed Jun 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIR 4x slower #33828

MIR 4x slower #33828

MagaTailor commented May 23, 2016

alexcrichton commented May 23, 2016

nikomatsakis commented May 23, 2016

nagisa commented May 23, 2016

nagisa commented Jun 5, 2016

eddyb commented Jun 5, 2016

eddyb commented Jun 5, 2016 •

edited

Loading

alexbool commented Jun 5, 2016

eddyb commented Jun 5, 2016 •

edited

Loading

alexbool commented Jun 5, 2016

nagisa commented Jun 5, 2016 •

edited

Loading

MagaTailor commented Aug 4, 2016

nagisa commented Aug 4, 2016

MIR 4x slower #33828

MIR 4x slower #33828

Comments

MagaTailor commented May 23, 2016

alexcrichton commented May 23, 2016

nikomatsakis commented May 23, 2016

nagisa commented May 23, 2016

Orbit

Plain old

nagisa commented Jun 5, 2016

eddyb commented Jun 5, 2016

eddyb commented Jun 5, 2016 • edited Loading

alexbool commented Jun 5, 2016

eddyb commented Jun 5, 2016 • edited Loading

alexbool commented Jun 5, 2016

nagisa commented Jun 5, 2016 • edited Loading

MagaTailor commented Aug 4, 2016

nagisa commented Aug 4, 2016

eddyb commented Jun 5, 2016 •

edited

Loading

eddyb commented Jun 5, 2016 •

edited

Loading

nagisa commented Jun 5, 2016 •

edited

Loading