Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change twice used large const table to static #82676

Merged
merged 1 commit into from
Mar 2, 2021
Merged

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Mar 1, 2021

This table is used twice in core::num::dec2flt::algorithm::power_of_ten. According to the semantics of const, a separate huge definition of the table is inlined at both places.

fn power_of_ten(e: i16) -> Fp {
assert!(e >= table::MIN_E);
let i = e - table::MIN_E;
let sig = table::POWERS.0[i as usize];
let exp = table::POWERS.1[i as usize];
Fp { f: sig, e: exp }
}

Theoretically this gets cleaned up by optimization passes, but in practice I am experiencing a miscompile from LTO on this code. Making the table a static, which would only be defined a single time and not require attention from LTO, eliminates the miscompile and seems semantically more appropriate anyway. A separate bug report on the LTO bug is forthcoming.

Original addition of const is from #27307.

This table is used twice in core::num::dec2flt::algorithm::power_of_ten.
According to the semantics of const, a separate huge definition of the
table is inlined at both places.

    fn power_of_ten(e: i16) -> Fp {
        assert!(e >= table::MIN_E);
        let i = e - table::MIN_E;
        let sig = table::POWERS.0[i as usize];
        let exp = table::POWERS.1[i as usize];
        Fp { f: sig, e: exp }
    }

Theoretically this gets cleaned up by optimization passes, but in
practice I am experiencing a miscompile from LTO on this code. Making
the table a static, which would only be defined a single time and not
require attention from LTO, eliminates the miscompile and seems
semantically more appropriate anyway. A separate bug report on the LTO
bug is forthcoming.
@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 1, 2021
@Mark-Simulacrum
Copy link
Member

I wouldn't expect any significant difference unless there's some kind of const propagation being done by LLVM today, but have you checked disassembly or run some benchmarks? We could run rustc-perf but it's not really a float parsing workload, but can't hurt either - @bors try @rust-timer queue

Otherwise this seems fine to me. An alternative could be splitting the constant in two so that there's just one use per "part", but doesn't feel obviously better.

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 1, 2021
@bors
Copy link
Contributor

bors commented Mar 1, 2021

⌛ Trying commit bd51dea with merge 015314cb764d47a719b5c1c60c0f12101e8c0cc5...

@dtolnay
Copy link
Member Author

dtolnay commented Mar 1, 2021

@Mark-Simulacrum yeah here's the disassembly:

Before

core::num::dec2flt::algorithm::power_of_ten::hf417b425dc5ff8a8:
  0xd839f00 <+0>:  pushq  %rax
  0xd839f01 <+1>:  movl   %edi, %eax
  0xd839f03 <+3>:  movswl %ax, %ecx
  0xd839f06 <+6>:  cmpl   $0xfffffecf, %ecx
  0xd839f0c <+12>: jl     0xd839f36              ; <+54> at algorithm.rs:17:5
  0xd839f0e <+14>: addl   $0x131, %eax
  0xd839f13 <+19>: movswq %ax, %rdi
  0xd839f17 <+23>: movzwl %ax, %eax
  0xd839f1a <+26>: cmpl   $0x263, %eax
  0xd839f1f <+31>: jae    0xd839f51              ; <+81> at algorithm.rs:19:15
  0xd839f21 <+33>: leaq   0x83bc830(%rip), %rcx
  0xd839f28 <+40>: movq   (%rcx,%rdi,8), %rax
  0xd839f2c <+44>: movzwl 0x1318(%rcx,%rdi,2), %edx
  0xd839f34 <+52>: popq   %rcx
  0xd839f35 <+53>: retq
  0xd839f36 <+54>: leaq   0x83bc7cc(%rip), %rdi
  0xd839f3d <+61>: leaq   0x65ab04c(%rip), %rdx
  0xd839f44 <+68>: movl   $0x23, %esi
  0xd839f49 <+73>: callq  *0x65ae839(%rip)
  0xd839f4f <+79>: ud2
  0xd839f51 <+81>: leaq   0x65ab050(%rip), %rdx
  0xd839f58 <+88>: movl   $0x263, %esi
  0xd839f5d <+93>: callq  *0x65ae81d(%rip)
  0xd839f63 <+99>: ud2

Expected table address in this assembly is 0xd839f21 <+33>: leaq 0x83bc830(%rip), %rcx = 0x15bf6758. The data at that address is something totally different. The actual placement of the expected data in the process's memory is at 0x12b6de68.

$ (lldb) x/4xg 0x83bc830+0xd839f28
0x15bf6758: 0xf7c45042f7c45042 0xf7c45042f7c45042
0x15bf6768: 0xf7c45042f7c45042 0xf7c45042f7c45042

$ (lldb) memory find -c2 -e 0xe0b62e2929aba83c -- 0x00400000 0x20000000
data found at location: 0x12b6de68
0x12b6de68: 3c a8 ab 29 29 2e b6 e0 26 49 0b ba d9 dc 71 8c
0x12b6de78: 6f 1b 8e 28 10 54 8e af 4b a2 b1 32 14 e9 71 db
no more matches within the range.

After

core::num::dec2flt::algorithm::power_of_ten::hf417b425dc5ff8a8:
  0xd839ec0 <+0>:  pushq  %rax
  0xd839ec1 <+1>:  movl   %edi, %eax
  0xd839ec3 <+3>:  movswl %ax, %ecx
  0xd839ec6 <+6>:  cmpl   $0xfffffecf, %ecx
  0xd839ecc <+12>: jl     0xd839ef6              ; <+54> at algorithm.rs:17:5
  0xd839ece <+14>: addl   $0x131, %eax
  0xd839ed3 <+19>: movswq %ax, %rdi
  0xd839ed7 <+23>: movzwl %ax, %eax
  0xd839eda <+26>: cmpl   $0x263, %eax
  0xd839edf <+31>: jae    0xd839f11              ; <+81> at algorithm.rs:19:15
  0xd839ee1 <+33>: leaq   0x53409c0(%rip), %rcx  ; core::num::dec2flt::table::POWERS::hf668a1d7d9ba717f
  0xd839ee8 <+40>: movq   (%rcx,%rdi,8), %rax
  0xd839eec <+44>: movzwl 0x1318(%rcx,%rdi,2), %edx
  0xd839ef4 <+52>: popq   %rcx
  0xd839ef5 <+53>: retq
  0xd839ef6 <+54>: leaq   0x83bc80c(%rip), %rdi
  0xd839efd <+61>: leaq   0x65ab08c(%rip), %rdx
  0xd839f04 <+68>: movl   $0x23, %esi
  0xd839f09 <+73>: callq  *0x65ae879(%rip)
  0xd839f0f <+79>: ud2
  0xd839f11 <+81>: leaq   0x65ab090(%rip), %rdx
  0xd839f18 <+88>: movl   $0x263, %esi
  0xd839f1d <+93>: callq  *0x65ae85d(%rip)
  0xd839f23 <+99>: ud2

Expected table address: 0xd839ee1 <+33>: leaq 0x53409c0(%rip), %rcx = 0x12b7a8a8. Indeed the table is correctly at that address:

$ (lldb) x/4xg 0x53409c0+0xd839ee8
0x12b7a8a8: 0xe0b62e2929aba83c 0x8c71dcd9ba0b4926
0x12b7a8b8: 0xaf8e5410288e1b6f 0xdb71e91432b1a24b

pub const POWERS: ([u64; 611], [i16; 611]) = (
[
0xe0b62e2929aba83c,
0x8c71dcd9ba0b4926,
0xaf8e5410288e1b6f,
0xdb71e91432b1a24b,

@leonardo-m
Copy link

Is it worth adding a warning in rustc for such situations?

@Mark-Simulacrum
Copy link
Member

Looks good to me. Presuming perf doesn't show anything surprising, r=me

@dtolnay
Copy link
Member Author

dtolnay commented Mar 1, 2021

I would be shocked if there is a perf difference, since power_of_ten is exactly the same machine instructions either way. The offset of the table in memory relative to power_of_ten could have cache effects but I wouldn't expect perf to be representative when it comes to those.

@Mark-Simulacrum
Copy link
Member

Yeah, I would also be shocked. If it's urgent feel free to approve before then, but I figured a few hours wouldn't really matter.

@bors
Copy link
Contributor

bors commented Mar 1, 2021

☀️ Try build successful - checks-actions
Build commit: 015314cb764d47a719b5c1c60c0f12101e8c0cc5 (015314cb764d47a719b5c1c60c0f12101e8c0cc5)

@rust-timer
Copy link
Collaborator

Queued 015314cb764d47a719b5c1c60c0f12101e8c0cc5 with parent 5233edc, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (015314cb764d47a719b5c1c60c0f12101e8c0cc5): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 1, 2021
@dtolnay
Copy link
Member Author

dtolnay commented Mar 1, 2021

@bors r=Mark-Simulacrum rollup

@bors
Copy link
Contributor

bors commented Mar 1, 2021

📌 Commit bd51dea has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 1, 2021
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 2, 2021
…laumeGomez

Rollup of 7 pull requests

Successful merges:

 - rust-lang#80734 (check that first arg to `panic!()` in const is `&str`)
 - rust-lang#81932 (Always compile rustdoc with debug logging enabled when `download-rustc` is set)
 - rust-lang#82018 (Remove the dummy cache in `DocContext`; delete RenderInfo)
 - rust-lang#82598 (Check stability and feature attributes in rustdoc)
 - rust-lang#82655 (Highlight identifier span instead of whole pattern span in `unused` lint)
 - rust-lang#82662 (Warn about unknown doc attributes)
 - rust-lang#82676 (Change twice used large const table to static)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 9a0ac7c into rust-lang:master Mar 2, 2021
@rustbot rustbot added this to the 1.52.0 milestone Mar 2, 2021
@dtolnay dtolnay deleted the powers branch March 9, 2021 06:15
@dtolnay dtolnay added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants