Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trans: Avoid weak linkage for closures when linking with MinGW. #34830

Merged
merged 2 commits into from
Aug 1, 2016

Conversation

michaelwoerister
Copy link
Member

@michaelwoerister michaelwoerister commented Jul 14, 2016

This PR proposes one possible solution to #34793, the problem that prevents servo/servo#12393 from landing. It applies the same strategy, that we already use for monomorphizations, to closures, that is, instead of emitting symbols with weak_odr linkage in order to avoid symbol conflicts, we emit them with internal linkage, with the side effect that we have to copy code instead of just linking to it, if more than one codegen unit is involved.
With this PR, the compiler will only apply this strategy for targets where we would actually run into a problem when using weak_odr linkage, in other words nothing will change for platforms except for MinGW.

The solution implemented here has one restriction that could be lifted with some more effort, but it does not seem to be worth the trouble since it will go away once we use only MIR-trans: If someone compiles code

  1. on MinGW,
  2. with more than one codegen unit,
  3. not using MIR-trans,
  4. and runs into a closure inlined from another crate

then the compiler will abort and suggest to compile either with just one codegen unit or -Zorbit.

What's nice about this is that I lays a foundation for also doing the same for generics: using weak linkage where possible and thus enabling some more space optimizations that the linker can do.

This PR also contains a test case for compiling a program that contains more than 2^15 closures. It's a huge, generated file with almost 100K LOCs. I did not commit the script for generating the file but could do so. Alternatively, maybe someone wants to come up with a way of doing this with macros.
The test file is implemented via macros now (thanks @alexcrichton!)

Opinions?

Fixes #34793.

cc @rust-lang/compiler

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member

I'd personally prefer to avoid a 3.2MB test case if possible, but I can definitely see how it'd be quite nice to have a regression test for this. Could something like this suffice though for generating a ton of closures?

macro_rules! a {                   
    ($mac:ident) => ($mac!());     
    ($mac:ident 1 $($t:tt)*) => (  
        a!($mac $($t)*);           
        a!($mac $($t)*);           
    )                              
}                                  

fn main() {                        
    macro_rules! doit {            
        () => (let _x = |a: u32| a + 4;)
    }                              
    a!(doit 1 1 1 1 1 1 1 1 1 1);  
}                                  

That should generate 1k closures on the stack I believe, although do the closure bodies have to all be different?

@michaelwoerister
Copy link
Member Author

michaelwoerister commented Jul 14, 2016

I'd personally prefer to avoid a 3.2MB test case if possible

What? Every self-respecting project of a certain size has to have one of those!

That should generate 1k closures on the stack I believe

Cool! I'll try to adapt it so it doesn't generate all closures in the same function. I've tried that before and it was really bad for compile times.

@michaelwoerister
Copy link
Member Author

OK, I've updated the test case. Much better now :)

@alexcrichton
Copy link
Member

🏋

@nagisa
Copy link
Member

nagisa commented Jul 15, 2016

Is this symbol count restriction per link-session or per-object/crate/library?

@eddyb
Copy link
Member

eddyb commented Jul 15, 2016

@michaelwoerister You should update the PR description as well (about the testcase generation).

@michaelwoerister
Copy link
Member Author

@nagisa
It can occur in many different situations, I guess, but per-object is probably the most accurate categorization. The problem is that the linker interprets section numbers as signed 16-bit integers (negative values are invalid except for -1 and -2) when really they are signed 32-bit integers. So when it tries to read a symbol reference where the 16 bits it reads for the section represent a negative number, it will fail.
So yes, the restriction applies at least per input object.

@michaelwoerister
Copy link
Member Author

@eddyb Done.

@nagisa
Copy link
Member

nagisa commented Jul 15, 2016

It can occur in many different situations, I guess, but per-object is probably the most accurate categorization. The problem is that the linker interprets section numbers as signed 16-bit integers (negative values are invalid except for -1 and -2) when really they are signed 32-bit integers. So when it tries to read a symbol reference where the 16 bits it reads for the section represent a negative number, it will fail.
So yes, the restriction applies at least per input object.

I’m figuring that if we wanted to go crazy, we could, then, keep a counter per-object which would assign weak_odr linkage for first 2^16 candidates and then make everything else internal, but I realised it is way too much effort for what it provides.

@michaelwoerister
Copy link
Member Author

I’m figuring that if we wanted to go crazy, we could, then, keep a counter per-object which would assign weak_odr linkage for first 2^16 candidates and then make everything else internal, but I realised it is way too much effort for what it provides.

Yeah, that might prevent the problem at the input level, but then we would probably run into the restriction for the output binary because we are asking ld to generate a file with more than 2^15 sections.

bors added a commit that referenced this pull request Jul 18, 2016
…=eddyb

Run base::internalize_symbols() even for single-codegen-unit crates.

The initial linkage-assignment (especially for closures) is a conservative one that makes some symbols more visible than they need to be. While this is not a correctness problem, it does force the LLVM inliner to be more conservative too, which results in poor performance. Once translation is based solely on MIR, it will be easier to also make the initial linkage assignment a better fitting one. Until then `internalize_symbols()` does a good job of preventing most performance regressions.

This should solve the regressions reported in #34891 and maybe also those in #34831.

As a side-effect, this will also solve most of the problematic cases described in #34793. Not reliably so, however. For that, we still need a solution like the one implement in #34830.

cc @rust-lang/compiler
@nikomatsakis
Copy link
Contributor

I am OK with this setup.

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Jul 29, 2016

📌 Commit d5ef24e has been approved by nikomatsakis

@bors
Copy link
Contributor

bors commented Jul 29, 2016

⌛ Testing commit d5ef24e with merge 2fded70...

@bors
Copy link
Contributor

bors commented Jul 29, 2016

💔 Test failed - auto-win-msvc-64-opt-rustbuild

@retep998
Copy link
Member

sh: ../configure: No such file or directory

Well that doesn't sound very good.

@alexcrichton
Copy link
Member

@bors: retry

@bors
Copy link
Contributor

bors commented Jul 29, 2016

⌛ Testing commit d5ef24e with merge 3a651d8...

@bors
Copy link
Contributor

bors commented Jul 29, 2016

💔 Test failed - auto-win-msvc-64-opt

@michaelwoerister
Copy link
Member Author

@bors r=nikomatsakis

This was a legitimate failure of a test case that had been added after the PR was made. Should be fixed.

@bors
Copy link
Contributor

bors commented Aug 1, 2016

📌 Commit eaea4ac has been approved by nikomatsakis

@bors
Copy link
Contributor

bors commented Aug 1, 2016

⌛ Testing commit eaea4ac with merge 5ef1e7e...

bors added a commit that referenced this pull request Aug 1, 2016
…akis

trans: Avoid weak linkage for closures when linking with MinGW.

This PR proposes one possible solution to #34793, the problem that prevents servo/servo#12393 from landing. It applies the same strategy, that we already use for monomorphizations, to closures, that is, instead of emitting symbols with `weak_odr` linkage in order to avoid symbol conflicts, we emit them with `internal` linkage, with the side effect that we have to copy code instead of just linking to it, if more than one codegen unit is involved.
With this PR, the compiler will only apply this strategy for targets where we would actually run into a problem when using `weak_odr` linkage, in other words nothing will change for platforms except for MinGW.

The solution implemented here has one restriction that could be lifted with some more effort, but it does not seem to be worth the trouble since it will go away once we use only MIR-trans: If someone compiles code

1. on MinGW,
2. with more than one codegen unit,
3. *not* using MIR-trans,
4. and runs into a closure inlined from another crate

then the compiler will abort and suggest to compile either with just one codegen unit or `-Zorbit`.

What's nice about this is that I lays a foundation for also doing the same for generics: using weak linkage where possible and thus enabling some more space optimizations that the linker can do.

~~This PR also contains a test case for compiling a program that contains more than 2^15 closures. It's a huge, generated file with almost 100K LOCs. I did not commit the script for generating the file but could do so. Alternatively, maybe someone wants to come up with a way of doing this with macros.~~
The test file is implemented via macros now (thanks @alexcrichton!)

Opinions?

Fixes #34793.

cc @rust-lang/compiler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants