Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Try to ensure we don’t emit duplicate symbols #22811

Closed
wants to merge 4 commits into from

Conversation

nagisa
Copy link
Member

@nagisa nagisa commented Feb 25, 2015

Ok, this was a hard one, and the description will also be somewhat long. We provide tools to tell what exact symbols to emit for any fn or static, but don’t quite check if that won’t cause any issues later on. Some of the issues include LLVM mangling our names again and our names pointing to wrong locations, us generating dumb foreign call wrappers, linker errors, extern functions resolving to different symbols altogether (extern {fn fail();} fail(); in some cases calling fail1()), etc.

Before the commit we had a function called note_unique_llvm_symbol, so it is clear somebody was aware of the issue at some point, but the function was barely used, mostly in irrelevant locations.

I published this PR mainly as RFC, because I’m not sure about approach taken here, and want to hear people’s involved with trans thoughts in general.

P.S. This is [breaking-change] because it makes dumbly written code (such as issue-15562.rs test) properly invalid.
P.S.S. this in its current form fixes all those issues about #[no_mangle] on trait methods.
P.S.S.S. were this PR received positively, I’d like to clean it up some before merge.

@rust-highfive
Copy link
Collaborator

r? @pcwalton

(rust_highfive has picked a reviewer for you, use r? to override)

@huonw
Copy link
Member

huonw commented Feb 25, 2015

I vaguely recall parallel codegen was broken due to duplicate symbols or something, maybe this helps with that? (cc @nick29581)

(Sorry for vagueness.)

@nrc
Copy link
Member

nrc commented Feb 26, 2015

cc @epdtry - was that the problem with parallel codegen? Does this help?

@nagisa
Copy link
Member Author

nagisa commented Feb 26, 2015

It probably does not, since the compiler still can pass the checks for the same symbol multiple times and then register all of them simultaneously. It would need locking in trans to fix that, AFAICT.

@nagisa
Copy link
Member Author

nagisa commented Feb 26, 2015

Hmm, probably checking for a symbol and registering one with LLVM at the same time (with a global lock, probably) and then using the registered ValueRef everywhere in trans would be a better choice overall. That would make parallel codegen correct as well at least wrt symbol clashes.

@nagisa
Copy link
Member Author

nagisa commented Feb 26, 2015

OK, I think I got what the issue with parallel codegen was and it is, on the retrospective, very obvious. The thing is that we use codegen-units number of LLVM contexts during trans and those LLVM contexts are mostly isolated from each other during the trans and optimisation phases. Objects produced by those contexts are merged during linking which either fails or does nothing sensible in presence of duplicate symbols.

@alexcrichton
Copy link
Member

I vaguely recall parallel codegen was broken due to duplicate symbols or something

The specific issue in question I believe was #18243, but I don't think that a small reproduction was ever generated. I do think, however, that -C codegen-units=N is broken today on master so this might enable it to work in stage1+. It also depends on what linker you're using I think... (fun times!)

Hmm, probably checking for a symbol and registering one with LLVM at the same time (with a global lock, probably) and then using the registered ValueRef everywhere in trans would be a better choice overall. That would make parallel codegen correct as well at least wrt symbol clashes.

Note that the parallel part of parallel codegen is only the LLVM passes. We continue to translate the entire AST serially, the strategy was just to round-robin translations into one of N LLVM contexts. Each context is then independently serialize.

Along those lines it should be possible to have a shared global context (I think this already exists?) which keeps track of symbol names with unsynchronized access (as all translation is done serially).

Objects produced by those contexts are merged during linking which either fails or does nothing sensible in presence of duplicate symbols.

I believe this is indeed the problem! In theory though we should never be producing duplicate symbols. If I remember correctly it was one of our own compiler-generated symbols, not a user-input symbols (so this may not help too too much).


I published this PR mainly as RFC, because I’m not sure about approach taken here, and want to hear people’s involved with trans thoughts in general.

I think this is a great idea! We've run into so may bugs in the past of duplicate names in codegen and they're always nasty and terrible to both find and fix.

I think the strategy taken here is relatively ok, but I would prefer to see the calls to functions like LLVMAddGlobal or register_fn to all get funneled through one common location. This one common location could then keep track of the symbols previously registered (and may pick up more instances here and there. Basically, if we're going to add this, I think we will want it to be as robust as possible which means pushing it as far down into the LLVM stack as possible.

@nagisa
Copy link
Member Author

nagisa commented Feb 27, 2015

I would prefer to see the calls to functions like LLVMAddGlobal ... to all get funneled through one common location

Yeah, that's exactly the strategy I am pursuing since yesterday, now.
Will post a new patchset sometime soon.

Along those lines it should be possible to have a shared global context (I think this already exists?) which keeps track of symbol names with unsynchronized access (as all translation is done serially).

We have a shared context for metadata, will investigate whether that
can be used for keeping track of the symbols as well.

On 27 February 2015 at 03:59, Alex Crichton notifications@github.com wrote:

I vaguely recall parallel codegen was broken due to duplicate symbols or
something

The specific issue in question I believe was #18243, but I don't think that
a small reproduction was ever generated. I do think, however, that -C
codegen-units=N is broken today on master so this might enable it to work in
stage1+. It also depends on what linker you're using I think... (fun times!)

Hmm, probably checking for a symbol and registering one with LLVM at the
same time (with a global lock, probably) and then using the registered
ValueRef everywhere in trans would be a better choice overall. That would
make parallel codegen correct as well at least wrt symbol clashes.

Note that the parallel part of parallel codegen is only the LLVM passes. We
continue to translate the entire AST serially, the strategy was just to
round-robin translations into one of N LLVM contexts. Each context is then
independently serialize.

Along those lines it should be possible to have a shared global context (I
think this already exists?) which keeps track of symbol names with
unsynchronized access (as all translation is done serially).

Objects produced by those contexts are merged during linking which either
fails or does nothing sensible in presence of duplicate symbols.

I believe this is indeed the problem! In theory though we should never be
producing duplicate symbols. If I remember correctly it was one of our own
compiler-generated symbols, not a user-input symbols (so this may not help
too too much).


I published this PR mainly as RFC, because I’m not sure about approach taken
here, and want to hear people’s involved with trans thoughts in general.

I think this is a great idea! We've run into so may bugs in the past of
duplicate names in codegen and they're always nasty and terrible to both
find and fix.

I think the strategy taken here is relatively ok, but I would prefer to see
the calls to functions like LLVMAddGlobal or register_fn to all get funneled
through one common location. This one common location could then keep track
of the symbols previously registered (and may pick up more instances here
and there. Basically, if we're going to add this, I think we will want it to
be as robust as possible which means pushing it as far down into the LLVM
stack as possible.


Reply to this email directly or view it on GitHub.

Regards,
Simonas Kazlauskas

@bors
Copy link
Contributor

bors commented Mar 3, 2015

☔ The latest upstream changes (presumably #22532) made this pull request unmergeable. Please resolve the merge conflicts.

@nagisa
Copy link
Member Author

nagisa commented Mar 3, 2015

@bors not like I care 😒

@nagisa nagisa closed this Mar 3, 2015
bors added a commit that referenced this pull request Apr 12, 2015
We provide tools to tell what exact symbols to emit for any fn or static, but
don’t quite check if that won’t cause any issues later on. Some of the issues
include LLVM mangling our names again and our names pointing to wrong locations,
us generating dumb foreign call wrappers, linker errors, extern functions
resolving to different symbols altogether (`extern {fn fail();} fail();` in some
cases calling `fail1()`), etc.

Before the commit we had a function called `note_unique_llvm_symbol`, so it is
clear somebody was aware of the issue at some point, but the function was barely
used, mostly in irrelevant locations.

Along with working on it I took liberty to start refactoring trans/base into
a few smaller modules. The refactoring is incomplete and I hope I will find some
motivation to carry on with it.

This is possibly a [breaking-change] because it makes dumbly written code
properly invalid.

This fixes all those issues about incorrect use of #[no_mangle] being not reported/misreported/ICEd by the compiler.

NB. This PR does not attempt to tackle the parallel codegen issue that was mentioned in #22811, but I believe it should be very straightforward in a follow up PR by modifying `trans::declare::get_defined_value` to look at all the contexts.

cc @alexcrichton @huonw @nrc because you commented on the original RFC issue.

EDIT: wow, this became much bigger than I initially intended.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants