trans: Make type names in LLVM IR independent of crate-nums and source locations. #37640

michaelwoerister · 2016-11-07T21:21:05Z

UPDATE:
This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code and the order in which extern crates are loaded. The new type names look like the old ones, except for closures and the <crate-num>. prefix being gone. Resolution of name clashes (e.g. of the same type in different crate versions) is left to LLVM (which will just append .<counter> to the name).

ORIGINAL TEXT:
This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code. Before, the type of closures contained the closures definition location. The new naming scheme follows the same pattern that we already use for symbol names: We have a human readable prefix followed by a hash that makes sure we don't have any collisions. Here is an example of what the new names look like:

// prog.rs - example program

mod mod1
{
    pub struct Struct<T>(pub T);
}

fn main() {
    use mod1::Struct;

    let _s = Struct(0u32);
    let _t = Struct('h');
    let _x = Struct(Struct(0i32));
}

Old:

%"mod1::Struct<u32>" = type { i32 }
%"mod1::Struct<char>" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" }
%"mod1::Struct<i32>" = type { i32 }

New:

%"prog::mod1::Struct<u32>::ejDrT" = type { i32 }
%"prog::mod1::Struct<char>::2eEAU" = type { i32 }
%"prog::mod1::Struct<prog::mod1::Struct<i32>>::ehCqR" = type { %"prog::mod1::Struct<i32>::$fAo2" }
%"prog::mod1::Struct<i32>::$fAo2" = type { i32 }

As you can see, the new names are slightly more verbose, but also more consistent. There is no difference now between a local type and one from another crate (before, non-local types where prefixed with <crate-num>. as in 2.std::mod1::Type1).

There is a bit of design space here. For example, we could leave off the crate name for local definitions (making names shorter but less consistent):

%"mod1::Struct<u32>::ejDrT" = type { i32 }
%"mod1::Struct<char>::2eEAU" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>::ehCqR" = type { %"mod1::Struct<i32>::$fAo2" }
%"mod1::Struct<i32>::$fAo2" = type { i32 }

We could also put the hash in front, which might be more readable:

%"ejDrT.mod1::Struct<u32>" = type { i32 }
%"2eEAU.mod1::Struct<char>" = type { i32 }
%"ehCqR.mod1::Struct<mod1::Struct<i32>>" = type { %"$fAo2.mod1::Struct<i32>" }
%"$fAo2.mod1::Struct<i32>" = type { i32 }

We could probably also get rid of the hash if we used full DefPaths and crate-nums (though I'm not yet a 100% sure if crate-nums could mess with incremental compilation).

%"mod1::Struct<u32>" = type { i32 }
%"mod1::Struct<char>" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" }
%"mod1::Struct<i32>" = type { i32 }
%"2.std::mod1::Type1" = type { ... }

I would prefer the solution with the hashes because it is nice and consistent conceptually, but visually it's admittedly a bit uglier. Maybe @rust-lang/compiler would like to bikeshed a little about this.

On a related note: Has anyone ever tried if the LTO-linker will merge equal types with different names?
(^ @brson, @alexcrichton ^)
If not, that would be a reason to make type names more consistent.

rust-highfive · 2016-11-07T21:21:17Z

r? @Aatch

(rust_highfive has picked a reviewer for you, use r? to override)

michaelwoerister · 2016-11-07T21:21:29Z

r? @nikomatsakis

nagisa · 2016-11-07T22:12:12Z

@michaelwoerister

The new naming scheme follows the same pattern that we already use for symbol names: We have a human readable prefix followed by a hash that makes sure we don't have any collisions.

Collision handling make little sense to me. Types will never get their “LLVM names” exposed (except maybe through debug info?) nor should never use them to influence what code does (as opposed to using ValueRefs or some such. LLVM also happens to automatically handle name collisions during code generation, AFAIR, by appending a number to the string. Basically all these names are is a debugging aid for somebody reading the IR.

I feel like all this extra work the compiler would end up doing is just a waste.

eddyb · 2016-11-07T22:37:49Z

I agree with @nagisa here. In fact, I'd venture to say LLVM's "type system" is a misguided attempt at some sort of higher-level-than-machine "MIR" for C and C++, and their regrets alone won't rewrite LLVM.

It's less insightful than the debuginfo which is still presented in a deplorable manner in LLVM IR.
On top of that, it lures optimization pass authors into the trap of "Information Morgana" - relying on "types" for shortcuts, which leads to overall inefficiency and missed opportunities where the "types" lie.

I vote for not emitting any type or value names (other than static and fn symbols) by default.
Where LLVM IR is to be introspected, a flag can be passed to the compiler to enable such verbosity.
Either way, accuracy or uniqueness, as @nagisa has already stated, are irrelevant for these entities.

michaelwoerister · 2016-11-07T23:10:04Z

Yes, I agree, the more I think about it. How about we have something that is readable (for debugging), like the item path and any disambiguation is done through a cgu-local counter (which might already be provided by llvm, as @nagisa mentioned). That should be fine for incr. comp. For that we only need to get rid of source locations and crate nums in the names.
I'll update the PR tomorrow.

On November 7, 2016 5:38:02 PM EST, Eduard-Mihai Burtescu notifications@github.com wrote:

I agree with @nagisa here. In fact, I'd venture to say LLVM's "type
system" is a misguided attempt at some sort of
higher-level-than-machine "MIR" for C and C++, and their regrets alone
won't rewrite LLVM.

It's less insightful than the debuginfo which is still presented in a
deplorable manner in LLVM IR.
On top of that, it lures optimization pass authors into the trap of
"Information Morgana" - relying on "types" for shortcuts, which leads
to overall inefficiency and missed opportunities where the "types" lie.

I vote for not emitting any type or value names (other than static
and fn symbols) by default.
Where LLVM IR is to be introspected, a flag can be passed to the
compiler to enable such verbosity.
Either way, accuracy or uniqueness, as @nagisa has already stated, are
irrelevant for these entities.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#37640 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

nagisa · 2016-11-07T23:36:27Z

I feel like we can tell LLVM to not preserve any names (other than statics and fns, as eddyb already noted; there’s a switch for that somewhere, I believe) unless --emit=llvm-ir or -C llvm-args is passed. This could help with memory use too.

michaelwoerister · 2016-11-08T15:22:07Z

OK, I did some tests with and without human-readable names and I could hardly detect a difference in space requirements or compile times. E.g. the libstd rlib was 5784 KB with readable names and 5750 KB without, libcore 8402 KB with and 8397 KB without - so 0.5% and 0.05%.
I'm not sure if it's worth the trouble of supporting two different modes.

eddyb · 2016-11-08T15:36:23Z

Are trans times not affected? In that case just making the type names simpler is enough IMO.

michaelwoerister · 2016-11-08T17:33:09Z

Are trans times not affected?

Not as far as I can tell.

michaelwoerister · 2016-11-08T19:26:35Z

Update: See text of PR description.

bors · 2016-11-09T23:14:12Z

☔ The latest upstream changes (presumably #37670) made this pull request unmergeable. Please resolve the merge conflicts.

michaelwoerister · 2016-11-10T16:05:46Z

Rebased. I think there is nothing controversial about this PR anymore. It fixes the incr. comp. stability problems with making type names worse.

brson · 2016-11-11T03:20:33Z

@bors r+

bors · 2016-11-11T03:20:34Z

📌 Commit b69bc15 has been approved by brson

bors · 2016-11-11T04:39:07Z

☔ The latest upstream changes (presumably #37104) made this pull request unmergeable. Please resolve the merge conflicts.

michaelwoerister · 2016-11-11T14:34:41Z

@bors r=brson

Rebased.

bors · 2016-11-11T14:34:42Z

📌 Commit 47b8656 has been approved by brson

nikomatsakis · 2016-11-11T14:55:37Z

@bors r+

bors · 2016-11-11T14:55:38Z

💡 This pull request was already approved, no need to approve it again.

There's another pull request that is currently being tested, blocking this pull request: Show one error for duplicated type definitions #37447

bors · 2016-11-11T14:55:38Z

📌 Commit 47b8656 has been approved by nikomatsakis

nikomatsakis · 2016-11-11T14:55:59Z

Oops, I see @brson r+'d already... well, I approve anyhow. =)

Rollup of 30 pull requests - Successful merges: #37190, #37368, #37481, #37503, #37527, #37535, #37551, #37584, #37600, #37613, #37615, #37659, #37662, #37669, #37682, #37688, #37690, #37692, #37693, #37694, #37695, #37696, #37698, #37699, #37705, #37708, #37709, #37716, #37724, #37727 - Failed merges: #37640, #37689, #37717

bors · 2016-11-12T12:15:45Z

🔒 Merge conflict

bors · 2016-11-12T12:15:54Z

☔ The latest upstream changes (presumably #37730) made this pull request unmergeable. Please resolve the merge conflicts.

michaelwoerister · 2016-11-13T01:11:50Z

@bors r=brson

Rebased.

bors · 2016-11-13T01:11:51Z

📌 Commit 4a32030 has been approved by brson

bors · 2016-11-13T02:53:49Z

🔒 Merge conflict

bors · 2016-11-13T02:54:12Z

☔ The latest upstream changes (presumably #37675) made this pull request unmergeable. Please resolve the merge conflicts.

Before this PR, type names could depend on the cratenum being used for a given crate and also on the source location of closures. Both are undesirable for incremental compilation where we cache LLVM IR and don't want it to depend on formatting or in which order crates are loaded.

michaelwoerister · 2016-11-14T00:51:02Z

@bors r=brson

Rebased.

bors · 2016-11-14T00:51:02Z

📌 Commit 276f052 has been approved by brson

bors · 2016-11-14T03:46:34Z

⌛ Testing commit 276f052 with merge 435246b...

@brson

trans: Make type names in LLVM IR independent of crate-nums and source locations. UPDATE: This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code and the order in which extern crates are loaded. The new type names look like the old ones, except for closures and the `<crate-num>.` prefix being gone. Resolution of name clashes (e.g. of the same type in different crate versions) is left to LLVM (which will just append `.<counter>` to the name). ORIGINAL TEXT: This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code. Before, the type of closures contained the closures definition location. The new naming scheme follows the same pattern that we already use for symbol names: We have a human readable prefix followed by a hash that makes sure we don't have any collisions. Here is an example of what the new names look like: ```rust // prog.rs - example program mod mod1 { pub struct Struct<T>(pub T); } fn main() { use mod1::Struct; let _s = Struct(0u32); let _t = Struct('h'); let _x = Struct(Struct(0i32)); } ``` Old: ```llvm %"mod1::Struct<u32>" = type { i32 } %"mod1::Struct<char>" = type { i32 } %"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" } %"mod1::Struct<i32>" = type { i32 } ``` New: ```llvm %"prog::mod1::Struct<u32>::ejDrT" = type { i32 } %"prog::mod1::Struct<char>::2eEAU" = type { i32 } %"prog::mod1::Struct<prog::mod1::Struct<i32>>::ehCqR" = type { %"prog::mod1::Struct<i32>::$fAo2" } %"prog::mod1::Struct<i32>::$fAo2" = type { i32 } ``` As you can see, the new names are slightly more verbose, but also more consistent. There is no difference now between a local type and one from another crate (before, non-local types where prefixed with `<crate-num>.` as in `2.std::mod1::Type1`). There is a bit of design space here. For example, we could leave off the crate name for local definitions (making names shorter but less consistent): ```llvm %"mod1::Struct<u32>::ejDrT" = type { i32 } %"mod1::Struct<char>::2eEAU" = type { i32 } %"mod1::Struct<mod1::Struct<i32>>::ehCqR" = type { %"mod1::Struct<i32>::$fAo2" } %"mod1::Struct<i32>::$fAo2" = type { i32 } ``` We could also put the hash in front, which might be more readable: ```llvm %"ejDrT.mod1::Struct<u32>" = type { i32 } %"2eEAU.mod1::Struct<char>" = type { i32 } %"ehCqR.mod1::Struct<mod1::Struct<i32>>" = type { %"$fAo2.mod1::Struct<i32>" } %"$fAo2.mod1::Struct<i32>" = type { i32 } ``` We could probably also get rid of the hash if we used full DefPaths and crate-nums (though I'm not yet a 100% sure if crate-nums could mess with incremental compilation). ```llvm %"mod1::Struct<u32>" = type { i32 } %"mod1::Struct<char>" = type { i32 } %"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" } %"mod1::Struct<i32>" = type { i32 } %"2.std::mod1::Type1" = type { ... } ``` I would prefer the solution with the hashes because it is nice and consistent conceptually, but visually it's admittedly a bit uglier. Maybe @rust-lang/compiler would like to bikeshed a little about this. On a related note: Has anyone ever tried if the LTO-linker will merge equal types with different names? (^ @brson, @alexcrichton ^) If not, that would be a reason to make type names more consistent.

bors · 2016-11-14T07:31:51Z

rust-highfive assigned Aatch Nov 7, 2016

rust-highfive assigned nikomatsakis and unassigned Aatch Nov 7, 2016

michaelwoerister force-pushed the llvm-type-names branch from 4ed52fb to 5e13f0c Compare November 8, 2016 19:20

michaelwoerister force-pushed the llvm-type-names branch from 5e13f0c to b69bc15 Compare November 10, 2016 16:03

alexcrichton added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Nov 10, 2016

brson assigned brson and unassigned nikomatsakis Nov 11, 2016

michaelwoerister force-pushed the llvm-type-names branch from b69bc15 to 47b8656 Compare November 11, 2016 14:33

eddyb mentioned this pull request Nov 11, 2016

Rollup of 29 pull requests #37726

Closed

eddyb mentioned this pull request Nov 12, 2016

Rollup of 30 pull requests #37730

Merged

michaelwoerister force-pushed the llvm-type-names branch from 47b8656 to 4a32030 Compare November 13, 2016 00:52

michaelwoerister added 4 commits November 13, 2016 19:49

Fix codegen test after change of llvm type naming scheme

fd4ee00

Adapt accidentally fixed test case.

790a2f9

Remove unused method CrateContext::rotate().

276f052

michaelwoerister force-pushed the llvm-type-names branch from 4a32030 to 276f052 Compare November 14, 2016 00:50

bors merged commit 276f052 into rust-lang:master Nov 14, 2016

trans: Make type names in LLVM IR independent of crate-nums and source locations. #37640

trans: Make type names in LLVM IR independent of crate-nums and source locations. #37640

Uh oh!

Conversation

michaelwoerister commented Nov 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Nov 7, 2016

Uh oh!

michaelwoerister commented Nov 7, 2016

Uh oh!

nagisa commented Nov 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eddyb commented Nov 7, 2016

Uh oh!

michaelwoerister commented Nov 7, 2016

Uh oh!

nagisa commented Nov 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelwoerister commented Nov 8, 2016

Uh oh!

eddyb commented Nov 8, 2016

Uh oh!

michaelwoerister commented Nov 8, 2016

Uh oh!

michaelwoerister commented Nov 8, 2016

Uh oh!

bors commented Nov 9, 2016

Uh oh!

michaelwoerister commented Nov 10, 2016

Uh oh!

brson commented Nov 11, 2016

Uh oh!

bors commented Nov 11, 2016

Uh oh!

bors commented Nov 11, 2016

Uh oh!

michaelwoerister commented Nov 11, 2016

Uh oh!

bors commented Nov 11, 2016

Uh oh!

nikomatsakis commented Nov 11, 2016

Uh oh!

bors commented Nov 11, 2016

Uh oh!

bors commented Nov 11, 2016

Uh oh!

nikomatsakis commented Nov 11, 2016

Uh oh!

bors commented Nov 12, 2016

Uh oh!

bors commented Nov 12, 2016

Uh oh!

michaelwoerister commented Nov 13, 2016

Uh oh!

bors commented Nov 13, 2016

Uh oh!

bors commented Nov 13, 2016

Uh oh!

bors commented Nov 13, 2016

Uh oh!

michaelwoerister commented Nov 14, 2016

Uh oh!

bors commented Nov 14, 2016

Uh oh!

bors commented Nov 14, 2016

Uh oh!

bors commented Nov 14, 2016

Uh oh!

Uh oh!

michaelwoerister commented Nov 7, 2016 •

edited

Loading

nagisa commented Nov 7, 2016 •

edited

Loading

nagisa commented Nov 7, 2016 •

edited

Loading