Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trans: Make type names in LLVM IR independent of crate-nums and source locations. #37640

Merged
merged 4 commits into from
Nov 14, 2016

Conversation

michaelwoerister
Copy link
Member

@michaelwoerister michaelwoerister commented Nov 7, 2016

UPDATE:
This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code and the order in which extern crates are loaded. The new type names look like the old ones, except for closures and the <crate-num>. prefix being gone. Resolution of name clashes (e.g. of the same type in different crate versions) is left to LLVM (which will just append .<counter> to the name).

ORIGINAL TEXT:
This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code. Before, the type of closures contained the closures definition location. The new naming scheme follows the same pattern that we already use for symbol names: We have a human readable prefix followed by a hash that makes sure we don't have any collisions. Here is an example of what the new names look like:

// prog.rs - example program

mod mod1
{
    pub struct Struct<T>(pub T);
}

fn main() {
    use mod1::Struct;

    let _s = Struct(0u32);
    let _t = Struct('h');
    let _x = Struct(Struct(0i32));
}

Old:

%"mod1::Struct<u32>" = type { i32 }
%"mod1::Struct<char>" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" }
%"mod1::Struct<i32>" = type { i32 }

New:

%"prog::mod1::Struct<u32>::ejDrT" = type { i32 }
%"prog::mod1::Struct<char>::2eEAU" = type { i32 }
%"prog::mod1::Struct<prog::mod1::Struct<i32>>::ehCqR" = type { %"prog::mod1::Struct<i32>::$fAo2" }
%"prog::mod1::Struct<i32>::$fAo2" = type { i32 }

As you can see, the new names are slightly more verbose, but also more consistent. There is no difference now between a local type and one from another crate (before, non-local types where prefixed with <crate-num>. as in 2.std::mod1::Type1).

There is a bit of design space here. For example, we could leave off the crate name for local definitions (making names shorter but less consistent):

%"mod1::Struct<u32>::ejDrT" = type { i32 }
%"mod1::Struct<char>::2eEAU" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>::ehCqR" = type { %"mod1::Struct<i32>::$fAo2" }
%"mod1::Struct<i32>::$fAo2" = type { i32 }

We could also put the hash in front, which might be more readable:

%"ejDrT.mod1::Struct<u32>" = type { i32 }
%"2eEAU.mod1::Struct<char>" = type { i32 }
%"ehCqR.mod1::Struct<mod1::Struct<i32>>" = type { %"$fAo2.mod1::Struct<i32>" }
%"$fAo2.mod1::Struct<i32>" = type { i32 }

We could probably also get rid of the hash if we used full DefPaths and crate-nums (though I'm not yet a 100% sure if crate-nums could mess with incremental compilation).

%"mod1::Struct<u32>" = type { i32 }
%"mod1::Struct<char>" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" }
%"mod1::Struct<i32>" = type { i32 }
%"2.std::mod1::Type1" = type { ... }

I would prefer the solution with the hashes because it is nice and consistent conceptually, but visually it's admittedly a bit uglier. Maybe @rust-lang/compiler would like to bikeshed a little about this.

On a related note: Has anyone ever tried if the LTO-linker will merge equal types with different names?
(^ @brson, @alexcrichton ^)
If not, that would be a reason to make type names more consistent.

@rust-highfive
Copy link
Collaborator

r? @Aatch

(rust_highfive has picked a reviewer for you, use r? to override)

@michaelwoerister
Copy link
Member Author

r? @nikomatsakis

@rust-highfive rust-highfive assigned nikomatsakis and unassigned Aatch Nov 7, 2016
@nagisa
Copy link
Member

nagisa commented Nov 7, 2016

@michaelwoerister

The new naming scheme follows the same pattern that we already use for symbol names: We have a human readable prefix followed by a hash that makes sure we don't have any collisions.

Collision handling make little sense to me. Types will never get their “LLVM names” exposed (except maybe through debug info?) nor should never use them to influence what code does (as opposed to using ValueRefs or some such. LLVM also happens to automatically handle name collisions during code generation, AFAIR, by appending a number to the string. Basically all these names are is a debugging aid for somebody reading the IR.

I feel like all this extra work the compiler would end up doing is just a waste.

@eddyb
Copy link
Member

eddyb commented Nov 7, 2016

I agree with @nagisa here. In fact, I'd venture to say LLVM's "type system" is a misguided attempt at some sort of higher-level-than-machine "MIR" for C and C++, and their regrets alone won't rewrite LLVM.

It's less insightful than the debuginfo which is still presented in a deplorable manner in LLVM IR.
On top of that, it lures optimization pass authors into the trap of "Information Morgana" - relying on "types" for shortcuts, which leads to overall inefficiency and missed opportunities where the "types" lie.

I vote for not emitting any type or value names (other than static and fn symbols) by default.
Where LLVM IR is to be introspected, a flag can be passed to the compiler to enable such verbosity.
Either way, accuracy or uniqueness, as @nagisa has already stated, are irrelevant for these entities.

@michaelwoerister
Copy link
Member Author

Yes, I agree, the more I think about it. How about we have something that is readable (for debugging), like the item path and any disambiguation is done through a cgu-local counter (which might already be provided by llvm, as @nagisa mentioned). That should be fine for incr. comp. For that we only need to get rid of source locations and crate nums in the names.
I'll update the PR tomorrow.

On November 7, 2016 5:38:02 PM EST, Eduard-Mihai Burtescu notifications@github.com wrote:

I agree with @nagisa here. In fact, I'd venture to say LLVM's "type
system" is a misguided attempt at some sort of
higher-level-than-machine "MIR" for C and C++, and their regrets alone
won't rewrite LLVM.

It's less insightful than the debuginfo which is still presented in a
deplorable manner in LLVM IR.
On top of that, it lures optimization pass authors into the trap of
"Information Morgana" - relying on "types" for shortcuts, which leads
to overall inefficiency and missed opportunities where the "types" lie.

I vote for not emitting any type or value names (other than static
and fn symbols) by default.
Where LLVM IR is to be introspected, a flag can be passed to the
compiler to enable such verbosity.
Either way, accuracy or uniqueness, as @nagisa has already stated, are
irrelevant for these entities.

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#37640 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

@nagisa
Copy link
Member

nagisa commented Nov 7, 2016

I feel like we can tell LLVM to not preserve any names (other than statics and fns, as eddyb already noted; there’s a switch for that somewhere, I believe) unless --emit=llvm-ir or -C llvm-args is passed. This could help with memory use too.

@michaelwoerister
Copy link
Member Author

OK, I did some tests with and without human-readable names and I could hardly detect a difference in space requirements or compile times. E.g. the libstd rlib was 5784 KB with readable names and 5750 KB without, libcore 8402 KB with and 8397 KB without - so 0.5% and 0.05%.
I'm not sure if it's worth the trouble of supporting two different modes.

@eddyb
Copy link
Member

eddyb commented Nov 8, 2016

Are trans times not affected? In that case just making the type names simpler is enough IMO.

@michaelwoerister
Copy link
Member Author

Are trans times not affected?

Not as far as I can tell.

@michaelwoerister
Copy link
Member Author

Update: See text of PR description.

@bors
Copy link
Contributor

bors commented Nov 9, 2016

☔ The latest upstream changes (presumably #37670) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister
Copy link
Member Author

Rebased. I think there is nothing controversial about this PR anymore. It fixes the incr. comp. stability problems with making type names worse.

@alexcrichton alexcrichton added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Nov 10, 2016
@brson
Copy link
Contributor

brson commented Nov 11, 2016

@bors r+

@bors
Copy link
Contributor

bors commented Nov 11, 2016

📌 Commit b69bc15 has been approved by brson

@brson brson assigned brson and unassigned nikomatsakis Nov 11, 2016
@bors
Copy link
Contributor

bors commented Nov 11, 2016

☔ The latest upstream changes (presumably #37104) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister
Copy link
Member Author

@bors r=brson

Rebased.

@bors
Copy link
Contributor

bors commented Nov 11, 2016

📌 Commit 47b8656 has been approved by brson

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Nov 11, 2016

💡 This pull request was already approved, no need to approve it again.

@bors
Copy link
Contributor

bors commented Nov 11, 2016

📌 Commit 47b8656 has been approved by nikomatsakis

@nikomatsakis
Copy link
Contributor

Oops, I see @brson r+'d already... well, I approve anyhow. =)

bors added a commit that referenced this pull request Nov 12, 2016
@bors
Copy link
Contributor

bors commented Nov 12, 2016

🔒 Merge conflict

@bors
Copy link
Contributor

bors commented Nov 12, 2016

☔ The latest upstream changes (presumably #37730) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister
Copy link
Member Author

@bors r=brson

Rebased.

@bors
Copy link
Contributor

bors commented Nov 13, 2016

📌 Commit 4a32030 has been approved by brson

@bors
Copy link
Contributor

bors commented Nov 13, 2016

🔒 Merge conflict

@bors
Copy link
Contributor

bors commented Nov 13, 2016

☔ The latest upstream changes (presumably #37675) made this pull request unmergeable. Please resolve the merge conflicts.

Before this PR, type names could depend on the cratenum being used
for a given crate and also on the source location of closures.
Both are undesirable for incremental compilation where we cache
LLVM IR and don't want it to depend on formatting or in which
order crates are loaded.
@michaelwoerister
Copy link
Member Author

@bors r=brson

Rebased.

@bors
Copy link
Contributor

bors commented Nov 14, 2016

📌 Commit 276f052 has been approved by brson

@bors
Copy link
Contributor

bors commented Nov 14, 2016

⌛ Testing commit 276f052 with merge 435246b...

bors added a commit that referenced this pull request Nov 14, 2016
trans: Make type names in LLVM IR independent of crate-nums and source locations.

UPDATE:
This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code and the order in which extern crates are loaded. The new type names look like the old ones, except for closures and the `<crate-num>.` prefix being gone. Resolution of name clashes (e.g. of the same type in different crate versions) is left to LLVM (which will just append `.<counter>` to the name).

ORIGINAL TEXT:
This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code. Before, the type of closures contained the closures definition location. The new naming scheme follows the same pattern that we already use for symbol names: We have a human readable prefix followed by a hash that makes sure we don't have any collisions. Here is an example of what the new names look like:

```rust
// prog.rs - example program

mod mod1
{
    pub struct Struct<T>(pub T);
}

fn main() {
    use mod1::Struct;

    let _s = Struct(0u32);
    let _t = Struct('h');
    let _x = Struct(Struct(0i32));
}
```
Old:
```llvm
%"mod1::Struct<u32>" = type { i32 }
%"mod1::Struct<char>" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" }
%"mod1::Struct<i32>" = type { i32 }
```
New:
```llvm
%"prog::mod1::Struct<u32>::ejDrT" = type { i32 }
%"prog::mod1::Struct<char>::2eEAU" = type { i32 }
%"prog::mod1::Struct<prog::mod1::Struct<i32>>::ehCqR" = type { %"prog::mod1::Struct<i32>::$fAo2" }
%"prog::mod1::Struct<i32>::$fAo2" = type { i32 }
```

As you can see, the new names are slightly more verbose, but also more consistent. There is no difference now between a local type and one from another crate (before, non-local types where prefixed with `<crate-num>.` as in `2.std::mod1::Type1`).

There is a bit of design space here. For example, we could leave off the crate name for local definitions (making names shorter but less consistent):
```llvm
%"mod1::Struct<u32>::ejDrT" = type { i32 }
%"mod1::Struct<char>::2eEAU" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>::ehCqR" = type { %"mod1::Struct<i32>::$fAo2" }
%"mod1::Struct<i32>::$fAo2" = type { i32 }
```

We could also put the hash in front, which might be more readable:
```llvm
%"ejDrT.mod1::Struct<u32>" = type { i32 }
%"2eEAU.mod1::Struct<char>" = type { i32 }
%"ehCqR.mod1::Struct<mod1::Struct<i32>>" = type { %"$fAo2.mod1::Struct<i32>" }
%"$fAo2.mod1::Struct<i32>" = type { i32 }
```

We could probably also get rid of the hash if we used full DefPaths and crate-nums (though I'm not yet a 100% sure if crate-nums could mess with incremental compilation).

```llvm
%"mod1::Struct<u32>" = type { i32 }
%"mod1::Struct<char>" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" }
%"mod1::Struct<i32>" = type { i32 }
%"2.std::mod1::Type1" = type { ... }
```
I would prefer the solution with the hashes because it is nice and consistent conceptually, but visually it's admittedly a bit uglier. Maybe @rust-lang/compiler would like to bikeshed a little about this.

On a related note: Has anyone ever tried if the LTO-linker will merge equal types with different names?
(^ @brson, @alexcrichton ^)
If not, that would be a reason to make type names more consistent.
@bors bors merged commit 276f052 into rust-lang:master Nov 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants