Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rudimentary name mangling support #4267

Merged
merged 46 commits into from
Sep 6, 2024

Conversation

dwblaikie
Copy link
Contributor

@dwblaikie dwblaikie commented Aug 29, 2024

This seems to be enough to avoid naming collisions for functions in any of the current test cases (verified by asserting that the name of the llvm::Function matches the name passed to create it - not triggering LLVM's numbering that happens when names collide)

It currently implements mangling for namespace scopes, class scopes, and impls.
Nothing generic is mangled yet - haven't looked at how that works, though evidently it's not covered by existing testing, I guess.

Follow-up change will document the current mangling algorithm in toolchain/docs/lower.md

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm glad to see the start on mangling! There are some bits that make sense to me, some questions about how mangling should really work.

For documentation, my sense would be that it's a design choice. If you'd like to write a proposal for that (with alternatives, etc), let me know and I'll help walk you through that. Otherwise, maybe just create a Google Doc explaining it for now? Ideally I'd say to update toolchain/docs/lower.md, but unfortunately #4242 is still in review.

toolchain/lower/mangler.h Outdated Show resolved Hide resolved
toolchain/lower/mangler.h Show resolved Hide resolved
}
std::string result;
llvm::raw_string_ostream os(result);
os << "_C";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was kind of hoping @chandlerc would review and handle this in particular. I know we'd discussed _C as an option, but actually, considering swift mangling uses $s, should we consider a similar $c? I think one of the concerns with _C might've been too high a risk of conflicting with a C-based name, but maybe using a special character in mangling is a good way to avoid that problem entirely? Or maybe c. if we think we can use . in names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to other ideas - The "_" form seems to be dominant, though (LLVM's demangler supports _Z for C++, _R for Rust, and _D for the D programming language).

Just looking at LLVM's demangler makes the fair point that a . prefix would be problematic (not that that's what you were suggesting) - because of the leading . in ELF symbol names, so the demangler strips those off by default.

Work backwards from _Z and use _Y (or perhaps _X, because X is cool)?

Though I'm really OK with whatever everyone else wants - and wouldn't totally object to using something following swift's pattern. So if $c is best for you, happy to switch to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can separate this out if you want to start with _C as probably good.

I was actually thinking further, and if C. or C$ were an option (upper or lower case), particularly using a non-identifier character as separators instead of prefix, it might both (a) make the name easy to read and (b) offer a name that couldn't conflict with a C name.

But I'm realizing the difference between _C and $c isn't actually that big for us: we should be suffixing all mangled names with the package scope (e.g., _CFoo.Main), which should already avoid name conflicts with C names if we're using . separators. So if you don't think C<char> is a good choice, let's stick with _C for cross-language consistency.

toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/file_context.cpp Show resolved Hide resolved
Copy link
Contributor Author

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - sorry for the drip feed of responses earlier. Working on my github review workflow some more.

I think I've addressed all the comments and things are a bit tidier now.

}
std::string result;
llvm::raw_string_ostream os(result);
os << "_C";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to other ideas - The "_" form seems to be dominant, though (LLVM's demangler supports _Z for C++, _R for Rust, and _D for the D programming language).

Just looking at LLVM's demangler makes the fair point that a . prefix would be problematic (not that that's what you were suggesting) - because of the leading . in ELF symbol names, so the demangler strips those off by default.

Work backwards from _Z and use _Y (or perhaps _X, because X is cool)?

Though I'm really OK with whatever everyone else wants - and wouldn't totally object to using something following swift's pattern. So if $c is best for you, happy to switch to that.

toolchain/lower/mangler.h Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the approach is looking good. I did realize we need to handle the package differently, but I'm fine splitting that out as a TODO if you'd rather just finish here. I think the necessary changes won't significantly affect the current logic.

BTW, might be worth adding a brief TODO about the generic/impl mangling you note in the PR description, but I'll leave that to you.

And toolchain/docs/lower.md is now merged for edits, but that can also be done separately.

}
std::string result;
llvm::raw_string_ostream os(result);
os << "_C";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can separate this out if you want to start with _C as probably good.

I was actually thinking further, and if C. or C$ were an option (upper or lower case), particularly using a non-identifier character as separators instead of prefix, it might both (a) make the name easy to read and (b) offer a name that couldn't conflict with a C name.

But I'm realizing the difference between _C and $c isn't actually that big for us: we should be suffixing all mangled names with the package scope (e.g., _CFoo.Main), which should already avoid name conflicts with C names if we're using . separators. So if you don't think C<char> is a good choice, let's stick with _C for cross-language consistency.

toolchain/lower/file_context.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.h Outdated Show resolved Hide resolved
toolchain/lower/mangler.h Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
Copy link
Contributor Author

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the approach is looking good. I did realize we need to handle the package differently, but I'm fine splitting that out as a TODO if you'd rather just finish here. I think the necessary changes won't significantly affect the current logic.

Ended up adding package support along the way.

BTW, might be worth adding a brief TODO about the generic/impl mangling you note in the PR description, but I'll leave that to you.

Ended up implementing the impl qualifier mangling in some of the review iterations, so that's done. I'll leave a FIXME for generic support.

And toolchain/docs/lower.md is now merged for edits, but that can also be done separately.

Mentioned that in the description, to be done in a follow-up.

toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
@dwblaikie
Copy link
Contributor Author

(with the fix for the mangling ending too early with package support and impls - now bazel check //... with a patch to assert an llvm::Function's name matches the name requested (ie: testing that we never request duplicate names and trigger LLVM's name deduplication support) doesn't fail - which is something :) Not sure if it's worth including that assertion in trunk... I guess it probably is?)

@@ -31,20 +31,20 @@ fn InstanceAccess(a: A) -> i32 {
// CHECK:STDOUT: ; ModuleID = 'extend_impl.carbon'
// CHECK:STDOUT: source_filename = "extend_impl.carbon"
// CHECK:STDOUT:
// CHECK:STDOUT: define i32 @F() !dbg !4 {
// CHECK:STDOUT: define i32 @"_CF.A.Main:I.Main.A.Main"() !dbg !4 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mangling looks a little strange -- specifically repeating the .A.Main part twice. It also looks like the trailing portion might be ambiguous. How should we read this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yep - that's a bug.

Fixed.

The issue was that it was mangling in the parent name scopes, including the parent name scope of wherever the impl was defined - which, if I understand correctly, isn't relevant. (an impl is uniquely defined by its interface and constraint, regardless of the scope in which that impl is defined - ie: two impls of the same interface+constraint, written in different namespaces, are invalid/do collide. (well, I guess there's no syntax to define an impl in a namespace, so it only happens to collide between, I guess, two files both containing an impl or a single file containing an inline impl inside a class definition, and one outside, which fails with:

basic.carbon:11:1: ERROR: Redefinition of `impl B as A`.
impl B as A {
^~~~~~~~~~~~~
basic.carbon:6:3: Previous definition was here.
  extend impl as A {
  ^~~~~~~~~~~~~~~~~~

Fixed this by adding the lexical parent scope after the switch, and having the impl handling continue resulting in the parent scope addition being skipped.

Now there shouldn't be any trailing portion after an impl, it should be <function name>.<self name>:<constraint name>.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good now. For zygoloid's comment, if you can address it easily here feel free and I can take another look, but I think it'd also make sense to merge this and work in a separate PR since it'll have more narrow impact. (note, when impl.self_id is a class, 3 scopes get added to names_to_render, maybe 1 should be cut)

toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Show resolved Hide resolved
@dwblaikie
Copy link
Contributor Author

Very much messed this PR state up... trying to figure out how to fix it :/

dwblaikie and others added 20 commits September 6, 2024 22:03
In preparation for adding broader name mangling support.
_C was chosen without special care or consideration - that C++ didn't
choose _C for the itanium mangling probably says it's not the best
letter to pick, but we don't have any particular detail about how
different languages have chosen their mangling prefixes or other
conflict avoidance.
* mangle the fully qualified name of the class doing the implementing
* do not mangle the qualifiers of the interfaces yet
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Laying the foundation for removing the recursion from the mangler.

(also fixes the nested switch type, which didn't compile)
dwblaikie and others added 14 commits September 6, 2024 22:04
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
This had been causing mangling to stop for impls, after reaching the
package level of the first part of the name - missing the second part of
the name.
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
only the constraint and interface uniquely identify an impl
Co-authored-by: Carbon Infra Bot <carbon-external-infra@google.com>
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM, just pointing out a couple small things.

toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
toolchain/lower/mangler.cpp Outdated Show resolved Hide resolved
dwblaikie and others added 2 commits September 6, 2024 15:48
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
dwblaikie and others added 2 commits September 6, 2024 15:54
Co-authored-by: Carbon Infra Bot <carbon-external-infra@google.com>
@dwblaikie dwblaikie added this pull request to the merge queue Sep 6, 2024
Merged via the queue into carbon-language:trunk with commit a548eff Sep 6, 2024
8 checks passed
@dwblaikie dwblaikie deleted the mangle_symbols branch September 6, 2024 23:16
dwblaikie added a commit to dwblaikie/carbon-lang that referenced this pull request Sep 9, 2024
This seems to be enough to avoid naming collisions for functions in any
of the current test cases (verified by asserting that the name of the
`llvm::Function` matches the name passed to create it - not triggering
LLVM's numbering that happens when names collide)

It currently implements mangling for namespace scopes, class scopes, and
impls.
Nothing generic is mangled yet - haven't looked at how that works,
though evidently it's not covered by existing testing, I guess.

Follow-up change will document the current mangling algorithm in
`toolchain/docs/lower.md`

---------

Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Carbon Infra Bot <carbon-external-infra@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants