Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes LTO + build-std + Oz failed to resolve undefined symbols #109821

Conversation

DianQK
Copy link
Member

@DianQK DianQK commented Mar 31, 2023

Fixes #108853. Fixes #72140. Fixes #110606.

The problem is that compiler_builtins is not involved in LTO. When share-generics + Oz/s/1 is turned on, the symbols needed for compiler_builtins are defined in core or other target files are involved in LTO.
In this case, we should export the relevant symbols at LTO. And we can write similar code to reproduce it in the stable version. Please see https://github.com/DianQK/rust-109821-reproducible.

Before LTO:

$ nm libcompiler_builtins-xx.rlib
U <u32 as core::convert::TryInto<u8>>::try_into
U <u32 as core::convert::TryInto<u32>>::try_into
U <u32 as core::convert::TryInto<u16>>::try_into
$ nm 
nm foo-cgu.0.rcgu.lto.input.bc
T <u32 as core::convert::TryInto<u8>>::try_into
U <u32 as core::convert::TryInto<usize>>::try_into
T <u32 as core::convert::TryInto<u32>>::try_into
T <u32 as core::convert::TryInto<u16>>::try_into

After LTO (Before the PR):

$ nm 
nm foo-cgu.0.rcgu.lto.after-restriction.bc
t <u32 as core::convert::TryInto<u8>>::try_into
U <u32 as core::convert::TryInto<usize>>::try_into
t <u32 as core::convert::TryInto<u32>>::try_into
t <u32 as core::convert::TryInto<u16>>::try_into

After LTO (After the PR):

$ nm 
nm foo-cgu.0.rcgu.lto.after-restriction.bc
T <u32 as core::convert::TryInto<u8>>::try_into
U <u32 as core::convert::TryInto<usize>>::try_into
T <u32 as core::convert::TryInto<u32>>::try_into
T <u32 as core::convert::TryInto<u16>>::try_into

Why does the problem only occur in Windows?
The rustc adds the -gc-sections on Linux and -Wl,-dead_strip on macOS. This problem can also be reproduced if these parameters are removed.

Another candidate patch that allows compiler_builtins to generate LLVM IR to be involved in LTO. Let ignored_for_lto always return false.

TODO:

@rustbot
Copy link
Collaborator

rustbot commented Mar 31, 2023

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @oli-obk (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 31, 2023
@bjorn3
Copy link
Member

bjorn3 commented Mar 31, 2023

compiler-builtins shouldn't depend on symbols exported by libcore. Even without LTO this is an issue due to compiler-builtins being added at the end of the linker invocations and ld.bfd (and other classical unix linkers) don't allow object files to depend on symbols exported by anything passed earlier on the commandline. compiler-builtins needs to be at the end as everything else depends on compiler-builtins.

Another way to fix #108853 would be to make the respective try_into functions #[inline].

@cr1901
Copy link
Contributor

cr1901 commented Mar 31, 2023

@bjorn3 You would know this better than me, but... isn't inline, even inline(always) not actually guaranteed to inline? How does share-generics interact with inline?

I can think of two edge cases:

  • There was an inline hint on try_foo, so a symbol try_foo shouldn't be exported normally. But rustc decided not to inline try_foo in compiler-builtins, but inlined try_foo everywhere else. No problems there, b/c try_foo's definition is found in compiler-builtins?
  • rustc decided not to inline try_foo in compiler-builtins and a preceding crate, despite the hint. The preceding crate defines the symbol try_foo, compiler-builtins tries to use try_foo definition in the preceding crate, and then the LTO link fails in the same way as Latest nightly are failing to build non-toy projects at linking stage when using LTO optimization #108853 b/c:
    • compiler-builtins isn't participating in LTO.
    • As mentioned ld.bfd (and other classical unix linkers) don't allow object files to depend on symbols exported by anything passed earlier on the commandline.

@bjorn3
Copy link
Member

bjorn3 commented Mar 31, 2023

AFAIK #[inline] doesn't interact with share-generics at all and in the current implementation forces the function to be codegened in every crate that uses it even if it isn't actually inlined. This is exactly what we need here. If rustc ever changes this to not codegen #[inline] functions in every crate that uses them, we can change the way compiler-builtins is handled at the same time. Possibly by adding a special case to keep the old behavior just for compiler-builtins. It already has a lot of special casing anyway. For example excluding from LTO, including it in every linker invocation even if a dylib dep already includes it, setting codegen-units to a very high number (to make every intrinsic end up in their own object file) and disabling debug assertions to avoid references to any of the panic infrastructure.

@DianQK
Copy link
Member Author

DianQK commented Apr 1, 2023

@bjorn3 @cr1901 I tested with ld.bfd and ld.gold on the small reproducible. It works fine, maybe the behavior has changed with the recent release?

$ ld.bfd --version
GNU ld (GNU Binutils) 2.40
$ ld.gold --version
GNU gold (GNU Binutils 2.40) 1.16

I'm curious as to why compiler-builtins depend on libcore, but compiler-builtins shouldn't depend on symbols exported by libcore. It looks strange to me and causes other issues, such as compiler-builtins requiring the core::panic::xx symbol.

If this is ok, we have to copy all the symbols/definitions needed by compiler-builtins to compiler-builtins, even under Oz builds. For this case, the share-generics must always be no for compiler-builtins. I'm not sure if this is appropriate.

Let's think a bit further, about ignored_for_lto to potentially use on more crates. It may not be appropriate to make the same constraints as compiler-builtins. This patch will make these cases work fine.
If we look at LLVM Link Time Optimization: Design and Implementation, it allows any input file to be an object file or a bitcode file. The examples in this document are very similar to the current issue.
The main.o is a native object file, and the a.o(a.bc) is an LLVM bitcode file. The main.o requires the foo1 symbol that is defined at a.bc.

Phase 3 : Optimize Bitcode Files: After symbol resolution, the linker tells the LTO shared object which symbols are needed by native object files. In the example above, the linker reports that only foo1() is used by native object files using lto_codegen_add_must_preserve_symbol(). Next the linker invokes the LLVM optimizer and code generators using lto_codegen_compile() which returns a native object file creating by merging the LLVM bitcode files and applying various optimization passes.

If we think of the LTO of rustc as truly Link Time Optimization, this patch provides a similar stage of Symbol Resolution.
This is a missing stage of the LTO of rustc.

@bjorn3
Copy link
Member

bjorn3 commented Apr 1, 2023

I'm curious as to why compiler-builtins depend on libcore, but compiler-builtins shouldn't depend on symbols exported by libcore.

All crates that contain any function depend on a number of so called lang items. These are things like the Sized and Add traits. Libcore defines those lang items. It isn't possible for compiler-builtins itself to define them as there can be only a single implementation in the current crate and all it's dependencies combined. In addition moving it to compiler-builtins and then having libcore depend on compiler-builtins won't work due to coherency.

It looks strange to me and causes other issues, such as compiler-builtins requiring the core::panic::xx symbol.

This is why compiler-builtins must be built with -Cdebug-assertions=false and has to be careful not to do any operation that can panic.

Let's think a bit further, about ignored_for_lto to potentially use on more crates. It may not be appropriate to make the same constraints as compiler-builtins.

ignored_for_lto is meant only for compiler-builtins. If it were to participate in LTO, functions like memcpy would get "optimized" into a recursive call to themself as LLVM thinks that it can pattern match and replace it with a more optimized version from compiler-builtins, but that of course doesn't work for the version in compiler-builtins.

If we think of the LTO of rustc as truly Link Time Optimization, this patch provides a similar stage of Symbol Resolution.
This is a missing stage of the LTO of rustc.

In case of linker plugin LTO (-Clinker-plugin-lto) even before this PR it would have worked as the symbols aren't made private by LTO before the linker gets a chance to see that compiler-builtins needs them. In that case the only thing that makes symbols private is a version script (which rustc generates from the exported_symbols list that you didn't change here), which doesn't hide symbols from other object files and bitcode files in the same linker invocation. Compiler-builtins still doesn't participate in LTO as it is compiled directly to object files instead of to bitcode files like everything else.

@bjorn3
Copy link
Member

bjorn3 commented Apr 1, 2023

It works fine, maybe the behavior has changed with the recent release?

I don't expect them to do that. I expect it to be a breaking change in some cases. In any case here is an example:

cat >foo.c <<EOF
void foo() {}
EOF
cat >bar.c <<EOF
void foo();

void main() { foo(); }
EOF
gcc -c foo.c -o foo.o
ar q foo.a foo.o
gcc -c bar.c -o bar.o
ar q bar.a bar.o
gcc bar.a foo.a -o test # use comes before definition => works
gcc foo.a bar.a -o test # definition comes before use => doesn't work
/usr/bin/ld: bar.a(bar.o): in function `main':
bar.c:(.text+0xa): undefined reference to `foo'
collect2: error: ld returned 1 exit status
ld.bfd --version
GNU ld (GNU Binutils for Debian) 2.35.2
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.

@cr1901
Copy link
Contributor

cr1901 commented Apr 1, 2023

AFAIK #[inline] doesn't interact with share-generics at all and in the current implementation forces the function to be codegened in every crate that uses it even if it isn't actually inlined. This is exactly what we need here.

In other words, #[inline] for a function foo forces a (private to that crate, presumably) symbol to be created for function foo, such that even if inlining foo doesn't occur, the function call will refer a (private) symbol foo that the linker can find?

@bjorn3
Copy link
Member

bjorn3 commented Apr 1, 2023

This is indeed the behavior of the current version of rustc. There is no guarantee that this will remain the case so user code shouldn't depend on it, but if it changed, we can easily change compiler-builtins at the same time to fix it.

@DianQK
Copy link
Member Author

DianQK commented Apr 2, 2023

All crates that contain any function depend on a number of so called lang items. These are things like the Sized and Add traits. Libcore defines those lang items. It isn't possible for compiler-builtins itself to define them as there can be only a single implementation in the current crate and all it's dependencies combined. In addition moving it to compiler-builtins and then having libcore depend on compiler-builtins won't work due to coherency.

I think I understand why compiler-builtins depend on libcore. Can you explain more why compiler-builtins shouldn't depend on symbols exported by libcore?

Forgive me for not understanding. Due to the Oz optimization, rustc puts some libcore function symbols into compiler-builtins as global symbol definitions. These symbols in the symbol resolution phase are later used, which is unique.


ignored_for_lto is meant only for compiler-builtins. If it were to participate in LTO, functions like memcpy would get "optimized" into a recursive call to themself as LLVM thinks that it can pattern match and replace it with a more optimized version from compiler-builtins, but that of course doesn't work for the version in compiler-builtins.

I don't think ignored_for_lto is meant only for compiler-builtins. For the current state, we can add #![no_builtins] to ignore LTO. And for more cases in the future, we may add the ignored_for_lto rule.


In case of linker plugin LTO (-Clinker-plugin-lto) even before this PR it would have worked as the symbols aren't made private by LTO before the linker gets a chance to see that compiler-builtins needs them.

According to my understanding and practice of LTO, by calling the lto_codegen_add_must_preserve_symbol() function to select the reserved symbols, the other symbols will change from Global(T) to Local(t).
Please check the small reproducible.
The majority of the *.lto.input.bc symbols are T, while *.lto.after-restriction.bc has only a small part of the reserved symbols that are T. This is exactly why the issue occurred.


I don't expect them to do that. I expect it to be a breaking change in some cases. In any case here is an example: ...

I figured out why using ld.bfd in my demo still works.
The rustc process is more like:

cat >foo.c <<EOF
void foo() {}
EOF
cat >bar.c <<EOF
void foo();

void main() { foo(); }
EOF
gcc -c foo.c -o foo.o
gcc -c bar.c -o bar.o
ar q bar.a bar.o
gcc foo.o bar.a -o test

The difference is that the foo symbol needed for bar.a is found in foo.o, not foo.a.

So I think this patch is still a reasonable solution.


I don't think inline is an appropriate solution. It should be a workaround.

  • If another function is called inside the function, we have to add an inline chain to resolve it.
  • An inline will affect all other crate calls to this function.
  • Or the body of this function is very large.

@bjorn3
Copy link
Member

bjorn3 commented Apr 2, 2023

The difference is that the foo symbol needed for bar.a is found in foo.o, not foo.a.

Both core and compiler-builtins will be passed to the linker as archives rather than raw object files, which means that even with the exception for raw object files, compiler-builtins is not allowed to depend on core.

I don't think inline is an appropriate solution. It should be a workaround.

* If another function is called inside the function, we have to add an inline chain to resolve it.

* An inline will affect all other crate calls to this function.

* Or the body of this function is very large.

The try_into methods in question are so small that inlining them will never bloat the binary. In fact it may reduce the binary size due to avoiding a call instruction and spilling registers. None of the core functions compiler-builtins calls are big enough that adding #[inline] leads to code bloat. We could add some code to codegen core functions called by compiler-builtins in compiler-builtins itself, but for now #[inline] works. This PR is IMO even more of a workaround than adding a couple of #[inline]. There should be absolutely no reason to read what symbols any of the compiled object files import. In addition it only helps for LTO, not for non-LTO cases. And it doesn't work on wasm as the object crate doesn't support reading the symbol table from a wasm object file, only from already linked wasm modules.

@DianQK DianQK force-pushed the fixes-failed-to-resolve-undefined-symbols-for-the-compiler_builtins branch from 12e7b2d to 52d6abc Compare April 2, 2023 15:40
@rustbot
Copy link
Collaborator

rustbot commented Apr 2, 2023

These commits modify the Cargo.lock file. Random changes to Cargo.lock can be introduced when switching branches and rebasing PRs.
This was probably unintentional and should be reverted before this PR is merged.

If this was intentional then you can ignore this comment.

@rust-log-analyzer

This comment has been minimized.

@DianQK
Copy link
Member Author

DianQK commented Apr 2, 2023

Both core and compiler-builtins will be passed to the linker as archives rather than raw object files, which means that even with the exception for raw object files, compiler-builtins is not allowed to depend on core.

It only occurs under non-LTO. This patch only affects LTO. In the case of LTO, the core merges into the LTO object file.


I read the When to #[inline].

On methods that have any generics in scope.

This missing method of adding inline may not be appropriate. And when it comes to the next missing method, we can't guarantee that inline will work.

I've added support for wasm, although more work is needed to address the license and the compilation warning.

@cr1901
Copy link
Contributor

cr1901 commented Apr 2, 2023

which means that even with the exception for raw object files, compiler-builtins is not allowed to depend on core.

Just to be clear, by "exception", you mean exception to:

Even without LTO this is an issue due to compiler-builtins being added at the end of the linker invocations and ld.bfd (and other classical unix linkers) don't allow object files to depend on symbols exported by anything passed earlier on the commandline.

,correct?

(Should the bold be "archive files"? It's the only way that makes sense to me, since there is an example of an archive depending on a symbol from something passed earlier on the command line here.)

The try_into methods in question are so small that inlining them will never bloat the binary. In fact it may reduce the binary size due to avoiding a call instruction and spilling registers.

My own experience w/ msp430 code is that this is often correct. You have to call small function a number of times before the overhead of calling to a single function elsewhere is smaller than the size cost of inlining at every call site. And for the smallest binaries, you're unlikely to reach this threshold b/c the number of calls to any given inlined function will inherently be small as well (there's only so much .text :P).

@bjorn3
Copy link
Member

bjorn3 commented Apr 2, 2023

It only occurs under non-LTO. This patch only affects LTO. In the case of LTO, the core merges into the LTO object file.

In case of non-LTO compiler-builtins is not allowed to depend on core with or without this PR. If compiler-builtins doesn't depend on core, this PR is not necessary.

I read the When to #[inline].

On methods that have any generics in scope.

This is under normal circumstances when -Zshare-generics isn't used and local codegen is not necessary for correctness.

This missing method of adding inline may not be appropriate. And when it comes to the next missing method, we can't guarantee that inline will work.

It will work.

I've added support for wasm, although more work is needed to address the license and the compilation warning.

Just enabling the wasm feature of object is not enough. That only enables support for reading the symbol table from linked wasm modules, not wasm object files. The object crate currently doesn't know how to parse the linking custom section: gimli-rs/object#471

@bjorn3
Copy link
Member

bjorn3 commented Apr 2, 2023

My own experience w/ msp430 code is that this is often correct. You have to call small function a number of times before the overhead of calling to a single function elsewhere is smaller than the size cost of inlining at every call site. And for the smallest binaries, you're unlikely to reach this threshold b/c the number of calls to any given inlined function will inherently be small as well (there's only so much .text :P).

Truncating integers as happens here should not generate any code if inlined, right? unwrap_unchecked is used on Result returned by try_into, so it doesn't generate a overflow check either.

Just to be clear, by "exception", you mean exception to:

Even without LTO this is an issue due to compiler-builtins being added at the end of the linker invocations and ld.bfd (and other classical unix linkers) don't allow object files to depend on symbols exported by anything passed earlier on the commandline.

,correct?

(Should the bold be "archive files"? It's the only way that makes sense to me, since there is an example of an archive depending on a symbol from something passed earlier on the command line here.)

Yes, and yes.

@DianQK
Copy link
Member Author

DianQK commented Apr 3, 2023

If compiler-builtins doesn't depend on core, this PR is not necessary.

If so, this PR may not be necessary. But the truth is that compiler-builtins depend on core. I know we can make compiler-builtins not depend on core symbolically by inlining or by turning off share-generics. However, there is some symbols dependency on core, which still ensures consistency of function implementation. So I'm still not sure why we want compiler-builtins to depend on core, and why we want compiler-builtins to not depend on core.


Even without LTO this is an issue due to compiler-builtins being added at the end of the linker invocations and ld.bfd (and other classical unix linkers) don't allow archive files to depend on symbols exported by anything passed earlier on the commandline.

I read https://man.openbsd.org/man1/ld.bfd.1 and https://lld.llvm.org/ELF/warn_backrefs.html. To be precise, this rule does not allow an undefined symbol to be found from an earlier archive file.

For example, we assume that bar.o depends on the _foo symbol of foo.o.

  • ld foo.a bar.a It's not allowed.
  • ld foo.o bar.a It's ok, because _foo has been loaded into the symbol table.

At this point, our case under LTO becomes ld [app+core].o compiler_builtins.a.

We can find the issue where the problem occurred much earlier, see #72140. This can be reproduced in the stable version. I have also added a smaller version, see https://github.com/DianQK/rust-109821-reproducible/tree/main/stable.


Just enabling the wasm feature of object is not enough. That only enables support for reading the symbol table from linked wasm modules, not wasm object files. The object crate currently doesn't know how to parse the linking custom section: gimli-rs/object#471.

Thank you for the reminder. I realize that the wasm-related fixes need more work.
I left a FIXME in the commit. And I'm glad to keep pushing for changes.
At least we can get the Tier 1 target platform to work properly.

@DianQK DianQK force-pushed the fixes-failed-to-resolve-undefined-symbols-for-the-compiler_builtins branch from 52d6abc to ec1e86d Compare April 3, 2023 12:41
@DianQK
Copy link
Member Author

DianQK commented Apr 3, 2023

What do you think of this patch, any ideas? @alexcrichton

@cr1901
Copy link
Contributor

cr1901 commented Apr 7, 2023

Another way to fix #108853 would be to make the respective try_into functions #[inline].

@bjorn3 I'm willing to file a PR to do this if no one else does (sometime next week). But will adding inline annotations fix #72140? This PR seems to fix both.

@bjorn3
Copy link
Member

bjorn3 commented Apr 7, 2023

#109983 landed like 2 hours ago. Can you verify if the next nightly works? The next nightly should be released in a couple of hours.

@DianQK
Copy link
Member Author

DianQK commented Apr 8, 2023

I tested the current master(23ee2af) branch. #108853 Fixed. #72140 and stable are not fixed.

@yujincheng08
Copy link
Contributor

yujincheng08 commented Apr 8, 2023

I tried the nightly build (c49c4fb) and now try_from and try_into do not show undefined reference. But compiler_builtins::mem::mem* are now showing undefined reference and -Zshare-generics=off no longer helps.

@wesleywiser
Copy link
Member

r? pnkfelix

@rustbot rustbot assigned pnkfelix and unassigned oli-obk Jun 22, 2023
@DianQK DianQK force-pushed the fixes-failed-to-resolve-undefined-symbols-for-the-compiler_builtins branch from ec1e86d to e7a98a5 Compare June 22, 2023 15:33
@pnkfelix
Copy link
Member

At this point, based on @bjorn3 's argument above (see comments here, here, here, and here), I'm inclined to close this PR.

However, I do not yet understand what is going on for #72140. Things marked with #![no_builtins] are not supposed to participate in LTO; but, clearly something is happening where combining #![no_builtins] with LTO is going wrong. I would like to understand what is going wrong for that case before we close this PR. (I'm trying to look into that now.)

@DianQK
Copy link
Member Author

DianQK commented Jul 14, 2023

#![no_builtins] did not participate in LTO, but some symbols it may depend on participated in LTO without being exported.

You can refer to https://github.com/DianQK/rust-109821-reproducible/tree/main/stable to see a simpler example.

@pnkfelix
Copy link
Member

some symbols it may depend on participated in LTO without being exported

Right, okay.

So I guess the implicit question for @bjorn3 here is: What is the right way to address that problem without a pass of the form being proposed in this PR.

Or in other words: What is supposed to be happening with those symbols? Should they be unconditionally exported (rather than having their export status be conditionalized on a pass like the one added by this PR)?

@DianQK
Copy link
Member Author

DianQK commented Jul 14, 2023

From an LTO perspective, it is normal to export symbols for object files that do not participate in LTO.
You can read the instructions at https://llvm.org/docs/LinkTimeOptimization.html.

Here's some highlighting of the key content:

  • Input source file a.c is compiled into LLVM bitcode form.
  • Input source file main.c is compiled into native object code.
--- a.h ---
extern int foo1(void);

--- a.c ---
#include "a.h"

int foo1(void) {
   ...
}

--- main.c ---
#include "a.h"

int main() {
  return foo1();
}

Phase 3 : Optimize Bitcode Files
After symbol resolution, the linker tells the LTO shared object which symbols are needed by native object files. In the example above, the linker reports that only foo1() is used by native object files using lto_codegen_add_must_preserve_symbol(). Next the linker invokes the LLVM optimizer and code generators using lto_codegen_compile() which returns a native object file creating by merging the LLVM bitcode files and applying various optimization passes.

The example of the instructions is just the right way to illustrate this issue.

If we say #![no_builtins] can't have external dependencies, that's fine.

But as it stands, I can create a #![no_builtins] dependency on an external project in the stable version.

@pnkfelix
Copy link
Member

If we say #![no_builtins] can't have external dependencies, that's fine.

that sounds like a very strong statement (potentially rendering the #![no_builtins] feature itself unusable in practice?)

but I think I should move my follow-up questions about #![no_builtins] over to #72140, maybe.

@DianQK
Copy link
Member Author

DianQK commented Jul 15, 2023

Based on the #35637 implementation, it appears that the semantics are more similar to lto = "never" + -fno-builtins.

If we consider the semantics to be similar to lto = "never", we can have the freedom to control which crates participate in LTO and which ones don't. We don't care about the dependencies of these crates. It's just that some crates produce object files, while others produce LLVM IR. If we consider a C source code file at the same level as a crate, this behavior is similar to that of clang/gcc.

If we say #![no_builtins] can't have external dependencies, it would be best to "copy" the code from core into compiler-builtins.

but I think I should move my follow-up questions about #![no_builtins] over to #72140, maybe.

Maybe you can consider this pull request as adding partial LTO support.

@DianQK
Copy link
Member Author

DianQK commented Jul 15, 2023

I read the documentation at https://doc.rust-lang.org/reference/attributes/codegen.html#the-no_builtins-attribute again.
I also found from the source code at

llvm::LLVMRustAddLibraryInfo(cpm, llmod, no_builtins);
that the semantics of #![no_builtins] are the same as -fno-builtin.

In terms of semantics, I think it does not explicitly statement that it prohibits external dependencies or excludes participation in LTO.

In that case, the fact that compiler-builtins cannot exhibit dependencies on core externally is a separate topic.


If I revisit the problem discussed in #35540.

This attribute is needed to prevent LLVM from optimizing a memcpy implementation into a recursive call. However this attribute is not preserved in LTO builds: the attribute only works if it is set at the top-level crate.

This attribute can be preserved in LTO, see https://clang.godbolt.org/z/Wod6KK6eq. It can work at a function level and be preserved.
However, the attribute is not preserved in Rust, likely because disableAllFunctions only takes effect during the current pass execution. We should consider adding this attribute.

The clang ensures that LTO is not lost by adding this attribute, see https://clang.godbolt.org/z/z4j6Wsod5.

I am currently attempting to add the no-builtins attribute to Rust, which should allow #![no_builtins] to participate in LTO again.


Finally, I think this can be transformed into two patches:

  • Add the no-builtins attribute to Rust.
  • Support lto = "never".

@pnkfelix
Copy link
Member

Based on my conversations with @DianQK (e.g. #72140 (comment)), I'm closing this PR in favor of the plan outlined in #113716 (comment)

@pnkfelix pnkfelix closed this Jul 18, 2023
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jul 18, 2023
…, r=pnkfelix

Add the `no-builtins` attribute to functions when `no_builtins` is applied at the crate level.

**When `no_builtins` is applied at the crate level, we should add the `no-builtins` attribute to each function to ensure it takes effect in LTO.**

This is also the reason why no_builtins does not take effect in LTO as mentioned in rust-lang#35540.

Now, `#![no_builtins]` should be similar to `-fno-builtin` in clang/gcc, see https://clang.godbolt.org/z/z4j6Wsod5.

Next, we should make `#![no_builtins]` participate in LTO again. That makes sense, as LTO also takes into consideration function-level instruction optimizations, such as the MachineOutliner. More importantly, when a user writes a large `#![no_builtins]` crate, they would like this crate to participate in LTO as well.

We should also add a function-level no_builtins attribute to allow users to have more control over it. This is similar to Clang's `__attribute__((no_builtin))` feature, see https://clang.godbolt.org/z/Wod6KK6eq. Before implementing this feature, maybe we should discuss whether to support more fine-grained control, such as `__attribute__((no_builtin("memcpy")))`.

Related discussions:
- rust-lang#109821
- rust-lang#35540

Next (a separate pull request?):
- [ ] Revert rust-lang#35637
- [ ] Add a function-level `no_builtin` attribute?
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jul 18, 2023
…, r=pnkfelix

Add the `no-builtins` attribute to functions when `no_builtins` is applied at the crate level.

**When `no_builtins` is applied at the crate level, we should add the `no-builtins` attribute to each function to ensure it takes effect in LTO.**

This is also the reason why no_builtins does not take effect in LTO as mentioned in rust-lang#35540.

Now, `#![no_builtins]` should be similar to `-fno-builtin` in clang/gcc, see https://clang.godbolt.org/z/z4j6Wsod5.

Next, we should make `#![no_builtins]` participate in LTO again. That makes sense, as LTO also takes into consideration function-level instruction optimizations, such as the MachineOutliner. More importantly, when a user writes a large `#![no_builtins]` crate, they would like this crate to participate in LTO as well.

We should also add a function-level no_builtins attribute to allow users to have more control over it. This is similar to Clang's `__attribute__((no_builtin))` feature, see https://clang.godbolt.org/z/Wod6KK6eq. Before implementing this feature, maybe we should discuss whether to support more fine-grained control, such as `__attribute__((no_builtin("memcpy")))`.

Related discussions:
- rust-lang#109821
- rust-lang#35540

Next (a separate pull request?):
- [ ] Revert rust-lang#35637
- [ ] Add a function-level `no_builtin` attribute?
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jul 18, 2023
…, r=pnkfelix

Add the `no-builtins` attribute to functions when `no_builtins` is applied at the crate level.

**When `no_builtins` is applied at the crate level, we should add the `no-builtins` attribute to each function to ensure it takes effect in LTO.**

This is also the reason why no_builtins does not take effect in LTO as mentioned in rust-lang#35540.

Now, `#![no_builtins]` should be similar to `-fno-builtin` in clang/gcc, see https://clang.godbolt.org/z/z4j6Wsod5.

Next, we should make `#![no_builtins]` participate in LTO again. That makes sense, as LTO also takes into consideration function-level instruction optimizations, such as the MachineOutliner. More importantly, when a user writes a large `#![no_builtins]` crate, they would like this crate to participate in LTO as well.

We should also add a function-level no_builtins attribute to allow users to have more control over it. This is similar to Clang's `__attribute__((no_builtin))` feature, see https://clang.godbolt.org/z/Wod6KK6eq. Before implementing this feature, maybe we should discuss whether to support more fine-grained control, such as `__attribute__((no_builtin("memcpy")))`.

Related discussions:
- rust-lang#109821
- rust-lang#35540

Next (a separate pull request?):
- [ ] Revert rust-lang#35637
- [ ] Add a function-level `no_builtin` attribute?
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Jul 19, 2023
…, r=pnkfelix

Add the `no-builtins` attribute to functions when `no_builtins` is applied at the crate level.

**When `no_builtins` is applied at the crate level, we should add the `no-builtins` attribute to each function to ensure it takes effect in LTO.**

This is also the reason why no_builtins does not take effect in LTO as mentioned in rust-lang#35540.

Now, `#![no_builtins]` should be similar to `-fno-builtin` in clang/gcc, see https://clang.godbolt.org/z/z4j6Wsod5.

Next, we should make `#![no_builtins]` participate in LTO again. That makes sense, as LTO also takes into consideration function-level instruction optimizations, such as the MachineOutliner. More importantly, when a user writes a large `#![no_builtins]` crate, they would like this crate to participate in LTO as well.

We should also add a function-level no_builtins attribute to allow users to have more control over it. This is similar to Clang's `__attribute__((no_builtin))` feature, see https://clang.godbolt.org/z/Wod6KK6eq. Before implementing this feature, maybe we should discuss whether to support more fine-grained control, such as `__attribute__((no_builtin("memcpy")))`.

Related discussions:
- rust-lang#109821
- rust-lang#35540

Next (a separate pull request?):
- [ ] Revert rust-lang#35637
- [ ] Add a function-level `no_builtin` attribute?
@DianQK DianQK deleted the fixes-failed-to-resolve-undefined-symbols-for-the-compiler_builtins branch August 7, 2023 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
9 participants