Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make #[used] work when linking with ld64 #133832

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions compiler/rustc_codegen_ssa/src/back/apple.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,82 @@ pub(super) fn macho_platform(target: &Target) -> u32 {
}
}

/// Add relocation and section data needed for a symbol to be considered
/// undefined by ld64.
///
/// Inherently very architecture-specific (unfortunately).
///
/// # New architectures
///
/// The values here can be found by compiling the following program:
///
/// ```c
/// void foo(void);
///
/// void use_foo() {
/// foo();
/// }
/// ```
///
/// With:
///
/// ```console
/// $ clang -c foo.c -O2 -g0 -fno-exceptions -target $CLANG_TARGET
/// ```
///
/// And then inspecting with `objdump -d foo.o` and/or:
///
/// ```no_run
/// use object::read::macho::{MachHeader, MachOFile};
/// use object::{File, Object, ObjectSection};
///
/// fn read(file: MachOFile<'_, impl MachHeader>) {
/// for section in file.sections() {
/// dbg!(section.name().unwrap(), section.data().unwrap(), section.align());
/// for reloc in section.relocations() {
/// dbg!(reloc);
/// }
/// }
/// }
///
/// fn main() {
/// match File::parse(&*std::fs::read("foo.o").unwrap()).unwrap() {
/// File::MachO32(file) => read(file),
/// File::MachO64(file) => read(file),
/// _ => unimplemented!(),
/// }
/// }
/// ```
pub(super) fn add_data_and_relocation(
file: &mut object::write::Object<'_>,
section: object::write::SectionId,
symbol: object::write::SymbolId,
target: &Target,
) -> object::write::Result<()> {
let (data, align, addend, r_type): (&[u8], _, _, _) = match &*target.arch {
"arm" => (&[0xff, 0xf7, 0xfe, 0xbf], 2, 0, object::macho::ARM_THUMB_RELOC_BR22),
"aarch64" => (&[0, 0, 0, 0x14], 4, 0, object::macho::ARM64_RELOC_BRANCH26),
"x86_64" => (
&[0x55, 0x48, 0x89, 0xe5, 0x5d, 0xe9, 0, 0, 0, 0],
16,
-4,
object::macho::X86_64_RELOC_BRANCH,
),
"x86" => (&[0x55, 0x89, 0xe5, 0x5d, 0xe9, 0xf7, 0xff, 0xff, 0xff], 16, -4, 0),
arch => unimplemented!("unsupported Apple architecture {arch:?}"),
};
Comment on lines +81 to +92
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite unhappy with this, it feels very brittle to emit a manual relocation like this, but I think it's the only way to do it (without pulling in a bunch of LLVM codegen internals).

I'm also unsure of the correctness, there might be more target options and/or flags that we should be checking here? Or maybe we should emit different kinds of relocations based on the kind of symbol?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... emitting manual relocations like this feels super fragile and hard to maintain. At the very least this would need comments and ideally references.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think additional comments are needed here, except to explain what this data represents, such as [0, 0, 0, 0x14] represents the b instruction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bit concern about here, the symbol of #[used] is a global variable, the C code should be void foo(void);. I'm not sure what can happen here. I prefer an address offset instruction. This is more safe to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to add a local symbol to help ld64 strip these dead code.

$ objdump -d target/release/divan-example | head -n 10

target/release/divan-example:   file format mach-o arm64

Disassembly of section __TEXT,__text:

000000010000273c <__text>:
10000273c: 1400000a     b       0x100002764 <_main>
100002740: 14016d1e     b       0x10005dbb8 <__ZN3std9panicking11EMPTY_PANIC17hd7edc6c0e88a95c9E>
100002744: 1400eca4     b       0x10003d9d4 <_rust_eh_personality>
100002748: 14016632     b       0x10005c010 <__ZN13divan_example4PUSH17h56090c2dad718183E>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestions here sounds correct to me, I've also later been thinking that an address offset instruction would be safer than a jump (since address offsets can be done safely on both function pointers and data pointers).

You may want to add a local symbol

I'm not sure I understand what you mean by that? You want a local symbol that appears as the "entry" point for the section? Does that help the linker see that the code is unused?


let offset = file.section_mut(section).append_data(data, align);
file.add_relocation(section, object::write::Relocation {
offset,
addend,
symbol,
flags: object::write::RelocationFlags::MachO { r_type, r_pcrel: true, r_length: 2 },
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we could outsource part of this to object by using object::write::RelocationFlags::Generic, but we'd still have to write the machine code ourselves.

})?;

Ok(())
}

/// Deployment target or SDK version.
///
/// The size of the numbers in here are limited by Mach-O's `LC_BUILD_VERSION`.
Expand Down
29 changes: 28 additions & 1 deletion compiler/rustc_codegen_ssa/src/back/link.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2070,8 +2070,20 @@ fn add_linked_symbol_object(
file.set_mangling(object::write::Mangling::None);
}

// ld64 requires a relocation to load undefined symbols, see below.
// Not strictly needed if linking with lld, but might as well do it there too.
let ld64_section_helper = if file.format() == object::BinaryFormat::MachO {
Some(file.add_section(
file.segment_name(object::write::StandardSegment::Text).to_vec(),
"__text".into(),
object::SectionKind::Text,
))
} else {
None
};

for (sym, kind) in symbols.iter() {
file.add_symbol(object::write::Symbol {
let symbol = file.add_symbol(object::write::Symbol {
name: sym.clone().into(),
value: 0,
size: 0,
Expand All @@ -2085,6 +2097,21 @@ fn add_linked_symbol_object(
section: object::write::SymbolSection::Undefined,
flags: object::SymbolFlags::None,
});

// The linker shipped with Apple's Xcode, ld64, works a bit differently from other linkers.
// In particular, it completely ignores undefined symbols by themselves, and only consider
// undefined symbols if they have relocations.
//
// So to make this trick work on ld64, we need to actually insert a relocation. The
// relocation must be valid though, and hence must point to a valid piece of machine code,
// and hence this is all very architecture-specific.
//
// See the following for a few details on the design of ld64:
// https://github.com/apple-oss-distributions/ld64/blob/ld64-951.9/doc/design/linker.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link doesn't seem to mention the details of how ld64 resolves undefined symbols, especially regarding relocations. Should we add links to the relevant code locations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was the closest I could find, but actual links to the code would be a good idea. I'll try to find the relevant parts when I get the time.

Note though that I think the new linker, -ld_prime, is not open source? So this will only be a weak reference in any case.

if let Some(section) = ld64_section_helper {
apple::add_data_and_relocation(&mut file, section, symbol, &sess.target)
.expect("failed adding relocation");
}
}

let path = tmpdir.join("symbols.o");
Expand Down
3 changes: 2 additions & 1 deletion tests/run-make/include-all-symbols-linking/lib.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
mod foo {
#[link_section = ".rodata.STATIC"]
#[cfg_attr(target_os = "linux", link_section = ".rodata.STATIC")]
#[cfg_attr(target_vendor = "apple", link_section = "__DATA,STATIC")]
#[used]
static STATIC: [u32; 10] = [1; 10];
}
Expand Down
15 changes: 10 additions & 5 deletions tests/run-make/include-all-symbols-linking/rmake.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,20 @@
// See https://github.com/rust-lang/rust/pull/95604
// See https://github.com/rust-lang/rust/issues/47384

//@ only-linux
// Reason: differences in object file formats on OSX and Windows
// causes errors in the llvm_objdump step
//@ ignore-windows differences in object file formats causes errors in the llvm_objdump step.
//@ ignore-wasm differences in object file formats causes errors in the llvm_objdump step.

use run_make_support::{dynamic_lib_name, llvm_objdump, llvm_readobj, rustc};
use run_make_support::{dynamic_lib_name, llvm_objdump, llvm_readobj, rustc, target};

fn main() {
rustc().crate_type("lib").input("lib.rs").run();
rustc().crate_type("cdylib").link_args("-Tlinker.ld").input("main.rs").run();
let mut main = rustc();
main.crate_type("cdylib");
if target().contains("linux") {
main.link_args("-Tlinker.ld");
}
main.input("main.rs").run();

// Ensure `#[used]` and `KEEP`-ed section is there
llvm_objdump()
.arg("--full-contents")
Expand Down
Loading