-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make #[used]
work when linking with ld64
#133832
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,82 @@ pub(super) fn macho_platform(target: &Target) -> u32 { | |
} | ||
} | ||
|
||
/// Add relocation and section data needed for a symbol to be considered | ||
/// undefined by ld64. | ||
/// | ||
/// Inherently very architecture-specific (unfortunately). | ||
/// | ||
/// # New architectures | ||
/// | ||
/// The values here can be found by compiling the following program: | ||
/// | ||
/// ```c | ||
/// void foo(void); | ||
/// | ||
/// void use_foo() { | ||
/// foo(); | ||
/// } | ||
/// ``` | ||
/// | ||
/// With: | ||
/// | ||
/// ```console | ||
/// $ clang -c foo.c -O2 -g0 -fno-exceptions -target $CLANG_TARGET | ||
/// ``` | ||
/// | ||
/// And then inspecting with `objdump -d foo.o` and/or: | ||
/// | ||
/// ```no_run | ||
/// use object::read::macho::{MachHeader, MachOFile}; | ||
/// use object::{File, Object, ObjectSection}; | ||
/// | ||
/// fn read(file: MachOFile<'_, impl MachHeader>) { | ||
/// for section in file.sections() { | ||
/// dbg!(section.name().unwrap(), section.data().unwrap(), section.align()); | ||
/// for reloc in section.relocations() { | ||
/// dbg!(reloc); | ||
/// } | ||
/// } | ||
/// } | ||
/// | ||
/// fn main() { | ||
/// match File::parse(&*std::fs::read("foo.o").unwrap()).unwrap() { | ||
/// File::MachO32(file) => read(file), | ||
/// File::MachO64(file) => read(file), | ||
/// _ => unimplemented!(), | ||
/// } | ||
/// } | ||
/// ``` | ||
pub(super) fn add_data_and_relocation( | ||
file: &mut object::write::Object<'_>, | ||
section: object::write::SectionId, | ||
symbol: object::write::SymbolId, | ||
target: &Target, | ||
) -> object::write::Result<()> { | ||
let (data, align, addend, r_type): (&[u8], _, _, _) = match &*target.arch { | ||
"arm" => (&[0xff, 0xf7, 0xfe, 0xbf], 2, 0, object::macho::ARM_THUMB_RELOC_BR22), | ||
"aarch64" => (&[0, 0, 0, 0x14], 4, 0, object::macho::ARM64_RELOC_BRANCH26), | ||
"x86_64" => ( | ||
&[0x55, 0x48, 0x89, 0xe5, 0x5d, 0xe9, 0, 0, 0, 0], | ||
16, | ||
-4, | ||
object::macho::X86_64_RELOC_BRANCH, | ||
), | ||
"x86" => (&[0x55, 0x89, 0xe5, 0x5d, 0xe9, 0xf7, 0xff, 0xff, 0xff], 16, -4, 0), | ||
arch => unimplemented!("unsupported Apple architecture {arch:?}"), | ||
}; | ||
|
||
let offset = file.section_mut(section).append_data(data, align); | ||
file.add_relocation(section, object::write::Relocation { | ||
offset, | ||
addend, | ||
symbol, | ||
flags: object::write::RelocationFlags::MachO { r_type, r_pcrel: true, r_length: 2 }, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that we could outsource part of this to |
||
})?; | ||
|
||
Ok(()) | ||
} | ||
|
||
/// Deployment target or SDK version. | ||
/// | ||
/// The size of the numbers in here are limited by Mach-O's `LC_BUILD_VERSION`. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2070,8 +2070,20 @@ fn add_linked_symbol_object( | |
file.set_mangling(object::write::Mangling::None); | ||
} | ||
|
||
// ld64 requires a relocation to load undefined symbols, see below. | ||
// Not strictly needed if linking with lld, but might as well do it there too. | ||
let ld64_section_helper = if file.format() == object::BinaryFormat::MachO { | ||
Some(file.add_section( | ||
file.segment_name(object::write::StandardSegment::Text).to_vec(), | ||
"__text".into(), | ||
object::SectionKind::Text, | ||
)) | ||
} else { | ||
None | ||
}; | ||
|
||
for (sym, kind) in symbols.iter() { | ||
file.add_symbol(object::write::Symbol { | ||
let symbol = file.add_symbol(object::write::Symbol { | ||
name: sym.clone().into(), | ||
value: 0, | ||
size: 0, | ||
|
@@ -2085,6 +2097,21 @@ fn add_linked_symbol_object( | |
section: object::write::SymbolSection::Undefined, | ||
flags: object::SymbolFlags::None, | ||
}); | ||
|
||
// The linker shipped with Apple's Xcode, ld64, works a bit differently from other linkers. | ||
// In particular, it completely ignores undefined symbols by themselves, and only consider | ||
// undefined symbols if they have relocations. | ||
// | ||
// So to make this trick work on ld64, we need to actually insert a relocation. The | ||
// relocation must be valid though, and hence must point to a valid piece of machine code, | ||
// and hence this is all very architecture-specific. | ||
// | ||
// See the following for a few details on the design of ld64: | ||
// https://github.com/apple-oss-distributions/ld64/blob/ld64-951.9/doc/design/linker.html | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This link doesn't seem to mention the details of how ld64 resolves undefined symbols, especially regarding relocations. Should we add links to the relevant code locations? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it was the closest I could find, but actual links to the code would be a good idea. I'll try to find the relevant parts when I get the time. Note though that I think the new linker, |
||
if let Some(section) = ld64_section_helper { | ||
apple::add_data_and_relocation(&mut file, section, symbol, &sess.target) | ||
.expect("failed adding relocation"); | ||
} | ||
} | ||
|
||
let path = tmpdir.join("symbols.o"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite unhappy with this, it feels very brittle to emit a manual relocation like this, but I think it's the only way to do it (without pulling in a bunch of LLVM codegen internals).
I'm also unsure of the correctness, there might be more target options and/or flags that we should be checking here? Or maybe we should emit different kinds of relocations based on the kind of symbol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... emitting manual relocations like this feels super fragile and hard to maintain. At the very least this would need comments and ideally references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think additional comments are needed here, except to explain what this data represents, such as
[0, 0, 0, 0x14]
represents theb
instruction.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a bit concern about here, the symbol of
#[used]
is a global variable, the C code should bevoid foo(void);
. I'm not sure what can happen here. I prefer an address offset instruction. This is more safe to me.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to add a local symbol to help ld64 strip these dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestions here sounds correct to me, I've also later been thinking that an address offset instruction would be safer than a jump (since address offsets can be done safely on both function pointers and data pointers).
I'm not sure I understand what you mean by that? You want a local symbol that appears as the "entry" point for the section? Does that help the linker see that the code is unused?