Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] BOLT asserts when encoding out-of-range Pending Relocations #116817

Open
paschalis-mpeis opened this issue Nov 19, 2024 · 3 comments
Open
Assignees
Labels
BOLT crash Prefer [crash-on-valid] or [crash-on-invalid]

Comments

@paschalis-mpeis
Copy link
Member

What is the problem

When a function is ignored from BOLT optimization, some relocations are emitted to patch the original fixups in the original text. Those are generated during external reference scanning. This is an optimization for adjusting the initial calls to land back to the optimized code. The relevant relocations are of type R_AARCH64_CALL26.
In large binaries the encoding does not fit the supported range, leading to an assertion:

only PC +/- 128MB is allowed for direct call

Ignoring functions can be common. Some reasons include:

  • they are explicitly requested in skip-funcs (like in the below reproducer)
  • could be a cold function in split mode
  • or could be implicitly ignored for other reasons

Reproducer

How the reproducer triggers the crash

The error is triggered through instrumentation due to code size increases.

When instrumenting MongoDB, BOLT requires to ignore some functions in order to succeed. When doing so, some additional relocations are emitted. However, instrumentation ends up increasing code size by more than 3x, which goes beyond those relocations' range. This causes an assertion crash when trying to encode such relocations (R_AARCH64_CALL26).

Why BOLT requires to ignore some functions

This requirement was introduced by PR #89681. Before that stricter check, those functions were not required to be ignored and the bug was not triggered (ie, on LLVM 18 RC). PR #101466 improves support for entry-points other than the main one to the function, but still BOLT bails out in our use case.

Regardless, this issue will not focus on such improvements as the error can be triggered without it

Reproducer Instructions

  • Tested on: Ubuntu 22.04.5 LTS, (6.8.0-1018-aws)

Use use this Dockerfile and the below script to compile the input binary and pull it on an AArch64 Ubuntu host.
Alternatively, one can get it pre-compiled from here.

# Compile mongodb v7.0.5 with clang 18.1.8 (stage-0):
docker build --progress=plain --tag 'tmp-mdb-stage-0' .

# Then, 'pull' the binary from container to the host, given host is Ubuntu 22.04/24.04 (preferred)
docker run --rm --entrypoint cat tmp-mdb-stage-0 mongo/build/install/bin/mongod > mongod
chmod +x mongod

# once binary is retrieved the image could be deleted
docker rmi tmp-mdb-stage-0

Then, compile bolt version dcd62070cf45 with assertions ON and use it to instrument the binary with:

LLVM_DIR=path/to/llvm/bin/dir

$LLVM_DIR/llvm-bolt mongod -instrument -o mongod.instrumented \
	--instrumentation-file=prof.fdata \
	--instrumentation-sleep-time=60 \
	--instrumentation-no-counters-clear \
	--instrumentation-wait-forks \
	--skip-funcs=_ZL9InterpretP9JSContextRN2js8RunStateE/1,_ZN2v88internal12_GLOBAL__N_18RawMatchIhEENS0_19IrregexpInterpreter6ResultEPNS0_7IsolateENS0_9ByteArrayENS0_6StringENS0_6VectorIKT_EEPiiiijNS0_6RegExp10CallOriginEj/1,_ZN2v88internal12_GLOBAL__N_18RawMatchIDsEENS0_19IrregexpInterpreter6ResultEPNS0_7IsolateENS0_9ByteArrayENS0_6StringENS0_6VectorIKT_EEPiiiijNS0_6RegExp10CallOriginEj/1

Proposed Solutions

A draft PR around (S1) will follow.

(S1) Apply pending relocs only when in bounds

Keep the 'external reference scan' optimization on, and apply it on a per callsite basis.
That is, if a particular external reference is within range, then the encoding of that relocation happens, otherwise it is ignored.

This may require some checks that the PendingRelocations were added specifically for optimization, ie during external reference scanning and that are R_AARCH64_CALL26.

(S2) Hint users to use -no-scan

Instead of the current assertion, BOLT could escalate this to an exit with a recommendation to use -no-scan to overcome this crash. This is not very obvious unless someone is familiar with specific BOLT code regions. Also, by bailing out at all times, we ensure that BOLT does not patch the code in some wrong way?

Q: What if an 'external reference' points to a cold function? Is that pending relocation still flushed?
I assume that even if it does, it will have no effect.

@github-actions github-actions bot added the BOLT label Nov 19, 2024
@paschalis-mpeis paschalis-mpeis changed the title [AArch64] BOLT asserts when attempting to encode out-of-range Pending Relocations [AArch64] BOLT asserts when encoding out-of-range Pending Relocations Nov 19, 2024
@EugeneZelenko EugeneZelenko added the crash Prefer [crash-on-valid] or [crash-on-invalid] label Nov 19, 2024
@maksfb
Copy link
Contributor

maksfb commented Dec 4, 2024

S1 sounds good to me. We just have to make sure that patching of old code is enabled, e.g. via opts::ForcePatch.

Q: What if an 'external reference' points to a cold function? Is that pending relocation still flushed?
I assume that even if it does, it will have no effect.

If a function is not going to be emitted, then there will be no reference/relocation recorded. Cold functions are not emitted in "lite" mode which is default for x86, but not AArch64.

auto ignoreFunctionRef = [&](const BinaryFunction &Target) {
if (&Target == this)
return true;
// Note that later we may decide not to emit Target function. In that
// case, we conservatively create references that will be ignored or
// resolved to the same function.
if (!BC.shouldEmit(Target))
return true;
return false;
};

@paschalis-mpeis paschalis-mpeis self-assigned this Dec 4, 2024
@paschalis-mpeis
Copy link
Member Author

S1 sounds good to me. We just have to make sure that patching of old code is enabled, e.g. via opts::ForcePatch.

I may be missing some connection between scanExternalRefs+flushPendingRelocations and PendingPatches of PatchEntries.cpp.

Do you want to introduce another requirement for when scanExternalRefs/flushPendingRelocations should run? ie only when opts:ForcePatch was also set? (on top of requiring no-scan not to be set)

@maksfb
Copy link
Contributor

maksfb commented Feb 6, 2025

S1 sounds good to me. We just have to make sure that patching of old code is enabled, e.g. via opts::ForcePatch.

I may be missing some connection between scanExternalRefs+flushPendingRelocations and PendingPatches of PatchEntries.cpp.

Do you want to introduce another requirement for when scanExternalRefs/flushPendingRelocations should run? ie only when opts:ForcePatch was also set? (on top of requiring no-scan not to be set)

Let me add more context on why we need to patch the original function entries. Whenever we duplicate a function body for execution purposes, we also need to duplicate all associate metadata, e.g. .eh_frame info, exception ranges, DWARF debug info, etc. If the function uses jump tables, those need to be duplicated as well. Currently, BOLT cannot duplicate all of the accompanying metadata and instead we force the execution of the new/optimized version while prohibiting execution of the original code. This is achieved in PatchEntries pass (

Error PatchEntries::runOnFunctions(BinaryContext &BC) {
).

For the reasons described above, it makes sense to patch entries in all original functions in most of the cases. Since we care about the performance, we have an exception for "small" functions (

// Check if we can skip patching the function.
if (!opts::ForcePatch && !Function.hasEHRanges() &&
Function.getSize() < PatchThreshold)
continue;
). I plan to make the requirements for this exception case more strict.

There are also cases where the code cannot be patched due to size limitations. Whenever we fail to patch the original entries, we have to bail out on optimizing the function (

Function.setIgnored();
). Such cases are quite rare and we report them to the user.

Lastly, if we can guarantee that the old code never gets executed, we can skip patching altogether. This guarantee is achieved via successful run of scanExternalRefs() and the relocation processing that follows.

If we are going to bail out on relocation processing for scanExternalRefs(), we can no longer skip the patching process as the assumption that the old code is never executed becomes broken.

I'm going to experiment with enabling unconditional execution of PatchEntries on AArch64 to make "lite" mode functional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BOLT crash Prefer [crash-on-valid] or [crash-on-invalid]
Projects
None yet
Development

No branches or pull requests

3 participants