Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang] Retry add basic -mtune support #96688

Closed
wants to merge 1,574 commits into from

Conversation

AlexisPerry
Copy link
Contributor

This aims to reland the changes and fix the broken test from PR #95043

@@ -0,0 +1,9 @@
; RUN: mlir-translate -import-llvm -split-input-file %s | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test also be guarded with REQUIRES: x86-registered-target?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the catch! Yes, this guard is absolutely needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, ninja check-mlir worked for me just fine on Aarch64 🤔 I think that mlir-translate doesn't really care :) Put differently, REQUIRES is not needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

Did you build your LLVM with the x86 target enabled (besides Aarch64)? If not, mlir-translate may indeed be fine with the attribute independent of the target architecture. Note that mlir-translate runs llvm::verifyModule which I thought could be unhappy with unsupported target specific attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into failures if I didn't include the guard. I have separate builds for x86 and Aarch64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have X86 disabled - I just double checked. And yes, this test works for me just fine without the guard. Alexis, what error do you get when it fails for you?

To me, it would make sense if mlir-translate didn't care - it doesn't have to interpret these attributes, does it?

Btw @AlexisPerry , there's no need to clone "tune-cpu.f90" as tune-cpu-llvm-aarch.f90. Instead, you can use conditional RUN lines like here:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been trying to reproduce my error ever since saying I had one and have been unsuccessful. It now appears that mlir-translate doesn't care about the target.

@banach-space That is a much better way of doing things. Thank you for the suggestion and I will make the changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it would make sense if mlir-translate didn't care - it doesn't have to interpret these attributes, does it?

It doesn't do anything with them except for running the LLVM verifier. If things pass on Aarch64, then the guards are not necessary.

mlir/test/Target/LLVMIR/tune-cpu.mlir Outdated Show resolved Hide resolved
PeterChou1 and others added 23 commits June 27, 2024 14:50
…lang (llvm#96555)

Updates the install path for clang-doc to share/clang-doc instead
share/clang to avoid confusion
Removes stdexecpt from clang-doc test introduced in
llvm#93928
since it violates the rule that test must be freestanding
We use REAL() calls in interceptors, but
DEFINE_REAL_PTHREAD_FUNCTIONS has nothing to do
with them and only used for internal maintenance
threads.

This is done to avoid confusion like in llvm#96456.
This paper only matters for TS18661-3 integration.
According to https://reviews.llvm.org/D114250
this was to handle Mac specific issue, however
the test is Linux only.

The test effectively prevents to lock main allocator
on fork, but we do that on Linux for other
sanitizers for years, and need to do the same
for TSAN to avoid deadlocks.
r7 is reserved in thumb2 (typically for the frame pointer, as opposed to r11 in
ARM mode), so assigning to a variable with explicit register storage in r7 will
produce an error.

But r7 is where the Linux kernel expects the syscall number to be placed. We
can use a temporary to get the register allocator to pick a temporary, which we
save+restore the previous value of r7 in.

Fixes: llvm#93738
These functions used only for `fork`.

Unused parameter `child` will be used in followup patches.
Cap the alignment to 128 bytes as that is the maximum alignment
supported by PTX. The restriction is mentioned in the parameter passing
section (Note D) of the [PTX Writer's Guide to Interoperability]
(https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#parameter-passing)

> D. The alignment must be 1, 2, 4, 8, 16, 32, 64, or 128 bytes.
Reland: llvm#95456

This patch improves the ROCDL gpu serialization API by:
- Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN
device code libraries to use during linking.
- Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`.
Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode
librariesm now it will only load the requested libraries.
- Exposing the `compileToBinary` method and making it virtual, allowing
downstream users to re-use this method.
- Exposing `moduleToObjectImpl`, this method provides a prototype flow
for compiling to binary, allowing downstream users to re-use this
method.
- It also avoids constructing the control variables if no device
libraries are being used.
- Changes the style of the error messages to be composable, ie no full
stops.
- Adds an error message for when the ROCm toolkit can't be found but it
was required.
We do that for other Sanitizers, and we
should do the same for TSAN.
There are know deadlocks reports here.
This paper was a rewording of WG14 N1485, correcting terminology and
bringing the C11 feature slightly closer in line with the C++11
feature. There is nothing additional to be done or test to conform to
what was specified by WG14 N1537, so we'll remove the entry and lean on
N1485 to track status for atomics.
Found while skimming this code. Don't have a reproducible test case for
this but the nullptr check should clearly occur before we try to
dereference `location_sp`.
Summary:
The StringError class has a specialized method that creates the
inconvertible error code for you. It's much easier to read this way.
This reverts commit 05d167f.

Reverting the patch fixes the following under EXPENSIVE_CHECKS:

  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir
  LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir
  LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll
  LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll
  LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir
  LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll
…e target link (llvm#91829)

This PR attempts to fix common block mapping for regular mapping of
these types as well as when they have been marked as "declare target
link". This PR should allow correct mapping of both the members of a
common block and the full common block via its block symbol.

The main changes were some adjustments to the Fortran OpenMP lowering to
HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and
adjustments to the way the we handle target kernel map argument
rebinding inside of the OMPIRBuilder.

For the Fortran OpenMP lowering were two changes, one to prevent the
implicit capture of common block members when the common block symbol
itself has been marked and the other creates intermediate member access
inside of the target region to be used in-place of those external to the
target region, this prevents external usages breaking the
IsolatedFromAbove pact.

In the latter case, there was an adjustment to the size calculation for
types to better handle cases where we pass an array as the type of a map
(as opposed to the bounds and the type of the element), which occurs in
the case of common blocks. There is also some adjustment to how
handleDeclareTargetMapVar handles renaming of declare target symbols in
the module to the reference pointer, now it will only apply to those
within the kernel that is currently being generated and we also perform
a modification to replace constants with instructions as necessary as we
cannot replace these with our reference pointer (non-constant and
constants do not mix nicely).

In the case of the OpenMPIRBuilder some changes were made to defer
global symbol rebinding to kernel arguments until all other arguments
have been rebound. This makes sure we do not replace uses that may refer
to the global (e.g. a GEP) but are themselves actually a separate
argument that needs bound.

Currently "declare target to" still needs some work, but this may be the
case for all types in conjunction with "declare target to" at the
moment.
…lvm#96537)

There are multiple ways to define a remainder operation. Depending on
the definition, the result could be either always positive or have the
sign of the dividend.
The pattern lowering `arith.remf` to LLVM assumes that the semantics
match `llvm.frem`, which seems to be reasonable. The folder, however, is
implemented via `APFloat::remainder()` which has different semantics.

This patch matches the folding behaviour to lowering behavior by using
`APFloat::mod()`, which matches the behavior of `llvm.frem` and libm's
`fmod()`. It also updates the documentation of `arith.remf` to explain
this behavior: The sign of the result of the remainder operation always
matches the sign of the dividend (LHS operand).

frem documentation: https://llvm.org/docs/LangRef.html#frem-instruction

Fix llvm#94431

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
This PR XFAILS `llvm/test/DebugInfo/attr-btf_type_tag.ll` on AIX since
we we don’t have `.debug_addr` section.

Co-authored-by: Nivetha Kuruparan <nivetha@comp810.rtp.raleigh.ibm.com>
Describe steps for optimizing the Linux kernel with BOLT.
anhtuyenibm and others added 7 commits June 27, 2024 15:53
According to Zheng @chenzheng1030, there is no .debug_addr section on
AIX.
Due to its absence on AIX, the test case may produce inconsistent
results, either passing or failing. This PR ensures that the test case
is marked as not applicable for AIX.
In some scenarios based on the split-dwarf build process, the dwo file
is not generated as expected(That is to say, no dwo file path is stored
in the binary). When the llvm-dwp tool is called to generate the .dwp
file, it will exit without any warning.
 
So, the plan is to prompt a warning to tell the user that the dwo file
was not actually generated.
<img width="699" alt="image"
src="https://github.com/llvm/llvm-project/assets/150100070/5e5742f6-daad-450f-87e9-cb25449c3c7a">
…ntrinsics (llvm#89655)

This should replicate the old intrinsic behavior better when codegen of
the raw instruction will require metadata in the future.
@AlexisPerry AlexisPerry closed this Jul 9, 2024
@Endilll Endilll removed their request for review July 9, 2024 22:35
@ldionne ldionne removed request for a team July 11, 2024 17:19
banach-space pushed a commit that referenced this pull request Jul 16, 2024
Initial implementation for the -mtune flag in Flang.

This PR is a clean version of PR #96688, which is a re-land of PR #95043
@AlexisPerry AlexisPerry deleted the AlexisPerry/mtune-2 branch July 16, 2024 16:46
sayhaan pushed a commit to sayhaan/llvm-project that referenced this pull request Jul 16, 2024
Summary:
Initial implementation for the -mtune flag in Flang.

This PR is a clean version of PR llvm#96688, which is a re-land of PR llvm#95043

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D59822421
yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
Initial implementation for the -mtune flag in Flang.

This PR is a clean version of PR #96688, which is a re-land of PR #95043
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.