[flang] Retry add basic -mtune support #96688

AlexisPerry · 2024-06-25T19:36:11Z

This aims to reland the changes and fix the broken test from PR #95043

gysit · 2024-06-26T05:42:18Z

mlir/test/Target/LLVMIR/Import/tune-cpu.ll

@@ -0,0 +1,9 @@
+; RUN: mlir-translate -import-llvm -split-input-file %s | FileCheck %s


Should this test also be guarded with REQUIRES: x86-registered-target?

Thank you so much for the catch! Yes, this guard is absolutely needed.

Hm, ninja check-mlir worked for me just fine on Aarch64 🤔 I think that mlir-translate doesn't really care :) Put differently, REQUIRES is not needed here.

Interesting!

Did you build your LLVM with the x86 target enabled (besides Aarch64)? If not, mlir-translate may indeed be fine with the attribute independent of the target architecture. Note that mlir-translate runs llvm::verifyModule which I thought could be unhappy with unsupported target specific attributes.

I ran into failures if I didn't include the guard. I have separate builds for x86 and Aarch64.

I have X86 disabled - I just double checked. And yes, this test works for me just fine without the guard. Alexis, what error do you get when it fails for you?

To me, it would make sense if mlir-translate didn't care - it doesn't have to interpret these attributes, does it?

Btw @AlexisPerry , there's no need to clone "tune-cpu.f90" as tune-cpu-llvm-aarch.f90. Instead, you can use conditional RUN lines like here:

https://github.com/llvm/llvm-project/blob/main/flang/test/Driver/pic-flags.f90

I have been trying to reproduce my error ever since saying I had one and have been unsuccessful. It now appears that mlir-translate doesn't care about the target.

@banach-space That is a much better way of doing things. Thank you for the suggestion and I will make the changes.

To me, it would make sense if mlir-translate didn't care - it doesn't have to interpret these attributes, does it?

It doesn't do anything with them except for running the LLVM verifier. If things pass on Aarch64, then the guards are not necessary.

mlir/test/Target/LLVMIR/tune-cpu.mlir

…lang (llvm#96555) Updates the install path for clang-doc to share/clang-doc instead share/clang to avoid confusion

Removes stdexecpt from clang-doc test introduced in llvm#93928 since it violates the rule that test must be freestanding

We use REAL() calls in interceptors, but DEFINE_REAL_PTHREAD_FUNCTIONS has nothing to do with them and only used for internal maintenance threads. This is done to avoid confusion like in llvm#96456.

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>

…lvm#91327)

This paper only matters for TS18661-3 integration.

According to https://reviews.llvm.org/D114250 this was to handle Mac specific issue, however the test is Linux only. The test effectively prevents to lock main allocator on fork, but we do that on Linux for other sanitizers for years, and need to do the same for TSAN to avoid deadlocks.

r7 is reserved in thumb2 (typically for the frame pointer, as opposed to r11 in ARM mode), so assigning to a variable with explicit register storage in r7 will produce an error. But r7 is where the Linux kernel expects the syscall number to be placed. We can use a temporary to get the register allocator to pick a temporary, which we save+restore the previous value of r7 in. Fixes: llvm#93738

These functions used only for `fork`. Unused parameter `child` will be used in followup patches.

Cap the alignment to 128 bytes as that is the maximum alignment supported by PTX. The restriction is mentioned in the parameter passing section (Note D) of the [PTX Writer's Guide to Interoperability] (https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#parameter-passing) > D. The alignment must be 1, 2, 4, 8, 16, 32, 64, or 128 bytes.

Reland: llvm#95456 This patch improves the ROCDL gpu serialization API by: - Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN device code libraries to use during linking. - Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`. Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode librariesm now it will only load the requested libraries. - Exposing the `compileToBinary` method and making it virtual, allowing downstream users to re-use this method. - Exposing `moduleToObjectImpl`, this method provides a prototype flow for compiling to binary, allowing downstream users to re-use this method. - It also avoids constructing the control variables if no device libraries are being used. - Changes the style of the error messages to be composable, ie no full stops. - Adds an error message for when the ROCm toolkit can't be found but it was required.

We do that for other Sanitizers, and we should do the same for TSAN. There are know deadlocks reports here.

This reduces codesize. As discussed in llvm#92707.

…llvm#95180) These instructions can be generated using regular LL intrinsics. Specified at: https://github.com/WebAssembly/half-precision/blob/29a9b9462c9285d4ccc1a5dc39214ddfd1892658/proposals/half-precision/Overview.md

This paper was a rewording of WG14 N1485, correcting terminology and bringing the C11 feature slightly closer in line with the C++11 feature. There is nothing additional to be done or test to conform to what was specified by WG14 N1537, so we'll remove the entry and lean on N1485 to track status for atomics.

Found while skimming this code. Don't have a reproducible test case for this but the nullptr check should clearly occur before we try to dereference `location_sp`.

Summary: The StringError class has a specialized method that creates the inconvertible error code for you. It's much easier to read this way.

This reverts commit 05d167f. Reverting the patch fixes the following under EXPENSIVE_CHECKS: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll

…e target link (llvm#91829) This PR attempts to fix common block mapping for regular mapping of these types as well as when they have been marked as "declare target link". This PR should allow correct mapping of both the members of a common block and the full common block via its block symbol. The main changes were some adjustments to the Fortran OpenMP lowering to HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and adjustments to the way the we handle target kernel map argument rebinding inside of the OMPIRBuilder. For the Fortran OpenMP lowering were two changes, one to prevent the implicit capture of common block members when the common block symbol itself has been marked and the other creates intermediate member access inside of the target region to be used in-place of those external to the target region, this prevents external usages breaking the IsolatedFromAbove pact. In the latter case, there was an adjustment to the size calculation for types to better handle cases where we pass an array as the type of a map (as opposed to the bounds and the type of the element), which occurs in the case of common blocks. There is also some adjustment to how handleDeclareTargetMapVar handles renaming of declare target symbols in the module to the reference pointer, now it will only apply to those within the kernel that is currently being generated and we also perform a modification to replace constants with instructions as necessary as we cannot replace these with our reference pointer (non-constant and constants do not mix nicely). In the case of the OpenMPIRBuilder some changes were made to defer global symbol rebinding to kernel arguments until all other arguments have been rebound. This makes sure we do not replace uses that may refer to the global (e.g. a GEP) but are themselves actually a separate argument that needs bound. Currently "declare target to" still needs some work, but this may be the case for all types in conjunction with "declare target to" at the moment.

…lvm#96537) There are multiple ways to define a remainder operation. Depending on the definition, the result could be either always positive or have the sign of the dividend. The pattern lowering `arith.remf` to LLVM assumes that the semantics match `llvm.frem`, which seems to be reasonable. The folder, however, is implemented via `APFloat::remainder()` which has different semantics. This patch matches the folding behaviour to lowering behavior by using `APFloat::mod()`, which matches the behavior of `llvm.frem` and libm's `fmod()`. It also updates the documentation of `arith.remf` to explain this behavior: The sign of the result of the remainder operation always matches the sign of the dividend (LHS operand). frem documentation: https://llvm.org/docs/LangRef.html#frem-instruction Fix llvm#94431 --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>

This PR XFAILS `llvm/test/DebugInfo/attr-btf_type_tag.ll` on AIX since we we don’t have `.debug_addr` section. Co-authored-by: Nivetha Kuruparan <nivetha@comp810.rtp.raleigh.ibm.com>

Describe steps for optimizing the Linux kernel with BOLT.

@chenzheng1030

According to Zheng @chenzheng1030, there is no .debug_addr section on AIX. Due to its absence on AIX, the test case may produce inconsistent results, either passing or failing. This PR ensures that the test case is marked as not applicable for AIX.

and delete an unneeded setParent call.

In some scenarios based on the split-dwarf build process, the dwo file is not generated as expected(That is to say, no dwo file path is stored in the binary). When the llvm-dwp tool is called to generate the .dwp file, it will exit without any warning. So, the plan is to prompt a warning to tell the user that the dwo file was not actually generated. <img width="699" alt="image" src="https://github.com/llvm/llvm-project/assets/150100070/5e5742f6-daad-450f-87e9-cb25449c3c7a">

…ntrinsics (llvm#89655) This should replicate the old intrinsic behavior better when codegen of the raw instruction will require metadata in the future.

Initial implementation for the -mtune flag in Flang. This PR is a clean version of PR #96688, which is a re-land of PR #95043

Summary: Initial implementation for the -mtune flag in Flang. This PR is a clean version of PR llvm#96688, which is a re-land of PR llvm#95043 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D59822421

Initial implementation for the -mtune flag in Flang. This PR is a clean version of PR #96688, which is a re-land of PR #95043

AlexisPerry requested review from banach-space, tarunprabhu, gysit and Dinistro June 25, 2024 19:37

AlexisPerry mentioned this pull request Jun 25, 2024

Fix for broken test following merge of PR #95043 #96683

Closed

gysit reviewed Jun 26, 2024

View reviewed changes

AlexisPerry mentioned this pull request Jun 27, 2024

Flang should handle the -mtune option #84947

Closed

PeterChou1 and others added 23 commits June 27, 2024 14:50

[clang-doc] update install path to share/clang-doc instead of share/c…

45ccae0

…lang (llvm#96555) Updates the install path for clang-doc to share/clang-doc instead share/clang to avoid confusion

[clang-doc] Remove stdexecpt from clang-doc test (llvm#96552)

8fb7e9d

Removes stdexecpt from clang-doc test introduced in llvm#93928 since it violates the rule that test must be freestanding

[sanitizer] Rename DEFINE_REAL_PTHREAD_FUNCTIONS (llvm#96527)

5b6911c

We use REAL() calls in interceptors, but DEFINE_REAL_PTHREAD_FUNCTIONS has nothing to do with them and only used for internal maintenance threads. This is done to avoid confusion like in llvm#96456.

[SLP] NFC. Refactor and add getAltInstrMask help function. (llvm#94709)

ce5090d

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>

[AMDGPU] Disallow negative s_load offsets in isLegalAddressingMode (l…

c32a256

…lvm#91327)

[C23] Move WG14 N2931 to the TS18661 section

b1e00ac

This paper only matters for TS18661-3 integration.

[nfc][tsan] Better name for locking functions (llvm#96598)

6f63727

These functions used only for `fork`. Unused parameter `child` will be used in followup patches.

[nfc][tsan] Clang format includes (llvm#96599)

7c165f7

[tsan] Lock/Unlock allocator and stacks on fork (llvm#96600)

b32c1d6

We do that for other Sanitizers, and we should do the same for TSAN. There are know deadlocks reports here.

[SelectionDAG] Lower llvm.ldexp.f32 to ldexp() on Windows. (llvm#95301)

7c30aae

This reduces codesize. As discussed in llvm#92707.

[lldb][LibCxx] Move incorrect nullptr check (llvm#96635)

4961d99

Found while skimming this code. Don't have a reproducible test case for this but the nullptr check should clearly occur before we try to dereference `location_sp`.

[LinkerWrapper][NFC] Simplify StringErrors (llvm#96650)

3686f59

Summary: The StringError class has a specialized method that creates the inconvertible error code for you. It's much easier to read this way.

XFAIL llvm/test/DebugInfo/attr-btf_type_tag.ll on AIX (llvm#96677)

7f90e44

This PR XFAILS `llvm/test/DebugInfo/attr-btf_type_tag.ll` on AIX since we we don’t have `.debug_addr` section. Co-authored-by: Nivetha Kuruparan <nivetha@comp810.rtp.raleigh.ibm.com>

[BOLT][docs] Add Linux kernel optimization guide (llvm#96669)

296bbc2

Describe steps for optimizing the Linux kernel with BOLT.

anhtuyenibm and others added 7 commits June 27, 2024 15:53

[MC] Chain together fragments only if Subsections.size() > 1

f038493

and delete an unneeded setParent call.

AMDGPU: Add amdgpu.no.fine.grained.memory when upgrading old atomic i…

da5a3b9

…ntrinsics (llvm#89655) This should replicate the old intrinsic behavior better when codegen of the raw instruction will require metadata in the future.

[CodeGen] Use range-based for loops (NFC) (llvm#96855)

ed10819

[flang] Add basic -mtune support

209ea54

mtune tests - remove unnecessary target guards, combine test files

48e468a

AlexisPerry requested review from JDevlieghere, cyndyishida, aaupov, maksfb, rafaelauler, ayermolo, dcci, tbaederr, Endilll, nikic and a team as code owners July 9, 2024 21:45

AlexisPerry closed this Jul 9, 2024

Endilll removed their request for review July 9, 2024 22:35

ldionne removed request for a team July 11, 2024 17:19

AlexisPerry mentioned this pull request Jul 11, 2024

Add basic -mtune support #98517

Merged

banach-space pushed a commit that referenced this pull request Jul 16, 2024

Add basic -mtune support (#98517)

f1d3fe7

Initial implementation for the -mtune flag in Flang. This PR is a clean version of PR #96688, which is a re-land of PR #95043

AlexisPerry deleted the AlexisPerry/mtune-2 branch July 16, 2024 16:46

yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024

Add basic -mtune support (#98517)

c11cff5

Initial implementation for the -mtune flag in Flang. This PR is a clean version of PR #96688, which is a re-land of PR #95043

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flang] Retry add basic -mtune support #96688

[flang] Retry add basic -mtune support #96688

AlexisPerry commented Jun 25, 2024

gysit Jun 26, 2024

AlexisPerry Jun 27, 2024

banach-space Jun 27, 2024

gysit Jun 27, 2024

AlexisPerry Jun 27, 2024

banach-space Jun 27, 2024

AlexisPerry Jul 2, 2024

gysit Jul 3, 2024

		@@ -0,0 +1,9 @@
		; RUN: mlir-translate -import-llvm -split-input-file %s \| FileCheck %s

[flang] Retry add basic -mtune support #96688

[flang] Retry add basic -mtune support #96688

Conversation

AlexisPerry commented Jun 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment