-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flang] Retry add basic -mtune support #96688
Conversation
@@ -0,0 +1,9 @@ | |||
; RUN: mlir-translate -import-llvm -split-input-file %s | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this test also be guarded with REQUIRES: x86-registered-target
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the catch! Yes, this guard is absolutely needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, ninja check-mlir
worked for me just fine on Aarch64 🤔 I think that mlir-translate
doesn't really care :) Put differently, REQUIRES
is not needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting!
Did you build your LLVM with the x86 target enabled (besides Aarch64)? If not, mlir-translate
may indeed be fine with the attribute independent of the target architecture. Note that mlir-translate
runs llvm::verifyModule
which I thought could be unhappy with unsupported target specific attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into failures if I didn't include the guard. I have separate builds for x86 and Aarch64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have X86 disabled - I just double checked. And yes, this test works for me just fine without the guard. Alexis, what error do you get when it fails for you?
To me, it would make sense if mlir-translate
didn't care - it doesn't have to interpret these attributes, does it?
Btw @AlexisPerry , there's no need to clone "tune-cpu.f90" as tune-cpu-llvm-aarch.f90. Instead, you can use conditional RUN lines like here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been trying to reproduce my error ever since saying I had one and have been unsuccessful. It now appears that mlir-translate doesn't care about the target.
@banach-space That is a much better way of doing things. Thank you for the suggestion and I will make the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it would make sense if mlir-translate didn't care - it doesn't have to interpret these attributes, does it?
It doesn't do anything with them except for running the LLVM verifier. If things pass on Aarch64, then the guards are not necessary.
…lang (llvm#96555) Updates the install path for clang-doc to share/clang-doc instead share/clang to avoid confusion
Removes stdexecpt from clang-doc test introduced in llvm#93928 since it violates the rule that test must be freestanding
We use REAL() calls in interceptors, but DEFINE_REAL_PTHREAD_FUNCTIONS has nothing to do with them and only used for internal maintenance threads. This is done to avoid confusion like in llvm#96456.
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
This paper only matters for TS18661-3 integration.
According to https://reviews.llvm.org/D114250 this was to handle Mac specific issue, however the test is Linux only. The test effectively prevents to lock main allocator on fork, but we do that on Linux for other sanitizers for years, and need to do the same for TSAN to avoid deadlocks.
r7 is reserved in thumb2 (typically for the frame pointer, as opposed to r11 in ARM mode), so assigning to a variable with explicit register storage in r7 will produce an error. But r7 is where the Linux kernel expects the syscall number to be placed. We can use a temporary to get the register allocator to pick a temporary, which we save+restore the previous value of r7 in. Fixes: llvm#93738
These functions used only for `fork`. Unused parameter `child` will be used in followup patches.
Cap the alignment to 128 bytes as that is the maximum alignment supported by PTX. The restriction is mentioned in the parameter passing section (Note D) of the [PTX Writer's Guide to Interoperability] (https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#parameter-passing) > D. The alignment must be 1, 2, 4, 8, 16, 32, 64, or 128 bytes.
Reland: llvm#95456 This patch improves the ROCDL gpu serialization API by: - Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN device code libraries to use during linking. - Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`. Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode librariesm now it will only load the requested libraries. - Exposing the `compileToBinary` method and making it virtual, allowing downstream users to re-use this method. - Exposing `moduleToObjectImpl`, this method provides a prototype flow for compiling to binary, allowing downstream users to re-use this method. - It also avoids constructing the control variables if no device libraries are being used. - Changes the style of the error messages to be composable, ie no full stops. - Adds an error message for when the ROCm toolkit can't be found but it was required.
We do that for other Sanitizers, and we should do the same for TSAN. There are know deadlocks reports here.
This reduces codesize. As discussed in llvm#92707.
…llvm#95180) These instructions can be generated using regular LL intrinsics. Specified at: https://github.com/WebAssembly/half-precision/blob/29a9b9462c9285d4ccc1a5dc39214ddfd1892658/proposals/half-precision/Overview.md
This paper was a rewording of WG14 N1485, correcting terminology and bringing the C11 feature slightly closer in line with the C++11 feature. There is nothing additional to be done or test to conform to what was specified by WG14 N1537, so we'll remove the entry and lean on N1485 to track status for atomics.
Found while skimming this code. Don't have a reproducible test case for this but the nullptr check should clearly occur before we try to dereference `location_sp`.
Summary: The StringError class has a specialized method that creates the inconvertible error code for you. It's much easier to read this way.
This reverts commit 05d167f. Reverting the patch fixes the following under EXPENSIVE_CHECKS: LLVM :: CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir LLVM :: CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir LLVM :: CodeGen/PowerPC/aix-xcoff-used-with-stringpool.ll LLVM :: CodeGen/PowerPC/merge-string-used-by-metadata.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool-large.ll LLVM :: CodeGen/PowerPC/mergeable-string-pool-pass-only.mir LLVM :: CodeGen/PowerPC/mergeable-string-pool.ll
…e target link (llvm#91829) This PR attempts to fix common block mapping for regular mapping of these types as well as when they have been marked as "declare target link". This PR should allow correct mapping of both the members of a common block and the full common block via its block symbol. The main changes were some adjustments to the Fortran OpenMP lowering to HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and adjustments to the way the we handle target kernel map argument rebinding inside of the OMPIRBuilder. For the Fortran OpenMP lowering were two changes, one to prevent the implicit capture of common block members when the common block symbol itself has been marked and the other creates intermediate member access inside of the target region to be used in-place of those external to the target region, this prevents external usages breaking the IsolatedFromAbove pact. In the latter case, there was an adjustment to the size calculation for types to better handle cases where we pass an array as the type of a map (as opposed to the bounds and the type of the element), which occurs in the case of common blocks. There is also some adjustment to how handleDeclareTargetMapVar handles renaming of declare target symbols in the module to the reference pointer, now it will only apply to those within the kernel that is currently being generated and we also perform a modification to replace constants with instructions as necessary as we cannot replace these with our reference pointer (non-constant and constants do not mix nicely). In the case of the OpenMPIRBuilder some changes were made to defer global symbol rebinding to kernel arguments until all other arguments have been rebound. This makes sure we do not replace uses that may refer to the global (e.g. a GEP) but are themselves actually a separate argument that needs bound. Currently "declare target to" still needs some work, but this may be the case for all types in conjunction with "declare target to" at the moment.
…lvm#96537) There are multiple ways to define a remainder operation. Depending on the definition, the result could be either always positive or have the sign of the dividend. The pattern lowering `arith.remf` to LLVM assumes that the semantics match `llvm.frem`, which seems to be reasonable. The folder, however, is implemented via `APFloat::remainder()` which has different semantics. This patch matches the folding behaviour to lowering behavior by using `APFloat::mod()`, which matches the behavior of `llvm.frem` and libm's `fmod()`. It also updates the documentation of `arith.remf` to explain this behavior: The sign of the result of the remainder operation always matches the sign of the dividend (LHS operand). frem documentation: https://llvm.org/docs/LangRef.html#frem-instruction Fix llvm#94431 --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
This PR XFAILS `llvm/test/DebugInfo/attr-btf_type_tag.ll` on AIX since we we don’t have `.debug_addr` section. Co-authored-by: Nivetha Kuruparan <nivetha@comp810.rtp.raleigh.ibm.com>
Describe steps for optimizing the Linux kernel with BOLT.
According to Zheng @chenzheng1030, there is no .debug_addr section on AIX. Due to its absence on AIX, the test case may produce inconsistent results, either passing or failing. This PR ensures that the test case is marked as not applicable for AIX.
and delete an unneeded setParent call.
In some scenarios based on the split-dwarf build process, the dwo file is not generated as expected(That is to say, no dwo file path is stored in the binary). When the llvm-dwp tool is called to generate the .dwp file, it will exit without any warning. So, the plan is to prompt a warning to tell the user that the dwo file was not actually generated. <img width="699" alt="image" src="https://github.com/llvm/llvm-project/assets/150100070/5e5742f6-daad-450f-87e9-cb25449c3c7a">
…ntrinsics (llvm#89655) This should replicate the old intrinsic behavior better when codegen of the raw instruction will require metadata in the future.
Summary: Initial implementation for the -mtune flag in Flang. This PR is a clean version of PR llvm#96688, which is a re-land of PR llvm#95043 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D59822421
This aims to reland the changes and fix the broken test from PR #95043