-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Add support for __regcall calling convention to spir targets. #4533
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have experience with the specifics of the change. Would request someone else more familiar with calling convention issues to do a more thorough review.
This generally seems ok, though I would probably suggest some additional testing for parameters. Additionally, I don't think we'd want to accept this without a spec written somewhere (I'd vastly prefer in the clang tree somewhere) that we can compare the implementation to. |
Thanks! Could you please suggest the test cases? Also, what specific place within clang tree for the doc do you have in mind? |
/clang/docs is where documentation goes. There doesn't seem to be a separate file for it, but there is a similar ABI document for Clang Blocks. As for tests: Though, your test names seem very confusing. It is odd that |
@erichkeane, ok will add more tests. What do you mean by "register out"? |
Typically with ABIs, we change how things are passed based on the number of 'registers' we have used so far. For example, actual reg-call will use 15 SSE registers and 8 Integer registers (IIRC). At a certain point we run out of registers, and have to pass on stack. |
I see. I have a test for that then - AMDGPUABIInfo will register out at 16, I removed that for the SPIRVABIInfo and added a test with 24 parameters - |
24 still sounds like a reasonable number of registers. I'm wondering how we behave with a more stressful stress test, like 500 parameters (or some other excessive number of registers that makes sense). Basically -- can we run out of registers with this calling convention in SPIR? |
@erichkeane, @AaronBallman - I'm resuming work on this after a big pause, sorry for delay. I addressed your comments - tests extended a lot (one has 130 ints to test for 'register out', various type combinations), also added a description of ABI. I added it to ESIMD, as I think we are not ready to officially support it for pure SYCL. Please review. |
This calling convention makes compiler return values and function arguments passed as values (through virtual registers) in most cases. The implementation is basically customizing SPIRABIInfo for CGFunctionInfo with the X86_RegCall calling convention. Code generation is mostly borrowed from AMDGPUABIInfo, but with some of the restrictions for passing by value removed. Signed-off-by: kbobrovs <Konstantin.S.Bobrovsky@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not my strong suit, but I didn't see anything that concerned me.
sycl/doc/extensions/experimental/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md
Outdated
Show resolved
Hide resolved
* 5-8 bytes - array of 2 ints | ||
Floating point types are not merged. Structure field alignment rules can | ||
increase the calculated size compared to simple sum of `sizeof` of all the | ||
types. If the total size exceeds 8, then: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structure field alignment rules can increase the calculated size compared to simple sum of
sizeof
of all the types.
I think this can be removed since it is mentioned above when describing how size is calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
// === TEST CASE: multi-level nested structs with one primitive type element at | ||
// the bottom, and one - at the top. The nested struct at the top is expected to | ||
// get "unwraped" by the compiler evaporating to the single element at the | ||
// bottom. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is a copy-and-paste error describing the previous case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, right, will fix
// CHECK-DAG: %struct.CharCharShortFloat = type { i8, i8, i16, float } | ||
|
||
template SYCL_DEVICE CharCharShortFloat __regcall func<CharCharShortFloat>(CharCharShortFloat x); | ||
// CHECK-DAG: define weak_odr x86_regcallcc [2 x i32] @_Z16__regcall3__funcI18CharCharShortFloatET_S1_([2 x i32] %{{[0-9a-zA-Z_.]+}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain this? The doc says floating point types are not merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching! Will update the doc.
I'll wait for @erichkeane's review to make review-related changes all at once to reduce stress on the CI system. |
I have nothing to add, Aaron and Prem covered everything I believe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits.
clang/lib/Basic/Targets/SPIR.h
Outdated
// Permit CC_X86RegCall which is used to mark external functions with | ||
// explicit simd or structure type arguments to pass them via registers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Permit CC_X86RegCall which is used to mark external functions with | |
// explicit simd or structure type arguments to pass them via registers. | |
// Permit CC_X86RegCall which is used to mark external functions with | |
// explicit simd or structure type arguments to pass them via registers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
clang/lib/Basic/Targets/SPIR.h
Outdated
return (CC == CC_SpirFunction || CC == CC_OpenCLKernel || | ||
// Permit CC_X86RegCall which is used to mark external functions | ||
// with | ||
// explicit simd or structure type arguments to pass them via | ||
// registers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (CC == CC_SpirFunction || CC == CC_OpenCLKernel || | |
// Permit CC_X86RegCall which is used to mark external functions | |
// with | |
// explicit simd or structure type arguments to pass them via | |
// registers. | |
return (CC == CC_SpirFunction || CC == CC_OpenCLKernel || | |
// Permit CC_X86RegCall which is used to mark external functions | |
// with explicit simd or structure type arguments to pass them via | |
// registers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
// === TEST CASE: multi-level nested structs with one primitive type element at | ||
// the bottom, and one - at the top. The nested struct at the top is expected to | ||
// get "unwraped" by the compiler evaporating to the single element at the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// get "unwraped" by the compiler evaporating to the single element at the | |
// get "unwrapped" by the compiler evaporating to the single element at the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for catching!
sycl/doc/extensions/experimental/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Konstantin S Bobrovsky <konstantin.s.bobrovsky@intel.com>
@intel/dpcpp-esimd-reviewers, friendly ping |
SYCL / Linux / L0 GEN9 LLVM Test Suite (pull_request) failed due to timeouts on unrelated tests: Timed Out Tests (7): Testing Time: 1080.59s |
@premanandrao, @elizabethandrews, @erichkeane, @AaronBallman - review comments addressed, please take a look/approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I spotted one more typo in the docs
sycl/doc/extensions/experimental/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a couple of typos in the doc.
sycl/doc/extensions/experimental/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_intel_esimd/sycl_ext_intel_esimd.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several misprints mentioned by reviewers, should they be fixed before merging?
That would be my preference. |
Co-authored-by: Aaron Ballman <aaron@aaronballman.com> Co-authored-by: premanandrao <premanand.m.rao@intel.com>
2568239
misprints fixed |
There is a single failure below. It is unrelated to this change, internal ticket created. The test passes on Windows in all modes, on Linux with OpenCL BE, fails only on Linux with LevelZero BE. |
This calling convention makes compiler return values and function arguments
passed as values (through virtual registers) in most cases. The implementation
is basically customizing SPIRABIInfo for CGFunctionInfo with the X86_RegCall
calling convention. Code generation is mostly borrowed from AMDGPUABIInfo, but
with some of the restrictions for passing by value removed.
Signed-off-by: kbobrovs Konstantin.S.Bobrovsky@intel.com