Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add support for __regcall calling convention to spir targets. #4533

Merged
merged 3 commits into from
Mar 17, 2022

Conversation

kbobrovs
Copy link
Contributor

@kbobrovs kbobrovs commented Sep 9, 2021

This calling convention makes compiler return values and function arguments
passed as values (through virtual registers) in most cases. The implementation
is basically customizing SPIRABIInfo for CGFunctionInfo with the X86_RegCall
calling convention. Code generation is mostly borrowed from AMDGPUABIInfo, but
with some of the restrictions for passing by value removed.

Signed-off-by: kbobrovs Konstantin.S.Bobrovsky@intel.com

Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have experience with the specifics of the change. Would request someone else more familiar with calling convention issues to do a more thorough review.

clang/test/CodeGenSYCL/regcall-cc-test.cpp Outdated Show resolved Hide resolved
@erichkeane
Copy link
Contributor

This generally seems ok, though I would probably suggest some additional testing for parameters.

Additionally, I don't think we'd want to accept this without a spec written somewhere (I'd vastly prefer in the clang tree somewhere) that we can compare the implementation to.

@kbobrovs
Copy link
Contributor Author

This generally seems ok, though I would probably suggest some additional testing for parameters.

Additionally, I don't think we'd want to accept this without a spec written somewhere (I'd vastly prefer in the clang tree somewhere) that we can compare the implementation to.

Thanks! Could you please suggest the test cases? Also, what specific place within clang tree for the doc do you have in mind?

@erichkeane
Copy link
Contributor

This generally seems ok, though I would probably suggest some additional testing for parameters.
Additionally, I don't think we'd want to accept this without a spec written somewhere (I'd vastly prefer in the clang tree somewhere) that we can compare the implementation to.

Thanks! Could you please suggest the test cases? Also, what specific place within clang tree for the doc do you have in mind?

/clang/docs is where documentation goes. There doesn't seem to be a separate file for it, but there is a similar ABI document for Clang Blocks.

As for tests:
-I'd like a few showing that we don't 'register out' like the rest of the ABIs.
-Cases where the same types end up getting used by both parameters and return values (to show we don't care, again, like in register cases).
-Tests that show alignment mismatches don't cause packing.
-Tests with more types. How are shorts/longs combined? How about chars and ints? etc.
-At least the documentation has to cover variadic arguments, I believe there is an effort to allow printf, so we'll likely need C variadics covered.

Though, your test names seem very confusing. It is odd that PassAsTwo32bitInts is passed as 8 i8s. But I expect such an ABI document to cover what you really mean.

@kbobrovs
Copy link
Contributor Author

@erichkeane, ok will add more tests. What do you mean by "register out"?

@erichkeane
Copy link
Contributor

@erichkeane, ok will add more tests. What do you mean by "register out"?

Typically with ABIs, we change how things are passed based on the number of 'registers' we have used so far. For example, actual reg-call will use 15 SSE registers and 8 Integer registers (IIRC). At a certain point we run out of registers, and have to pass on stack.

@kbobrovs
Copy link
Contributor Author

@erichkeane, ok will add more tests. What do you mean by "register out"?

Typically with ABIs, we change how things are passed based on the number of 'registers' we have used so far. For example, actual reg-call will use 15 SSE registers and 8 Integer registers (IIRC). At a certain point we run out of registers, and have to pass on stack.

I see. I have a test for that then - AMDGPUABIInfo will register out at 16, I removed that for the SPIRVABIInfo and added a test with 24 parameters - StillPassThroughRegisters

@AaronBallman
Copy link
Contributor

@erichkeane, ok will add more tests. What do you mean by "register out"?

Typically with ABIs, we change how things are passed based on the number of 'registers' we have used so far. For example, actual reg-call will use 15 SSE registers and 8 Integer registers (IIRC). At a certain point we run out of registers, and have to pass on stack.

I see. I have a test for that then - AMDGPUABIInfo will register out at 16, I removed that for the SPIRVABIInfo and added a test with 24 parameters - StillPassThroughRegisters

24 still sounds like a reasonable number of registers. I'm wondering how we behave with a more stressful stress test, like 500 parameters (or some other excessive number of registers that makes sense). Basically -- can we run out of registers with this calling convention in SPIR?

@kbobrovs kbobrovs requested review from a team as code owners March 9, 2022 08:14
@kbobrovs
Copy link
Contributor Author

kbobrovs commented Mar 9, 2022

@erichkeane, @AaronBallman - I'm resuming work on this after a big pause, sorry for delay. I addressed your comments - tests extended a lot (one has 130 ints to test for 'register out', various type combinations), also added a description of ABI. I added it to ESIMD, as I think we are not ready to officially support it for pure SYCL.

Please review.

This calling convention makes compiler return values and function arguments
passed as values (through virtual registers) in most cases. The implementation
is basically customizing SPIRABIInfo for CGFunctionInfo with the X86_RegCall
calling convention. Code generation is mostly borrowed from AMDGPUABIInfo, but
with some of the restrictions for passing by value removed.

Signed-off-by: kbobrovs <Konstantin.S.Bobrovsky@intel.com>
AaronBallman
AaronBallman previously approved these changes Mar 9, 2022
Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not my strong suit, but I didn't see anything that concerned me.

clang/lib/CodeGen/TargetInfo.cpp Show resolved Hide resolved
* 5-8 bytes - array of 2 ints
Floating point types are not merged. Structure field alignment rules can
increase the calculated size compared to simple sum of `sizeof` of all the
types. If the total size exceeds 8, then:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structure field alignment rules can increase the calculated size compared to simple sum of sizeof of all the types.

I think this can be removed since it is mentioned above when describing how size is calculated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

// === TEST CASE: multi-level nested structs with one primitive type element at
// the bottom, and one - at the top. The nested struct at the top is expected to
// get "unwraped" by the compiler evaporating to the single element at the
// bottom.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is a copy-and-paste error describing the previous case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, right, will fix

// CHECK-DAG: %struct.CharCharShortFloat = type { i8, i8, i16, float }

template SYCL_DEVICE CharCharShortFloat __regcall func<CharCharShortFloat>(CharCharShortFloat x);
// CHECK-DAG: define weak_odr x86_regcallcc [2 x i32] @_Z16__regcall3__funcI18CharCharShortFloatET_S1_([2 x i32] %{{[0-9a-zA-Z_.]+}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain this? The doc says floating point types are not merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching! Will update the doc.

clang/test/CodeGenSYCL/regcall-cc-test.cpp Show resolved Hide resolved
@kbobrovs
Copy link
Contributor Author

I'll wait for @erichkeane's review to make review-related changes all at once to reduce stress on the CI system.

@erichkeane
Copy link
Contributor

I have nothing to add, Aaron and Prem covered everything I believe.

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits.

Comment on lines 297 to 298
// Permit CC_X86RegCall which is used to mark external functions with
// explicit simd or structure type arguments to pass them via registers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Permit CC_X86RegCall which is used to mark external functions with
// explicit simd or structure type arguments to pass them via registers.
// Permit CC_X86RegCall which is used to mark external functions with
// explicit simd or structure type arguments to pass them via registers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Comment on lines 131 to 135
return (CC == CC_SpirFunction || CC == CC_OpenCLKernel ||
// Permit CC_X86RegCall which is used to mark external functions
// with
// explicit simd or structure type arguments to pass them via
// registers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (CC == CC_SpirFunction || CC == CC_OpenCLKernel ||
// Permit CC_X86RegCall which is used to mark external functions
// with
// explicit simd or structure type arguments to pass them via
// registers.
return (CC == CC_SpirFunction || CC == CC_OpenCLKernel ||
// Permit CC_X86RegCall which is used to mark external functions
// with explicit simd or structure type arguments to pass them via
// registers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


// === TEST CASE: multi-level nested structs with one primitive type element at
// the bottom, and one - at the top. The nested struct at the top is expected to
// get "unwraped" by the compiler evaporating to the single element at the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// get "unwraped" by the compiler evaporating to the single element at the
// get "unwrapped" by the compiler evaporating to the single element at the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching!

Signed-off-by: Konstantin S Bobrovsky <konstantin.s.bobrovsky@intel.com>
@kbobrovs
Copy link
Contributor Author

@intel/dpcpp-esimd-reviewers, friendly ping

@kbobrovs
Copy link
Contributor Author

SYCL / Linux / L0 GEN9 LLVM Test Suite (pull_request) failed due to timeouts on unrelated tests:

Timed Out Tests (7):
SYCL :: KernelAndProgram/multiple-kernel-linking.cpp
SYCL :: Plugin/level_zero_batch_test.cpp
SYCL :: Plugin/level_zero_batch_test_copy_with_compute.cpp
SYCL :: Plugin/level_zero_dynamic_batch_test.cpp
SYCL :: Plugin/retain_events.cpp
SYCL :: Printf/char.cpp
SYCL :: Printf/float.cpp

Testing Time: 1080.59s
Unsupported : 184
Passed : 578
Expectedly Failed: 38
Timed Out : 7

@kbobrovs
Copy link
Contributor Author

@premanandrao, @elizabethandrews, @erichkeane, @AaronBallman - review comments addressed, please take a look/approve

AaronBallman
AaronBallman previously approved these changes Mar 14, 2022
Copy link
Contributor

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I spotted one more typo in the docs

premanandrao
premanandrao previously approved these changes Mar 14, 2022
Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple of typos in the doc.

v-klochkov
v-klochkov previously approved these changes Mar 14, 2022
Copy link
Contributor

@v-klochkov v-klochkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several misprints mentioned by reviewers, should they be fixed before merging?

@premanandrao
Copy link
Contributor

There are several misprints mentioned by reviewers, should they be fixed before merging?

That would be my preference.

Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: premanandrao <premanand.m.rao@intel.com>
@kbobrovs
Copy link
Contributor Author

misprints fixed

@kbobrovs
Copy link
Contributor Author

There is a single failure below. It is unrelated to this change, internal ticket created. The test passes on Windows in all modes, on Linux with OpenCL BE, fails only on Linux with LevelZero BE.
[2022-03-15T05:57:52.148Z] Failed Tests (1):
[2022-03-15T05:57:52.148Z] SYCL :: Assert/assert_in_simultaneous_kernels.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants