-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to reduce kernel launch overheads #794
Conversation
0e0d58d
to
129dc77
Compare
Unit test failures for iGPU OpenCL
among others |
What’s going on with those tests? They should be failing expectedly.
Is this the iGPU on which the tests fail? The tests should be failing as they try to launch kernels over the supported thread block size (1000 vs. 512): TEMPLATE_TEST_CASE("ABM_AddKernel_MultiTypeMultiSize", "", int, long, float, long long, double) {
auto size = GENERATE(as<size_t>{}, 100, 500, 1000);
// (...)
hipLaunchKernelGGL(add<TestType>, 1, size, 0, 0, d_a, d_b, d_c, size);
HIP_CHECK(hipGetLastError()); I see the tests failing on the main branch on my iGPU where thread block sizes limited to 256. If these tests are passing on the main branch on the CI’s iGPU - why are they passing in the first place? |
129dc77
to
29e59c2
Compare
@linehill tests are limited to 500 max so they don't fail on main for me not should they be failing. The test in question:
You probably looked in hip-tests which is not enabled by default. |
you just pasted a snippet where it says 512, not 256. |
yes, all tests that fail, fail on the iGPU OpenCL backend. |
512 is figure from CI's test log.
I see, I thought chipStar switched to hip-tests. |
29e59c2
to
84c81c1
Compare
No new changes, just rebase. |
Skip kernel info string construction when it is not going to be logged/displayed.
Reduce kernel launch overhead from __hipPushCallConfiguration() and __hipPopCallConfiguration() functions: * Remove unnecessary backend initialization (backend will be initialized at hipLaunchKernel()). * Avoid creating short living temporary, heap allocated chipStar::ExecItem.
Exit earlier from prepareDeviceVariablesNoLock().
Recycle OpenCL kernel handles instead of creating them for each exec-item.
84c81c1
to
0ff1aff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why linter complains for this PR
src/backend/OpenCL/CHIPBackendOpenCL.hh:42:10 [clang-diagnostic-error]
'CL/cl_ext.h' file not found
but other than that, great PR.
LGTM
A patch set for reducing kernel launch overheads. These improved HeCBench's mrc, floydwarshall and overlay benchmarks by 41-57% on PVC.
__hipPushCallConfiguration()
and__hipPopCallConfiguration()
functions:hipLaunchKernel()
).chipStar::ExecItem
.prepareDeviceVariablesNoLock()
.