Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ck kernels invocation refactoring #2379

Merged
merged 10 commits into from
Sep 12, 2023
Merged

Conversation

CAHEK7
Copy link
Contributor

@CAHEK7 CAHEK7 commented Sep 6, 2023

Simplified and unified CK kernel invocation.
Applied patches from #2338, #2369, #2305, #2336 to all other CK kernels and unified the code.

@atamazov

@CAHEK7
Copy link
Contributor Author

CAHEK7 commented Sep 7, 2023

@iq136boy can you take a look since it affects your PRs?
@amberhassaan that's exactly the change which brings similar CK invocation code into one place.

src/include/miopen/solver/implicitgemm_ck_util.hpp Outdated Show resolved Hide resolved
@CAHEK7
Copy link
Contributor Author

CAHEK7 commented Sep 8, 2023

CI is broken for a while. It can't even checkout the repo.

Copy link
Contributor

@amberhassaan amberhassaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Nice consolidation and improvement. I'd recommend introducing a base performance config class for these implicit-gemm solvers, and that may remove some more repeated code.

src/include/miopen/solver.hpp Show resolved Hide resolved
@JehandadKhan
Copy link
Collaborator

@junliume This can be merged as well

@junliume junliume merged commit 6bcd141 into develop Sep 12, 2023
138 checks passed
@junliume junliume added this to the ROCm 6.0 milestone Sep 12, 2023
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 1

case miopenDouble:
default:
MIOPEN_THROW(miopenStatusInternalError,
"ConvHipImplicitGemmFwdXdlops operation not implemented for this data type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConvHipImplicitGemm3DGroupFwdXdlops

auto ptr_iter = FindConvPtrByID(conv_ptrs, kernel_id);

if(ptr_iter == conv_ptrs.end())
MIOPEN_THROW("PerformanceConfig kernel '" + kernel_id + "' does not exist");
Copy link
Contributor

@atamazov atamazov Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is much better to return ConvSolution(miopenStatusInvalidValue) when this happens. Why: This won't break Find operation, i.e. would improve stability of releases at the users' sites. In all other cases (i.e. attempt to run a convolution) returning a solution with error code would lead to MIOPEN_THROW somewhere in the upper layers of the library.

Implementation: Maybe the most robust approach is throwing a custom exception and catching it in GetSolution (and then MIOPEN_LOG_E there). You can find example in comgr.cpp (which handles compilation errors).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just kept initial functionality as it is. But I agree with this comment and I guess it should be a separated issue - how to handle bad perf configs gracefully. Issue #2426 created.

Comment on lines +4669 to +4672
int index = 0;
std::string kernel_id = "";
std::vector<std::string> valid_kernels;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] AFAICS we use kernel_id as PerformanceConfig value, and we still need index and valid_kernels for traversing the ComputedContainer. Maybe we can get rid of kernel_id and store only index in the database? The positive side effect is smalled perf-db footprint.

Or this would make PerformanceConfig incompatible between CK versions because valid_kernels may become different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't since we do not have any guarantees that the order of the kernels is persistent. And it may break backward compatibility as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Recommendation]
@CAHEK7 Please put C++ comment that summarizes this review thread, i.e. explains why we need string kernel_id as well as index and valid_kernels in the CK's PerformanceConfigs.

@atamazov
Copy link
Contributor

@CAHEK7 What are your plans about the unresolved post-merge review comments?

@CAHEK7
Copy link
Contributor Author

CAHEK7 commented Oct 2, 2023

Thanks for mentioning me, I missed all the comments.

I'm going to address all of them when #2385 and #2386 are merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants