-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic Tensor Reduction for Double (SWDEV-284915) #934
Conversation
…support Reduction on Double
This comment has been minimized.
This comment has been minimized.
@qianfengz Please fix tidy issues. Do not hesitate to ask for help, if needed. Thanks. |
I am trying to create a docker to be used locally for doing "make analyze". But it seems from my location, the libboost-1.72 could not be accessed in the scripts used by MIOpen/Dockerfile. Any existing docker image can be pulled to do "make analyze" ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libboost-1.72
@qianfengz
Could you test if the following URL is not accessible?
https://boostorg.jfrog.io/artifactory/main/release/1.72.0/source/boost_1_72_0.tar.gz
Make analyze consists of For the former, you need to install cppcheck (see ./dev-requirements.txt)
Then, remake build directory from scratch and then |
@qianfengz For You can also install cppcheck within the container and then |
I can manually download boost 1.72 and install it for building MIOpen. But for "make analyze", the issue seems be more complicated since in Dockerfile, the breaking of boost downloading also breaks other downloading/installation |
Thank Artem. |
One strange issue is that on
failed to get correct maximum value on my MI100/MI25. While they are able to pass on ROCM Watching the dumped results, it can be seen that in cases of failure, the second largest value rather than the largest one is picked by the GPU kernel, for example the second largest value is 0.999997, the largest value is 0.999999 or 1.0 by different runnings. I am doubting there are compiler issues which cause expression |
So the new feature revealed an issue only found in previous version of ROCm? I think all CI nodes have already updated to 4.1 at least, and I doubt if they will patch issues for earlier version of compiler, especially if it is already "fixed" in later versions. :) |
No, our CI nodes are running 3.7. @shurale-nkn is working on the dockerfile upgrade to 4.2. |
Can you please check with rocm 4.2 release? Please see confluence where to pull it from. |
@qianfengz If everything works fine with 4.2, then we can disable failing test cases (workaround) for 3.7 and forget about this problem. |
Wait some time, I need download 4.2 from home, could not access repo.radeon.com from the office. |
You can pull docker image of 4.2 from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I can see, a new data type has been added here, which means need to add tests for reduction with double type. But since we don't have testing stage for double on jenkins, need add custom test with reduce_test --all and attach it for full fp32 stages on target GPUs.
Theoretically, we should add MIOPEN_TEST_DOUBLE stages, but yes, it seems too expensive to add two stages only to test reduction. Let's add custom reduction tests for double to FULL FP32 stages. |
@qianfengz Please merge from develop and add tests. Do not hesitate to ask me if you have any questions or need assistance. Thanks. |
So my understand is that since there is no stage/paralleled sub-stage for |
test/CMakeLists.txt
Outdated
|
||
if(MIOPEN_TEST_FLOAT) | ||
add_custom_test(test_reduce_double SKIP_UNLESS_ALL FLOAT_ENABLED COMMAND $<TARGET_FILE:test_reduce_test> --double --all) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Custom tests are enabled for FLOAT by default -> let's remove.
gfx908 are disabled by default -> let's enable explicitly.
if(MIOPEN_TEST_FLOAT) | |
add_custom_test(test_reduce_double SKIP_UNLESS_ALL FLOAT_ENABLED COMMAND $<TARGET_FILE:test_reduce_test> --double --all) | |
endif() | |
if(MIOPEN_TEST_FLOAT) | |
add_custom_test(test_reduce_double SKIP_UNLESS_ALL GFX908_ENABLED | |
COMMAND $<TARGET_FILE:test_reduce_test> --double --all --verbose | |
) | |
endif() |
Correct! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -1053,3 +1052,7 @@ if(MIOPEN_TEST_CONV) | |||
COMMAND $<TARGET_FILE:test_conv2d> --verbose --input 1 48 7 7 --weights 1 48 5 5 --pads_strides_dilations 0 0 4 4 1 1 | |||
) | |||
endif() | |||
|
|||
if(MIOPEN_TEST_FLOAT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Notice] Well, this if
is not required actually (but there is no harm). Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-merge review 1 (part of work on https://ontrack-internal.amd.com/browse/SWDEV-293780)
@@ -605,6 +622,8 @@ void ReduceTensorDescriptor::ReduceTensor(const Handle& handle, | |||
|
|||
std::string param2 = param + " -DCK_PARAM_GRIDSIZE=" + std::to_string(gridSize_2) + " "; | |||
|
|||
std::string network_config2 = network_config + "_C2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qianfengz Why this change (appending "_C2") is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second time and the first time use different kernels even though they are for solving the same problem, so they should have different key for mapping in the kernel cache.
One other thing that I want to mention is that, kernel cache is actually not queried, since Handle::AddKernel()
actually only queries the Program cache, and we does not use Handle::HasKernel() + Handle::GetKernel()
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second time and the first time use different kernels even though they are for solving the same problem, so they should have different key for mapping in the kernel cache.
I do not understand, sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
@@ -308,6 +308,9 @@ bool PerformanceConfigConvAsm1x1U::IsValid(const ConvolutionContext& config) con | |||
|
|||
void PerformanceConfigConvAsm1x1U::HeuristicInit(const ConvolutionContext& config) | |||
{ | |||
if(config.in_data_type == miopenDouble) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this change is not needed. IsApplicable()
of this Solver should refuse configs with double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
|
* Add miopenDouble data type ID * Update the miopenReduceTensor() host C++ interface implementation to support Reduction on Double * Tiny update in reduction kernel layer to use built-in shuffle for Double * Update in reduce_driver to support Reduction on Double * Update in reduce_test to support Reduction on Double * Update to remove compiler warnings caused by the adding of miopenDouble * Add workaround for rocm 3.7 in reduce_test.cpp * Add custom_test for testing reduce double
Adds
miopenDouble
to the public API.Adds support for miopenReduceTensor() to be used on Tensors of double type data. All the 8 Reduction Operations which are supported with fp32/fp16 Tensors can be used with double Tensors. The alpha/beta scaling factors follow the same specification as what is used by cuddnReduceTensor(), that is, for fp32/fp16, alpha/beta should be float type passed by the application; for double, alpha/beta should be double type passed by the application.
This is expected to resolve https://ontrack-internal.amd.com/browse/SWDEV-284915.
The commits have gone through the following tests:
Regular MIOpen tests:
Manual testing: