Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Tensor Reduction for Double (SWDEV-284915) #934

Merged
merged 23 commits into from
Jun 6, 2021
Merged

Conversation

qianfengz
Copy link
Contributor

@qianfengz qianfengz commented May 17, 2021

Adds miopenDouble to the public API.

Adds support for miopenReduceTensor() to be used on Tensors of double type data. All the 8 Reduction Operations which are supported with fp32/fp16 Tensors can be used with double Tensors. The alpha/beta scaling factors follow the same specification as what is used by cuddnReduceTensor(), that is, for fp32/fp16, alpha/beta should be float type passed by the application; for double, alpha/beta should be double type passed by the application.

This is expected to resolve https://ontrack-internal.amd.com/browse/SWDEV-284915.

The commits have gone through the following tests:

Regular MIOpen tests:

$ ./bin/test_reduce_test --double --all
$ ./bin/test_reduce_test --all
$ ./bin/test_reduce_test --half --all

Manual testing:

#!/bin/bash

### for fp64
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0 -O 0 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1 -O 0 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2 -O 0 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2,3 -O 0 

bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0 -O 1 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1 -O 1 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2 -O 1 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2,3 -O 1 

bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0 -O 2 -N 0 -I 1 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0 -O 3 -N 0 -I 1 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0 -O 4 -N 0 -I 1 

bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1 -O 2 -N 0 -I 1 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1 -O 3 -N 0 -I 1 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1 -O 4 -N 0 -I 1 

bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2 -O 2 -N 0 -I 1 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2 -O 3 -N 0 -I 1 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2 -O 4 -N 0 -I 1 

bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2,3 -O 2 -N 0 -I 1 

## skip this on ROCm 3.7 due to unexpected numeric error
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2,3 -O 3 -N 0 -I 1 
## skip this on ROCm 3.7 due to unexpected numeric error
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2,3 -O 4 -N 0 -I 1 

bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0 -O 5 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0 -O 6 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0 -O 7 

bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1 -O 5 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1 -O 6 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1 -O 7 

bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2 -O 5 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2 -O 6 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2 -O 7 

bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2,3 -O 5 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2,3 -O 6 
bin/MIOpenDriver reducefp64 -D 64,3,280,81  -R 0,1,2,3 -O 7 

### for fp32
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0 -O 0
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1 -O 0 
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2 -O 0 
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2,3 -O 0 

bin/MIOpenDriver reduce -D 64,3,280,81  -R 0 -O 1 
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1 -O 1 
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2 -O 1 
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2,3 -O 1

bin/MIOpenDriver reduce -D 64,3,280,81  -R 0 -O 2 -N 0 -I 1
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0 -O 3 -N 0 -I 1
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0 -O 4 -N 0 -I 1

bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1 -O 2 -N 0 -I 1
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1 -O 3 -N 0 -I 1
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1 -O 4 -N 0 -I 1

bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2 -O 2 -N 0 -I 1
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2 -O 3 -N 0 -I 1
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2 -O 4 -N 0 -I 1

bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2,3 -O 2 -N 0 -I 1
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2,3 -O 3 -N 0 -I 1
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2,3 -O 4 -N 0 -I 1

bin/MIOpenDriver reduce -D 64,3,280,81  -R 0 -O 5 
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0 -O 6
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0 -O 7 

bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1 -O 5 
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1 -O 6 
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1 -O 7 

bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2 -O 5 
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2 -O 6 
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2 -O 7 

bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2,3 -O 5
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2,3 -O 6 
bin/MIOpenDriver reduce -D 64,3,280,81  -R 0,1,2,3 -O 7 

### for fp16
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0 -O 0
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1 -O 0 
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2 -O 0 
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2,3 -O 0 

bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0 -O 1 
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1 -O 1 
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2 -O 1 
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2,3 -O 1

bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0 -O 2 -N 0 -I 1
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0 -O 3 -N 0 -I 1
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0 -O 4 -N 0 -I 1

bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1 -O 2 -N 0 -I 1
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1 -O 3 -N 0 -I 1
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1 -O 4 -N 0 -I 1

bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2 -O 2 -N 0 -I 1
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2 -O 3 -N 0 -I 1
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2 -O 4 -N 0 -I 1

bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2,3 -O 2 -N 0 -I 1
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2,3 -O 3 -N 0 -I 1
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2,3 -O 4 -N 0 -I 1

bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0 -O 5 
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0 -O 6
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0 -O 7 

bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1 -O 5 
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1 -O 6 
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1 -O 7 

bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2 -O 5 
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2 -O 6 
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2 -O 7 

bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2,3 -O 5
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2,3 -O 6 
bin/MIOpenDriver reducefp16 -D 4,3,60,50  -R 0,1,2,3 -O 7 

@qianfengz qianfengz requested a review from atamazov May 17, 2021 08:06
@ghost

This comment has been minimized.

@atamazov atamazov requested a review from junliume May 18, 2021 11:23
@atamazov
Copy link
Contributor

@qianfengz Please fix tidy issues. Do not hesitate to ask for help, if needed. Thanks.

@qianfengz
Copy link
Contributor Author

@qianfengz Please fix tidy issues. Do not hesitate to ask for help, if needed. Thanks.

I am trying to create a docker to be used locally for doing "make analyze". But it seems from my location, the libboost-1.72 could not be accessed in the scripts used by MIOpen/Dockerfile. Any existing docker image can be pulled to do "make analyze" ?

Copy link
Collaborator

@junliume junliume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libboost-1.72

@qianfengz
Could you test if the following URL is not accessible?
https://boostorg.jfrog.io/artifactory/main/release/1.72.0/source/boost_1_72_0.tar.gz

test/reduce_test.cpp Outdated Show resolved Hide resolved
@atamazov
Copy link
Contributor

atamazov commented May 18, 2021

@qianfengz Please fix tidy issues. Do not hesitate to ask for help, if needed. Thanks.

I am trying to create a docker to be used locally for doing "make analyze". But it seems from my location, the libboost-1.72 could not be accessed in the scripts used by MIOpen/Dockerfile.

Any existing docker image can be pulled to do "make analyze"?

Make analyze consists of make cppcheck and make tidy.

For the former, you need to install cppcheck (see ./dev-requirements.txt)

git clone --single-branch https://github.com/danmar/cppcheck
cd cppcheck
git checkout dd05839a7e63ef04afd34711cb3e1e0ef742882f
mkdir build
cd build
cmake ..
make -j $(nproc)
make test
sudo make install

Then, remake build directory from scratch and then make cppcheck -j $(nproc)

@atamazov
Copy link
Contributor

atamazov commented May 18, 2021

@qianfengz For make tidy, you can docker pull rocm/miopen:miopen-rocm-37. It has everything required for building & running MIOpen (including boost) except cppcheck. Dependencies reside in /root/driver/MLOpen/deps_hip and in /root/driver/MLOpen/deps_opencl. There, build MIOpen as usual and then make tidy -j $(nproc).

You can also install cppcheck within the container and then docker commit it and use your personalized container for both checks.

test/reduce_test.cpp Outdated Show resolved Hide resolved
@qianfengz
Copy link
Contributor Author

libboost-1.72

@qianfengz
Could you test if the following URL is not accessible?
https://boostorg.jfrog.io/artifactory/main/release/1.72.0/source/boost_1_72_0.tar.gz

I can manually download boost 1.72 and install it for building MIOpen. But for "make analyze", the issue seems be more complicated since in Dockerfile, the breaking of boost downloading also breaks other downloading/installation

@qianfengz
Copy link
Contributor Author

@qianfengz For make tidy, you can docker pull rocm/miopen:miopen-rocm-37. It has everything required for building & running MIOpen (including boost) except cppcheck. Dependencies reside in /root/driver/MLOpen/deps_hip and in /root/driver/MLOpen/deps_opencl. There, build MIOpen as usual and then make tidy -j $(nproc).

You can also install cppcheck within the container and then docker commit it and use your personalized container for both checks.

Thank Artem. rocm/miopen:miopen-rocm-37 works perfectly for me.

@qianfengz
Copy link
Contributor Author

qianfengz commented May 19, 2021

One strange issue is that on rocm/miopen:miopen-rocm-37, the follow testings

  • bin/MIOpenDriver reducefp64 -D 64,3,280,81 -R 0,1,2,3 -O 4 -N 0 -I 1
  • bin/MIOpenDriver reducefp64 -D 64,3,280,81 -R 0,1,2,3 -O 3 -N 0 -I 1
  • bin/test_reduce_test --double --D 64 3 280 81 --R 0 1 2 3 --ReduceOp 4 --N 0 --I 1
  • bin/test_reduce_test --double --D 64 3 280 81 --R 0 1 2 3 --ReduceOp 3 --N 0 --I 1

failed to get correct maximum value on my MI100/MI25. While they are able to pass on ROCM 3.8, 3.10 and 4.1.1 on my M100/MI25 (installed from official repository by me).

Watching the dumped results, it can be seen that in cases of failure, the second largest value rather than the largest one is picked by the GPU kernel, for example the second largest value is 0.999997, the largest value is 0.999999 or 1.0 by different runnings.

I am doubting there are compiler issues which cause expression 0.999997 < 0.999999 return false result.

@junliume
Copy link
Collaborator

One strange issue is that on rocm/miopen:miopen-rocm-37, the follow testings

  • bin/MIOpenDriver reducefp64 -D 64,3,280,81 -R 0,1,2,3 -O 4 -N 0 -I 1
  • bin/MIOpenDriver reducefp64 -D 64,3,280,81 -R 0,1,2,3 -O 3 -N 0 -I 1
  • bin/test_reduce_test --double --D 64 3 280 81 --R 0 1 2 3 --ReduceOp 4 --N 0 --I 1
  • bin/test_reduce_test --double --D 64 3 280 81 --R 0 1 2 3 --ReduceOp 3 --N 0 --I 1

failed to get correct maximum value on my MI100/MI25. While they are able to pass on ROCM 3.8, 3.10 and 4.1.1 on my M100/MI25 (installed from official repository by me).

Watching the dumped results, it can be seen that in cases of failure, the second largest value rather than the largest one is picked by the GPU kernel, for example the second largest value is 0.999997, the largest value is 0.999999 or 1.0 by different runnings.

I am doubting there are compiler issues which cause expression 0.999997 < 0.999999 return false result.

So the new feature revealed an issue only found in previous version of ROCm? I think all CI nodes have already updated to 4.1 at least, and I doubt if they will patch issues for earlier version of compiler, especially if it is already "fixed" in later versions. :)

@atamazov
Copy link
Contributor

atamazov commented May 19, 2021

@junliume

I think all CI nodes have already updated to 4.1

No, our CI nodes are running 3.7. @shurale-nkn is working on the dockerfile upgrade to 4.2.

@atamazov
Copy link
Contributor

@qianfengz

I am doubting there are compiler issues which cause expression 0.999997 < 0.999999 return false result.

Can you please check with rocm 4.2 release? Please see confluence where to pull it from.

@atamazov
Copy link
Contributor

@qianfengz If everything works fine with 4.2, then we can disable failing test cases (workaround) for 3.7 and forget about this problem.

@qianfengz
Copy link
Contributor Author

@qianfengz If everything works fine with 4.2, then we can disable failing test cases (workaround) for 3.7 and forget about this problem.

Wait some time, I need download 4.2 from home, could not access repo.radeon.com from the office.

@atamazov
Copy link
Contributor

@qianfengz

Wait some time, I need download 4.2 from home, could not access repo.radeon.com from the office.

You can pull docker image of 4.2 from compute-artifactory.amd.com:5000/rocm-plus-docker/compute-rocm-rel-4.2:21-STG2.

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good!

src/include/miopen/datatype.hpp Outdated Show resolved Hide resolved
test/driver.hpp Outdated Show resolved Hide resolved
test/reduce_test.cpp Show resolved Hide resolved
atamazov
atamazov previously approved these changes Jun 3, 2021
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@shurale-nkn shurale-nkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I can see, a new data type has been added here, which means need to add tests for reduction with double type. But since we don't have testing stage for double on jenkins, need add custom test with reduce_test --all and attach it for full fp32 stages on target GPUs.

@atamazov
Copy link
Contributor

atamazov commented Jun 3, 2021

need add custom test with reduce_test --all and attach it for full fp32 stages on target GPUs.

Theoretically, we should add MIOPEN_TEST_DOUBLE stages, but yes, it seems too expensive to add two stages only to test reduction. Let's add custom reduction tests for double to FULL FP32 stages.

@atamazov
Copy link
Contributor

atamazov commented Jun 3, 2021

@qianfengz Please merge from develop and add tests. Do not hesitate to ask me if you have any questions or need assistance. Thanks.

@atamazov atamazov changed the title Generic Tensor Reduction for Double data Generic Tensor Reduction for Double (SWDEV-284915) Jun 3, 2021
@qianfengz
Copy link
Contributor Author

@qianfengz Please merge from develop and add tests. Do not hesitate to ask me if you have any questions or need assistance. Thanks.

So my understand is that since there is no stage/paralleled sub-stage for double defined in the Jekinsfile, the bin/test_reduce_test --double --all has no chance to be executed during the CI. And so you want me to add bin/test_reduce_test --double --all using add_custom_test method so that it can be executed when MIOPEN_TEST_FLOAT and MIOPEN_TEST_ALL are enabled by some stages/sub-stages. Is this right ?

test/CMakeLists.txt Show resolved Hide resolved
Comment on lines 1055 to 1058

if(MIOPEN_TEST_FLOAT)
add_custom_test(test_reduce_double SKIP_UNLESS_ALL FLOAT_ENABLED COMMAND $<TARGET_FILE:test_reduce_test> --double --all)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom tests are enabled for FLOAT by default -> let's remove.
gfx908 are disabled by default -> let's enable explicitly.

Suggested change
if(MIOPEN_TEST_FLOAT)
add_custom_test(test_reduce_double SKIP_UNLESS_ALL FLOAT_ENABLED COMMAND $<TARGET_FILE:test_reduce_test> --double --all)
endif()
if(MIOPEN_TEST_FLOAT)
add_custom_test(test_reduce_double SKIP_UNLESS_ALL GFX908_ENABLED
COMMAND $<TARGET_FILE:test_reduce_test> --double --all --verbose
)
endif()

@atamazov
Copy link
Contributor

atamazov commented Jun 4, 2021

...And so you want me to add bin/test_reduce_test --double --all using add_custom_test method so that it can be executed when MIOPEN_TEST_FLOAT and MIOPEN_TEST_ALL are enabled by some stages/sub-stages. Is this right ?

Correct!

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -1053,3 +1052,7 @@ if(MIOPEN_TEST_CONV)
COMMAND $<TARGET_FILE:test_conv2d> --verbose --input 1 48 7 7 --weights 1 48 5 5 --pads_strides_dilations 0 0 4 4 1 1
)
endif()

if(MIOPEN_TEST_FLOAT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Notice] Well, this if is not required actually (but there is no harm). Ok.

@atamazov atamazov merged commit 01ff3da into develop Jun 6, 2021
@atamazov
Copy link
Contributor

atamazov commented Jun 6, 2021

@qianfengz 🎉

@qianfengz qianfengz deleted the reduction-fp64 branch June 22, 2021 06:29
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-merge review 1 (part of work on https://ontrack-internal.amd.com/browse/SWDEV-293780)

@@ -605,6 +622,8 @@ void ReduceTensorDescriptor::ReduceTensor(const Handle& handle,

std::string param2 = param + " -DCK_PARAM_GRIDSIZE=" + std::to_string(gridSize_2) + " ";

std::string network_config2 = network_config + "_C2";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qianfengz Why this change (appending "_C2") is necessary?

Copy link
Contributor Author

@qianfengz qianfengz Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second time and the first time use different kernels even though they are for solving the same problem, so they should have different key for mapping in the kernel cache.

One other thing that I want to mention is that, kernel cache is actually not queried, since Handle::AddKernel() actually only queries the Program cache, and we does not use Handle::HasKernel() + Handle::GetKernel() here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qianfengz

The second time and the first time use different kernels even though they are for solving the same problem, so they should have different key for mapping in the kernel cache.

I do not understand, sorry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@@ -308,6 +308,9 @@ bool PerformanceConfigConvAsm1x1U::IsValid(const ConvolutionContext& config) con

void PerformanceConfigConvAsm1x1U::HeuristicInit(const ConvolutionContext& config)
{
if(config.in_data_type == miopenDouble)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this change is not needed. IsApplicable() of this Solver should refuse configs with double.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@atamazov
Copy link
Contributor

atamazov commented Jul 17, 2021

It seems like this PR is the reason of https://ontrack-internal.amd.com/browse/SWDEV-293780. I am in the process of double confirming this. Meanwhile please check from your side if this PR contains something wrong. Ignore this.

atamazov pushed a commit that referenced this pull request Jul 22, 2021
* Add miopenDouble data type ID
* Update the miopenReduceTensor() host C++ interface implementation to support Reduction on Double
* Tiny update in reduction kernel layer to use built-in shuffle for Double
* Update in reduce_driver to support Reduction on Double
* Update in reduce_test to support Reduction on Double
* Update to remove compiler warnings caused by the adding of miopenDouble
* Add workaround for rocm 3.7 in reduce_test.cpp
* Add custom_test for testing reduce double
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants