Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPP Tensor Transpose ND on HOST and HIP #362

Merged
merged 149 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from 139 commits
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
230a29d
Add transpose ref input/output
r-abishek Nov 28, 2023
edc1b9c
added initial support for generic ND transpose in HOST
sampath1117 Sep 4, 2023
69722bf
added golden outputs for transpose
sampath1117 Sep 5, 2023
63c1929
optimized 2D transpose with SSE instructions
sampath1117 Sep 7, 2023
4ff7b87
added initial SSE version for 3D inputs with last dimension fixed to 16
sampath1117 Sep 11, 2023
29be31a
minor changes
sampath1117 Sep 11, 2023
2b39906
added support for transposing 3D inputs when innermost dimension rema…
sampath1117 Sep 11, 2023
598b010
added support for transposing 3D inputs when innermost dimension rema…
sampath1117 Sep 12, 2023
7151677
minor cleanup
sampath1117 Sep 12, 2023
ddc54a1
optimized 2D transpose with AVX2 instructions
sampath1117 Sep 12, 2023
d8883bf
added generic case to do memcpy if permute order is same as input layout
sampath1117 Sep 13, 2023
b7b66e4
made changes to process w.r.t strides instead of ROI
sampath1117 Sep 13, 2023
bbcb707
made changes in test suite to run performance tests
sampath1117 Sep 13, 2023
3ca31a4
added initial transpose SSE version for 4D inputs
sampath1117 Sep 14, 2023
eccaa3c
optimized 4d transpose with avx2
sampath1117 Sep 15, 2023
b961d34
updated golden inputs and output with actual float data
sampath1117 Sep 19, 2023
79682e5
changed the name of transpose test suite file
sampath1117 Sep 19, 2023
117d4bc
ported transpose 3D 16 channel variant from SSE to AVX2 instructions
sampath1117 Sep 19, 2023
6f4c273
added a templated generic transpose kernel for other bitdepths
sampath1117 Sep 21, 2023
b0b85b1
changed ROI buffer to also have begin values
sampath1117 Sep 28, 2023
8637c82
made transpose test suite generic for supporting any ND kernel
sampath1117 Oct 4, 2023
c909277
added support to run transpose test suite from python
sampath1117 Oct 4, 2023
7aef9ca
changed file name of transpose test suite for better readability
sampath1117 Oct 4, 2023
8e2cc6e
minor changes
sampath1117 Oct 10, 2023
b143008
address review comments
sampath1117 Oct 12, 2023
4a60069
removed usage of malloc/calloc in transpose function
sampath1117 Oct 17, 2023
ad29567
fixed build error in test suite
sampath1117 Nov 29, 2023
1a41104
Add initial generic templated u8/f16/f32/i8 unvectorized transpose - …
r-abishek Nov 30, 2023
1817f63
Add initial hip misc tests for transpose
r-abishek Nov 30, 2023
b6d6fad
Increase max dims to 8
r-abishek Nov 30, 2023
a7df2eb
Add perf test with profiler option on misc hip tests
r-abishek Nov 30, 2023
4982b12
added more comments as per review comments
sampath1117 Dec 4, 2023
31fd4d8
Merge remote-tracking branch 'ar/transpose_tensor' into sr/transpose_…
sampath1117 Dec 4, 2023
81f12b7
modified comment for more clarity
sampath1117 Dec 4, 2023
4d696f5
added missing declaration for transpose gpu that got removed with merge
sampath1117 Dec 7, 2023
3e92ff8
removed unnecessary validation checks for transpose
sampath1117 Dec 8, 2023
a85993b
added correct interpretation of numDims
sampath1117 Dec 8, 2023
b26e68c
temp commit for vectorized version
r-abishek Dec 13, 2023
690eaf2
added seperate header for misc test suite
sampath1117 Dec 13, 2023
e17ff2b
Merge branch 'ar/transpose_tensor' into sr/transpose_pr_changes
r-abishek Dec 14, 2023
c2e2f81
Merge pull request #203 from sampath1117/sr/transpose_pr_changes
r-abishek Dec 14, 2023
1bfb89f
Fix 2d case for dst->src 1 pixel mapping
snehaa8 Dec 20, 2023
9531d40
Fix 3d and 4d transpose HIP dst->src mapping
snehaa8 Dec 21, 2023
f0f5552
Implement initial version for 8 pixel dst -> src mapping
snehaa8 Dec 22, 2023
4f825c3
Implement 8 pixel store with dst to src mapping
snehaa8 Jan 3, 2024
8d79265
Fix 8 pixel store with dst to src mapping
snehaa8 Jan 11, 2024
a883ed5
Cleanup and remove unused variables
snehaa8 Jan 12, 2024
8654457
Merge branch 'master' of https://github.com/snehaa8/rpp into sn/trans…
snehaa8 Jan 12, 2024
8791586
Optimize HIP kernel further
snehaa8 Jan 12, 2024
dcd54ac
experimental changes for adding qa mode for performance tests
sampath1117 Jan 18, 2024
50138ea
made changes to add display more information w.r.t QA results summary…
sampath1117 Jan 19, 2024
f3bea0e
minor changes
sampath1117 Jan 19, 2024
58a915a
Add changes to dump qa results to excel file
HazarathKumarM Jan 23, 2024
03c5d27
Add performance QA for three new tensor functions
HazarathKumarM Jan 23, 2024
a5d49a0
update prerequisites in readme
HazarathKumarM Jan 23, 2024
2e0f922
merged latest changes
HazarathKumarM Jan 23, 2024
c097f74
added changes to handle unsupported cases
sampath1117 Jan 23, 2024
ba860e4
removed treshold dictionary and added performance Noise treshold
HazarathKumarM Jan 24, 2024
a7f17f7
RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduc…
r-abishek Jan 26, 2024
f6ee505
Merge remote-tracking branch 'TOT/develop' into sr/qa_perf
snehaa8 Jan 29, 2024
b9c8cdc
Merge pull request #221 from sampath1117/sr/qa_perf
r-abishek Jan 29, 2024
fa0e3ab
Changes to the performane summary dataframe
HazarathKumarM Jan 30, 2024
35f26f0
Merge branch 'sr/qa_perf' of https://github.com/sampath1117/rpp into …
HazarathKumarM Jan 30, 2024
067d575
minor changes
HazarathKumarM Jan 30, 2024
8ed21ae
Update CMakeLists.txt to add ${CMAKE_CURRENT_SOURCE_DIR} for CI
r-abishek Jan 31, 2024
1cdfcde
Update CMakeLists.txt fix
r-abishek Jan 31, 2024
f82145e
Update CMakeLists.txt fix
r-abishek Jan 31, 2024
0c4f413
remove tabulate dependency
HazarathKumarM Jan 31, 2024
407360d
Merge pull request #229 from sampath1117/sr/qa_perf
r-abishek Jan 31, 2024
511a7af
Update README.md to remove tabulate pip install
r-abishek Jan 31, 2024
a1f4213
License - updates to 2024 and consistency changes (#298)
r-abishek Jan 31, 2024
7096c1d
Test - Update README.md for test_suite (#299)
r-abishek Jan 31, 2024
dab18e0
Merge branch 'master' of https://github.com/GPUOpen-ProfessionalCompu…
r-abishek Feb 1, 2024
07a5f66
Bump rocm-docs-core[api_reference] from 0.33.0 to 0.33.1 in /docs/sph…
dependabot[bot] Feb 6, 2024
a5e5679
Bump rocm-docs-core[api_reference] from 0.33.1 to 0.33.2 in /docs/sph…
dependabot[bot] Feb 7, 2024
37186bb
Fix for CI machine failure
r-abishek Feb 7, 2024
b889a79
Add note on performance
r-abishek Feb 8, 2024
e8aa6b2
Update doc codeowners (#303)
samjwu Feb 8, 2024
5787417
Merge pull request #297 from r-abishek/ar/test_suite_upgrade_5_qa_perf
paveltc Feb 8, 2024
3eb2e8e
Merge branch 'master' into sn/transpose_ND
r-abishek Feb 8, 2024
a921332
Documentation - Bump rocm-docs-core[api_reference] from 0.33.2 to 0.3…
dependabot[bot] Feb 9, 2024
30bed4e
Test suite - upgrade 5 qa perf (#305)
kiritigowda Feb 9, 2024
5c423ab
RPP Color Temperature on HOST and HIP (#271)
r-abishek Feb 9, 2024
df6e2c9
RPP Voxel 3D Tensor Add/Subtract scalar on HOST and HIP (#272)
r-abishek Feb 9, 2024
a4ed137
RPP Magnitude on HOST and HIP (#278)
r-abishek Feb 14, 2024
1976cbf
Bump rocm-docs-core[api_reference] from 0.34.0 to 0.34.2 in /docs/sph…
dependabot[bot] Feb 16, 2024
ec8f2f0
RPP Tensor Audio Support - Down Mixing (#296)
r-abishek Feb 16, 2024
29a5c82
RPP Voxel 3D Tensor Multiply scalar on HOST and HIP (#306)
r-abishek Feb 16, 2024
98a3c82
Test Suite Bugfix (#307)
r-abishek Feb 16, 2024
608225b
Bump rocm-docs-core[api_reference] from 0.34.2 to 0.35.0 in /docs/sph…
dependabot[bot] Feb 23, 2024
a7ef385
RPP Reduction - Tensor min and Tensor max on HOST and HIP (#260)
r-abishek Feb 24, 2024
473cde4
CI - Update precheckin.groovy
kiritigowda Feb 24, 2024
ce7dc91
Merge branch 'develop' of https://github.com/r-abishek/rpp into sn/tr…
r-abishek Feb 28, 2024
1147bfe
Update CMakeLists.txt
kiritigowda Apr 12, 2024
5e3fc7a
Bump rocm-docs-core[api_reference] from 0.38.1 to 1.0.0 in /docs/sphi…
dependabot[bot] Apr 18, 2024
6a942c8
Merge remote-tracking branch 'abishek/develop' into sn/transpose_ND
snehaa8 Apr 18, 2024
e3bc943
Merge latest changes
snehaa8 Apr 18, 2024
bfa195d
Cleanup HIP
snehaa8 Apr 24, 2024
c8ed055
fixed copyright and name changes for HOST kernels
sampath1117 Apr 24, 2024
b6b7cc5
Bump rocm-docs-core[api_reference] from 1.0.0 to 1.1.0 in /docs/sphin…
dependabot[bot] Apr 25, 2024
e16ad7a
RPP Gaussian Noise Voxel Tensor on HOST and HIP (#323)
r-abishek Apr 26, 2024
7da4b63
Docs - Bump rocm-docs-core[api_reference] from 1.1.0 to 1.1.1 in /doc…
dependabot[bot] Apr 29, 2024
d24f1ea
RPP - Package updates (#341)
kiritigowda Apr 29, 2024
84ebccd
Cleanup and optimize
snehaa8 Apr 30, 2024
178fc89
Merge branch 'develop' into sn/transpose_ND
snehaa8 Apr 30, 2024
485f181
minor change in comment
sampath1117 Apr 30, 2024
77e14ef
Minor common-fixes for HIP (#345)
r-abishek May 7, 2024
34f3f6d
Readme Updates: --usecase=rocm (#349)
kiritigowda May 8, 2024
ab52683
RPP Tensor Audio Support - Spectrogram (#312)
r-abishek May 8, 2024
ee0d6fe
Update CHANGELOG.md (#352)
r-abishek May 8, 2024
2decd32
RPP Tensor Audio Support - Slice (#325)
r-abishek May 8, 2024
30ce1d6
RPP Tensor Audio Support - MelFilterBank (#332)
r-abishek May 8, 2024
64ae74f
RPP Tensor Normalize ND on HOST and HIP (#335)
r-abishek May 9, 2024
1a3015c
SWDEV-459739 - Remove the package obsolete setting (#353)
raramakr May 9, 2024
4cb8d4b
Audio support merge commit fixes (#354)
r-abishek May 9, 2024
5a699f7
Merge branch 'develop' into sn/transpose_ND
sampath1117 May 10, 2024
9a6cfc4
Merge branch 'develop' into sn/transpose_ND
sampath1117 May 10, 2024
840b6e6
revert unnecessary changes happened with merge
sampath1117 May 10, 2024
bfc5598
added transpose test case
sampath1117 May 10, 2024
bf20223
removed .txt input files
sampath1117 May 10, 2024
9fea8f4
moved normalize inputs and outputs to another folder
sampath1117 May 10, 2024
52df0c3
made changes to make the golden input and output path generic for all…
sampath1117 May 10, 2024
b67fed3
removed .txt output files for transpose
sampath1117 May 10, 2024
a99727e
use hipMemcpyAsync instead of instead of hipMempcy in hip kernel
sampath1117 May 10, 2024
976f5f7
modified compare output function to do comparision for transpose case
sampath1117 May 10, 2024
35f04c8
added golden output for 3d inputs
sampath1117 May 10, 2024
e12d5c7
moved constant compute outside the loop
sampath1117 May 10, 2024
59355c0
minor change in description
sampath1117 May 10, 2024
9ec2f6b
updated print statement for usage in test suites
sampath1117 May 10, 2024
d4219f8
Bump rocm-docs-core[api_reference] from 1.1.1 to 1.1.2 in /docs/sphin…
dependabot[bot] May 17, 2024
264392b
Docker updates (#356)
LakshmiKumar23 May 17, 2024
9907a49
Version Updates (#359)
LakshmiKumar23 May 17, 2024
538b53d
Merge branch 'develop' into sn/transpose_ND
sampath1117 May 21, 2024
9d199da
added AVX2 flags and made helper functions inline in HOST kernel
sampath1117 May 21, 2024
8879af0
Merge pull request #216 from snehaa8/sn/transpose_ND
r-abishek May 21, 2024
024dfb4
Merge branch 'develop' of https://github.com/ROCm/rpp into ar/transpo…
r-abishek May 28, 2024
d3f58b6
Remove re-definition
r-abishek May 28, 2024
7d900f8
Merge branch 'develop' into ar/transpose_tensor
kiritigowda May 29, 2024
200194b
Merge branch 'develop' into ar/transpose_tensor
r-abishek Jun 5, 2024
4589515
changed nDim to tensorDims
sampath1117 Jun 24, 2024
2ff9f21
minor change
sampath1117 Jun 24, 2024
825784d
Merge pull request #284 from sampath1117/sr/transpose_pr_changes
r-abishek Jun 24, 2024
10f343d
Merge branch 'develop' into ar/transpose_tensor
r-abishek Jun 24, 2024
90d7ba8
Fix to add endif
r-abishek Jun 24, 2024
97b5068
Update transpose.hpp to add comments
r-abishek Jun 24, 2024
fbdb259
Update transpose.hpp to remove stream sync and clarify comment
r-abishek Jun 24, 2024
8ccf8df
Merge branch 'develop' into ar/transpose_tensor
kiritigowda Jun 24, 2024
d9eb7ad
Merge branch 'develop' into ar/transpose_tensor
r-abishek Jun 27, 2024
66ae6c7
Merge branch 'develop' into ar/transpose_tensor
kiritigowda Jun 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions include/rppt_tensor_geometric_augmentations.h
Original file line number Diff line number Diff line change
Expand Up @@ -584,6 +584,40 @@ RppStatus rppt_flip_voxel_host(RppPtr_t srcPtr, RpptGenericDescPtr srcGenericDes
RppStatus rppt_flip_voxel_gpu(RppPtr_t srcPtr, RpptGenericDescPtr srcGenericDescPtr, RppPtr_t dstPtr, RpptGenericDescPtr dstGenericDescPtr, Rpp32u *horizontalTensor, Rpp32u *verticalTensor, Rpp32u *depthTensor, RpptROI3DPtr roiGenericPtrSrc, RpptRoi3DType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! \brief Transpose Generic augmentation on HOST backend
* \details The transpose augmentation performs an input-permutation based transpose on a generic ND Tensor.
* \param [in] srcPtr source tensor in HOST memory
* \param [in] srcGenericDescPtr source tensor descriptor
* \param [out] dstPtr source tensor in HOST memory
* \param [in] dstGenericDescPtr destination tensor descriptor
* \param [in] permTensor permutation tensor for transpose operation
* \param [in] roiTensor ROI data for each element in source tensor (tensor of batchSize * number of dimensions * 2 values)
* \param [in] rppHandle RPP HOST handle created with <tt>\ref rppCreateWithBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
* \ingroup group_tensor_geometric
*/
RppStatus rppt_transpose_host(RppPtr_t srcPtr, RpptGenericDescPtr srcGenericDescPtr, RppPtr_t dstPtr, RpptGenericDescPtr dstGenericDescPtr, Rpp32u *permTensor, Rpp32u *roiTensor, rppHandle_t rppHandle);

#ifdef GPU_SUPPORT
/*! \brief Transpose Generic augmentation on HIP backend
* \details The transpose augmentation performs an input-permutation based transpose on a generic ND Tensor.
* \param [in] srcPtr source tensor in HIP memory
* \param [in] srcGenericDescPtr source tensor descriptor
* \param [out] dstPtr source tensor in HIP memory
* \param [in] dstGenericDescPtr destination tensor descriptor
* \param [in] permTensor permutation tensor for transpose operation in pinned memory
* \param [in] roiTensor ROI data for each element in source tensor (tensor of batchSize * number of dimensions * 2 values)
* \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithStreamAndBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
* \ingroup group_tensor_geometric
*/
RppStatus rppt_transpose_gpu(RppPtr_t srcPtr, RpptGenericDescPtr srcGenericDescPtr, RppPtr_t dstPtr, RpptGenericDescPtr dstGenericDescPtr, Rpp32u *permTensor, Rpp32u *roiTensor, rppHandle_t rppHandle);
#endif // GPU_SUPPORT

/*! @}
*/

Expand Down
37 changes: 37 additions & 0 deletions src/include/cpu/rpp_cpu_common.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,21 @@ struct RPPTensorFunctionMetaData
};
#endif // GPU_SUPPORT

// Computes strides for ND Tensor
inline void compute_strides(Rpp32u *strides, Rpp32u *shape, Rpp32u tensorDim)
{
if (tensorDim > 0)
{
Rpp32u v = 1;
for (Rpp32u i = tensorDim - 1; i > 0; i--)
{
strides[i] = v;
v *= shape[i];
}
strides[0] = v;
}
}

// Uses fast inverse square root algorithm from Lomont, C., 2003. FAST INVERSE SQUARE ROOT. [online] lomont.org. Available at: <http://www.lomont.org/papers/2003/InvSqrt.pdf>
inline float rpp_host_math_inverse_sqrt_1(float x)
{
Expand Down Expand Up @@ -6444,4 +6459,26 @@ inline void reduce_max_i48_host(__m128i *pMaxR, __m128i *pMaxG, __m128i *pMaxB,
result[0] = _mm_max_epi8(_mm_unpacklo_epi64(px[0], zero), _mm_unpackhi_epi64(px[0], zero));
}

inline void compute_transpose4x8_avx(__m256 *pSrc, __m128 *pDst)
{
__m256 tmp0, tmp1, tmp2, tmp3;
tmp0 = _mm256_shuffle_ps(pSrc[0], pSrc[1], 0x44); /* shuffle to get [P01|P02|P09|P10|P05|P06|P13|P14] */
tmp2 = _mm256_shuffle_ps(pSrc[0], pSrc[1], 0xEE); /* shuffle to get [P03|P04|P11|P12|P07|P08|P15|P16] */
tmp1 = _mm256_shuffle_ps(pSrc[2], pSrc[3], 0x44); /* shuffle to get [P17|P18|P25|P26|P21|P22|P29|P30] */
tmp3 = _mm256_shuffle_ps(pSrc[2], pSrc[3], 0xEE); /* shuffle to get [P19|P20|P27|P28|P23|P24|P31|P32] */
pSrc[0] = _mm256_shuffle_ps(tmp0, tmp1, 0x88); /* shuffle to get [P01|P09|P17|P25|P05|P13|P21|P29] */
pSrc[1] = _mm256_shuffle_ps(tmp0, tmp1, 0xDD); /* shuffle to get [P02|P10|P18|P26|P06|P14|P22|P30] */
pSrc[2] = _mm256_shuffle_ps(tmp2, tmp3, 0x88); /* shuffle to get [P03|P11|P19|P27|P07|P15|P23|P31] */
pSrc[3] = _mm256_shuffle_ps(tmp2, tmp3, 0xDD); /* shuffle to get [P04|P12|P20|P28|P08|P16|P24|P32] */

pDst[0] = _mm256_castps256_ps128(pSrc[0]); /* extract [P01|P09|P17|P25] */
pDst[1] = _mm256_castps256_ps128(pSrc[1]); /* extract [P02|P10|P18|P26] */
pDst[2] = _mm256_castps256_ps128(pSrc[2]); /* extract [P03|P11|P19|P27] */
pDst[3] = _mm256_castps256_ps128(pSrc[3]); /* extract [P04|P12|P20|P28] */
pDst[4] = _mm256_extractf128_ps(pSrc[0], 1); /* extract [P05|P13|P21|P29] */
pDst[5] = _mm256_extractf128_ps(pSrc[1], 1); /* extract [P06|P14|P22|P30] */
pDst[6] = _mm256_extractf128_ps(pSrc[2], 1); /* extract [P07|P15|P23|P31] */
pDst[7] = _mm256_extractf128_ps(pSrc[3], 1); /* extract [P08|P16|P24|P32] */
}

#endif //RPP_CPU_COMMON_H
1 change: 1 addition & 0 deletions src/modules/cpu/host_tensor_geometric_augmentations.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ SOFTWARE.
#include "kernel/warp_affine.hpp"
#include "kernel/phase.hpp"
#include "kernel/slice.hpp"
#include "kernel/transpose.hpp"
#include "kernel/crop_and_patch.hpp"
#include "kernel/flip_voxel.hpp"

Expand Down
15 changes: 0 additions & 15 deletions src/modules/cpu/kernel/normalize.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,21 +26,6 @@ SOFTWARE.
#include "rpp_cpu_simd.hpp"
#include "rpp_cpu_common.hpp"

// Computes strides
void compute_strides(Rpp32u *strides, Rpp32u *shape, Rpp32u tensorDim)
{
if (tensorDim > 0)
{
Rpp32u v = 1;
for (Rpp32u i = tensorDim - 1; i > 0; i--)
{
strides[i] = v;
v *= shape[i];
}
strides[0] = v;
}
}

// Recursive reduction helper function to compute difference of input with mean and squares them up
template<typename T>
void compute_diff_square_sum(Rpp32f &output, T *input, Rpp32s inputStride, Rpp32s numElements, Rpp32f mean)
Expand Down
Loading