Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPP Glitch on HOST and HIP #357

Merged
merged 129 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
7c81458
Adds Tensor Implementation for glitch Augmentation
snehaa8 Nov 29, 2022
8ebba31
Add c++ implementation for u8Pkd3-pkd3 pln3-pln3 for glitch augmenta…
HazarathKumarM Nov 30, 2022
3a1c15d
Add avx implementation for u8 pln3-pln3 for glitch augmentation
HazarathKumarM Dec 1, 2022
adb68b7
implement avx optimization for pln3-pln3 glitch augmentation
HazarathKumarM Dec 5, 2022
62b5fae
Add Avx implementation for pkd3-pkd3 and pkd3-pln3 conversions for gl…
HazarathKumarM Dec 5, 2022
4e4718e
Add pln3-pln3,pkd3-pln3 conversions and updates tensor performance sc…
HazarathKumarM Dec 6, 2022
557b548
Add avx implementation for u8 pln3-pkd3 and pkd3-pkd3 conversion in g…
HazarathKumarM Dec 7, 2022
aba8d03
Add tensor implementation for i8pln3-pln3 and i8pln3 - pkd3
HazarathKumarM Dec 8, 2022
bccb22b
Add f32-f32 tensor conversions for glitch augmentation
HazarathKumarM Dec 12, 2022
86dfb82
Add f16 Tensor implementations for glitch augmentations
HazarathKumarM Dec 12, 2022
d990b8c
Add changes in pkd3-pln3 tensor conversion
HazarathKumarM Dec 13, 2022
7796399
optimize Pkd3-pkd3 conversion
HazarathKumarM Dec 14, 2022
7610c05
Add Hip tensor implementation for glitch kernel
HazarathKumarM Dec 22, 2022
0914c3e
fixed bugs in Glitch tensor host backend
HazarathKumarM Dec 30, 2022
68aadc5
cleanup glitch Tensor HOST
HazarathKumarM Jan 10, 2023
5447365
resolve merge conflicts
HazarathKumarM Jan 17, 2023
fc5a354
resolve merge conflicts
HazarathKumarM Jan 18, 2023
6a91378
code cleanup
HazarathKumarM Jan 18, 2023
db74868
minor changes
HazarathKumarM Jan 18, 2023
6688de0
Merge remote-tracking branch 'TOT/master' into hk/glitch_tensor
fiona-gladwin Oct 3, 2023
933711d
Address review comments
HazarathKumarM Oct 3, 2023
f4dec17
Add glitch test case in new test suite
HazarathKumarM Oct 3, 2023
3bf34fa
fixed minor bugs with glitch addition in test suite
sampath1117 Oct 3, 2023
3415efe
Merge TOT develop into glitch branch
HazarathKumarM Nov 26, 2023
b5246b8
modify glitch host code to use AVX2 instructions
HazarathKumarM Dec 27, 2023
389da0b
modify glitch hip pln code to use 8 pixel load/store
HazarathKumarM Dec 27, 2023
bb57381
fix hip pln golden outputs mismatch
HazarathKumarM Jan 2, 2024
cc1d93f
modify glitch hip pkd code to use 8 pixel load/store
HazarathKumarM Jan 8, 2024
dcd54ac
experimental changes for adding qa mode for performance tests
sampath1117 Jan 18, 2024
50138ea
made changes to add display more information w.r.t QA results summary…
sampath1117 Jan 19, 2024
f3bea0e
minor changes
sampath1117 Jan 19, 2024
58a915a
Add changes to dump qa results to excel file
HazarathKumarM Jan 23, 2024
03c5d27
Add performance QA for three new tensor functions
HazarathKumarM Jan 23, 2024
a5d49a0
update prerequisites in readme
HazarathKumarM Jan 23, 2024
2e0f922
merged latest changes
HazarathKumarM Jan 23, 2024
c097f74
added changes to handle unsupported cases
sampath1117 Jan 23, 2024
ba860e4
removed treshold dictionary and added performance Noise treshold
HazarathKumarM Jan 24, 2024
a7f17f7
RPP Test Suite Upgrade 4 - CSV to BIN conversions for file size reduc…
r-abishek Jan 26, 2024
cafd1ae
Address review comments
HazarathKumarM Jan 29, 2024
4c96b0a
Merge with latest changes
HazarathKumarM Jan 29, 2024
1c1190d
Merge remote-tracking branch 'TOT/develop' into hk/glitch_tensor
HazarathKumarM Jan 29, 2024
d6a86ea
Add .bin golden output
HazarathKumarM Jan 29, 2024
f6ee505
Merge remote-tracking branch 'TOT/develop' into sr/qa_perf
snehaa8 Jan 29, 2024
b9c8cdc
Merge pull request #221 from sampath1117/sr/qa_perf
r-abishek Jan 29, 2024
f55dea7
Rgb offsets structure changes
HazarathKumarM Jan 30, 2024
fa0e3ab
Changes to the performane summary dataframe
HazarathKumarM Jan 30, 2024
35f26f0
Merge branch 'sr/qa_perf' of https://github.com/sampath1117/rpp into …
HazarathKumarM Jan 30, 2024
067d575
minor changes
HazarathKumarM Jan 30, 2024
827f164
Address review comments
HazarathKumarM Jan 30, 2024
5408894
minor changes
HazarathKumarM Jan 30, 2024
8ed21ae
Update CMakeLists.txt to add ${CMAKE_CURRENT_SOURCE_DIR} for CI
r-abishek Jan 31, 2024
1cdfcde
Update CMakeLists.txt fix
r-abishek Jan 31, 2024
f82145e
Update CMakeLists.txt fix
r-abishek Jan 31, 2024
0c4f413
remove tabulate dependency
HazarathKumarM Jan 31, 2024
407360d
Merge pull request #229 from sampath1117/sr/qa_perf
r-abishek Jan 31, 2024
511a7af
Update README.md to remove tabulate pip install
r-abishek Jan 31, 2024
a1f4213
License - updates to 2024 and consistency changes (#298)
r-abishek Jan 31, 2024
7096c1d
Test - Update README.md for test_suite (#299)
r-abishek Jan 31, 2024
dab18e0
Merge branch 'master' of https://github.com/GPUOpen-ProfessionalCompu…
r-abishek Feb 1, 2024
7e8f7c1
Merge remote-tracking branch 'abishek/ar/opt_glitch' into hk/glitch_t…
HazarathKumarM Feb 1, 2024
07a5f66
Bump rocm-docs-core[api_reference] from 0.33.0 to 0.33.1 in /docs/sph…
dependabot[bot] Feb 6, 2024
a5e5679
Bump rocm-docs-core[api_reference] from 0.33.1 to 0.33.2 in /docs/sph…
dependabot[bot] Feb 7, 2024
37186bb
Fix for CI machine failure
r-abishek Feb 7, 2024
b889a79
Add note on performance
r-abishek Feb 8, 2024
0bb77d0
Merge remote-tracking branch 'abishek/ar/test_suite_upgrade_5_qa_perf…
HazarathKumarM Feb 8, 2024
e8aa6b2
Update doc codeowners (#303)
samjwu Feb 8, 2024
a921332
Documentation - Bump rocm-docs-core[api_reference] from 0.33.2 to 0.3…
dependabot[bot] Feb 9, 2024
30bed4e
Test suite - upgrade 5 qa perf (#305)
kiritigowda Feb 9, 2024
5c423ab
RPP Color Temperature on HOST and HIP (#271)
r-abishek Feb 9, 2024
df6e2c9
RPP Voxel 3D Tensor Add/Subtract scalar on HOST and HIP (#272)
r-abishek Feb 9, 2024
a4ed137
RPP Magnitude on HOST and HIP (#278)
r-abishek Feb 14, 2024
0d50719
Change glitch Algorithm
sampath1117 Feb 14, 2024
1976cbf
Bump rocm-docs-core[api_reference] from 0.34.0 to 0.34.2 in /docs/sph…
dependabot[bot] Feb 16, 2024
ec8f2f0
RPP Tensor Audio Support - Down Mixing (#296)
r-abishek Feb 16, 2024
29a5c82
RPP Voxel 3D Tensor Multiply scalar on HOST and HIP (#306)
r-abishek Feb 16, 2024
98a3c82
Test Suite Bugfix (#307)
r-abishek Feb 16, 2024
4b3e32f
code cleanup
sampath1117 Feb 23, 2024
608225b
Bump rocm-docs-core[api_reference] from 0.34.2 to 0.35.0 in /docs/sph…
dependabot[bot] Feb 23, 2024
a7ef385
RPP Reduction - Tensor min and Tensor max on HOST and HIP (#260)
r-abishek Feb 24, 2024
473cde4
CI - Update precheckin.groovy
kiritigowda Feb 24, 2024
12dda99
Address review comments
sampath1117 Mar 4, 2024
d463776
merged latest changes
sampath1117 Mar 4, 2024
1c5d29e
change Algorithm for f32 variants
sampath1117 Mar 5, 2024
c33af22
Bump rocm-docs-core[api_reference] from 0.35.0 to 0.35.1 in /docs/sph…
dependabot[bot] Mar 6, 2024
14f6334
Bump rocm-docs-core[api_reference] from 0.35.1 to 0.36.0 in /docs/sph…
dependabot[bot] Mar 12, 2024
95c3272
Merge branch 'master' into develop
kiritigowda Mar 12, 2024
641f653
Docs - Bump rocm-docs-core[api_reference] from 0.36.0 to 0.37.0 in /d…
dependabot[bot] Mar 20, 2024
5568573
Link cleanup (#326)
LisaDelaney Mar 20, 2024
a6749ba
Update notes
LisaDelaney Mar 20, 2024
a255906
Docs - Bump rocm-docs-core[api_reference] from 0.37.0 to 0.37.1 in /d…
dependabot[bot] Mar 22, 2024
d3df761
RPP Voxel Flip on HIP and HOST (#285)
r-abishek Mar 23, 2024
ebecb42
RPP Vignette Tensor on HOST and HIP (#311)
r-abishek Mar 23, 2024
fc1410b
Bump rocm-docs-core[api_reference] from 0.37.1 to 0.38.0 in /docs/sph…
dependabot[bot] Mar 27, 2024
3ebd7c3
RPP Tensor Audio Support - Resample (#310)
r-abishek Apr 3, 2024
76f31df
Docs - Missing input and output images for Doxygen (#331)
r-abishek Apr 3, 2024
b83f910
Scratch buffers rename for HOST and HIP (#324)
r-abishek Apr 3, 2024
ebeb131
Update CMakeLists.txt
kiritigowda Apr 3, 2024
6930465
RPP BitwiseAND and BitwiseOR Tensor on HOST and HIP (#318)
r-abishek Apr 9, 2024
fb15035
Merge remote-tracking branch 'abishek/develop' into hk/glitch_tensor
sampath1117 Apr 10, 2024
2a81f7a
Merge remote-tracking branch 'TOT/develop' into hk/glitch_tensor
sampath1117 May 2, 2024
6287533
Address review comments
sampath1117 May 6, 2024
daabb36
Address review comments
sampath1117 May 7, 2024
77e14ef
Minor common-fixes for HIP (#345)
r-abishek May 7, 2024
34f3f6d
Readme Updates: --usecase=rocm (#349)
kiritigowda May 8, 2024
ab52683
RPP Tensor Audio Support - Spectrogram (#312)
r-abishek May 8, 2024
ee0d6fe
Update CHANGELOG.md (#352)
r-abishek May 8, 2024
2decd32
RPP Tensor Audio Support - Slice (#325)
r-abishek May 8, 2024
30ce1d6
RPP Tensor Audio Support - MelFilterBank (#332)
r-abishek May 8, 2024
64ae74f
RPP Tensor Normalize ND on HOST and HIP (#335)
r-abishek May 9, 2024
1a3015c
SWDEV-459739 - Remove the package obsolete setting (#353)
raramakr May 9, 2024
4cb8d4b
Audio support merge commit fixes (#354)
r-abishek May 9, 2024
5f8f024
Address review comments
sampath1117 May 10, 2024
d3f8ff7
Merge remote-tracking branch 'abishek/develop' into hk/glitch_tensor
sampath1117 May 10, 2024
ca2f88f
clean up the code
sampath1117 May 10, 2024
1d054f9
Address review comments
sampath1117 May 14, 2024
89a57cc
Add Compiler AVX2 compiler flag
sampath1117 May 15, 2024
a798a4a
Merge pull request #199 from fiona-gladwin/hk/glitch_tensor
r-abishek May 17, 2024
d42af4e
Merge branch 'develop' of https://github.com/ROCm/rpp into ar/opt_glitch
r-abishek May 28, 2024
9f29643
Merge branch 'develop' into ar/opt_glitch
kiritigowda May 29, 2024
9daf678
Merge branch 'develop' into ar/opt_glitch
kiritigowda Jun 4, 2024
6c4d5df
Merge branch 'develop' into ar/opt_glitch
r-abishek Jun 5, 2024
7d5855b
Add comments on compute_glitch_locs_hip()
r-abishek Jun 6, 2024
aa8d55e
Merge branch 'develop' into ar/opt_glitch
r-abishek Jun 6, 2024
89a49d4
remove the 75% width consideration for aligned length
HazarathKumarM Jun 7, 2024
6f295a4
Merge pull request #278 from fiona-gladwin/hk/glitch_tensor
r-abishek Jun 7, 2024
0d4a124
Minor fix
r-abishek Jun 7, 2024
6eda460
Merge branch 'develop' into ar/opt_glitch
r-abishek Jun 7, 2024
61a1260
Merge branch 'develop' into ar/opt_glitch
r-abishek Jun 11, 2024
443a502
Merge branch 'develop' into ar/opt_glitch
kiritigowda Jun 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions include/rppdefs.h
Original file line number Diff line number Diff line change
Expand Up @@ -446,6 +446,16 @@ typedef struct

} RpptRoiLtrb;

/*! \brief RPPT Tensor Channel Offsets struct
* \ingroup group_rppdefs
*/
typedef struct
{
RppiPoint r;
RppiPoint g;
RppiPoint b;
} RpptChannelOffsets;

/*! \brief RPPT Tensor 3D ROI LTFRBB struct
* \ingroup group_rppdefs
*/
Expand Down
44 changes: 44 additions & 0 deletions include/rppt_tensor_effects_augmentations.h
Original file line number Diff line number Diff line change
Expand Up @@ -565,4 +565,48 @@ RppStatus rppt_erase_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPt
#ifdef __cplusplus
}
#endif

/*! \brief Glitch augmentation on HOST backend for a NCHW/NHWC layout tensor
* \details The glitch augmentation adds a glitch effect for a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \image html img150x150.jpg Sample Input
* \image html effects_augmentations_glitch_img150x150.jpg Sample Output
* \param [in] srcPtr source tensor in HOST memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HOST memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] rgbOffsets RGB offset values to use for the glitch augmentation (A single set of 3 Rppi point values that applies to all images in the batch.
* For each point and for each image in the batch: 0 < point.x < width, 0 < point.y < height)
* \param [in] roiTensorSrc ROI data for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HOST handle created with <tt>\ref rppCreateWithBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_glitch_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptChannelOffsets *rgbOffsets, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);

#ifdef GPU_SUPPORT
/*! \brief Glitch augmentation on HIP backend for a NCHW/NHWC layout tensor
* \details The glitch augmentation adds a glitch effect for a batch of RGB(3 channel) / greyscale(1 channel) images with an NHWC/NCHW tensor layout.<br>
* - srcPtr depth ranges - Rpp8u (0 to 255), Rpp16f (0 to 1), Rpp32f (0 to 1), Rpp8s (-128 to 127).
* - dstPtr depth ranges - Will be same depth as srcPtr.
* \image html img150x150.jpg Sample Input
* \image html effects_augmentations_glitch_img150x150.jpg Sample Output
* \param [in] srcPtr source tensor in HIP memory
* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 1/3)
* \param [out] dstPtr destination tensor in HIP memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] rgbOffsets RGB offset values to use for the glitch augmentation (A 1D tensor in pinned/HOST memory contains single set of 3 Rppi point values that applies to all images in the batch.
* For each point and for each image in the batch: 0 < point.x < width, 0 < point.y < height)
* \param [in] roiTensorSrc ROI data for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))
* \param [in] roiType ROI type used (RpptRoiType::XYWH or RpptRoiType::LTRB)
* \param [in] rppHandle RPP HIP handle created with <tt>\ref rppCreateWithStreamAndBatchSize()</tt>
* \return A <tt> \ref RppStatus</tt> enumeration.
* \retval RPP_SUCCESS Successful completion.
* \retval RPP_ERROR* Unsuccessful completion.
*/
RppStatus rppt_glitch_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, RpptChannelOffsets *rgbOffsets, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
#endif // GPU_SUPPORT
#endif // RPPT_TENSOR_EFFECTS_AUGMENTATIONS_H
97 changes: 97 additions & 0 deletions src/include/cpu/rpp_cpu_simd.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,10 @@ const __m256i avx_pxShufflePkd = _mm256_setr_m128(xmm_pxStore4Pkd, xmm_pxStore4P
const __m128i xmm_pxMask00 = _mm_setr_epi8(0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0, 1, 2, 3);
const __m128i xmm_pxMask04To11 = _mm_setr_epi8(4, 5, 6, 7, 8, 9, 10, 11, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80);

const __m256i avx_pxMaskR = _mm256_setr_epi8(0, 0x80, 0x80, 3, 0x80, 0x80, 6, 0x80, 0x80, 9, 0x80, 0x80, 12, 0x80, 0x80, 15, 0x80, 0x80, 18, 0x80, 0x80, 21, 0x80, 0x80, 24, 0x80, 0x80, 27, 0x80, 0x80, 0x80, 0x80);
const __m256i avx_pxMaskG = _mm256_setr_epi8(0x80, 1, 0x80, 0x80, 4, 0x80, 0x80, 7, 0x80, 0x80, 10, 0x80, 0x80, 13, 0x80, 0x80, 16, 0x80, 0x80, 19, 0x80, 0x80, 22, 0x80, 0x80, 25, 0x80, 0x80, 28, 0x80, 0x80, 0x80);
const __m256i avx_pxMaskB = _mm256_setr_epi8(0x80, 0x80, 2, 0x80, 0x80, 5, 0x80, 0x80, 8, 0x80, 0x80, 11, 0x80, 0x80, 14, 0x80, 0x80, 17, 0x80, 0x80, 20, 0x80, 0x80, 23, 0x80, 0x80, 26, 0x80, 0x80, 29, 0x80, 0x80);

// Print helpers

inline void rpp_mm_print_epi8(__m128i vPrintArray)
Expand Down Expand Up @@ -1021,6 +1025,99 @@ inline void rpp_load48_u8pkd3_to_f32pln3_avx(Rpp8u *srcPtr, __m256 *p)
p[5] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[2], xmm_pxMaskB), _mm_shuffle_epi8(px[3], xmm_pxMaskB))); /* Contains B09-16 */
}

inline void rpp_glitch_load24_u8pkd3_to_f32pln3_avx(Rpp8u *srcPtr, __m256 *p, int *srcLocs)
{
__m128i px[2];
px[0] = _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[0])); /* load [R01|G01|B01|R02|G02|B02|R03|G03|B03|R04|G04|B04|R05|G05|B05|R06] - Need R01-04 */
px[1] = _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[0] + 12)); /* load [R05|G05|B05|R06|G06|B06|R07|G07|B07|R08|G08|B08|R09|G09|B09|R10] - Need R05-08 */
p[0] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[0], xmm_pxMaskR), _mm_shuffle_epi8(px[1], xmm_pxMaskR))); /* Contains R01-08 */

px[0] = _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[1])); /* load [R01|G01|B01|R02|G02|B02|R03|G03|B03|R04|G04|B04|R05|G05|B05|R06] - Need G01-04 */
px[1] = _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[1] + 12)); /* load [R05|G05|B05|R06|G06|B06|R07|G07|B07|R08|G08|B08|R09|G09|B09|R10] - Need G05-08 */
p[1] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[0], xmm_pxMaskG), _mm_shuffle_epi8(px[1], xmm_pxMaskG))); /* Contains G01-08 */

px[0] = _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[2])); /* load [R01|G01|B01|R02|G02|B02|R03|G03|B03|R04|G04|B04|R05|G05|B05|R06] - Need B01-04 */
px[1] = _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[2] + 12)); /* load [R05|G05|B05|R06|G06|B06|R07|G07|B07|R08|G08|B08|R09|G09|B09|R10] - Need B05-08 */
p[2] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[0], xmm_pxMaskB), _mm_shuffle_epi8(px[1], xmm_pxMaskB))); /* Contains B01-08 */
}

inline void rpp_glitch_load24_f32pkd3_to_f32pln3_avx(Rpp32f *srcPtr, __m256 *p, int *srcLocs)
{
__m128 p128[8];
Rpp32f *srcPtrTemp = srcPtr + srcLocs[0];
p[0] = _mm256_setr_ps(*srcPtrTemp, *(srcPtrTemp + 3), *(srcPtrTemp + 6), *(srcPtrTemp + 9),
*(srcPtrTemp + 12), *(srcPtrTemp + 15), *(srcPtrTemp + 18), *(srcPtrTemp + 21));
srcPtrTemp = srcPtr + srcLocs[1];
p[1] = _mm256_setr_ps(*(srcPtrTemp + 1), *(srcPtrTemp + 4), *(srcPtrTemp + 7), *(srcPtrTemp + 10),
*(srcPtrTemp + 13), *(srcPtrTemp + 16), *(srcPtrTemp + 19), *(srcPtrTemp + 22));
srcPtrTemp = srcPtr + srcLocs[2];
p[2] = _mm256_setr_ps(*(srcPtrTemp + 2), *(srcPtrTemp + 5), *(srcPtrTemp + 8), *(srcPtrTemp + 11),
*(srcPtrTemp + 14), *(srcPtrTemp + 17), *(srcPtrTemp + 20), *(srcPtrTemp + 23));
}

inline void rpp_glitch_load24_i8pkd3_to_f32pln3_avx(Rpp8s *srcPtr, __m256 *p, int *srcLocs)
{
__m128i px[2];
px[0] = _mm_add_epi8(xmm_pxConvertI8, _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[0]))); /* load [R01|G01|B01|R02|G02|B02|R03|G03|B03|R04|G04|B04|R05|G05|B05|R06] - Need R01-04 */
px[1] = _mm_add_epi8(xmm_pxConvertI8, _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[0] + 12))); /* load [R05|G05|B05|R06|G06|B06|R07|G07|B07|R08|G08|B08|R09|G09|B09|R10] - Need R05-08 */
p[0] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[0], xmm_pxMaskR), _mm_shuffle_epi8(px[1], xmm_pxMaskR))); /* Contains R01-08 */

px[0] = _mm_add_epi8(xmm_pxConvertI8, _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[1]))); /* load [R01|G01|B01|R02|G02|B02|R03|G03|B03|R04|G04|B04|R05|G05|B05|R06] - Need G01-04 */
px[1] = _mm_add_epi8(xmm_pxConvertI8, _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[1] + 12))); /* load [R05|G05|B05|R06|G06|B06|R07|G07|B07|R08|G08|B08|R09|G09|B09|R10] - Need G05-08 */
p[1] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[0], xmm_pxMaskG), _mm_shuffle_epi8(px[1], xmm_pxMaskG))); /* Contains G01-08 */

px[0] = _mm_add_epi8(xmm_pxConvertI8, _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[2]))); /* load [R01|G01|B01|R02|G02|B02|R03|G03|B03|R04|G04|B04|R05|G05|B05|R06] - Need B01-04 */
px[1] = _mm_add_epi8(xmm_pxConvertI8, _mm_loadu_si128((__m128i *)(srcPtr + srcLocs[2] + 12))); /* load [R05|G05|B05|R06|G06|B06|R07|G07|B07|R08|G08|B08|R09|G09|B09|R10] - Need B05-08 */
p[2] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[0], xmm_pxMaskB), _mm_shuffle_epi8(px[1], xmm_pxMaskB))); /* Contains B01-08 */
}

inline void rpp_glitch_load30_u8pkd3_to_u8pkd3_avx(Rpp8u *srcPtr, int *srcLocs, __m256i &p)
{
__m256i px[3];
px[0] = _mm256_loadu_si256((__m256i *)(srcPtr + srcLocs[0])); // Load the source location1 values passed
px[1] = _mm256_loadu_si256((__m256i *)(srcPtr + srcLocs[1])); // Load the source location2 values passed
px[2] = _mm256_loadu_si256((__m256i *)(srcPtr + srcLocs[2])); // Load the source location3 values passed
px[0] = _mm256_shuffle_epi8(px[0], avx_pxMaskR); /* Shuffle to obtain R channel values */
px[1] = _mm256_shuffle_epi8(px[1], avx_pxMaskG); /* Shuffle to obtain G channel values */
px[2] = _mm256_shuffle_epi8(px[2], avx_pxMaskB); /* Shuffle to obtain B channel values */
px[0] = _mm256_or_si256(px[0], px[1]); /* Pack R and G channels to obtain RG format */
p = _mm256_or_si256(px[0], px[2]); /* Pack RG values and B channel to obtain RGB format */
}

inline void rpp_glitch_load30_i8pkd3_to_i8pkd3_avx(Rpp8s *srcPtr, int * srcLocs, __m256i &p)
{
__m256i px[3];
px[0] = _mm256_loadu_si256((__m256i *)(srcPtr + srcLocs[0])); // Load the source location1 values passed
px[1] = _mm256_loadu_si256((__m256i *)(srcPtr + srcLocs[1])); // Load the source location2 values passed
px[2] = _mm256_loadu_si256((__m256i *)(srcPtr + srcLocs[2])); // Load the source location3 values passed
px[0] = _mm256_shuffle_epi8(px[0], avx_pxMaskR); /* Shuffle to obtain R channel values */
px[1] = _mm256_shuffle_epi8(px[1], avx_pxMaskG); /* Shuffle to obtain G channel values */
px[2] = _mm256_shuffle_epi8(px[2], avx_pxMaskB); /* Shuffle to obtain B channel values */
px[0] = _mm256_or_si256(px[0], px[1]); /* Pack R and G channels to obtain RG format */
p = _mm256_or_si256(px[0], px[2]); /* Pack RG values and B channel to obtain RGB format */
}

inline void rpp_glitch_load6_f32pkd3_to_f32pkd3_avx(Rpp32f *srcPtr, int * srcLocs, __m256 &p)
{
p =_mm256_setr_ps(*(srcPtr + srcLocs[0]), *(srcPtr + srcLocs[1] + 1), *(srcPtr + srcLocs[2] + 2), *(srcPtr + srcLocs[0] + 3),
*(srcPtr + srcLocs[1] + 4), *(srcPtr + srcLocs[2] + 5), 0.0f, 0.0f);
}

inline void rpp_glitch_load48_u8pln3_to_f32pln3_avx(Rpp8u *srcPtrR, Rpp8u *srcPtrG, Rpp8u *srcPtrB, __m256 *p, int *srcLocs)
{
__m128i px[3];

px[0] = _mm_loadu_si128((__m128i *)srcPtrR + srcLocs[0]); /* load [R01|R02|R03|R04|R05|R06|R07|R08|R09|R10|R11|R12|R13|R14|R15|R16] */
px[1] = _mm_loadu_si128((__m128i *)srcPtrG + srcLocs[1]); /* load [G01|G02|G03|G04|G05|G06|G07|G08|G09|G10|G11|G12|G13|G14|G15|G16] */
px[2] = _mm_loadu_si128((__m128i *)srcPtrB + srcLocs[2]); /* load [B01|B02|B03|B04|B05|B06|B07|B08|B09|B10|B11|B12|B13|B14|B15|B16] */
p[0] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[0], xmm_pxMask00To03), _mm_shuffle_epi8(px[0], xmm_pxMask04To07))); /* Contains R01-08 */
p[1] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[0], xmm_pxMask08To11), _mm_shuffle_epi8(px[0], xmm_pxMask12To15))); /* Contains R09-16 */
p[2] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[1], xmm_pxMask00To03), _mm_shuffle_epi8(px[1], xmm_pxMask04To07))); /* Contains G01-08 */
p[3] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[1], xmm_pxMask08To11), _mm_shuffle_epi8(px[1], xmm_pxMask12To15))); /* Contains G09-16 */
p[4] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[2], xmm_pxMask00To03), _mm_shuffle_epi8(px[2], xmm_pxMask04To07))); /* Contains B01-08 */
p[5] = _mm256_cvtepi32_ps(_mm256_setr_m128i(_mm_shuffle_epi8(px[2], xmm_pxMask08To11), _mm_shuffle_epi8(px[2], xmm_pxMask12To15))); /* Contains B09-16 */
}

inline void rpp_load48_u8pkd3_to_f32pln3_mirror_avx(Rpp8u *srcPtr, __m256 *p)
{
__m128i px[4];
Expand Down
1 change: 1 addition & 0 deletions src/modules/cpu/host_tensor_effects_augmentations.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ SOFTWARE.
#include "kernel/noise_shot.hpp"
#include "kernel/noise_gaussian.hpp"
#include "kernel/non_linear_blend.hpp"
#include "kernel/glitch.hpp"
#include "kernel/water.hpp"
#include "kernel/ricap.hpp"
#include "kernel/vignette.hpp"
Expand Down
Loading