Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce tensor support #70

Merged
merged 63 commits into from
Sep 23, 2021
Merged

Conversation

r-abishek
Copy link
Member

  • Adds all RPP framework headers and changes needed for tensor support.
  • Adds all unit test and performance test support for brightness tensor.

d_half8 dst_h8;

dst_h8.x.x = __float22half2_rn(*(float2 *)&(dst_f8->x));
dst_h8.x.y = __float22half2_rn(*((float2 *)&(dst_f8->x) + 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please try to avoid this +1 in your code. Use the x, y, z, and w components of the dst_f8->x.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


__device__ __forceinline__ void rpp_hip_load24_pkd3_and_unpack_to_float24_pln3(float *srcPtr, uint srcIdx, d_float24 *src_f24)
{
src_f24->x.x.x = srcPtr[srcIdx];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function needs to be rewritten. please don't access the srcPtr memory like the way you've written here by adding some offset. read a float4 each time and then use the x, y, z, w components to fill the src_f24.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


__device__ __forceinline__ void rpp_hip_load24_pkd3_and_unpack_to_float24_pln3(half *srcPtr, uint srcIdx, d_float24 *src_f24)
{
src_f24->x.x.x = __half2float(srcPtr[srcIdx]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the same comments mentioned above for rpp_hip_load24_pkd3_and_unpack_to_float24_pln3 function to rewrite this function as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

srcPtrG = srcPtrR + increment;
srcPtrB = srcPtrG + increment;

src_f24->x.x.x = *srcPtrR;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use vector float4 for reading srcPtr memory and use its components when filling the src_f24.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

typedef enum
{
rppStatusSuccess = 0,
rppStatusBadParm = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better to give -ve values for error codes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@rrawther rrawther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please redo the macros and fix codacy warnings

Copy link
Contributor

@rrawther rrawther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paveltc : Please run all the unit-tests before merge

@r-abishek
Copy link
Member Author

@asalmanp The I8 images now match exactly with the F16/F32/U8 visually. Please let me know if you have any other thoughts too. We could now proceed for the final testing and merge @paveltc


__device__ void brightness_hip_compute(signed char *srcPtr, d_float8 *src_f8, d_float8 *dst_f8, float4 *alpha_f4, float4 *beta_f4)
{
dst_f8->x = rpp_hip_pixel_check((src_f8->x + 128) * *alpha_f4 + *beta_f4) - 128;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src_f8->x is a float4 so if your intention is to add each of the x/y/z/w with 128 then you need to use (float4)128 here.

similarly, you would need to use (float4)128 to subtract in the end of this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats true, but I believe it was doing an auto typecast to float4. In any case, I have added a manual typecast now to be on the safer side.

{
int id_x = (hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x) * 4;

roiTensorPtrSrc[id_x + 2] -= (roiTensorPtrSrc[id_x] - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use int2 to read from the roiTensorPtrSrc instead of using +1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have changed it now. Also, here roiTensorPtrSrc has 0/1/2/3, so I have used int4.

return layoutParams;
}

// inline void copy_roiTensor(RpptROIPtr roiTensorPtrSrc, rpp::Handle& handle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented out codes if it is not needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Cleaned up this rppi_validate file for just some formatting changes, tabs to spaces, and removal of unwanted commented code.


__device__ __forceinline__ float rpp_hip_unpack0(int src)
{
return (float)(signed char)(src & 0xFF);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the casting to (signed char) is needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I found this to be the only way to do the unpack correctly. It takes src, gets the first byte by masking with 0xFF, then we explicitly tell the compiler to interpret the MSB as a signed bit, then convert to float. If I directly convert to float, the most significant bit was interpreted not as a sign, and as part of an 8 bit number.

@AryanSalmanpour
Copy link
Member

@r-abishek please take a look at my latest comments and make the necessary changes.

@rrawther
Copy link
Contributor

@paveltc please run rali and rpp unit-tests on this PR so we can merge it

@paveltc
Copy link
Contributor

paveltc commented Sep 22, 2021

@rrawther @asalmanp This PR passes the unit tests.

@r-abishek
Copy link
Member Author

@rrawther @asalmanp This PR passes the unit tests.

@asalmanp @kiritigowda Could we merge the PR#70 too? The unit tests have been verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants