Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HIP] fix host min/max in header #82956

Merged
merged 1 commit into from
Feb 27, 2024
Merged

[HIP] fix host min/max in header #82956

merged 1 commit into from
Feb 27, 2024

Conversation

yxsamliu
Copy link
Collaborator

CUDA defines min/max functions for host in global namespace. HIP header needs to define them too to be compatible. Currently only min/max(int, int) is defined. This causes wrong result for arguments that are out of range for int. This patch defines host min/max functions to be compatible with CUDA.

Fixes: SWDEV-446564

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Feb 26, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 26, 2024

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

CUDA defines min/max functions for host in global namespace. HIP header needs to define them too to be compatible. Currently only min/max(int, int) is defined. This causes wrong result for arguments that are out of range for int. This patch defines host min/max functions to be compatible with CUDA.

Fixes: SWDEV-446564


Full diff: https://github.com/llvm/llvm-project/pull/82956.diff

1 Files Affected:

  • (modified) clang/lib/Headers/__clang_hip_math.h (+42-6)
diff --git a/clang/lib/Headers/__clang_hip_math.h b/clang/lib/Headers/__clang_hip_math.h
index 11e1e7d032586f..e87070e41b0ffc 100644
--- a/clang/lib/Headers/__clang_hip_math.h
+++ b/clang/lib/Headers/__clang_hip_math.h
@@ -1306,14 +1306,50 @@ float min(float __x, float __y) { return __builtin_fminf(__x, __y); }
 __DEVICE__
 double min(double __x, double __y) { return __builtin_fmin(__x, __y); }
 
+// Define host min/max functions.
+
 #if !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
-__host__ inline static int min(int __arg1, int __arg2) {
-  return __arg1 < __arg2 ? __arg1 : __arg2;
-}
 
-__host__ inline static int max(int __arg1, int __arg2) {
-  return __arg1 > __arg2 ? __arg1 : __arg2;
-}
+#pragma push_macro("DEFINE_MIN_MAX_FUNCTIONS")
+#define DEFINE_MIN_MAX_FUNCTIONS(type1, type2) \
+static inline auto min(const type1 __a, const type2 __b) \
+  -> typename std::remove_reference<decltype(__a < __b ? __a : __b)>::type { \
+  return (__a < __b) ? __a : __b; \
+} \
+static inline auto max(const type1 __a, const type2 __b) \
+  -> typename std::remove_reference<decltype(__a > __b ? __a : __b)>::type { \
+  return (__a > __b) ? __a : __b; \
+}
+
+// Define min and max functions for same type comparisons
+DEFINE_MIN_MAX_FUNCTIONS(int, int)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned int, unsigned int)
+DEFINE_MIN_MAX_FUNCTIONS(long, long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long, unsigned long)
+DEFINE_MIN_MAX_FUNCTIONS(long long, long long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, unsigned long long)
+
+// Define min and max functions for mixed type comparisons
+DEFINE_MIN_MAX_FUNCTIONS(int, unsigned int)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned int, int)
+DEFINE_MIN_MAX_FUNCTIONS(long, unsigned long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long, long)
+DEFINE_MIN_MAX_FUNCTIONS(long long, unsigned long long)
+DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, long long)
+
+// Floating-point comparisons using built-in functions
+static inline float min(float const __a, float const __b) { return __builtin_fminf(__a, __b); }
+static inline double min(double const __a, double const __b) { return __builtin_fmin(__a, __b); }
+static inline double min(float const __a, double const __b) { return __builtin_fmin(__a, __b); }
+static inline double min(double const __a, float const __b) { return __builtin_fmin(__a, __b); }
+
+static inline float max(float const __a, float const __b) { return __builtin_fmaxf(__a, __b); }
+static inline double max(double const __a, double const __b) { return __builtin_fmax(__a, __b); }
+static inline double max(float const __a, double const __b) { return __builtin_fmax(__a, __b); }
+static inline double max(double const __a, float const __b) { return __builtin_fmax(__a, __b); }
+
+#pragma pop_macro("DEFINE_MIN_MAX_FUNCTIONS")
+
 #endif // !defined(__HIPCC_RTC__) && !defined(__OPENMP_AMDGCN__)
 #endif
 

Copy link

github-actions bot commented Feb 26, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

}

// Define min and max functions for same type comparisons
DEFINE_MIN_MAX_FUNCTIONS(int, int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not do something like this w/ the appropriate static assertion? Or is there an important restriction on the specific types for this function.

template <typename T, typename U>
static inline auto min(const T &__a, const U &__b) {
  return (__a < __b) ? __a : __b;
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will return reference of __a or __b. that won't work since it is reference to stack var

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I was wrong about the reference.

The issue with the template approach is that it has subtle differences regarding overloading resolution compared with CUDA, which could lead to incompatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not do stuff like static_assert where we check if both types are the same modulo unsigned or const? I'm assuming that would be similar. I'd just prefer to avoid macro magic if possible, but it we really need it for compatibility reasons we can use it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not just limiting types. template function has lower precedence than functions in overloading resolution. If users using std::min we will get different behavior than CUDA

@yxsamliu yxsamliu force-pushed the fix-minmax branch 2 times, most recently from aa50cad to bd87c56 Compare February 26, 2024 15:33
Comment on lines +1333 to +1344
DEFINE_MIN_MAX_FUNCTIONS(unsigned int, int, unsigned int)
DEFINE_MIN_MAX_FUNCTIONS(unsigned int, unsigned int, int)
DEFINE_MIN_MAX_FUNCTIONS(unsigned long, long, unsigned long)
DEFINE_MIN_MAX_FUNCTIONS(unsigned long, unsigned long, long)
DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, long long, unsigned long long)
DEFINE_MIN_MAX_FUNCTIONS(unsigned long long, unsigned long long, long long)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these are needed in order to avoid errors about ambiguous overload resolution when we pass signed/unsigned arguments.

Normally, if we were to use std::min() function, the user would have to explicitly cast arguments or use std::min<unsigned>() to resolve the issue.

Implicitly converting int->unsigned under the hood is probably not a good idea here as we do not know what the user needs/wants and whether it's a WAI or an error. For min/max converting a negative argument into an unsigned would probably be an error. I think we do need to force users to use one of the all-signed or all-unsigned variants here, too, same as with std::min/max.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume these are needed in order to avoid errors about ambiguous overload resolution when we pass signed/unsigned arguments.

Normally, if we were to use std::min() function, the user would have to explicitly cast arguments or use std::min<unsigned>() to resolve the issue.

Implicitly converting int->unsigned under the hood is probably not a good idea here as we do not know what the user needs/wants and whether it's a WAI or an error. For min/max converting a negative argument into an unsigned would probably be an error. I think we do need to force users to use one of the all-signed or all-unsigned variants here, too, same as with std::min/max.

You are right about that std::min/max does not allow mixed signed/unsigned arguments.

Unfortunately, that is how CUDA defines its host min/max functions, e.g. https://godbolt.org/z/Wxxq9dv91

Not following this could cause incompatibility and regressions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not everything CUDA does is the right model to follow. This may be one of the cases where we should improve things, if we can, instead of just copying the broken behavior. Not adding problematic things is easier than removing them later, when they are used, intentionally or not.

Considering that HIP currently does not have those functions, it would suggest that there is probably no existing HIP code depending on them. Existing cuda code which may need those functions will need some amount of porting to HIP, anyway, so fixing the source code could be done as part of the porting effort.

We could put those mixed min/max functions under some preprocessor guard, which would keep them disabled by default. If someone desperately needs them, they would have to specify -DPLEASE_ENABLE_BROKEN_MINMAX_ON_MIXED_SIGNED_UNSIGNED_TYPES.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. will define them only if __HIP_DEFINE_MIXED_HOST_MIN_MAX__ is defined.

@yxsamliu yxsamliu force-pushed the fix-minmax branch 2 times, most recently from c8331bf to f747130 Compare February 26, 2024 21:45
Comment on lines 1332 to 1335
// CUDA defines host min/max functions with mixed signed/unsgined integer
// parameters where signed integers are casted to unsigned integers. However,
// this may not be users' intention. Therefore do not define them by default
// unless users specify -D__HIP_DEFINE_MIXED_HOST_MIN_MAX__.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: signed integers are implicitly promoted to unsigned ones due to the integer promotion rules. Cast would imply intentional cast and we're not doing that.

I'd rephrase it a bit along the lines of:

The routines below will perform unsigned comparison, which may produce invalid results if a signed integer was passed unintentionally. We do not want it happen silently, and do not provide these overloads by default. However for compatibility with CUDA, we allow them, if explicitly requested by the user by defining __HIP_DEFINE_MIXED_HOST_MIN_MAX__.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this PR due to regression in hipCUB:

https://github.com/ROCm/hipCUB/blob/develop/hipcub/include/hipcub/backend/rocprim/device/device_spmv.hpp#L142

hipCUB/hipcub/include/hipcub/backend/rocprim/device/device_spmv.hpp:142:33: error: call to 'min' is ambiguous

it is a call of min(int, unsigned int). The code is likely ported from CUDA.

Probably I need to define those functions with mixed args by default to avoid regressions.

CUDA defines min/max functions for host in global namespace.
HIP header needs to define them too to be compatible.
Currently only min/max(int, int) is defined. This causes
wrong result for arguments that are out of range for int.
This patch defines host min/max functions to be compatible
with CUDA.

Also allows users to define
`__HIP_NO_HOST_MIN_MAX_IN_GLOBAL_NAMESPACE__` to disable
host max/min in global namespace.

min/max functions with mixed signed/unsigned integer
parameters are not defined unless
`__HIP_DEFINE_MIXED_HOST_MIN_MAX__` is defined.

Fixes: SWDEV-446564
@yxsamliu yxsamliu merged commit 55783bd into llvm:main Feb 27, 2024
4 checks passed
yxsamliu added a commit that referenced this pull request Feb 28, 2024
This reverts commit 55783bd.

Due to regressions in hipCUB.

hipCUB/hipcub/include/hipcub/backend/rocprim/device/device_spmv.hpp:142:33: error: call to 'min' is ambiguous

https://github.com/ROCm/hipCUB/blob/develop/hipcub/include/hipcub/backend/rocprim/device/device_spmv.hpp#L142

The ambuguity seems due to missing min(int, unsigned int).

Previously, there is only min(int, int). After the change,
there are min(int, int) and min(unsigned int, unsigned int),
therefore there is ambiguity.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Feb 28, 2024
Reverts due to  build break of hipCUB
commit 55783bd [HIP] fix host min/max in header (llvm#82956)

Change-Id: I76c002a4ac9ccd2df5f40bba1ffa66509835dc33
@Artem-B
Copy link
Member

Artem-B commented Feb 28, 2024

Probably I need to define those functions with mixed args by default to avoid regressions.

Are there any other regressions? Can hupCUB be fixed instead? While their use case is probably benign, I'd rather fix the user code, than propagate CUDA bugs into HIP.

@yxsamliu
Copy link
Collaborator Author

Probably I need to define those functions with mixed args by default to avoid regressions.

Are there any other regressions? Can hupCUB be fixed instead? While their use case is probably benign, I'd rather fix the user code, than propagate CUDA bugs into HIP.

So far we only found this issue in our internal CI. I will try asking hipCUB to fix it on their side.

@yxsamliu
Copy link
Collaborator Author

Probably I need to define those functions with mixed args by default to avoid regressions.

Are there any other regressions? Can hupCUB be fixed instead? While their use case is probably benign, I'd rather fix the user code, than propagate CUDA bugs into HIP.

So far we only found this issue in our internal CI. I will try asking hipCUB to fix it on their side.

hipCUB issue opened ROCm/hipCUB#343

@yxsamliu
Copy link
Collaborator Author

found another library using mixed min: ROCm/Tensile#1977

@yxsamliu
Copy link
Collaborator Author

yet another usage of mixed signed/unsigned min ROCm/hipBLASLt#1227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants