-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding float to string kernel #1508
Conversation
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far. Some minor comments.
// Range of numbers here is for normalizing the value. | ||
// If the value is above or below the following limits, the output is converted to | ||
// scientific notation in order to show (at most) the number of significant digits. | ||
static constexpr double upper_limit = 10000000; // max is 1x10^7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark's max?
Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>
Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
…k-rapids-jni into thirtiseven-float_to_string
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
…k-rapids-jni into thirtiseven-float_to_string
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good now 👍 . Please run compute-sanitizer
to make sure there is no hidden issue. And do so if there are Spark integration tests too. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question about layout and one include issue.
if (d_chars != nullptr) { | ||
float_to_string(idx); | ||
} else { | ||
d_offsets[idx] = compute_output_size(d_floats.element<FloatType>(idx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd to pass the index into format_float and then pass the float into compute_output_size. Why not pass the index into both for some symmetry and add auto const value = d_floats.element(idx); to compute_output_size? This is a stylistic thing and has no bearing on the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, done.
* limitations under the License. | ||
*/ | ||
|
||
#include "cast_string.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why cast_string.hpp
is quoted here. There isn't a cast_string.hpp
in this directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
build |
Tested with compute-sanitizer and plugin integration tests, merging it. Thanks all for review! |
This PR adds a kernel for casting float to string, to make spark-rapids produce closer results when doing such casting.
Supporting this is a necessary part of
format_number
kernel, so I split it out as a 'subtask'.This PR uses Ryū: fast float-to-string conversion (PLDI'18) as the solution for casting float/double to string. The results differ from the output of Spark's in some cases: sometimes the output is shorter (which is arguably more accurate) and sometimes the output may differ in the precise digits output (e.g., see ulfjack/ryu#83).
In most cases, the result will match Spark's results, and in the cases where it does not, the values will match when we cast them back to float. Tested in plugin PR below.
The logic part is based on ryu's C and Java implementation. I'm leaning towards keeping it consistent with ryu's codebase rather than making it more Cuda style to make it easier to apply upstream changes, but I'm not sure if that's a good practice.
Related plugin PR: NVIDIA/spark-rapids#9470