Adding float to string kernel #1508

thirtiseven · 2023-10-18T10:49:48Z

This PR adds a kernel for casting float to string, to make spark-rapids produce closer results when doing such casting.

Supporting this is a necessary part of format_number kernel, so I split it out as a 'subtask'.

This PR uses Ryū: fast float-to-string conversion (PLDI'18) as the solution for casting float/double to string. The results differ from the output of Spark's in some cases: sometimes the output is shorter (which is arguably more accurate) and sometimes the output may differ in the precise digits output (e.g., see ulfjack/ryu#83).

In most cases, the result will match Spark's results, and in the cases where it does not, the values will match when we cast them back to float. Tested in plugin PR below.

The logic part is based on ryu's C and Java implementation. I'm leaning towards keeping it consistent with ryu's codebase rather than making it more Cuda style to make it easier to apply upstream changes, but I'm not sure if that's a good practice.

Related plugin PR: NVIDIA/spark-rapids#9470

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

hyperbolic2346

Looking good so far. Some minor comments.

hyperbolic2346 · 2023-10-19T03:22:49Z

src/main/cpp/src/cast_float_to_string.cu

+  // Range of numbers here is for normalizing the value.
+  // If the value is above or below the following limits, the output is converted to
+  // scientific notation in order to show (at most) the number of significant digits.
+  static constexpr double upper_limit = 10000000;  // max is 1x10^7


Spark's max?

src/main/cpp/src/cast_float_to_string.cu

Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

…k-rapids-jni into thirtiseven-float_to_string

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

…k-rapids-jni into thirtiseven-float_to_string

thirdparty/cudf

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

src/main/cpp/src/cast_float_to_string.cu

…ring

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

src/main/cpp/src/CastStringJni.cpp

src/main/cpp/src/cast_float_to_string.cu

src/main/cpp/tests/cast_float_to_string.cpp

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

ttnghia

This looks good now 👍 . Please run compute-sanitizer to make sure there is no hidden issue. And do so if there are Spark integration tests too. Thanks.

hyperbolic2346

One question about layout and one include issue.

hyperbolic2346 · 2023-12-06T18:52:45Z

src/main/cpp/src/cast_float_to_string.cu

+    if (d_chars != nullptr) {
+      float_to_string(idx);
+    } else {
+      d_offsets[idx] = compute_output_size(d_floats.element<FloatType>(idx));


It seems odd to pass the index into format_float and then pass the float into compute_output_size. Why not pass the index into both for some symmetry and add auto const value = d_floats.element(idx); to compute_output_size? This is a stylistic thing and has no bearing on the output.

good idea, done.

hyperbolic2346 · 2023-12-06T18:54:51Z

src/main/cpp/tests/cast_float_to_string.cpp

+ * limitations under the License.
+ */
+
+#include "cast_string.hpp"


I don't understand why cast_string.hpp is quoted here. There isn't a cast_string.hpp in this directory.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2023-12-08T01:20:32Z

build

thirtiseven · 2023-12-08T05:22:58Z

Tested with compute-sanitizer and plugin integration tests, merging it. Thanks all for review!

thirtiseven and others added 4 commits October 13, 2023 14:04

wip

0e7485c

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

wip

2c04fff

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Merge branch 'NVIDIA:branch-23.12' into float_to_string

6883988

Add float to string kernel

cbce724

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven mentioned this pull request Oct 18, 2023

Use float to string kernel NVIDIA/spark-rapids#9470

Merged

hyperbolic2346 reviewed Oct 19, 2023

View reviewed changes

thirtiseven and others added 3 commits October 19, 2023 15:57

Update src/main/cpp/src/cast_float_to_string.cu

8d7ead2

Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

Update src/main/cpp/src/cast_float_to_string.cu

9ab2089

Co-authored-by: Mike Wilson <hyperbolic2346@users.noreply.github.com>

address comments and use different precision for float

c3b3d64

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven mentioned this pull request Oct 30, 2023

[FEA] Support format_number NVIDIA/spark-rapids#9173

Closed

thirtiseven added 2 commits November 6, 2023 15:58

rewrite the solution with ryu

007cf5e

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

update license

1264317

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven marked this pull request as ready for review November 6, 2023 10:29

thirtiseven changed the title ~~[WIP] Adding float to string kernel~~ Adding float to string kernel Nov 6, 2023

thirtiseven requested a review from hyperbolic2346 November 6, 2023 10:42

thirtiseven added 10 commits November 7, 2023 17:02

clean up

a87a403

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Split ftos_converter out

979dc39

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

clean up

4c75bc7

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Merge branch 'float_to_string' of https://github.com/thirtiseven/spar…

744d0df

…k-rapids-jni into thirtiseven-float_to_string

resolve cudf conflicts

f1c11e6

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

resolve cudf conflicts

760799b

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

resolve cudf conflicts

bfba655

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

resolve cudf conflicts

ad27fee

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Merge branch 'float_to_string' of https://github.com/thirtiseven/spar…

77841d9

…k-rapids-jni into thirtiseven-float_to_string

Merge branch 'thirtiseven-float_to_string' into float_to_string

6728170

jlowe reviewed Nov 14, 2023

View reviewed changes

thirdparty/cudf Outdated Show resolved Hide resolved

thirtiseven and others added 3 commits November 14, 2023 23:17

remove cudf changes

40a4cb8

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

remove cudf changes

05f5517

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Merge branch 'NVIDIA:branch-23.12' into float_to_string

07c961e

thirtiseven self-assigned this Nov 15, 2023

thirtiseven dismissed jlowe’s stale review via ced33b6 November 22, 2023 07:00

thirtiseven changed the base branch from branch-23.12 to branch-24.02 November 22, 2023 13:39

ttnghia reviewed Nov 22, 2023

View reviewed changes

src/main/cpp/src/cast_float_to_string.cu Outdated Show resolved Hide resolved

thirtiseven and others added 4 commits November 23, 2023 16:23

Merge remote-tracking branch 'upstream/branch-24.02' into float_to_st…

199e1db

…ring

cudf conflict

131e48c

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Update src/main/cpp/src/cast_float_to_string.cu

3c09c49

Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>

addressed comments

b78e3b3

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

ttnghia reviewed Nov 30, 2023

View reviewed changes

src/main/cpp/src/CastStringJni.cpp Outdated Show resolved Hide resolved

ttnghia reviewed Dec 1, 2023

View reviewed changes

src/main/cpp/src/CastStringJni.cpp Outdated Show resolved Hide resolved

ttnghia reviewed Dec 1, 2023

View reviewed changes

src/main/cpp/src/cast_float_to_string.cu Outdated Show resolved Hide resolved

ttnghia reviewed Dec 1, 2023

View reviewed changes

src/main/cpp/src/cast_float_to_string.cu Outdated Show resolved Hide resolved

ttnghia reviewed Dec 1, 2023

View reviewed changes

src/main/cpp/src/cast_float_to_string.cu Show resolved Hide resolved

ttnghia reviewed Dec 1, 2023

View reviewed changes

src/main/cpp/tests/cast_float_to_string.cpp Outdated Show resolved Hide resolved

ttnghia reviewed Dec 1, 2023

View reviewed changes

src/main/cpp/tests/cast_float_to_string.cpp Outdated Show resolved Hide resolved

thirtiseven and others added 5 commits December 4, 2023 17:07

Merge branch 'NVIDIA:branch-24.02' into float_to_string

62aa3ba

clang format

04d1c4f

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Address comments

388cb50

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

Address comments

54fa73c

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

sync

683e73f

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

ttnghia previously approved these changes Dec 4, 2023

View reviewed changes

thirtiseven requested a review from hyperbolic2346 December 5, 2023 01:57

Merge branch 'branch-24.02' into float_to_string

944863b

hyperbolic2346 requested changes Dec 6, 2023

View reviewed changes

address comments

97e64af

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven dismissed ttnghia’s stale review via 97e64af December 7, 2023 09:13

hyperbolic2346 approved these changes Dec 7, 2023

View reviewed changes

thirtiseven merged commit 4c20e3a into NVIDIA:branch-24.02 Dec 8, 2023
3 checks passed

sameerz added the bug Something isn't working label Dec 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding float to string kernel #1508

Adding float to string kernel #1508

thirtiseven commented Oct 18, 2023 •

edited

Loading

hyperbolic2346 left a comment

hyperbolic2346 Oct 19, 2023

ttnghia left a comment

hyperbolic2346 left a comment

hyperbolic2346 Dec 6, 2023

thirtiseven Dec 7, 2023

hyperbolic2346 Dec 6, 2023

thirtiseven Dec 7, 2023

thirtiseven commented Dec 8, 2023

thirtiseven commented Dec 8, 2023

Adding float to string kernel #1508

Adding float to string kernel #1508

Conversation

thirtiseven commented Oct 18, 2023 • edited Loading

hyperbolic2346 left a comment

Choose a reason for hiding this comment

hyperbolic2346 Oct 19, 2023

Choose a reason for hiding this comment

ttnghia left a comment

Choose a reason for hiding this comment

hyperbolic2346 left a comment

Choose a reason for hiding this comment

hyperbolic2346 Dec 6, 2023

Choose a reason for hiding this comment

thirtiseven Dec 7, 2023

Choose a reason for hiding this comment

hyperbolic2346 Dec 6, 2023

Choose a reason for hiding this comment

thirtiseven Dec 7, 2023

Choose a reason for hiding this comment

thirtiseven commented Dec 8, 2023

thirtiseven commented Dec 8, 2023

thirtiseven commented Oct 18, 2023 •

edited

Loading