Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix floating point window range extents. #13606

Merged
merged 1 commit into from
Jun 26, 2023

Conversation

mythrocks
Copy link
Contributor

Description

This commit fixes the window range calculations for floating-point order by columns.

Window range calculations involve comparing the delta value (preceding/following) with the current row value, and capping current_row - delta at numeric_limits::min().

It turns out that for float/double values, numeric_limits::min() returns FLT_MIN which is the lowest positive finite float value. This causes the erstwhile logic to produce incorrect results when the order-by column contains negative float values.

The fix involves replacing numeric_limits::min() with numeric_limits::lowest() which returns the true min float value.

Reference:

  1. https://en.cppreference.com/w/cpp/types/numeric_limits/min

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

This commit fixes the window range calculations for floating-point
order by columns.

Window range calculations involve comparing the `delta` value (preceding/following)
with the current row value, and capping `current_row - delta` at
`numeric_limits::min()`.

It turns out that for `float`/`double` values, `numeric_limits::min()` returns
`FLT_MIN` which is the lowest positive finite float value. This causes the
erstwhile logic to produce incorrect results when the order-by column contains
negative float values.

The fix involves replacing `numeric_limits::min()` with `numeric_limits::lowest()`
which returns the true min float value.

Reference:
1. https://en.cppreference.com/w/cpp/types/numeric_limits/min
@mythrocks mythrocks added bug Something isn't working improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 22, 2023
@mythrocks mythrocks requested a review from a team as a code owner June 22, 2023 20:38
@mythrocks mythrocks self-assigned this Jun 22, 2023
@mythrocks mythrocks requested review from harrism and vyasr June 22, 2023 20:38
@mythrocks mythrocks removed the improvement Improvement / enhancement to an existing function label Jun 22, 2023
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function labels Jun 22, 2023
@mythrocks mythrocks removed the improvement Improvement / enhancement to an existing function label Jun 22, 2023
@mythrocks
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 9a3f3a9 into rapidsai:branch-23.08 Jun 26, 2023
51 checks passed
mythrocks added a commit to mythrocks/cudf that referenced this pull request Jun 28, 2023
This is a follow-up to rapidsai#13512 (which added support for floating point
order-by columns in window functions), and rapidsai#13606 (which fixed how
negative values are handled for floating point order-by).

This commit fixes how `NaN` and `+/- Infinity` values are handled
for floating point.

Prior to this commit, the calculations for range window extents depended
on the behaviour of `thrust::less<float>` and `thrust::greater<float>`,
as well as addition/subtraction on `+/- Infinity`. This produced some
unexpected results:
1. `thrust::less`/`greater` on `NaN` does not produce strict ordering.
2. Addition/Subtraction on the numerical values of `Infinity` could
   produce finite values that interfere with window extent calculations.
   Ideally, the results should have remained infinite.

This commit adds custom comparators with `NaN` awareness, to better
handle columns that contain `NaN`s. It also fixes range calculations
where `Infinity` is involved.

Tests have been added to cover ASC/DESC order sorting on `FLOAT`,
with `NaN` and `Infinity` values.
rapids-bot bot pushed a commit that referenced this pull request Jul 5, 2023
This is a follow-up to #13512 (which added support for floating point order-by columns in window functions), and #13606 (which fixed how negative values are handled for floating point order-by).

This commit fixes how `NaN` and `+/- Infinity` values are handled for floating point.

Prior to this commit, the calculations for range window extents depended on the behaviour of `thrust::less<float>` and `thrust::greater<float>`, as well as addition/subtraction on `+/- Infinity`. This produced some unexpected results:
1. `thrust::less`/`greater` on `NaN` does not produce strict ordering.
2. Addition/Subtraction on the numerical values of `Infinity` could produce finite values that interfere with window extent calculations. Ideally, the results should have remained infinite.

This commit adds custom comparators with `NaN` awareness, to better handle columns that contain `NaN`s. It also fixes range calculations where `Infinity` is involved.

Tests have been added to cover ASC/DESC order sorting on `FLOAT`, with `NaN` and `Infinity` values.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Mike Wilson (https://github.com/hyperbolic2346)
  - https://github.com/nvdbaranec

URL: #13635
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants