Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize map_normalize #22211

Merged
merged 1 commit into from
Mar 15, 2024
Merged

Conversation

kaikalur
Copy link
Contributor

@kaikalur kaikalur commented Mar 14, 2024

Description

Optimize MapNormalize function to not call reduce for every element.

Motivation and Context

This function is a sql function and it calls (nested) reduce on the values array for every element which we don't optimize via cse currently (for nested lambdas) #22214. So we pull out the reduce to do it only once for performance.

Impact

Improved UDF performance

Test Plan

Tests exist and also added a couple for NaN/Infinity/null results

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Optimized `map_normalize` builtin SQL UDF to avoid repeated reduce computation

@kaikalur kaikalur requested a review from a team as a code owner March 14, 2024 22:12
@kaikalur kaikalur force-pushed the optimize_map_normalize branch 2 times, most recently from be7d9d7 to 1b1f005 Compare March 14, 2024 22:26
@mbasmanova mbasmanova changed the title Optimize map normalize Optimize map_normalize Mar 14, 2024
mbasmanova
mbasmanova previously approved these changes Mar 14, 2024
Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Please, update commit message to use map_normalize and add details about the optimization.

CC: @rschlussel @amitkdutta

@rschlussel
Copy link
Contributor

Can you also add tests for all the cases that @mbasmanova was trying over here #22209. And documentation accordingly.

Also add a release note for improving performance of map_normalize

Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Missing doc updates about the behavior when the values sum to 0, but not sure that's important if we're also deprecating this function moving it to internal.

@kaikalur kaikalur merged commit b23ba09 into prestodb:master Mar 15, 2024
56 checks passed
@wanglinsong wanglinsong mentioned this pull request May 1, 2024
48 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants