Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add longest_common_prefix implementation, documentation and tests #24651

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Leziak
Copy link

@Leziak Leziak commented Feb 28, 2025

Description

Add longest_common_prefix function

Motivation and Context

Feature requested by myself (😃) - I'm currently doing some data analysis at Meta which involves numerous string comparisons between FB names and IG usernames and building a pipeline. I found the levenshtein_distance built-in function really useful for this, and found that there was also a hamming_distance function. For the purposes of my data analysis, I need functions for finding out the longest common prefix, substring, suffix and ideally also the Jaro-Winkler distance. For posterity, I thought it would be really handy to have this implemented just like Levenshtein distance or the Hamming distance. So I went for it - this diff specifically contains the longest common prefix function, plus tests.

Impact

Added a longest_common_prefix function

image

Test Plan

Tested manually on TCPH catalog via presto-cli, then wrote tests which include non-ASCII characters (took direction from the Levenshtein distance function tests on this)

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

@Leziak Leziak requested review from steveburnett, elharo and a team as code owners February 28, 2025 11:57
@Leziak Leziak requested a review from presto-oss February 28, 2025 11:57
Copy link

linux-foundation-easycla bot commented Feb 28, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, looks good.

@steveburnett
Copy link
Contributor

  • Please sign the Presto CLA as linked in this comment.

  • Please squash merge commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants