Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url_extract_path should return NULL if URI is malformed #7038

Closed
mbasmanova opened this issue Oct 13, 2023 · 1 comment
Closed

url_extract_path should return NULL if URI is malformed #7038

mbasmanova opened this issue Oct 13, 2023 · 1 comment
Assignees
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@mbasmanova
Copy link
Contributor

mbasmanova commented Oct 13, 2023

Bug description

In Presto, url_extract_path returns NULL if URL is malformed, url_decode throws.

Velox currently throws in both functions. Need to fix to match Presto.

presto:di>  select url_extract_path('https://www.ucu.edu.uy/agenda/evento/%%UCUrlCompartir%%');
 _col0
-------
 NULL
(1 row)

presto:di> select url_decode('https://www.ucu.edu.uy/agenda/evento/%%UCUrlCompartir%%');
Query 20231013_105704_33253_divi2 failed: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 0 in: "%U"

CC: @zacw7 @amitkdutta @aditi-pandit @kgpai

System information

n/a

Relevant logs

No response

@mbasmanova mbasmanova added bug Something isn't working triage Newly created issue that needs attention. labels Oct 13, 2023
@kgpai
Copy link
Contributor

kgpai commented Nov 18, 2023

One more thing currently :

presto:tiny> select URL_EXTRACT_PATH('www.facebook.com '), URL_EXTRACT_PATH('foo'), URL_EXTRACT_PATH('https://www.ucu.edu.uy/agenda/evento/%%UCUrlCompartir%%');
 _col0 | _col1 | _col2 
-------+-------+-------
 NULL  | foo   | NULL  

However running above on Velox gives NULL | NULL | Exception..
Key thing seems to be presto java uses java.net.uri (https://docs.oracle.com/javase/7/docs/api/java/net/URI.html) and there 'foo' matches their implementation ([scheme:]scheme-specific-part[#fragment]) whereas it doesnt pass our implementation (https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/URLFunctions.h#L32) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants