-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CAST(varchar as <integral type>) doesn't trim leading and trailing whitespaces, but fails #21315
Comments
CC: @amitkdutta |
Velox currently allows leading and trailing whitespace in cast(varchar as ). We are going to keep this behavior. It would be nice to change Presto to match as well. |
Interesting! I am just curious what do other systems like MySQL, duckdb etc do here ? This should be easily fixable in presto Java without breaking any client assumptions. |
I believe this case should be easily fixable by making Presto's behavior compiling with ANSI SQL. I doubt any client/user would expect and depend on a throw for varchar -> number conversion because of some spaces (although this is not impossible, plenty of weird cases out there). ANSI SQL says it very clear, thanks @gggrace14 for digging this out! DubkDB should ignore whitespaces too - Velox uses it in the unit tests. |
Related to #21001 (another blank padding bug) This works fine in Postgres. My guess is this is because in Postgres, it's not a conversion from varchar to bigint, but char(3) to bigint. The extra space is treated as blank padding, Postgres is consistent in ignoring the blank padding for comparisons, and also apparently when converting between types. Even this query succeeds, and I suspect it's because there's an implicit coercion from varchar to char for the sake of the cast: In Presto, it's actually a conversion from varchar to bigint. This causes an issue here, but also in the issue above. For varchar, I don't think blank padding semantics apply. |
FYI, there will be result differences between Presto and Velox for try_cast, e.g. NULL in Presto vs. 12 in Velox:
|
Yes, this is true. I should've also included in the description. So a common case we can predict is the result will have more records if the user has filter like try_cast() IS NOT NULL. If we want to proceed with the alignment, we should make it into a major Presto release. |
Note, that Presto's behavior is also inconsistent.
But throws an error for integers. Parses the string argument as a signed decimal long. The characters in the string must all be decimal digits, except that the first character may be an ASCII minus sign '-' (\u002D') to indicate a negative value or an ASCII plus sign '+' ('\u002B') to indicate a positive value. The resulting long value is returned, exactly as if the argument and the radix 10 were given as arguments to the parseLong(java.lang.String, int) method. |
Submitting #22284 to change the behavior. |
Current behavior of casting varchar to integral types is to throw exception if the varchar is valid but contains leading or trailing whitespaces. For example,
throws the following exception
This is different from the behavior described in ANSI SQL
Accordingly, TRY(CAST(varchar AS ) and TRY_CAST(varchar AS ) return NULL rather than the actual valid integral number.
Should Presto change the above CAST behavior to accept and trim leading and trailing whitespaces and be compliant with ANSI SQL?
The exception is thrown because the implementation of casting varchar to integral types directly delegates to Java, ex.
presto/presto-main/src/main/java/com/facebook/presto/type/VarcharOperators.java
Lines 176 to 187 in 04138bc
And Java does not accept whitespaces or any character other than decimal digits and sign character.
https://docs.oracle.com/javase/8/docs/api/java/lang/Long.html#parseLong-java.lang.String-
Your Environment
Expected Behavior
Casting of varchar trims leading and trailing whitespaces.
Current Behavior
Casting of varchar to integral types throws exception if the varchar contains leading or trailing whitespaces
Possible Solution
Casting of varchar trims leading and trailing whitespaces before delegating to Java
Long.parseLong()
.Steps to Reproduce
Screenshots (if appropriate)
Context
Throwing exception makes Presto have a different semantics of above CAST from ANSI SQL and other mainstream engines or libraries that compliant with ANSI SQL, ex., MySQL, folly, etc.
The text was updated successfully, but these errors were encountered: