Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blank padding semantics in comparisons and predicates #21001

Open
dqmdev opened this issue Sep 29, 2023 · 7 comments
Open

Blank padding semantics in comparisons and predicates #21001

dqmdev opened this issue Sep 29, 2023 · 7 comments
Labels

Comments

@dqmdev
Copy link

dqmdev commented Sep 29, 2023

Following can be seen with current (0.279+) versions of Presto

Is this design intent or a defect?

If design intent, please improve the documentation to state as such which users can be referenced to.

Presto does not return a row when a predicate or comparison has terms such as shown

select * from T where 'a' = 'a '

While various database systems (Db2, ORACLE, Snowflake etc) will return rows.

If the literal values are defined as shown rows will be returned.

select * from T where 'a' = 'a'

As will cases such as

select * from T where cast ('a' as char(1)) = 'a '

Comparison such as follows returns rows
select * from "T" where 'a' <> 'a '

ISO SQL specification discusses treatment of blank padding

When values of unequal length are compared, if the collation for
the comparison has the NO PAD characteristic and the shorter value is equal to some prefix of the longer value,
then the shorter value is considered less than the longer value. If the collation for the comparison has the PAD
SPACE characteristic, for the purposes of the comparison, the shorter value is effectively extended to the length
of the longer by concatenation of s on the right.

See related issues in Trino project trinodb/trino#9031

@dqmdev dqmdev added the bug label Sep 29, 2023
@tdcmeehan
Copy link
Contributor

It appears this is a genuine issue. CC: @mbasmanova how does Velox handle this?

@mbasmanova
Copy link
Contributor

@tdcmeehan The issue description is too cryptic. I don't think I understand. If it helps, Velox doesn't support char(n) or varchar(n) types.

@tdcmeehan
Copy link
Contributor

@mbasmanova, concisely, I think the problem is, in queries such as select * from T where 'a' = 'a ', most engines will treat this as a comparison between two char(1), but Presto treats this as a cast between two varchars. This appears to be inconsistent with the SQL spec and also inconsistent with other engines as noted above.

@mbasmanova
Copy link
Contributor

CC: @kaikalur

@mbasmanova
Copy link
Contributor

@tdcmeehan Tim, thank you for clarifying. This helps a lot. Do you happen to know if this behavior is decided at the SQL Parser level?

@tdcmeehan
Copy link
Contributor

It's at parsing/analysis. Parser sees a string literal, then during analysis it's assigned the varchar type. In theory, it should be possible to change this behavior, but we need to think deeply about the implications.

Looking ahead at Prestissimo, I wonder if the following example would work: select * from T where cast ('a' as char(1)) = 'a '. My guess is this wouldn't work--it seems like it would be comparing two varchars under the hood, since my understanding is only varchar is supported.

@aditi-pandit
Copy link
Contributor

@tdcmeehan : Interestingly select * from T where cast ('a' as char(1)) = 'a ' works on Prestissimo because the cast expression is evaluated at the co-ordinator, so the query is planned as a TableScan of T :)

With the lockdown features we've been talking about cast ('a' as char(1)) = 'a ' should be rejected before constant folding since char is not supported in Velox.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 📋 Prioritized Backlog
Development

No branches or pull requests

4 participants