Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The function upper produces error for some special characters #32488

Closed
lcwangchao opened this issue Feb 21, 2022 · 6 comments · Fixed by #32505
Closed

The function upper produces error for some special characters #32488

lcwangchao opened this issue Feb 21, 2022 · 6 comments · Fixed by #32505
Assignees
Labels
affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. may-affects-4.0 This bug maybe affects 4.0.x versions. severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@lcwangchao
Copy link
Collaborator

lcwangchao commented Feb 21, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

With new_collation_enabled is True

create table t(a varchar(32)) DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
insert into t values('ʞ');
select binary upper('ʞ');
select binary upper(a) from t;
select distinct upper(a) from t; 

2. What did you expect to see? (Required)

mysql> select binary upper('ʞ');
+----------------------------------------+
| binary upper('ʞ')                      |
+----------------------------------------+
| 0xEA9EB0                               |
+----------------------------------------+
1 row in set (0.03 sec)

mysql>  select binary upper(a) from t;
+----------------------------------+
| binary upper(a)                  |
+----------------------------------+
| 0xEA9EB0                         |
+----------------------------------+
1 row in set (0.03 sec)

mysql> select distinct upper(a) from t;
+----------+
| upper(a) |
+----------+
| Ʞ        |
+----------+
1 row in set (1.76 sec)

3. What did you see instead (Required)

mysql> select binary upper('ʞ');
+----------------------------------------+
| binary upper('ʞ')                      |
+----------------------------------------+
| 0xEA9EB0                               |
+----------------------------------------+
1 row in set (0.03 sec)

mysql>  select binary upper(a) from t;
+----------------------------------+
| binary upper(a)                  |
+----------------------------------+
| 0xEA9E                           |
+----------------------------------+
1 row in set (0.03 sec)

mysql> select distinct upper(a) from t;
ERROR 1105 (HY000): runtime error: index out of range [2] with length 2

4. What is your TiDB version? (Required)

master. But I think it effects all tidb versions

mysql> SELECT tidb_version();
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_version() |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Release Version: None
Edition: Community
Git Commit Hash: None
Git Branch: None
UTC Build Time: None
GoVersion: go1.16.3
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)

@lcwangchao lcwangchao added type/bug The issue is confirmed as a bug. sig/sql-infra SIG: SQL Infra severity/major labels Feb 21, 2022
@ti-chi-bot ti-chi-bot added may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. labels Feb 21, 2022
@lcwangchao
Copy link
Collaborator Author

The root cause is here:

func (b *builtinUpperUTF8Sig) vecEvalString(input *chunk.Chunk, result *chunk.Column) error {
if err := b.args[0].VecEvalString(b.ctx, input, result); err != nil {
return err
}
enc := charset.FindEncoding(b.args[0].GetType().Charset)
for i := 0; i < input.NumRows(); i++ {
result.SetRaw(i, []byte(enc.ToUpper(result.GetString(i))))
}
return nil
}

In line 147, we reserved the space according to the original character, but the length of its upper case may larger.

@Defined2014
Copy link
Contributor

/assign

@Mini256
Copy link
Member

Mini256 commented Feb 21, 2022

Similar problems may also occur in the lower() function.

select binary 'İ'; -- 0xC4B0
select binary lower('İ');  -- 0x69

@Defined2014
Copy link
Contributor

Defined2014 commented Feb 21, 2022

Some results of upper() and lower() are different between TiDB and MySQL, because the result of Golang is based on unicode 13.0.0 ref.
Maybe we can add a SpecailCase table for them or just fix panic problem.

Some of results:

...
rune: ꮑ, unicode: 43921, mysql upper: 0xEAAE91, tidb upper: 0xE18F81
rune: ꮒ, unicode: 43922, mysql upper: 0xEAAE92, tidb upper: 0xE18F82
rune: ꮓ, unicode: 43923, mysql upper: 0xEAAE93, tidb upper: 0xE18F83
rune: ꮔ, unicode: 43924, mysql upper: 0xEAAE94, tidb upper: 0xE18F84
rune: ꮕ, unicode: 43925, mysql upper: 0xEAAE95, tidb upper: 0xE18F85
rune: ꮖ, unicode: 43926, mysql upper: 0xEAAE96, tidb upper: 0xE18F86
rune: ꮗ, unicode: 43927, mysql upper: 0xEAAE97, tidb upper: 0xE18F87
...

@lcwangchao
Copy link
Collaborator Author

Some results of upper() and lower() are different between TiDB and MySQL, because the result of Golang is based on unicode 13.0.0 ref. Maybe we can add a SpecailCase table for them or just fix panic problem.

Some of results:

...
rune: ꮑ, unicode: 43921, mysql upper: 0xEAAE91, tidb upper: 0xE18F81
rune: ꮒ, unicode: 43922, mysql upper: 0xEAAE92, tidb upper: 0xE18F82
rune: ꮓ, unicode: 43923, mysql upper: 0xEAAE93, tidb upper: 0xE18F83
rune: ꮔ, unicode: 43924, mysql upper: 0xEAAE94, tidb upper: 0xE18F84
rune: ꮕ, unicode: 43925, mysql upper: 0xEAAE95, tidb upper: 0xE18F85
rune: ꮖ, unicode: 43926, mysql upper: 0xEAAE96, tidb upper: 0xE18F86
rune: ꮗ, unicode: 43927, mysql upper: 0xEAAE97, tidb upper: 0xE18F87
...

In my local mysql (version: 8.0.23) , it return s the result:

mysql> select binary upper('\U+AB91');
+------------------------------------------+
| binary upper('ꮑ')                        |
+------------------------------------------+
| 0xE18F81                                 |
+------------------------------------------+
1 row in set (0.00 sec)

The behavior is the same with tidb.

@Defined2014
Copy link
Contributor

Defined2014 commented Feb 21, 2022

My MySQL version is 8.0.28, and I tried it in MySQL 8.0.23. I think the result is based on charset and collation.

MySQL:
mysql> set names utf8mb4 collate utf8mb4_bin;
Query OK, 0 rows affected (0.00 sec)

mysql> select binary upper('\U+AB91');
+---------------------+
| binary upper('')   |
+---------------------+
| ꮑ                   |
+---------------------+
1 row in set, 1 warning (0.00 sec)

mysql> set names utf8mb4 collate utf8mb4_0900_ai_ci;
Query OK, 0 rows affected (0.00 sec)

mysql> select binary upper('\U+AB91');
+---------------------+
| binary upper('')   |
+---------------------+
| Ꮑ                   |
+---------------------+
1 row in set, 1 warning (0.00 sec)

@jebter jebter added affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. labels Feb 24, 2022
@ti-chi-bot ti-chi-bot removed may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. labels Feb 24, 2022
@jebter jebter added affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. labels Feb 24, 2022
@ti-chi-bot ti-chi-bot removed may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. labels Feb 24, 2022
@ti-chi-bot ti-chi-bot removed the may-affects-5.2 This bug maybe affects 5.2.x versions. label Feb 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. affects-5.2 This bug affects 5.2.x versions. affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. may-affects-4.0 This bug maybe affects 4.0.x versions. severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants