Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong behavior when handle strcmp with default collation #5366

Closed
solotzg opened this issue Jul 14, 2022 · 0 comments · Fixed by #5375 or #5429
Closed

Wrong behavior when handle strcmp with default collation #5366

solotzg opened this issue Jul 14, 2022 · 0 comments · Fixed by #5375 or #5429
Labels
severity/moderate type/bug The issue is confirmed as a bug.

Comments

@solotzg
Copy link
Contributor

solotzg commented Jul 14, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

drop table if exists t;
create table t (a varchar(100)) CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
alter table t set tiflash replica 1;

insert into t values('1   '), ('1\n'), ('1');
select hex(min(a)) from t;

MySQL version

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.29    |
+-----------+
1 row in set (0.00 sec)

2. What did you expect to see? (Required)

MySQL [test]> select hex(min(a)) from t;
+-------------+
| hex(min(a)) |
+-------------+
| 31202020    |
+-------------+

3. What did you see instead (Required)

MySQL [test]> select hex(min(a)) from t;
+-------------+
| hex(min(a)) |
+-------------+
| 31          |
+-------------+

4. What is your TiFlash version? (Required)

master

TiDB behavior

For mysql

mysql> SET NAMES utf8mb4 COLLATE utf8mb4_bin;
Query OK, 0 rows affected (0.00 sec)

mysql> select strcmp('1\0', '1');
+--------------------+
| strcmp('1\0', '1') |
+--------------------+
|                 -1 |
+--------------------+
1 row in set (0.00 sec)

For tidb

MySQL [test]> SET NAMES utf8mb4 COLLATE utf8mb4_bin;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> select strcmp('1\0', '1');
+--------------------+
| strcmp('1\0', '1') |
+--------------------+
|                  1 |
+--------------------+
1 row in set (0.00 sec)

MySQL cmp '1\0' and '1':

  • get min length of both string is 1;
  • '1' and '1' are equal;
  • the remain str in '1\0' is '\0', cmp '\0' with whit space ' ';
  • cmp '\0' and ' ', same like cmp 0x0 and 0x20, got -1;

According to https://docs.pingcap.com/tidb/dev/character-set-and-collation, the tidb choose to remove tail space.

Bug in tiflash

reinterpret_cast<const char *>(&parent.chars[parent.offsetAt(lhs)]), parent.sizeAt(lhs),
reinterpret_cast<const char *>(&parent.chars[parent.offsetAt(rhs)]), parent.sizeAt(rhs));

tiflash show remove tail '\0'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
1 participant