-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tidb: add description about GB18030 #18662
base: master
Are you sure you want to change the base?
Changes from 4 commits
edf6730
794e35f
5bf0a04
0241804
1ad75f9
63942e5
b3c949f
4c45bac
c8d8590
4766193
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
--- | ||
title: GB18030 | ||
summary: 本文介绍 TiDB 对 GB18030 字符集的支持情况。 | ||
--- | ||
|
||
# GB18030 | ||
|
||
TiDB 从 v8.4.0 开始支持 GB18030 字符集。本文档介绍 TiDB 对 GB18030 字符集的支持和兼容情况。 | ||
|
||
```sql | ||
SHOW CHARACTER SET WHERE CHARSET = 'gb18030'; | ||
+---------+---------------------------------+--------------------+--------+ | ||
| Charset | Description | Default collation | Maxlen | | ||
+---------+---------------------------------+--------------------+--------+ | ||
| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 | | ||
+---------+---------------------------------+--------------------+--------+ | ||
1 row in set (0.01 sec) | ||
|
||
SHOW COLLATION WHERE CHARSET = 'gb18030'; | ||
+-------------+---------+-----+---------+----------+---------+---------------+ | ||
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | | ||
+-------------+---------+-----+---------+----------+---------+---------------+ | ||
| gb18030_bin | gb18030 | 249 | Yes | Yes | 1 | PAD SPACE | | ||
+-------------+---------+-----+---------+----------+---------+---------------+ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @CbcWestwolf 这里的结果是不是漏了 ”gb18030_chinese_ci“ 这个 collation ? |
||
1 row in set (0.00 sec) | ||
``` | ||
|
||
## 与 MySQL 的兼容性 | ||
|
||
本节介绍 TiDB 中 GB18030 字符集与 MySQL 的兼容情况。 | ||
|
||
### 排序规则兼容性 | ||
|
||
MySQL 的字符集默认排序规则是 `gb18030_chinese_ci`。与 MySQL 不同,TiDB GB18030 字符集的默认排序规则为 `gb18030_bin`。另外,TiDB 支持的 `gb18030_bin` 与 MySQL 支持的 `gb18030_bin` 排序规则也不一致,TiDB 是将 GB18030 转换成 UTF8MB4 然后做二进制排序。 | ||
CbcWestwolf marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
如果要使 TiDB 兼容 MySQL 的 GB18030 字符集排序规则,你需要在初次初始化 TiDB 集群时设置 TiDB 配置项 [`new_collations_enabled_on_first_bootstrap`](/tidb-configuration-file.md#new_collations_enabled_on_first_bootstrap) 为 `true` 来开启[新的排序规则框架](/character-set-and-collation.md#新框架下的排序规则支持)。 | ||
CbcWestwolf marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
开启新的排序规则框架后,查看 GB18030 字符集对应的排序规则,可以看到 TiDB GB18030 默认排序规则已经切换为 `gb18030_chinese_ci`。 | ||
|
||
```sql | ||
SHOW CHARACTER SET WHERE CHARSET = 'gb18030'; | ||
+---------+---------------------------------+--------------------+--------+ | ||
| Charset | Description | Default collation | Maxlen | | ||
+---------+---------------------------------+--------------------+--------+ | ||
| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 | | ||
+---------+---------------------------------+--------------------+--------+ | ||
1 row in set (0.01 sec) | ||
|
||
SHOW COLLATION WHERE CHARSET = 'gb18030'; | ||
+--------------------+---------+-----+---------+----------+---------+---------------+ | ||
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | | ||
+--------------------+---------+-----+---------+----------+---------+---------------+ | ||
| gb18030_bin | gb18030 | 249 | | Yes | 1 | PAD SPACE | | ||
| gb18030_chinese_ci | gb18030 | 248 | Yes | Yes | 1 | PAD SPACE | | ||
+--------------------+---------+-----+---------+----------+---------+---------------+ | ||
2 rows in set (0.00 sec) | ||
``` | ||
|
||
### 非法字符兼容性 | ||
|
||
* 在系统变量 [`character_set_client`](/system-variables.md#character_set_client) 和 [`character_set_connection`](/system-variables.md#character_set_connection) 没有同时设置为 `gb18030` 的情况下,TiDB 处理非法字符的方式与 MySQL 一致。 | ||
* 在 `character_set_client` 和 `character_set_connection` 同时设置为 `gb18030` 的情况下,TiDB 处理非法字符的方式与 MySQL 有如下区别: | ||
CbcWestwolf marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- MySQL 处理非法 GB18030 字符集时,对读和写操作的处理方式不同。 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 需要介绍分别如何处理的吗?或者给个 MySQL 相关内容的链接。 |
||
- TiDB 处理非法 GB18030 字符集时,对读和写操作的处理方式相同。TiDB 在严格模式下读写非法 GB18030 字符都会报错;在非严格模式下,读写非法 GB18030 字符都会用 `?` 替换。 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,12 +17,12 @@ SHOW CHARACTER SET WHERE CHARSET = 'gbk'; | |
1 row in set (0.00 sec) | ||
|
||
SHOW COLLATION WHERE CHARSET = 'gbk'; | ||
+----------------+---------+------+---------+----------+---------+ | ||
| Collation | Charset | Id | Default | Compiled | Sortlen | | ||
+----------------+---------+------+---------+----------+---------+ | ||
| gbk_bin | gbk | 87 | | Yes | 1 | | ||
+----------------+---------+------+---------+----------+---------+ | ||
1 rows in set (0.00 sec) | ||
+-----------+---------+----+---------+----------+---------+---------------+ | ||
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | | ||
+-----------+---------+----+---------+----------+---------+---------------+ | ||
| gbk_bin | gbk | 87 | Yes | Yes | 1 | PAD SPACE | | ||
+-----------+---------+----+---------+----------+---------+---------------+ | ||
Comment on lines
+20
to
+24
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这里是不是也漏了返回 gbk_chinese_ci 这个collation 结果? |
||
1 row in set (0.00 sec) | ||
``` | ||
|
||
## 与 MySQL 的兼容性 | ||
|
@@ -47,12 +47,12 @@ SHOW CHARACTER SET WHERE CHARSET = 'gbk'; | |
1 row in set (0.00 sec) | ||
|
||
SHOW COLLATION WHERE CHARSET = 'gbk'; | ||
+----------------+---------+------+---------+----------+---------+ | ||
| Collation | Charset | Id | Default | Compiled | Sortlen | | ||
+----------------+---------+------+---------+----------+---------+ | ||
| gbk_bin | gbk | 87 | | Yes | 1 | | ||
| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | | ||
+----------------+---------+------+---------+----------+---------+ | ||
+----------------+---------+----+---------+----------+---------+---------------+ | ||
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | | ||
+----------------+---------+----+---------+----------+---------+---------------+ | ||
| gbk_bin | gbk | 87 | | Yes | 1 | PAD SPACE | | ||
| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 | PAD SPACE | | ||
+----------------+---------+----+---------+----------+---------+---------------+ | ||
2 rows in set (0.00 sec) | ||
``` | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L580 优先级排序 需要加上
gbk_chinese_ci
和gb18030_chinese_ci
吗?