Skip to content

Commit

Permalink
[DOCS]: Suggest using utf8mb4_bin for collation
Browse files Browse the repository at this point in the history
For MySQL, suggest using `utf8mb4_bin` for collation, rather than
`utf8mb4_general_ci`, because the latter is case insensitive, and can
break assumptions in various parts of Gitea. Such as branch names:
branch names are case sensitive in git, but with a case insensitive
collation, and the branch names stored in the database, repositories
with branch names that differ in case only can lead to internal errors.

This little change updates the documentation only, as a first step of
addressing the problem.

Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
  • Loading branch information
algernon committed Dec 28, 2023
1 parent 921df1c commit 8e3691b
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 8 deletions.
4 changes: 2 additions & 2 deletions docs/content/help/faq.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -385,8 +385,8 @@ Unfortunately MySQL's `utf8` charset does not completely allow all possible UTF-
They created a new charset and collation called `utf8mb4` that allows for emoji to be stored but tables which use
the `utf8` charset, and connections which use the `utf8` charset will not use this.

Please run `gitea doctor convert`, or run `ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;`
for the database_name and run `ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;`
Please run `gitea doctor convert`, or run `ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`
for the database_name and run `ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`
for each table in the database.

## Why are Emoji displaying only as placeholders or in monochrome
Expand Down
4 changes: 2 additions & 2 deletions docs/content/help/faq.zh-cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -389,8 +389,8 @@ SET GLOBAL innodb_large_prefix=1;
他们创建了一个名为 `utf8mb4`的字符集和校对规则,允许存储 Emoji,但使用
utf8 字符集的表和连接将不会使用它。

请运行 `gitea doctor convert` 或对数据库运行 `ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;`
并对每个表运行 `ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;`
请运行 `gitea doctor convert` 或对数据库运行 `ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`
并对每个表运行 `ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`

您还需要将`app.ini`文件中的数据库字符集设置为`CHARSET=utf8mb4`

Expand Down
4 changes: 2 additions & 2 deletions docs/content/installation/database-preparation.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,10 @@ Note: All steps below requires that the database engine of your choice is instal

Replace username and password above as appropriate.

4. Create database with UTF-8 charset and collation. Make sure to use `utf8mb4` charset instead of `utf8` as the former supports all Unicode characters (including emojis) beyond _Basic Multilingual Plane_. Also, collation chosen depending on your expected content. When in doubt, use either `unicode_ci` or `general_ci`.
4. Create database with UTF-8 charset and collation. Make sure to use `utf8mb4` charset instead of `utf8` as the former supports all Unicode characters (including emojis) beyond _Basic Multilingual Plane_. Also, collation chosen depending on your expected content. When in doubt, use `utf8mb4_bin`.

```sql
CREATE DATABASE giteadb CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_unicode_ci';
CREATE DATABASE giteadb CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_bin';
```

Replace database name as appropriate.
Expand Down
4 changes: 2 additions & 2 deletions docs/content/installation/database-preparation.zh-cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,10 @@ menu:

根据需要替换上述用户名和密码。

4. 使用 UTF-8 字符集和排序规则创建数据库。确保使用 `**utf8mb4**` 字符集,而不是 `utf8`,因为前者支持 _Basic Multilingual Plane_ 之外的所有 Unicode 字符(包括表情符号)。排序规则根据您预期的内容选择。如果不确定,可以使用 `unicode_ci` 或 `general_ci`。
4. 使用 UTF-8 字符集和排序规则创建数据库。确保使用 `**utf8mb4**` 字符集,而不是 `utf8`,因为前者支持 _Basic Multilingual Plane_ 之外的所有 Unicode 字符(包括表情符号)。排序规则根据您预期的内容选择。如果不确定,可以使用 `utf8mb4_bin`。

```sql
CREATE DATABASE giteadb CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_unicode_ci';
CREATE DATABASE giteadb CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_bin';
```

根据需要替换数据库名称。
Expand Down

0 comments on commit 8e3691b

Please sign in to comment.