Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS]: Suggest using utf8mb4_bin for collation #28627

Closed
wants to merge 1 commit into from
Closed

[DOCS]: Suggest using utf8mb4_bin for collation #28627

wants to merge 1 commit into from

Conversation

algernon
Copy link
Contributor

For MySQL, suggest using utf8mb4_bin for collation, rather than utf8mb4_general_ci, because the latter is case insensitive, and can break assumptions in various parts of Gitea. Such as branch names: branch names are case sensitive in git, but with a case insensitive collation, and the branch names stored in the database, repositories with branch names that differ in case only can lead to internal errors.

This little change updates the documentation only, as a first step of addressing the problem.

For MySQL, suggest using `utf8mb4_bin` for collation, rather than
`utf8mb4_general_ci`, because the latter is case insensitive, and can
break assumptions in various parts of Gitea. Such as branch names:
branch names are case sensitive in git, but with a case insensitive
collation, and the branch names stored in the database, repositories
with branch names that differ in case only can lead to internal errors.

This little change updates the documentation only, as a first step of
addressing the problem.

Signed-off-by: Gergely Nagy <forgejo@gergo.csillger.hu>
@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Dec 28, 2023
@pull-request-size pull-request-size bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 28, 2023
@algernon
Copy link
Contributor Author

I have a proper fix in the works too, but that's going to take a bit longer, and I wanted to get the documentation update out ASAP.

@wxiaoguang
Copy link
Contributor

This change is far from ideal. For example, the gitea doctor convert doesn't match the changed SQL.

And this problem is a longstanding known problem, see:

Convert branch table name column to a new collation for mysql/mssql to support case sensitive because branch names are case sensitive #25623

I guess maybe it just needs more time to mature.

@algernon
Copy link
Contributor Author

Yes, I intentionally did not update doctor convert, because that'd be part of a proper fix (including a migration). This one focuses on the documentation only, so that new installations are not broken by default.

As for #25623: I think it is too limited, as it only changes the branch name column. Since other databases (Postgres & SQLite) are case sensitive by default, I think a better solution would be to change every collation to utf8mb4_bin (and whatever MSSQL needs), lest the same problem come back later on in a different form.

@wxiaoguang
Copy link
Contributor

wxiaoguang commented Dec 28, 2023

Yes, I intentionally did not update doctor convert, because that'd be part of a proper fix (including a migration). This one focuses on the documentation only, so that new installations are not broken by default.

That's the problem. Since it is incomplete, it doesn't really help. To resolve the problem, it should be fixed fundamentally, eg: by default use case-sensitive column types.

If only the document is changed, the problem is:

  1. Inconsistency from the gitea doctor command.
  2. New instances still suffer the problem.
    • A lot of people are using Docker or existing database, they still suffer the problem.
  3. Few people really read documents.

So, it needs a full fix.

@algernon
Copy link
Contributor Author

Yes, I intentionally did not update doctor convert, because that'd be part of a proper fix (including a migration). This one focuses on the documentation only, so that new installations are not broken by default.

That's the problem. Since it is incomplete, it doesn't really help.

It helps new installations, by making them use a case-sensitive collation (if following the docs). My point is that helping new installations and helping existing ones can be treated separately, and the former doesn't need a full fix.

 1. Inconsistency from the `gitea doctor` command.

Fair enough. I can adjust the PR to touch only the database preparation parts, and then the docs and gitea doctor convert will be consistent. The FAQ can be fixed once a migration is developed too.

2. New instances still suffer the problem.

Not if they follow the docs and set the correct collation when creating the database. If you create the database with collation set to utf8mb4_bin, tables and their columns will inherit that. Updating the database prep docs help with that.

   * A lot of people are using Docker or existing database, they still suffer the problem.

Again, existing users are not a target for this PR. This PR's only aim is to help new instances by not recommending the wrong collation in the docs. That's all.

3. Few people really read documents.

So lets remove docs then? o.O

So, it needs a full fix.

Okay.

@algernon algernon closed this Dec 28, 2023
@algernon algernon deleted the docs/gitea/mysql-collation branch December 28, 2023 10:02
@wxiaoguang
Copy link
Contributor

Again, existing users are not a target for this PR. This PR's only aim is to help new instances by not recommending the wrong collation in the docs. That's all.

Actually, many (maybe most) new Gitea instances are created by docker. So it still needs a code-level fix, either partial or full.

However, if we only say Please run `gitea doctor convert`, or run `ALTER ... COLLATE utf8mb4_bin;`, it would heavily confuse end users. I guess most users would choose gitea doctor convert, then they will still surprisingly find that the problem is still there .........

For a quick patch, it could be like this: Make table column branch.name case sensitive #28633

I also agree that it's better to make all database column case-sensitive by default.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/docs size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants