Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs/design: add proposal for full collations support #14574

Merged
merged 8 commits into from
Oct 20, 2020

Conversation

bb7133
Copy link
Member

@bb7133 bb7133 commented Jan 24, 2020

What problem does this PR solve?

This PR adds a proposal for full collations support in TiDB.

An issue has been created for the discussions on this proposal:

#14573

What is changed and how it works?

NA

Check List

NA

Release Note

  • No release note

@bb7133 bb7133 added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. status/WIP proposal component/docs and removed help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jan 24, 2020
@bb7133 bb7133 force-pushed the bb7133/add_collations_rfc branch from c452457 to 9506ce5 Compare January 24, 2020 15:12
const (
FlagIgnoreCase uint64=1 << iota
FlagIgnoreAccent
FlagPadding
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't ignore accent, what is the defined order of the accents - byte order? The sorting order may be defined (for example Canadian French and French sort accents in the opposite order.. but there is likely a more useful example where this matters).

UCA implements this by having a Character Weight, Accent Weight, Case Weight, and Punctuation Weight.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't ignore accent, what is the defined order of the accents - byte order?

I think the order can be defined by the raw bytes/codepoint/accent weight of UCA, it depends on the demands.

Value: indexedColumnsValue
```

Pros: Extra table lookup is avoided and no change to the TIDB optimizer.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also has the pro that it is able to scan just the index in order to upgrade the on-disk storage to fix bugs in collations, because it has the sortkey and the value in the index. But this is not the method that MySQL uses (Option 3)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got your point, thanks!

But this is not the method that MySQL uses (Option 3)

For MySQL, the on-disk storage is about the engine. So far as I know, InnoDB stores collations in metadata only since data is organized in pages, while MyRocks face the similar encoding issues like TiDB, and they have limitations on collations of index:

To support memcomparable format, MyRocks restricts collations on indexed columns -- only binary, latin1_bin and utf8_bin collations are supported.

3. Users of TiDB can distinguish the tables with old collation and new collations easily, if they both exist in a TiDB cluster.
4. All replications/backups/recovers through TiDB tools like Binlog-related mechanism, CDC and BR should not encounter unexpected errors caused from the changes of collations in TiDB. For example,
- When using BR, if backup data from old cluster is applied to the new cluster, the new cluster should know that the data is binary collated without padding, or otherwise 'Duplicate entry' error may be reported for primary key/unique key columns.
- If we don't allow both old collations and new collations exist in a TiDB cluster(the Option 4, see below), trying to make replications/backup & recovery between old/new clusters will break this prerequisite.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a requirement that you can fix bugs in a collation. Because indexed values affect sort order, MySQL is not able to fix a collation ever. They instead need to create a new collation, and migration is painful. You should expect that there are bugs in the implementation, and have a plan to be able to scan through and upgrade incorrect values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I think this can be resolved by marking an internal version in TiDB's collation metadata.

I will try to add some descriptions about this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus 1 on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be resolved by marking an internal version in TiDB's collation metadata.

The real difficulty here is how to cope with index, should we need to regenerate index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, index reorgnization seems to be the only way to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can fix bug by this, why we still introduce concept 0900? We can almost use the lastest UCA(now is 10.0), and take the upgradation as a big bug fix. For example, we can use utf8mb4_ai_ci and don't expose the version to user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, we can use utf8mb4_ai_ci and don't expose the version to user.

The key part of this is the upgrading: when we upgrade the internal version of utf8mb4_ai_ci, the existing on-disk data must be updated too. I've updated the RFC on the implementation of this, PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bb7133 Then what‘s the problem with utf8mb4_ai_ci without the version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean that we may have bugs in the implementation and write the wrong data on disk.

Comment on lines 209 to 217
All strings in MySQL are with implicit collations, if they're not explicitly specified, so we may need to check/update all codes related to strings comparison/lookup:

- Encoding/Decoding: update codec related functions in `codec`/`tablecodec` package.
- SQL Runtime: update like/regex/string comparison related functions in `expression` package.
- SQL Optimizer: If option 1 for encodings is chosen, the optimizer should build the extra table lookup plans.
- DDL/Schema: check if the collation is old/new from the versions of tables/columns.
- Misc
+ update string comparison related functions in `util` package.
+ check if range functions defined in table partitions work with collations.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this explicitly described which collation features will be supported by the framework. For example:

  • Primary Weight i.e. character (yes)
  • Secondary Weight i.e. accent (current proposal says a shortcut will be implemented, as a boolean for sensitivity)
  • Tertiary Weight, i.e. case (current proposal says a shortcut will be implemented, as a boolean for sensitivity)
  • Quaternary Weight, i.e. punctuation weight (not supported)
  • PAD / NOPAD (maybe)
  • Reorder and inheritance rules (??)

The MySQL blog posts give a good overview of what these features are:
http://mysqlserverteam.com/new-collations-in-mysql-8-0-0/
http://mysqlserverteam.com/mysql-8-0-1-accent-and-case-sensitive-collations-for-utf8mb4/
http://mysqlserverteam.com/mysql-8-0-collations-the-devil-is-in-the-details/

Copy link
Member Author

@bb7133 bb7133 Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I will try to add those features in this proposal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PAD / NOPAD

This will be supported, we've some issues related to it already, like #13664

Reorder and inheritance rules (??)

Currently, we don't have a plan for this.

In MySQL 8, the default collation of `utf8mb4` has been changed to `utf8mb4_900_ai_ci`, in which the Unicode version is upgraded to 9.0, with full UCA implementation.

Should TiDB aim to support all collation of MySQL?
* If TiDB supports the latest collations only(for example, support `utf8mb4_900_ai_ci` only and treat `utf8mb4_general_ci` as one of its alias). This seems to be 'standard-correct' but may lead to potential compatibility issues.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend this path. Keep in mind why MySQL can not do this:

  • The collation affects the on-disk format. It can't just change meta data to upgrade users to a new version, it needs to upgrade the tables (collations are effectively a data type on indexed values).
  • The upgrade can not be guaranteed to be unattended. Due to both bugs, and differences unique key violations may occur.

These will make migrations of edge-cases harder, but applications rarely depend on the behaviour of the edge-cases. Bug-for-bug compatibility in datatypes is really hardwork, so I suspect even with best intentions (and all collations implemented) this case will still exist where unique constraints errors will be triggered during migrations incorrectly.

An automated way for the DM framwork to handle this, would be to defer creating unique indexes, and first scan and report any violations. They are usually small enough that an operator can fix them - and may want to in the source database.

Copy link
Member Author

@bb7133 bb7133 Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the recommendations, especially the idea of uniqueness-detect for DM framework.

My only concern is that some behaviors may not be detected by the on-storage data scans. For example, the potential changes of application queries, but that may be be fine...as you mentioned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since many of TiDB's users expect to migrate from MySQL without any upper-level change, the basic principle is that TiDB provides as much compatibility as possible with MySQL, which is, most of the time we have to pay the price for the historical burden of MySQL(like utf8 and utf8mb4).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want general_ci, and after reading the doc of Oracle I think we could do better than MySQL.
The reason that I don't want general_ci is:
https://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci
and
https://stackoverflow.com/questions/1036454/what-are-the-differences-between-utf8-general-ci-and-utf8-unicode-ci

@wwar
Copy link

wwar commented Jan 24, 2020

I have an alternative proposal to full collations support, which is to first implement weight_string for utf8mb4_900_ai_ci + functional indexes.

Here is an example in MySQL 8.0.19 which has both:

CREATE TABLE pets (
 id INT NOT NULL PRIMARY KEY auto_increment,
 animaltype VARCHAR(50),
 INDEX ((WEIGHT_STRING(animaltype)))
);

INSERT INTO pets (animaltype) VALUES ('doG'), ('CAT'), ('FrOG'), ('ROCK');
SELECT * FROM pets WHERE WEIGHT_STRING(animaltype)=weight_string('dog');
EXPLAIN SELECT * FROM pets WHERE WEIGHT_STRING(animaltype)=weight_string('dog');

..

mysql8019> SELECT * FROM pets WHERE WEIGHT_STRING(animaltype)=weight_string('dog');
+----+------------+
| id | animaltype |
+----+------------+
|  1 | doG        |
+----+------------+
1 row in set (0.00 sec)

mysql8019> EXPLAIN SELECT * FROM pets WHERE WEIGHT_STRING(animaltype)=weight_string('dog');
+----+-------------+-------+------------+------+------------------+------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys    | key              | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+------------------+------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | pets  | NULL       | ref  | functional_index | functional_index | 203     | const |    1 |   100.00 | Using where |
+----+-------------+-------+------------+------+------------------+------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql8019> SHOW WARNINGS;
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                          |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `test`.`pets`.`id` AS `id`,`test`.`pets`.`animaltype` AS `animaltype` from `test`.`pets` where (weight_string(`test`.`pets`.`animaltype`) = <cache>(weight_string('dog'))) |
+-------+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

This is not native support, but weight_string is the more correct implementation of sortKey as written in this proposal. Implementing it first will allow easier debugging, since strings can be compared against MySQL for correctness.

- Full UCA implementation for some Unicode collations, for example: `utf8mb4_900_ai_ci`.

## Proposal

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before diving into the details, can we first have a summary of the changes we want to make? This doesn't have to be long, just a few sentences talking about changes we want to make and why. The intention of this comment is to make the proposal easier to pick up by community developers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, and this is the reason you advised putting the 'Implementation' chapter ahead.

3. Users of TiDB can distinguish the tables with old collation and new collations easily, if they both exist in a TiDB cluster.
4. All replications/backups/recovers through TiDB tools like Binlog-related mechanism, CDC and BR should not encounter unexpected errors caused from the changes of collations in TiDB. For example,
- When using BR, if backup data from old cluster is applied to the new cluster, the new cluster should know that the data is binary collated without padding, or otherwise 'Duplicate entry' error may be reported for primary key/unique key columns.
- If we don't allow both old collations and new collations exist in a TiDB cluster(the Option 4, see below), trying to make replications/backup & recovery between old/new clusters will break this prerequisite.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus 1 on this.


The encoding layout of TiDB has been described in our [previous article](https://pingcap.com/blog/2017-07-11-tidbinternal2/#map). The row format should be changed to make it memory comparable, this is important to the index lookup. Basic principle is that all keys encoded should use the `sortKeys` result from `Key()`/`KeyFromString()` function. However, most of the `sortKeys` calculations are not reversible.

#### Option 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are some decision criteria for choosing option 1 vs option 2? Here are a few points I can think of

  1. Long term roadmap for collation
  2. Engineering effort
  3. Bug fixing difficulty
  4. Migration effort
  5. Upgrade effort

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

I think it's about performance and engineering. Opinion 2 is a little bit easier to implement while requires a new encoding layout. Both options have potential performance regressions, but looks that the issue of Option 1 is more serious.

For the points you listed, most of them(1, 3, 4 and 5) are the same for both options.


Cons: Since existing TiDB cluster can't get new collations enabled, requirement 3 is eliminated; Requirement 4 is not met.

## Implementation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we move this section to the beginning of the proposal section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it's better to keep the alternatives firstly, and then implementation details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the change, there will be a TODO list for progress tracking as well as the guidance for the community.

@bb7133
Copy link
Member Author

bb7133 commented Jan 28, 2020

/run-all-tests

StringFromKey(str []byte) string
}
```
The interface is quite similar to the Go [collate package](https://godoc.org/golang.org/x/text/collate), it is possible to implement some of the MySQL collation by simply mapping it to an existing Go collation.
Copy link
Member

@wjhuang2016 wjhuang2016 Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The implement of collate in GO use UCA 620, and MySQL newest's collations use UCA 900.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can we use go collate package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UCA framework doesn't change between the versions, so differences can be resolved by forking the collate package and change the lookup table.

The issue is only some of the MySQL collations are implemented by standard UCA. For example, utf8mb4_general_ci is not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UCA framework doesn't change between the versions

According to https://unicode.org/reports/tr10/#Migration:

From 620 to 630:

The IgnoreSP option for variable weighted characters has been removed. Implementers of this option may instead refer to CLDR Shifted behavior.

I think the implement may change between versions.

In MySQL 8, the default collation of `utf8mb4` has been changed to `utf8mb4_900_ai_ci`, in which the Unicode version is upgraded to 9.0, with full UCA implementation.

Should TiDB aim to support all collation of MySQL?
* If TiDB supports the latest collations only(for example, support `utf8mb4_900_ai_ci` only and treat `utf8mb4_general_ci` as one of its alias). This seems to be 'standard-correct' but may lead to potential compatibility issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want general_ci, and after reading the doc of Oracle I think we could do better than MySQL.
The reason that I don't want general_ci is:
https://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci
and
https://stackoverflow.com/questions/1036454/what-are-the-differences-between-utf8-general-ci-and-utf8-unicode-ci

@bb7133 bb7133 force-pushed the bb7133/add_collations_rfc branch from 1af88f8 to 274775d Compare January 30, 2020 08:23

Pros: Requirement 1.a, 1.b, 2 and 3 are met.

Cons: Requirement 1.c and 4 are not met, the namings are also confusing: collations that are named from MySQL are different from MySQL, but the ones named different from MySQL behave the same as MySQL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requirement 2 is not met, since new users need to add tidb_ prefix to actually use specified collations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

* Primary Weight i.e. character
* Secondary Weight i.e. accent
* Tertiary Weight i.e. case
* PAD / NOPAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://godoc.org/golang.org/x/text/collate#Option there is no option for PAD / NOPAD. Do we need to implement the padding behaviour it by our own?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.


We have an opportunity to do a much better job than MySQL has done.

A collation, logically, should not be a property of a character set. Since all character sets are subsets of Unicode, collations can apply to Unicode codepoints without needing to be tied to individual character sets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be true. For example, users may want to store data in GBK encoding while be able to compare by Chinese characters. GBK encoding lead to less storage space compared to Unicode encoding when storing Chinese characters.

@wjhuang2016
Copy link
Member

wjhuang2016 commented Feb 3, 2020

Here is my proposal for linguistic collation(corresponding to binary collation), is not completed yet:
We support collations like:
(1) utf8mb4_locale_strength_caseLevel_...., see also mongo and Oracle and DB2
Or
(2) utf8mb4_locale_ai_ci
For option 1, we can provide more powerful collation, but it's more complex to implement.
For option 2, it can use Go's collate package in TiDB, but still may need to build the corresponding in TiKV(icu-binding?).
Note:
There is not an option "version", We try to automatically upgrade it. See also Semantic Versioning.

@wjhuang2016
Copy link
Member

We have a internal meeting afternoon, and here is the meeting notes. Please feel free to comment on it.

The implementation of utf8mb4_general_ci and utf8_general_ci in TiDB (compatible with MySQL or standard UCA)?

Discussion:
The general_ ci is not a standard unicode collation implementation. In some cases that UCA may think that some are equal where general ci may give the a different result.
Migrating from MySQL to TiDB may cause applications to throw errors for users.
Needs to meet the timeline of one of TiDB user and we need to prioritize that.
The existing go collate library does not support UCA 9.0, with full UCA implementation.(Still need to investigate)

Decision:
Compatible with MySQL general_ci implementation and add full collation later.(That is, support general_ci in 4.0)

Which KV encoding options should be used?

Discussion:
Performance vs engineering difficulty.
Needs to replace Value with SortKey in the indexed key.
Option 2: keep the original index value in the value.
Option 2 might eliminate one optimization in TiKV. Doesn’t prefer to add value to key.
The encoding should support version and extensible to add more column values to support covering index. Add rowid after the values.
As for virtual generated columns, we use index to store its data so that we don’t need to compute anymore. Option 1 eliminates this optimization.
In option 2, where to place the column value? Key or Value?

Decision:
Choose option 2
Place the column value in Value.
AI:
@coocood , @bb7133 to provide detailed encoding for option2 to support extensions.

Compatibility Options

Discussion:
Not all the compatibility options can be satisfied.
Requirement 3 must be met.
We can support option 4 first. Existing customers can upgrade to 4.0 but might not support new collation.

Decision:
Choose option 4. Use a config to control whether to use the new collation.
Keep the default behavior as the old TiDB behavior.
Later we can implement option 1 or option 2 or option 3 to make the migration smooth.
Also add label “/* AS_BINARY_COLLATION */” for old tables. (In 4.0)

Resource Allocation
TiKV: @breeswish
TiDB: @wjhuang2016 , @qw4990 or someone familiar with expression.

@tiancaiamao
Copy link
Contributor

/run-all-tests

1 similar comment
@tiancaiamao
Copy link
Contributor

/run-all-tests

2. Users of MySQL who requires collations can move to TiDB without modifying their scripts/applications.
3. Users of TiDB can distinguish the tables with old collation and new collations easily, if they both exist in a TiDB cluster.
4. All replications/backups/recovers through TiDB tools like Binlog-related mechanism, CDC and BR should not encounter unexpected errors caused from the changes of collations in TiDB. For example,
- When using BR, if backup data from old cluster is applied to the new cluster, the new cluster should know that the data is binary collated without padding, or otherwise 'Duplicate entry' error may be reported for primary key/unique key columns.
Copy link
Contributor

@kennytm kennytm May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BR works at KV level. There is no "duplicate entry" error when working at KV level.

In fact BR does not decode the keys and values beyond the table ID, so by changing the index encoding we can be sure that a table restored through BR with different new_collation_enabled setting would have broken index.

BR plans to reject restore with when new_collation_enabled is different, until the "Bug-fixing" part is implemented (ColumnInfoVersion & ALTER TABLE TIDB UPGRADE COLLATION). Is this acceptable @bb7133?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BR plans to reject restore with when new_collation_enabled is different, until the "Bug-fixing" part is implemented (ColumnInfoVersion & ALTER TABLE TIDB UPGRADE COLLATION). Is this acceptable ?

Yes, of course...

- b. After upgrade, the behavior of newly-created tables remains unchanged.
- c. If a TiDB cluster is replicating the data to a MySQL instance through Binlog, all replications remain unchanged.
2. Users of MySQL who requires collations can move to TiDB without modifying their scripts/applications.
3. Users of TiDB can distinguish the tables with old collation and new collations easily, if they both exist in a TiDB cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to make it possible to have tables with "old collation" and "new collation" coexist on the same cluster?

The current implementation reads the collation info on bootstrap from the mysql.tidb table, is that value supposed to be mutable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to make it possible to have tables with "old collation" and "new collation" coexist on the same cluster?

Yes, but currently we don't have a detailed roadmap for it.

@SunRunAway SunRunAway modified the milestones: v4.0.0-beta.1, v4.0.1 May 28, 2020
@zz-jason zz-jason removed this from the v4.0.1 milestone Jun 5, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Sep 2, 2020


## Open issues

- https://github.com/pingcap/tidb/issues/222
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change here.

@bb7133 bb7133 force-pushed the bb7133/add_collations_rfc branch from 45d0969 to db3ee98 Compare October 14, 2020 04:13
@sre-bot
Copy link
Contributor

sre-bot commented Oct 14, 2020

No release note, Please follow https://github.com/pingcap/community/blob/master/contributors/release-note-checker.md

tidb> insert into t values ('a');
Query OK, 1 row affected // Should report error "Duplicate entry 'a'"
```
In the case above, the user creates a column with a case-insensitive collation `utf8mb4_general_ci`. Since the 'real' collation of TiDB is always `utf8mb4_bin`, inserting data into the primary key with "A" and "a" does not lead to an error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the 'real' collation of TiDB is always utf8mb4_bin,
s/utf8mb4_bin/binary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@bb7133 bb7133 force-pushed the bb7133/add_collations_rfc branch from db3ee98 to 7d015d5 Compare October 14, 2020 05:26
@sre-bot
Copy link
Contributor

sre-bot commented Oct 14, 2020

Copy link
Member

@wjhuang2016 wjhuang2016 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Oct 15, 2020
@xiongjiwei
Copy link
Contributor

LGTM

@pingcap pingcap deleted a comment from ti-srebot Oct 20, 2020
@bb7133 bb7133 added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Oct 20, 2020
@bb7133
Copy link
Member Author

bb7133 commented Oct 20, 2020

/merge

@bb7133 bb7133 merged commit 0522f65 into pingcap:master Oct 20, 2020
@bb7133 bb7133 deleted the bb7133/add_collations_rfc branch December 29, 2023 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/docs proposal status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.