-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[#1127] YSQL: Collation Support (part 5)
Summary: This diff addresses a number of known issues with recently added YSQL collation support and also enable YSQL collation support by default. 1. Disallow alter column collation unless the old and new collation match exactly. For example: ``` create table foo(id text); CREATE TABLE insert into foo values ('aaaa'); INSERT 0 1 alter table foo alter column id set data type text collate "en_US.utf8"; ysqlsh:altc1.sql:5: ERROR: This ALTER TABLE command is not yet supported. ``` In the example, column 'id' has default collation, but the alter statement tries to change it to "en_US.utf8" which is different. An alter text column command can succeed only when collations do not change. For example: one can change column type from varchar(10) to varchar(20): ``` create table foo(id varchar(10) collate "en_US.utf8"); CREATE TABLE create index foo_id_idx on foo(id collate "C" desc); CREATE INDEX insert into foo values ('aaaa'); INSERT 0 1 alter table foo alter column id set data type varchar(20) collate "en_US.utf8"; ALTER TABLE ``` In YSQL, collation change can imply on-disk change due to collation-encoding: in docdb a character data is stored with a collation sort key. Different collation can have different sort key for the same character data. Currently YSQL only supports very limited alter column command when no on-disk change is possible. Given that, we also simply disallow column collation change. 2. Disallow creating database with any non-C collation. In my previous change, I accidently enabled create database command to create a database with a non-C collation. This isn't intended as we still assume default collation is C. In addition, any non-C collation can imply perf/storage cost. So at this time we should continue to disallow this. We can enhance it in the future if needed. For example, the following command continues to fail: ``` yugabyte=# create database db LC_COLLATE = "en_US.utf8" TEMPLATE template0; create database db LC_COLLATE = "en_US.utf8" TEMPLATE template0; ERROR: Value other than 'C' for lc_collate option is not yet supported LINE 1: create database db LC_COLLATE = "en_US.utf8" TEMPLATE templa... ^ HINT: Please report the issue on https://github.com/YugaByte/yugabyte-db/issues ``` 3. Disallow text_pattern_ops/bpchar_pattern_ops/varchar_pattern_ops in index creation unless the indexed column has "C" collation. For example, ``` create table foo(id char(10) collate "en_US.utf8"); CREATE TABLE create index foo_id_idx on foo(id bpchar_pattern_ops asc); ysqlsh:pat3.sql:14: ERROR: could not use operator class "bpchar_pattern_ops" with column collation "en_US.utf8" HINT: Use the COLLATE clause to set "C" collation explicitly. ``` The semantics of bpchar_pattern_ops is to create an index such that the index keys are no longer sorted according to the base table column collation "en_US.utf8". Instead it sorts the index keys as if the collation is "C". However postgres does not change the index column collation to "C". Instead, it relies upon bpchar_pattern_ops to select a custom comparator bttext_pattern_cmp to do the comparison. That's why currently in YSQL collation support YB will detect that the index column has "en_US.utf8" collation and the index keys will still be sorted according to "en_US.utf8" not "C" collation. Therefore it is disallowed and a hint is given to user to use a work around such as: ``` create index foo_id_idx on foo(id collate "C" asc); ``` Historically, postgres only supported database collation that is decided at initdb time via OS env variable LC_COLLATE. As a result, normal index will be sorted according to collation determined by LC_COLLATE. Such an index is not usable by certain operators such as LIKE. As a work around, postgres provided these *_pattern_ops as work around to build an index that ignores LC_COLLATE (i.e., use C collation for the index). Now that postgres supports column collation which can override the database collation, *_pattern_ops are no longer needed. 4. Changed QLValue string_value field from 'string' to 'bytes' to suppress a error message. A collation sort key is a null-terminated byte sequence (without any embedded \0 byte). As a result the collation encoded string may contain invalid UTF-8 characters. However it is set as a QLValue string_value field in place of the original string value which is UTF-8. Protobuf reports invalid UTF-8 as an ERROR even though the collation-encoded string is still sent across the wire without any loss. 5. Added upgrade support Test Plan: ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressTypesString' ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressExtension' ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressPartitions' ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressDml' ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressPlpgsql' ./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressFeature' ./yb_build.sh release --java-test 'org.yb.pgsql.TestYsqlUpgrade' Reviewers: mihnea, alex, dmitry Reviewed By: alex, dmitry Subscribers: ksreenivasan, yql Differential Revision: https://phabricator.dev.yugabyte.com/D13363
- Loading branch information
Showing
22 changed files
with
2,821 additions
and
103 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.