Skip to content

Commit

Permalink
[#1127] YSQL: Collation Support (part 5)
Browse files Browse the repository at this point in the history
Summary:
This diff addresses a number of known issues with recently added YSQL collation
support and also enable YSQL collation support by default.

1. Disallow alter column collation unless the old and new collation match
exactly. For example:
```
create table foo(id text);
CREATE TABLE
insert into foo values ('aaaa');
INSERT 0 1
alter table foo alter column id set data type text collate "en_US.utf8";
ysqlsh:altc1.sql:5: ERROR:  This ALTER TABLE command is not yet supported.
```
In the example, column 'id' has default collation, but the alter statement tries
to change it to "en_US.utf8" which is different. An alter text column command
can succeed only when collations do not change. For example: one can change
column type from varchar(10) to varchar(20):

```
create table foo(id varchar(10) collate "en_US.utf8");
CREATE TABLE
create index foo_id_idx on foo(id collate "C" desc);
CREATE INDEX
insert into foo values ('aaaa');
INSERT 0 1
alter table foo alter column id set data type varchar(20) collate "en_US.utf8";
ALTER TABLE
```

In YSQL, collation change can imply on-disk change due to collation-encoding: in
docdb a character data is stored with a collation sort key. Different collation
can have different sort key for the same character data. Currently YSQL only
supports very limited alter column command when no on-disk change is possible.
Given that, we also simply disallow column collation change.

2. Disallow creating database with any non-C collation.
In my previous change, I accidently enabled create database command to create a
database with a non-C collation. This isn't intended as we still assume default
collation is C. In addition, any non-C collation can imply perf/storage cost. So
at this time we should continue to disallow this. We can enhance it in the
future if needed. For example, the following command continues to fail:

```
yugabyte=# create database db LC_COLLATE = "en_US.utf8" TEMPLATE template0;
create database db LC_COLLATE = "en_US.utf8" TEMPLATE template0;
ERROR:  Value other than 'C' for lc_collate option is not yet supported
LINE 1: create database db LC_COLLATE = "en_US.utf8" TEMPLATE templa...
                           ^
HINT:  Please report the issue on https://github.com/YugaByte/yugabyte-db/issues
```

3. Disallow text_pattern_ops/bpchar_pattern_ops/varchar_pattern_ops in index
creation unless the indexed column has "C" collation. For example,

```
create table foo(id char(10) collate "en_US.utf8");
CREATE TABLE
create index foo_id_idx on foo(id bpchar_pattern_ops asc);
ysqlsh:pat3.sql:14: ERROR:  could not use operator class "bpchar_pattern_ops" with column collation
"en_US.utf8"
HINT:  Use the COLLATE clause to set "C" collation explicitly.
```
The semantics of bpchar_pattern_ops is to create an index such that the index
keys are no longer sorted according to the base table column collation
"en_US.utf8". Instead it sorts the index keys as if the collation is "C".
However postgres does not change the index column collation to "C". Instead, it
relies upon bpchar_pattern_ops to select a custom comparator bttext_pattern_cmp
to do the comparison. That's why currently in YSQL collation support YB will
detect that the index column has "en_US.utf8" collation and the index keys will
still be sorted according to "en_US.utf8" not "C" collation. Therefore it is
disallowed and a hint is given to user to use a work around such as:
```
create index foo_id_idx on foo(id collate "C" asc);
```

Historically, postgres only supported database collation that is decided at
initdb time via OS env variable LC_COLLATE. As a result, normal index
will be sorted according to collation determined by LC_COLLATE. Such an index is
not usable by certain operators such as LIKE. As a work around, postgres
provided these *_pattern_ops as work around to build an index that ignores
LC_COLLATE (i.e., use C collation for the index). Now that postgres supports
column collation which can override the database collation, *_pattern_ops are
no longer needed.

4. Changed QLValue string_value field from 'string' to 'bytes' to suppress a
error message. A collation sort key is a null-terminated byte sequence (without
any embedded \0 byte). As a result the collation encoded string may contain
invalid UTF-8 characters. However it is set as a QLValue string_value field in
place of the original string value which is UTF-8. Protobuf reports invalid
UTF-8 as an ERROR even though the collation-encoded string is still sent across
the wire without any loss.

5. Added upgrade support

Test Plan:
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressTypesString'
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressExtension'
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressPartitions'
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressDml'
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressPlpgsql'
./yb_build.sh release --java-test 'org.yb.pgsql.TestPgRegressFeature'
./yb_build.sh release --java-test 'org.yb.pgsql.TestYsqlUpgrade'

Reviewers: mihnea, alex, dmitry

Reviewed By: alex, dmitry

Subscribers: ksreenivasan, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D13363
  • Loading branch information
myang2021 committed Nov 3, 2021
1 parent a9f463f commit 1e0330b
Show file tree
Hide file tree
Showing 22 changed files with 2,821 additions and 103 deletions.
8 changes: 4 additions & 4 deletions java/yb-pgsql/src/test/java/org/yb/pgsql/TestYsqlUpgrade.java
Original file line number Diff line number Diff line change
Expand Up @@ -1202,9 +1202,9 @@ private void assertMigrationsWorked(
assertEquals(ver, migrationRow.getInt(0).intValue());
assertEquals(0, migrationRow.getInt(1).intValue());
assertTrue(migrationRow.getString(2).startsWith("V" + ver + "__"));
assertTrue("Expected migration timestamp to be at most 5 mins old!",
assertTrue("Expected migration timestamp to be at most 10 mins old!",
migrationRow.getLong(3) != null &&
System.currentTimeMillis() - migrationRow.getLong(3) < 5 * 60 * 1000);
System.currentTimeMillis() - migrationRow.getLong(3) < 10 * 60 * 1000);
}
}

Expand Down Expand Up @@ -1421,8 +1421,8 @@ private List<Row> copyWithResolvedOids(
}

private void runMigrations() throws Exception {
// Set migrations timeout to 3 min, adjusted
long timeoutMs = BuildTypeUtil.adjustTimeout(180 * 1000);
// Set migrations timeout to 6 min, adjusted
long timeoutMs = BuildTypeUtil.adjustTimeout(360 * 1000);
List<String> lines = runProcess(
TestUtils.findBinary("yb-admin"),
"--master_addresses",
Expand Down
3 changes: 2 additions & 1 deletion src/postgres/src/backend/commands/dbcommands.c
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
"https://github.com/yugabyte/yugabyte-db/issues"),
parser_errposition(pstate, dencoding->location)));

if (!YBIsCollationEnabled() && dcollate && dbcollate && strcmp(dbcollate, "C") != 0)
if (!(YBIsCollationEnabled() && kTestOnlyUseOSDefaultCollation) && dcollate &&
dbcollate && strcmp(dbcollate, "C") != 0)
ereport(YBUnsupportedFeatureSignalLevel(),
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("Value other than 'C' for lc_collate "
Expand Down
50 changes: 50 additions & 0 deletions src/postgres/src/backend/commands/indexcmds.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include "catalog/indexing.h"
#include "catalog/partition.h"
#include "catalog/pg_am.h"
#include "catalog/pg_collation.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_inherits.h"
#include "catalog/pg_opclass.h"
Expand Down Expand Up @@ -1477,6 +1478,46 @@ CheckPredicate(Expr *predicate)
errmsg("functions in index predicate must be marked IMMUTABLE")));
}

/*
* YbCheckCollationRestrictions
* Checks that the given partial-index predicate is valid.
* Disallow some built-in operator classes if the column has non-C collation.
* We already accept them if the column has C collation so continue to allow that.
*/
static void
YbCheckCollationRestrictions(Oid attcollation, Oid opclassoid)
{
HeapTuple classtup;
Form_pg_opclass classform;
char *opclassname;
HeapTuple collationtup;
Form_pg_collation collform;
char *collname;

classtup = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclassoid));
if (!HeapTupleIsValid(classtup))
elog(ERROR, "cache lookup failed for operator class %u", opclassoid);
classform = (Form_pg_opclass) GETSTRUCT(classtup);
opclassname = NameStr(classform->opcname);
if (strcasecmp(opclassname, "bpchar_pattern_ops") == 0 ||
strcasecmp(opclassname, "text_pattern_ops") == 0 ||
strcasecmp(opclassname, "varchar_pattern_ops") == 0)
{
collationtup = SearchSysCache1(COLLOID, ObjectIdGetDatum(attcollation));
if (!HeapTupleIsValid(collationtup))
elog(ERROR, "cache lookup failed for collation %u", attcollation);
collform = (Form_pg_collation) GETSTRUCT(collationtup);
collname = NameStr(collform->collname);
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("could not use operator class \"%s\" with column collation \"%s\"",
opclassname, collname),
errhint("Use the COLLATE clause to set \"C\" collation explicitly.")));
ReleaseSysCache(collationtup);
}
ReleaseSysCache(classtup);
}

/*
* Compute per-index-column information, including indexed column numbers
* or index expressions, opclasses, and indoptions. Note, all output vectors
Expand Down Expand Up @@ -1757,6 +1798,15 @@ ComputeIndexAttrs(IndexInfo *indexInfo,
accessMethodName,
accessMethodId);

/*
* In Yugabyte mode, disallow some built-in operator classes if the column has non-C
* collation.
*/
if (IsYugaByteEnabled() &&
YBIsCollationValidNonC(attcollation) &&
!kTestOnlyUseOSDefaultCollation)
YbCheckCollationRestrictions(attcollation, classOidP[attn]);

/*
* Identify the exclusion operator, if any.
*/
Expand Down
9 changes: 9 additions & 0 deletions src/postgres/src/backend/commands/ybccmds.c
Original file line number Diff line number Diff line change
Expand Up @@ -967,6 +967,15 @@ YBCPrepareAlterTableCmd(AlterTableCmd* cmd, Relation rel, YBCPgStatement handle,
errmsg("This ALTER TABLE command is not yet supported.")));
}
}
/*
* Do not allow collation update because that requires different collation
* encoding and therefore can cause on-disk changes.
*/
Oid cur_collation_id = attTup->attcollation;
Oid new_collation_id = GetColumnDefCollation(NULL, colDef, newTypId);
if (cur_collation_id != new_collation_id)
ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("This ALTER TABLE command is not yet supported.")));
break;
}

Expand Down
2 changes: 1 addition & 1 deletion src/postgres/src/common/pg_yb_common.c
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ YBIsCollationEnabled()
* The default value must be in sync with that of FLAGS_TEST_pg_collation_enabled.
*/
cached_value = YBCIsEnvVarTrueWithDefault("FLAGS_TEST_pg_collation_enabled",
false /* default_value */);
true /* default_value */);
}
return cached_value;
#else
Expand Down
2 changes: 1 addition & 1 deletion src/postgres/src/include/catalog/pg_yb_migration.dat
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@

[

{ major => '11', minor => '0', name => '<baseline>', time_applied => '_null_' }
{ major => '12', minor => '0', name => '<baseline>', time_applied => '_null_' }

]
24 changes: 12 additions & 12 deletions src/postgres/src/test/regress/expected/yb_extensions.out
Original file line number Diff line number Diff line change
Expand Up @@ -57,30 +57,30 @@ select pg_stat_statements_reset();
(1 row)

select userid,dbid,queryid,query,calls,rows,shared_blks_hit,shared_blks_read,shared_blks_dirtied,shared_blks_written,local_blks_hit,local_blks_read,local_blks_dirtied,local_blks_written,temp_blks_read,temp_blks_written,blk_read_time from pg_stat_statements;
userid | dbid | queryid | query | calls | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written | blk_read_time
--------+-------+--------------------+-----------------------------------+-------+------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+----------------+-------------------+---------------
12462 | 12463 | 304651465921150967 | select pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
userid | dbid | queryid | query | calls | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written | blk_read_time
--------+-------+---------------------+-----------------------------------+-------+------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+----------------+-------------------+---------------
13247 | 13248 | 8978530904538012215 | select pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
(1 row)

create table test(a int, b float);
insert into test(a,b) values (5,10);
select userid,dbid,queryid,query,calls,rows,shared_blks_hit,shared_blks_read,shared_blks_dirtied,shared_blks_written,local_blks_hit,local_blks_read,local_blks_dirtied,local_blks_written,temp_blks_read,temp_blks_written,blk_read_time from pg_stat_statements;
userid | dbid | queryid | query | calls | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written | blk_read_time
--------+-------+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+----------------+-------------------+---------------
12462 | 12463 | 304651465921150967 | select pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
12462 | 12463 | -1399701522274914821 | insert into test(a,b) values ($1,$2) | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
12462 | 12463 | -6731041317835482454 | select userid,dbid,queryid,query,calls,rows,shared_blks_hit,shared_blks_read,shared_blks_dirtied,shared_blks_written,local_blks_hit,local_blks_read,local_blks_dirtied,local_blks_written,temp_blks_read,temp_blks_written,blk_read_time from pg_stat_statements | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
12462 | 12463 | 8223779431833792722 | create table test(a int, b float) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
13247 | 13248 | -2113262558775307896 | select userid,dbid,queryid,query,calls,rows,shared_blks_hit,shared_blks_read,shared_blks_dirtied,shared_blks_written,local_blks_hit,local_blks_read,local_blks_dirtied,local_blks_written,temp_blks_read,temp_blks_written,blk_read_time from pg_stat_statements | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
13247 | 13248 | -1399701522274914821 | insert into test(a,b) values ($1,$2) | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
13247 | 13248 | 8978530904538012215 | select pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
13247 | 13248 | 8223779431833792722 | create table test(a int, b float) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
(4 rows)

insert into test(a,b) values (15,20);
select userid,dbid,queryid,query,calls,rows,shared_blks_hit,shared_blks_read,shared_blks_dirtied,shared_blks_written,local_blks_hit,local_blks_read,local_blks_dirtied,local_blks_written,temp_blks_read,temp_blks_written,blk_read_time from pg_stat_statements;
userid | dbid | queryid | query | calls | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written | blk_read_time
--------+-------+----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+----------------+-------------------+---------------
12462 | 12463 | 304651465921150967 | select pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
12462 | 12463 | -1399701522274914821 | insert into test(a,b) values ($1,$2) | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
12462 | 12463 | -6731041317835482454 | select userid,dbid,queryid,query,calls,rows,shared_blks_hit,shared_blks_read,shared_blks_dirtied,shared_blks_written,local_blks_hit,local_blks_read,local_blks_dirtied,local_blks_written,temp_blks_read,temp_blks_written,blk_read_time from pg_stat_statements | 2 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
12462 | 12463 | 8223779431833792722 | create table test(a int, b float) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
13247 | 13248 | -2113262558775307896 | select userid,dbid,queryid,query,calls,rows,shared_blks_hit,shared_blks_read,shared_blks_dirtied,shared_blks_written,local_blks_hit,local_blks_read,local_blks_dirtied,local_blks_written,temp_blks_read,temp_blks_written,blk_read_time from pg_stat_statements | 2 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
13247 | 13248 | -1399701522274914821 | insert into test(a,b) values ($1,$2) | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
13247 | 13248 | 8978530904538012215 | select pg_stat_statements_reset() | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
13247 | 13248 | 8223779431833792722 | create table test(a int, b float) | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
(4 rows)

drop table test;
Expand All @@ -97,4 +97,4 @@ select uuid_generate_v1() != uuid_generate_v4();
?column?
----------
t
(1 row)
(1 row)
4 changes: 2 additions & 2 deletions src/postgres/src/test/regress/expected/yb_feature_temp.out
Original file line number Diff line number Diff line change
Expand Up @@ -117,15 +117,15 @@ SELECT * FROM temptest;
COMMIT;
ERROR: Illegal state: Transaction for catalog table write operation 'pg_type' not found
SELECT * FROM temptest;
ERROR: could not open file "base/12463/t2_16950": No such file or directory
ERROR: could not open file "base/13248/t2_16950": No such file or directory
BEGIN;
CREATE TEMP TABLE temptest(col) ON COMMIT DROP AS SELECT 1;
ERROR: relation "temptest" already exists
SELECT * FROM temptest;
ERROR: current transaction is aborted, commands ignored until end of transaction block
COMMIT;
SELECT * FROM temptest;
ERROR: could not open file "base/12463/t2_16950": No such file or directory
ERROR: could not open file "base/13248/t2_16950": No such file or directory
-- ON COMMIT is only allowed for TEMP
CREATE TABLE temptest(col int) ON COMMIT DELETE ROWS;
ERROR: ON COMMIT can only be used on temporary tables
Expand Down
Loading

0 comments on commit 1e0330b

Please sign in to comment.