Skip to content

Commit

Permalink
#665: [YSQL] Enable support for sequences
Browse files Browse the repository at this point in the history
Summary:
This diff adds support for:
- CREATE SEQUENCE
- DROP SEQUENCE
- nextval
- currval
- lastval
- Create a table with columns of serial types

Our implementation of sequences uses one replicated user level table to store all sequences data. This table has four columns: db_oid, rel_oid, last_val, and is_called. The first two columns uniquely identify a sequence. last_val and is_called is the data necessary to determine what is the next value in the sequence. Our implementation differs from Postgres implementation by using one row in a special table as opposed to using a one-row table to store the same data.

Initially, when a sequence is created, an RPC to insert a new row is sent to the tserver that is the leader of the tablet which will store the new row. Initially last_val is set to the start value (default 1), and is_called is set to false. is_called is false when last_val hasn't been used. In other words, if nextval() reads the sequence data, it returns last_val if is_called is false. It returns last_val + increment otherwise.

Postgres implementation of nextval locks the sequence table, reads the data, checks whether incrementing (possibly by a negative value) would violate any constrains, writes the new values for last_val and is_called, and then it unlocks the table. In YugaByte's implementation we cannot lock the data table because this table is shared among all the sequences. Instead, a nextval() call sends a read RPC to read the current values `last_val` and `is_called`, constraints are then evaluated for these values, and if no errors occurred, we do a conditional update (update the row only if the values haven't changed since we last read them). If the conditional update fails, we retry the whole operation again: read data, check constraints, update data. Because our implementation uses two RPCs each time we increment `last_val` or change `is_called`, the performance of a default sequence (with `CACHE` set to 1) will be much lower than a similar sequence in Postgres. To minimize this, the user should use a cache sufficiently large to avoid issuing two RPCs for each sequence value requested through nextval(). The disadvantage of this approach, is that once a block of cache numbers has been generated, any unused numbers from the cache will be lost forever.

Pending:
- Support for CYCLE option
- ALTER SEQUENCE
- setval

Test Plan:
Manual for now. Tests coming soon:
```
postgres=# create sequence s1 increment 3 start 100 cache 1000 ;
CREATE SEQUENCE
postgres=# select nextval('s1');
 nextval
---------
     100
(1 row)

postgres=# select nextval('s1');
 nextval
---------
     103
(1 row)

postgres=# ^D\q
dog.local:~/code/yugabyte [postgres_sequence ↓·2↑·1|✚ 5⚑ 7]
14:22 $ ./bin/yb-ctl destroy; ./bin/yb-ctl create --enable_postgres; ./bin/yb-ctl status; ./bin/yb-ctl setup_pg_sequences_table; ./bin/psql -p 5433 -U postgres -h localhost^C
dog.local:~/code/yugabyte [postgres_sequence ↓·2↑·1|✚ 5⚑ 7]
14:22 $ ./bin/psql -p 5433 -U postgres -h localhost
psql (10.4)
Type "help" for help.

postgres=# select nextval('s1');
 nextval
---------
    3100
(1 row)

postgres=# ^D\q
dog.local:~/code/yugabyte [postgres_sequence ↓·2↑·1|✚ 5⚑ 7]
14:23 $ ./bin/psql -p 5433 -U postgres -h localhost
psql (10.4)
Type "help" for help.

postgres=# select nextval('s1');
 nextval
---------
    6100
(1 row)
```

```
psql (10.4)
Type "help" for help.

postgres=# create table t(k serial primary key, v int);
insert CREATE TABLE
postgres=# insert into t(v) values (100);
INSERT 0 1
postgres=# insert into t(v) values (101);
INSERT 0 1
postgres=# insert into t(v) values (102);
INSERT 0 1
postgres=# select * from t;
 k |  v
---+-----
 1 | 100
 2 | 101
 3 | 102
(3 rows)

postgres=# \d t;
                            Table "public.t"
 Column |  Type   | Collation | Nullable |           Default
--------+---------+-----------+----------+------------------------------
 k      | integer |           | not null | nextval('t_k_seq'::regclass)
 v      | integer |           |          |

postgres=# select nextval('t_k_seq');
 nextval
---------
       4
(1 row)

postgres=# select currval('t_k_seq');
 currval
---------
       4
(1 row)

postgres=# select lastval();
 lastval
---------
       4
(1 row)

```

```
postgres=# SELECT c.relname FROM pg_class c WHERE c.relkind = 'S';
 relname
---------
(0 rows)

postgres=# create sequence s1;
CREATE SEQUENCE
postgres=# create table t4(k serial, v int);
CREATE TABLE
postgres=# SELECT c.relname FROM pg_class c WHERE c.relkind = 'S';
 relname
----------
 s1
 t4_k_seq
(2 rows)

postgres=# drop sequence s1;
DROP SEQUENCE
postgres=# drop table t4;
DROP TABLE
postgres=# SELECT c.relname FROM pg_class c WHERE c.relkind = 'S';
 relname
----------
 t4_k_seq
(1 row)

postgres=# drop sequence t4_k_seq;
DROP SEQUENCE
postgres=# SELECT c.relname FROM pg_class c WHERE c.relkind = 'S';
 relname
---------
(0 rows)
```

Reviewers: neil, karthik, mihnea, robert

Reviewed By: robert

Subscribers: kannan, bogdan, neha, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D6128
  • Loading branch information
hectorgcr committed Mar 12, 2019
1 parent bd14a88 commit c7310c1
Show file tree
Hide file tree
Showing 21 changed files with 1,434 additions and 84 deletions.
615 changes: 615 additions & 0 deletions java/yb-pgsql/src/test/java/org/yb/pgsql/TestPgSequences.java

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions src/postgres/src/backend/catalog/heap.c
Original file line number Diff line number Diff line change
Expand Up @@ -2147,9 +2147,9 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
*/
adbin = nodeToString(expr);

/*
* Also deparse it to form the mostly-obsolete adsrc field.
*/
/*
* Also deparse it to form the mostly-obsolete adsrc field.
*/
adsrc = deparse_expression(expr,
deparse_context_for(RelationGetRelationName(rel),
RelationGetRelid(rel)),
Expand Down
142 changes: 127 additions & 15 deletions src/postgres/src/backend/commands/sequence.c
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@
#include "utils/syscache.h"
#include "utils/varlena.h"

/* YB includes. */
#include "pg_yb_utils.h"

/*
* We don't want to log each fetching of a value from a sequence,
Expand Down Expand Up @@ -218,9 +220,20 @@ DefineSequence(ParseState *pstate, CreateSeqStmt *seq)
rel = heap_open(seqoid, AccessExclusiveLock);
tupDesc = RelationGetDescr(rel);

/* now initialize the sequence's data */
tuple = heap_form_tuple(tupDesc, value, null);
fill_seq_with_data(rel, tuple);
if (IsYugaByteEnabled())
{
HandleYBStatus(YBCInsertSequenceTuple(ybc_pg_session,
MyDatabaseId,
ObjectIdGetDatum(seqoid),
Int64GetDatumFast(seqdataform.last_value),
false /* is_called */));
}
else
{
/* now initialize the sequence's data */
tuple = heap_form_tuple(tupDesc, value, null);
fill_seq_with_data(rel, tuple);
}

/* process OWNED BY if given */
if (owned_by)
Expand Down Expand Up @@ -421,6 +434,7 @@ AlterSequence(ParseState *pstate, AlterSeqStmt *stmt)
HeapTupleData datatuple;
Form_pg_sequence seqform;
Form_pg_sequence_data newdataform;
FormData_pg_sequence_data seq_data;
bool need_seq_rewrite;
List *owned_by;
ObjectAddress address;
Expand Down Expand Up @@ -453,14 +467,27 @@ AlterSequence(ParseState *pstate, AlterSeqStmt *stmt)

seqform = (Form_pg_sequence) GETSTRUCT(seqtuple);

/* lock page's buffer and read tuple into new sequence structure */
(void) read_seq_tuple(seqrel, &buf, &datatuple);
if (IsYugaByteEnabled())
{
int64_t last_val = 0;
HandleYBStatus(YBCReadSequenceTuple(ybc_pg_session, MyDatabaseId, ObjectIdGetDatum(relid),
&last_val, &seq_data.is_called));

seq_data.last_value = last_val;
seq_data.log_cnt = 0;
newdataform = &seq_data;
}
else
{
/* lock page's buffer and read tuple into new sequence structure */
(void) read_seq_tuple(seqrel, &buf, &datatuple);

/* copy the existing sequence data tuple, so it can be modified locally */
newdatatuple = heap_copytuple(&datatuple);
newdataform = (Form_pg_sequence_data) GETSTRUCT(newdatatuple);
/* copy the existing sequence data tuple, so it can be modified locally */
newdatatuple = heap_copytuple(&datatuple);
newdataform = (Form_pg_sequence_data) GETSTRUCT(newdatatuple);

UnlockReleaseBuffer(buf);
UnlockReleaseBuffer(buf);
}

/* Check and set new values */
init_params(pstate, stmt->options, stmt->for_identity, false,
Expand Down Expand Up @@ -522,6 +549,12 @@ DeleteSequenceTuple(Oid relid)
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed for sequence %u", relid);

if (IsYugaByteEnabled())
{
HandleYBStatus(
YBCDeleteSequenceTuple(ybc_pg_session, MyDatabaseId, ObjectIdGetDatum(relid)));
}

CatalogTupleDelete(rel, tuple);

ReleaseSysCache(tuple);
Expand Down Expand Up @@ -573,6 +606,7 @@ nextval_internal(Oid relid, bool check_permissions)
HeapTuple pgstuple;
Form_pg_sequence pgsform;
HeapTupleData seqdatatuple;
FormData_pg_sequence_data seq_data;
Form_pg_sequence_data seq;
int64 incby,
maxv,
Expand Down Expand Up @@ -630,9 +664,27 @@ nextval_internal(Oid relid, bool check_permissions)
cycle = pgsform->seqcycle;
ReleaseSysCache(pgstuple);

/* lock page' buffer and read tuple */
seq = read_seq_tuple(seqrel, &buf, &seqdatatuple);
page = BufferGetPage(buf);
retry:
if (IsYugaByteEnabled())
{
int64_t last_val;
bool is_called;
HandleYBStatus(YBCReadSequenceTuple(ybc_pg_session,
MyDatabaseId,
ObjectIdGetDatum(relid),
&last_val,
&is_called));
seq_data.last_value = last_val;
seq_data.is_called = is_called;
seq_data.log_cnt = 0;
seq = &seq_data;
}
else
{
/* lock page' buffer and read tuple */
seq = read_seq_tuple(seqrel, &buf, &seqdatatuple);
page = BufferGetPage(buf);
}

elm->increment = incby;
last = next = result = seq->last_value;
Expand All @@ -645,6 +697,13 @@ nextval_internal(Oid relid, bool check_permissions)
fetch--;
}


/*
* We don't use the WAL log record. The value has already been updated and there is no way
* to rollback to another sequence number.
*/
if (IsYugaByteEnabled())
goto check_bounds;
/*
* Decide whether we should emit a WAL log record. If so, force up the
* fetch count to grab SEQ_LOG_VALS more values than we actually need to
Expand Down Expand Up @@ -673,6 +732,7 @@ nextval_internal(Oid relid, bool check_permissions)
}
}

check_bounds:
while (fetch) /* try to fetch cache [+ log ] numbers */
{
/*
Expand Down Expand Up @@ -737,7 +797,8 @@ nextval_internal(Oid relid, bool check_permissions)
}

log -= fetch; /* adjust for any unfetched numbers */
Assert(log >= 0);
if (!IsYugaByteEnabled())
Assert(log >= 0);

/* save info in local cache */
elm->last = result; /* last returned number */
Expand All @@ -746,6 +807,34 @@ nextval_internal(Oid relid, bool check_permissions)

last_used_seq = elm;

/*
* YugaByte doesn't use the WAL, and we don't need to free the buffer because we didn't allocate
* memory for it. So close the relation and return the result now.
*/
if (IsYugaByteEnabled())
{
bool skipped = false;
/*
* We do a conditional update here to detect write conflicts with other sessions. If the
* update fails, we retry again by reading the last_val and is_called values and going
* through the whole process again.
*/
HandleYBStatus(YBCUpdateSequenceTuple(ybc_pg_session,
MyDatabaseId,
ObjectIdGetDatum(relid),
last /* last_val */,
true /* is_called */,
seq->last_value /* expected_last_val */,
seq->is_called /* expected_is_called */,
&skipped));
if (skipped)
{
goto retry;
}
relation_close(seqrel, NoLock);
return result;
}

/*
* If something needs to be WAL logged, acquire an xid, so this
* transaction's commit will trigger a WAL flush and wait for syncrep.
Expand Down Expand Up @@ -934,8 +1023,15 @@ do_setval(Oid relid, int64 next, bool iscalled)
*/
PreventCommandIfParallelMode("setval()");

/* lock page' buffer and read tuple */
seq = read_seq_tuple(seqrel, &buf, &seqdatatuple);
/*
* TODO(hector): Finish the implementation for setval(). For now, we only skip this part of the
* code to avoid errors.
*/
if (!IsYugaByteEnabled())
{
/* lock page' buffer and read tuple */
seq = read_seq_tuple(seqrel, &buf, &seqdatatuple);
}

if ((next < minv) || (next > maxv))
{
Expand Down Expand Up @@ -963,6 +1059,16 @@ do_setval(Oid relid, int64 next, bool iscalled)
/* In any case, forget any future cached numbers */
elm->cached = elm->last;

/*
* TODO(hector): Finish the implementation for setval(). YugaByte doesn't use the WAL, and we
* didn't allocate memory for buffer, so no need to free it.
*/
if (IsYugaByteEnabled())
{
relation_close(seqrel, NoLock);
return;
}

/* check the comment above nextval_internal()'s equivalent call. */
if (RelationNeedsWAL(seqrel))
GetTopTransactionId();
Expand Down Expand Up @@ -1852,6 +1958,12 @@ pg_sequence_last_value(PG_FUNCTION_ARGS)
errmsg("permission denied for sequence %s",
RelationGetRelationName(seqrel))));

if (IsYugaByteEnabled())
{
/* TODO(hector): Read the sequence's data. For now return null. */
relation_close(seqrel, NoLock);
PG_RETURN_NULL();
}
seq = read_seq_tuple(seqrel, &buf, &seqtuple);

is_called = seq->is_called;
Expand Down
2 changes: 1 addition & 1 deletion src/postgres/src/backend/commands/ybccmds.c
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ YBCCreateTable(CreateStmt *stmt, char relkind, TupleDesc desc, Oid relationId, O
relationId,
false, /* is_shared_table */
false, /* if_not_exists */
primary_key == NULL /* add_primary_key */,
primary_key == NULL /* add_primary_key */,
&handle));

/*
Expand Down
9 changes: 4 additions & 5 deletions src/postgres/src/backend/parser/gram.y
Original file line number Diff line number Diff line change
Expand Up @@ -838,6 +838,7 @@ stmt :
| CreateAsStmt
| CopyStmt
| CreateSchemaStmt
| CreateSeqStmt
| CreateStmt
| CreateUserStmt
| CreatedbStmt
Expand Down Expand Up @@ -915,7 +916,6 @@ stmt :
| AlterOpFamilyStmt { parser_ybc_not_support(@1, "This statement"); }
| CreatePolicyStmt { parser_ybc_not_support(@1, "This statement"); }
| CreatePLangStmt { parser_ybc_not_support(@1, "This statement"); }
| CreateSeqStmt { parser_ybc_not_support(@1, "This statement"); }
| CreateSubscriptionStmt { parser_ybc_not_support(@1, "This statement"); }
| CreateStatsStmt { parser_ybc_not_support(@1, "This statement"); }
| CreateTableSpaceStmt { parser_ybc_not_support(@1, "This statement"); }
Expand Down Expand Up @@ -2693,7 +2693,7 @@ alter_column_default:
;

opt_drop_behavior:
CASCADE { $$ = DROP_CASCADE; parser_ybc_not_support(@1, "CASCADE"); }
CASCADE { $$ = DROP_CASCADE; }
| RESTRICT { $$ = DROP_RESTRICT; }
| /* EMPTY */ { $$ = DROP_RESTRICT; /* default */ }
;
Expand Down Expand Up @@ -4376,7 +4376,6 @@ RefreshMatViewStmt:
CreateSeqStmt:
CREATE OptTemp SEQUENCE qualified_name OptSeqOptList
{
parser_ybc_not_support(@1, "CREATE SEQUENCE");
CreateSeqStmt *n = makeNode(CreateSeqStmt);
$4->relpersistence = $2;
n->sequence = $4;
Expand All @@ -4387,7 +4386,6 @@ CreateSeqStmt:
}
| CREATE OptTemp SEQUENCE IF_P NOT EXISTS qualified_name OptSeqOptList
{
parser_ybc_not_support(@1, "CREATE SEQUENCE");
CreateSeqStmt *n = makeNode(CreateSeqStmt);
$7->relpersistence = $2;
n->sequence = $7;
Expand Down Expand Up @@ -4442,6 +4440,7 @@ SeqOptElem: AS SimpleTypename
}
| CYCLE
{
parser_ybc_not_support(@1, "CYCLE");
$$ = makeDefElem("cycle", (Node *)makeInteger(true), @1);
}
| NO CYCLE
Expand Down Expand Up @@ -6662,7 +6661,7 @@ DropStmt: DROP drop_type_any_name IF_P EXISTS any_name_list opt_drop_behavior
/* object types taking any_name_list */
drop_type_any_name:
TABLE { $$ = OBJECT_TABLE; }
| SEQUENCE { parser_ybc_not_support(@1, "DROP SEQUENCE"); $$ = OBJECT_SEQUENCE; }
| SEQUENCE { $$ = OBJECT_SEQUENCE; }
| VIEW { $$ = OBJECT_VIEW; }
| MATERIALIZED VIEW
{
Expand Down
3 changes: 2 additions & 1 deletion src/postgres/src/backend/utils/misc/pg_yb_utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,8 @@ IsYBRelationById(Oid relid)
}

bool
IsYBRelationByKind(char relKind){
IsYBRelationByKind(char relKind)
{
return (relKind == RELKIND_RELATION || relKind == RELKIND_INDEX);
}

Expand Down
3 changes: 0 additions & 3 deletions src/postgres/src/test/regress/expected/yb_feature_types.out
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,8 @@ CREATE TABLE feature_tab_double_precision (feature_col DOUBLE PRECISION);
CREATE TABLE feature_tab_decimal (feature_col DECIMAL);
CREATE TABLE feature_tab_numeric (feature_col NUMERIC);
CREATE TABLE feature_tab_smallserial (feature_col SMALLSERIAL);
ERROR: could not open file "base/18373/0": No such file or directory
CREATE TABLE feature_tab_serial (feature_col SERIAL);
ERROR: could not open file "base/18373/0": No such file or directory
CREATE TABLE feature_tab_bigserial (feature_col BIGSERIAL);
ERROR: could not open file "base/18373/0": No such file or directory
--
-- Monetary Types
CREATE TABLE feature_tab_money (feature_col MONEY);
Expand Down
Loading

0 comments on commit c7310c1

Please sign in to comment.