Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YCQL] Table stuck in the keyspace after deletion #3032

Closed
smalyshev opened this issue Nov 27, 2019 · 9 comments
Closed

[YCQL] Table stuck in the keyspace after deletion #3032

smalyshev opened this issue Nov 27, 2019 · 9 comments
Assignees
Labels
community/request Issues created by external users priority/high High Priority

Comments

@smalyshev
Copy link

I created a new keyspace and then created table sessions in it, with a couple of indexes. After that, I tried to delete it, however something strange happened: table is stuck in half-deleted state and I can not neither use nor delete it. This is what I get:

Desc tables sees it:

> desc tables;

sessions

I can describe it:

> desc table sessions;

CREATE TABLE test_space.sessions (
    session text,
    ts timestamp,
    PRIMARY KEY (session, ts)
) WITH CLUSTERING ORDER BY (ts ASC)
    AND default_time_to_live = 0;

If I try to drop it, it says it doesn't exist:

> drop table sessions;

InvalidRequest: Error from server: code=2200 [Invalid query] message="Object Not Found. The object does not exist: table_name: "sessions"
namespace {
  name: "test_space"
}

drop table sessions;
           ^^^^^^^^
 (error -301)"

If I try to query it, it doesn't exist:

> select * from sessions;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Object Not Found
select * from sessions;
              ^^^^^^^^
 (error -301)"

If I try to create the same table again, it creates it!

> create table sessions (
                 ...     session text,
                 ...     ts timestamp,
                 ...     PRIMARY KEY (session, ts)
                 ... ) WITH CLUSTERING ORDER BY (ts ASC)
                 ...     AND default_time_to_live = 0;

No error! But the keys in table description are duplicated now:

> desc table sessions;
CREATE TABLE test_space.sessions (
    session text,
    ts timestamp,
    PRIMARY KEY ((session, session), ts, ts)
) WITH CLUSTERING ORDER BY (ts ASC, ts ASC)
    AND default_time_to_live = 0;

And I can query and drop this duplicate table now. But once I dropped it, it's back to the phantom table now. Also, can not drop the keyspace:

> drop keyspace test_space;
ServerError: Server Error. Cannot delete namespace which has table: sessions [id=de3137bc9950419ea82c77af2583e395]: namespace {
  name: "test_space"
}

drop keyspace test_space;
              ^^^^^^^^^^^^^
 (error -2)
@yugabyte-ci yugabyte-ci added the community/request Issues created by external users label Nov 27, 2019
@smalyshev
Copy link
Author

If I query the system_schema.tables I get:

 keyspace_name | table_name | bloom_filter_fp_chance | caching | cdc  | comment | compaction | compression | crc_check_chance | dclocal_read_repair_chance | default_time_to_live | extensions | flags        | gc_grace_seconds | id                                   | max_index_interval | memtable_flush_period_in_ms | min_index_interval | read_repair_chance | speculative_retry | transactions
---------------+------------+------------------------+---------+------+---------+------------+-------------+------------------+----------------------------+----------------------+------------+--------------+------------------+--------------------------------------+--------------------+-----------------------------+--------------------+--------------------+-------------------+---------------------
    test_space |   sessions |                   null |    null | null |    null |       null |        null |             null |                       null |                    0 |       null | {'compound'} |             null | 95e38325-af77-2ca8-9e41-5099bc3731de |               null |                        null |               null |               null |              null | {'enabled': 'true'}

Interestingly enough, in another keyspace where I had deleted tables I also have duplicate records with the same name (but different ids). Is this normal?

@OlegLoginov OlegLoginov changed the title Table stuck in the keyspace after deletion [YCQL] Table stuck in the keyspace after deletion Nov 28, 2019
@OlegLoginov OlegLoginov self-assigned this Nov 28, 2019
@OlegLoginov
Copy link
Contributor

OlegLoginov commented Nov 28, 2019

It's correct to have tables with the same names in different keyspaces.

@smalyshev, you are talking about duplicate records:

in another keyspace where I had deleted tables I also have duplicate records with the same name (but different ids). Is this normal?

If there are a few table records with the same name (and different ids) in the same keyspace - that looks strange. Could you please provide a log how do you see that?

@smalyshev
Copy link
Author

For example:

cqlsh:system_schema> select * from tables where keyspace_name='test';

 keyspace_name | table_name | bloom_filter_fp_chance | caching | cdc  | comment | compaction | compression | crc_check_chance | dclocal_read_repair_chance | default_time_to_live | extensions | flags        | gc_grace_seconds | id                                   | max_index_interval | memtable_flush_period_in_ms | min_index_interval | read_repair_chance | speculative_retry | transactions
---------------+------------+------------------------+---------+------+---------+------------+-------------+------------------+----------------------------+----------------------+------------+--------------+------------------+--------------------------------------+--------------------+-----------------------------+--------------------+--------------------+-------------------+----------------------
          test |   sessions |                   null |    null | null |    null |       null |        null |             null |                       null |                    0 |       null | {'compound'} |             null | c5108580-e2f3-5597-0c44-d3ae6a1d0340 |               null |                        null |               null |               null |              null | {'enabled': 'false'}
          test |   sessions |                   null |    null | null |    null |       null |        null |             null |                       null |                    0 |       null | {'compound'} |             null | 1d960be8-4790-55a6-f149-cab8aaaecfff |               null |                        null |               null |               null |              null | {'enabled': 'false'}
          test |   ip_stats |                   null |    null | null |    null |       null |        null |             null |                       null |                    0 |       null | {'compound'} |             null | 9a562c29-f56c-15a0-b845-1e741fd65115 |               null |                        null |               null |               null |              null | {'enabled': 'false'}

As you can see, two tables named sessions. I could reproduce the original problem in this keyspace too now - after I tried to delete these tables, this keyspace is stuck too - tables can't be deleted and can't be used. So now I have two keyspaces like that.

@ndeodhar
Copy link
Contributor

ndeodhar commented Dec 6, 2019

The problem is in CreateTable. Consider this scenario:

create table foo; // This creates table with ID 100
drop table foo;
create table foo; // This creates table with ID 200
create index foo_idx on foo;

In this example, when foo with ID 100 is dropped, we set the table's state as DELETING and start deleting tablets. For large clusters with large number of tablets, deleting all tablets can take up to a minute. Once all tablets are deleted, we mark the table as DELETED.
Now, it's possible to create a new table foo while the older table is DELETING but not yet DELETED. In the next step, when we create an index on foo, cql server does a lookup to find the indexed (base) table. Here, it finds foo with ID 100 in its cache and sends a request to master to create an index on foo(id=100).
Master goes ahead and creates the index and alters the base table to add the index table to its list of indexes. This causes the table state to change to ALTERING and finally RUNNING once the alter command runs!

There are 2 fixes needed:

  1. CreateTable should ensure that indexed table is not in deleting or deleted state.
  2. AlterTable should ensure that table being altered in not in deleting or deleted state.

@ndeodhar ndeodhar added the priority/high High Priority label Dec 6, 2019
@smalyshev
Copy link
Author

The scenario above sounds pretty close to what I did - I had a table for which I've forgotten to create some fields and indexes, so I've deleted it and started to recreate it immediately (I wasn't aware of the complications above then), so I think it matches what is described above.

ndeodhar added a commit that referenced this issue Dec 7, 2019
Summary:
Since YSQL DDLs are not transactional yet, it's possible to result in a scenario where a namespace is present in YB metadata but not in postgres. This can happen if the creation partly succeeds - i.e. namespace is created in YB, but before it was created in postgres system tables, the operation was terminated (due to intermittent network issue or node failure for example).

In this scenario, the namespace is unusable since postgres system tables are unaware of its existence and it just lies around in YB metadata. We should add a yb-admin command to delete namespace that can help recover in such situations.

Another example is #3032.

We need a way to clean up and remove tables, namespaces, and indexes in case of DDL consistency issues.

Usage:
Delete namespace by name:
```
yb-admin delete_namespace ysql.namespace_name
yb-admin delete_namespace ycql.namespace_name
```

Delete namespace by ID
```
yb-admin delete_namespace_by_id <id>
```

Delete table by name
```
yb-admin delete_table ysql.namespace_name table_name
yb-admin delete_table ycql.namespace_name table_name
```

Delete table by ID
```
yb-admin delete_table_by_id <id>
```

Delete index by name
```
yb-admin delete_index ysql.namespace_name index_name
yb-admin delete_index ycql.namespace_name index_name
```

Delete index by ID
```
yb-admin delete_index_by_id <id>
```
Note that for YSQL, these commands will only delete data from master and not from postgres. These are only meant to be used to clean up master state in case of cluster errors or inconsistencies.

Test Plan:
Tested manually.

Created inconsistent state:
```
yugabyte=# create database nehatest;
ERROR:  Already present: Keyspace 'nehatest' already exists
yugabyte=# \l
                                   List of databases
      Name       |  Owner   | Encoding | Collate |    Ctype    |   Access privileges
-----------------+----------+----------+---------+-------------+-----------------------
 postgres        | postgres | UTF8     | C       | en_US.UTF-8 |
 system_platform | postgres | UTF8     | C       | en_US.UTF-8 |
 template0       | postgres | UTF8     | C       | en_US.UTF-8 | =c/postgres          +
                 |          |          |         |             | postgres=CTc/postgres
 template1       | postgres | UTF8     | C       | en_US.UTF-8 | =c/postgres          +
                 |          |          |         |             | postgres=CTc/postgres
 yugabyte        | postgres | UTF8     | C       | en_US.UTF-8 |
(5 rows)

yugabyte=# create database nehatest;
ERROR:  Already present: Keyspace 'nehatest' already exists
yugabyte=# drop database nehatest;
ERROR:  database "nehatest" does not exist
yugabyte=# create database nehatest;
ERROR:  Already present: Keyspace 'nehatest' already exists
```

Ran `yb-admin delete_namespace ysql.nehatest` and namespace was successfully deleted.

Also, tested delete table and delete index.

Reviewers: bogdan, mihnea

Reviewed By: mihnea

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D7657
@OlegLoginov
Copy link
Contributor

OlegLoginov commented Dec 11, 2019

Reproduced the main issue - stuck keyspace:

    // Create test table.
    session.execute("create table test_drop (h1 int primary key, " + // TS-1
                    "c1 int, c2 int, c3 int, c4 int, c5 int) " +
                    "with transactions = {'enabled' : true};");
    // Create test indexes.
    session.execute("create index i1 on test_drop (c1);");           // TS-2
    session.execute("create index i2 on test_drop (c2);");           // TS-3
    session.execute("select * from test_drop;");                     // TS-1
    // Drop test table.
    session.execute("drop table test_drop;");                        // TS-2
    // Create test table again.
    session.execute("create table test_drop (h1 int primary key, " + // TS-3
                    "c1 int, a2 int, a3 int, a4 int, a5 int) " +
                    "with transactions = {'enabled' : true};");
    // Create index.
    session.execute("create index i6 on test_drop (c1);");           // TS-1
    session.execute("drop table test_drop;");                        // TS-2
    session.execute("drop keyspace test_ks;");                       // TS-3

Error:

	at org.yb.cql.TestIndex.testDropTableTimeout(TestIndex.java:296)
Caused by: com.datastax.driver.core.exceptions.ServerError: 
An unexpected error occurred server side on /127.204.240.183:9042: Server Error. Cannot delete namespace which has index: i6 [id=36f277befd6243fea6361a9c037e827e]: namespace {
  name: "test_ks"
}
drop keyspace test_ks;
              ^^^^^^^
 (ql error -2)

[ERROR] testDropTableTimeout(org.yb.cql.TestIndex)  Time elapsed: 7.446 s  <<< ERROR!
com.datastax.driver.core.exceptions.InvalidQueryException: 
Object Not Found. The object does not exist: table_name: "test_drop"
namespace {
  name: "test_ks"
  database_type: YQL_DATABASE_CQL
}
DROP TABLE test_ks.test_drop;
                   ^^^^^^^^^
 (ql error -301)

@OlegLoginov
Copy link
Contributor

OlegLoginov commented Dec 11, 2019

Due to race conditions between TServers the test result can reference to the wrong Index (above) OR to the table:

Caused by: com.datastax.driver.core.exceptions.ServerError: 
An unexpected error occurred server side on /127.159.161.216:9042: Server Error. Cannot delete namespace which has table: test_drop [id=32c6deec55254066895075c62355e462]: namespace {
  name: "test_ks"
}

drop keyspace test_ks;
              ^^^^^^^
 (ql error -2)

[ERROR] testDropTableTimeout(org.yb.cql.TestIndex)  Time elapsed: 11.544 s  <<< ERROR!
com.datastax.driver.core.exceptions.InvalidQueryException: 
Object Not Found. The object does not exist: table_name: "test_drop"
namespace {
  name: "test_ks"
  database_type: YQL_DATABASE_CQL
}

DROP TABLE test_ks.test_drop;
                   ^^^^^^^^^
 (ql error -301)

In this case we have race between (slow) DeleteTable from TS-1 and DeleteNamespace from TS-2.

@OlegLoginov
Copy link
Contributor

Mentioned above unexpected 'Object Not Found' error as result of DROP TABLE - is tracked by this: #3133

OlegLoginov added a commit that referenced this issue Dec 24, 2019
Summary:
- Fixed 'Object not found' issue in YBClient::Data::DeleteTable().
- Preventing index attaching to a deleted table. (in CatalogManager::CreateTable)
- Preventing table restoring back to RUNNING state after deleting. (in CatalogManager::AddIndexInfoToTable)
- Added new point into the CQL Server Executor for local table cache clean-up.
- Improved 'yb-admin dump_masters_state' - to be able getting data from SysCatalog into file/to console (current implementation does not allow to get it because the dump length is limited by maximum LOG line length - not too much.. it's not enough for even 1 table.)
- Updated 'yb-admin delete_index' output.

Test Plan:
ybd --java-test org.yb.cql.TestBigNumShards#testDropTableTimeout
ybd --java-test org.yb.cql.TestWithMasterLatency#testDropTableTimeout
ybd --java-test org.yb.cql.TestIndex#testRecreateTable

ybd --cxx-test yb-admin-test --gtest_filter AdminCliTest.TestDeleteIndex

Reviewers: bogdan, mihnea, hector, neha, mikhail

Reviewed By: mikhail

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D7670
@OlegLoginov
Copy link
Contributor

Fixed by the commit above.

OlegLoginov added a commit that referenced this issue Dec 31, 2019
…ion.

Summary:
The new java test  org.yb.cql.TestBigNumShards#testDropTableTimeout was introduced in the fix for #3032 (D7670):
a70ab64

The test creates big number of shards, so it was disabled for TSAN configuration.
This diff disables the test for ASAN too, because it fails in ASAN due to timeouts with the error:   com.datastax.driver.core.exceptions.TransportException: [/127.230.226.34:9042] Connection has been closed

Test Plan:
ybd asan --java-test org.yb.cql.TestBigNumShards#testDropTableTimeout
ybd tsan --java-test org.yb.cql.TestBigNumShards#testDropTableTimeout
ybd --java-test org.yb.cql.TestBigNumShards#testDropTableTimeout

Reviewers: mikhail

Reviewed By: mikhail

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D7747
mbautin added a commit to mbautin/yugabyte-db that referenced this issue Jan 29, 2020
…r stuck table/keyspace.

Summary:
This is the backport of this commit by @OlegLoginov in master to the 2.0.5 branch: yugabyte@a70ab64

- Fixed 'Object not found' issue in YBClient::Data::DeleteTable().
- Preventing index attaching to a deleted table. (in CatalogManager::CreateTable)
- Preventing table restoring back to RUNNING state after deleting. (in CatalogManager::AddIndexInfoToTable)
- Added new point into the CQL Server Executor for local table cache clean-up.
- Improved 'yb-admin dump_masters_state' - to be able getting data from SysCatalog into file/to console (current implementation does not allow to get it because the dump length is limited by maximum LOG line length - not too much.. it's not enough for even 1 table.)
- Updated 'yb-admin delete_index' output.

Test Plan: Jenkins: skip

Reviewers: mihnea

Subscribers: yql, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D7869
mbautin pushed a commit that referenced this issue Jan 31, 2020
…table/keyspace.

Summary:
This is the backport of the following commit by @OlegLoginov from master to the 2.0.5 branch: a70ab64
Original revision in master: https://phabricator.dev.yugabyte.com/D7670

- Fixed 'Object not found' issue in YBClient::Data::DeleteTable().
- Preventing index attaching to a deleted table. (in CatalogManager::CreateTable)
- Preventing table restoring back to RUNNING state after deleting. (in CatalogManager::AddIndexInfoToTable)
- Added new point into the CQL Server Executor for local table cache clean-up.
- Improved 'yb-admin dump_masters_state' - to be able getting data from SysCatalog into file/to console (current implementation does not allow to get it because the dump length is limited by maximum LOG line length - not too much.. it's not enough for even 1 table.)
- Updated 'yb-admin delete_index' output.

Test Plan: Jenkins: skip

Reviewers: oleg, mihnea

Reviewed By: mihnea

Subscribers: jenkins-bot, yql, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D7869
carlos-username pushed a commit to carlos-username/yugabyte-db that referenced this issue Mar 11, 2020
…r stuck table/keyspace.

Summary:
This is the backport of the following commit by @OlegLoginov from master to the 2.0.5 branch: yugabyte@a70ab64
Original revision in master: https://phabricator.dev.yugabyte.com/D7670

- Fixed 'Object not found' issue in YBClient::Data::DeleteTable().
- Preventing index attaching to a deleted table. (in CatalogManager::CreateTable)
- Preventing table restoring back to RUNNING state after deleting. (in CatalogManager::AddIndexInfoToTable)
- Added new point into the CQL Server Executor for local table cache clean-up.
- Improved 'yb-admin dump_masters_state' - to be able getting data from SysCatalog into file/to console (current implementation does not allow to get it because the dump length is limited by maximum LOG line length - not too much.. it's not enough for even 1 table.)
- Updated 'yb-admin delete_index' output.

Test Plan: Jenkins: skip

Reviewers: oleg, mihnea

Reviewed By: mihnea

Subscribers: jenkins-bot, yql, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D7869
OlegLoginov added a commit that referenced this issue Sep 5, 2021
…stDropTableTimeout.

Summary:
The test `org.yb.cql.TestBigNumShards#testDropTableTimeout` sets `NumShardsPerTServer `==32.
For 1 table + 5 indexes it created 192 tablets. For the many number of tablets Jenkins can occasionally fail with timeout.

The big number of tablets was a way to reproduce some cases when the DROP TABLE happens when the CREATE TABLE/INDEX is not finished.
See for details:
- GH: #3032
- Diff: https://phabricator.dev.yugabyte.com/D7670
- Commit: a70ab64

Current fix keeps the test for big number of shards, but the number of tables (and indexes) reduced to 2.
Slow CREATE TABLE test-case is re-implemented via `TEST_simulate_slow_table_create_secs`:  `org.yb.cql.TestMasterLatency#testSlowCreateDropIndex `.

Test Plan:
ybd --java-test org.yb.cql.TestBigNumShards#testCreateDropTable --tp 1 -n 10
ybd --java-test org.yb.cql.TestMasterLatency#testSlowCreateDropTable --tp 1 -n 10
ybd --java-test org.yb.cql.TestMasterLatency#testSlowCreateDropIndex --tp 1 -n 10
ybd --java-test org.yb.cql.TestSlowCreateTable#testCreateTableTimeout --tp 1 -n 10

Reviewers: timur, amitanand

Reviewed By: amitanand

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D12427
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community/request Issues created by external users priority/high High Priority
Projects
None yet
Development

No branches or pull requests

4 participants