-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24669][SQL] Invalidate tables in case of DROP DATABASE CASCADE #23905
Conversation
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the call to drop the database, we can get an error. For example, its not allowed to drop the default database. In that case, even though we are not going to go ahead with the drop database action, we will end up refreshing all the tables inside it ? Is that expected ?
e6598ba
to
adfb0a6
Compare
modified |
CC @HyukjinKwon |
catalog.listTables(databaseName).foreach { t => | ||
catalog.refreshTable(t) | ||
} | ||
} | ||
sparkSession.sessionState.catalog.dropDatabase(databaseName, ifExists, cascade) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this logic move under catalog.dropDatabase function ? The reason is, we have the following code in dropDatabase.
val dbName = formatDatabaseName(db)
if (dbName == DEFAULT_DATABASE) {
throw new AnalysisException(s"Can not drop default database")
}
If we keep it here, then we should have identical check. In this case formatDatabase().. handles case sensitivity in the database name. Also we should use DEFAULT_DATABASE as opposed to hard coding it here.
2ndly, can any of the calls 1) listTables 2)refreshTable throw exception ? In this case, do we want to fail the drop operation or allow the drop with a warning ? I don't know the answer. Just list the questions here so others can help answer.
cc @cloud-fan @gatorsmile for their input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i have moved that code to catalog.dropDatabase functions, i also think that make more sense
adfb0a6
to
f483c62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd a bit worry about this approach. shouldn't this be done up to the user and not automatically?
Can we do it on the behalf of user as we are following this approach while using dropTable and refresh the table so these changes will also make it inline with dropTable behaviour. |
cc @gatorsmile @dongjoon-hyun can you please review this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your first contribution, @Udbhav30 .
I left a few comment. Also, could you move this test case to DDLSuite.scala
? If you put it there, InMemoryCatalogedDDLSuite
and HiveCatalogedDDLSuite
will test it.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
Outdated
Show resolved
Hide resolved
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
Outdated
Show resolved
Hide resolved
6accf35
to
188ae92
Compare
@dongjoon-hyun I have done the suggested changes |
ok to test |
Test build #102917 has finished for PR 23905 at commit
|
@Udbhav30 . Two test case failures look relevant to this PR. Could you check them by running locally? Please update this PR after you pass them locally. Thanks. |
yes i will check and update |
Test build #102995 has finished for PR 23905 at commit
|
Test build #103001 has finished for PR 23905 at commit
|
@dongjoon-hyun i have handled all suggestions please let me know if anymore suggestions are there |
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
Show resolved
Hide resolved
@@ -218,6 +218,11 @@ class SessionCatalog( | |||
if (dbName == DEFAULT_DATABASE) { | |||
throw new AnalysisException(s"Can not drop default database") | |||
} | |||
if (cascade && databaseExists(dbName)) { | |||
listTables(dbName).foreach { t => | |||
refreshTable(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this PR is invalidating the tables. So, let's use more narrow API here instead of refreshTable
.
- refreshTable(t)
+ invalidateCachedTable(QualifiedTableName(dbName, t.table))
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
Outdated
Show resolved
Hide resolved
@@ -218,6 +218,11 @@ class SessionCatalog( | |||
if (dbName == DEFAULT_DATABASE) { | |||
throw new AnalysisException(s"Can not drop default database") | |||
} | |||
if (cascade && databaseExists(dbName)) { | |||
listTables(dbName).foreach { t => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun Do we need to worry about any recurring exceptions from this call like following ?
http://discuss.itversity.com/t/unable-to-perform-listtables-on-spark-catalog-class/15888
Should we fail the drop database or warn and proceed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best behavior is compatible with the old Spark behavior. So, that will be warn and proceed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @dongjoon-hyun
c4e3214
to
eab40d6
Compare
Test build #103082 has finished for PR 23905 at commit
|
Test build #103083 has finished for PR 23905 at commit
|
Test build #103081 has finished for PR 23905 at commit
|
retest this please |
Test build #103093 has finished for PR 23905 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your first contribution, @Udbhav30 . And thank you, @SongYadong , @dilipbiswal , @felixcheung , @cloud-fan .
+1, LGTM, too. Merged to master/2.4/2.3.
For the MetaException
, we can ignore for now since it seems that Spark don't handle that except partitioning handling cases. We can revisit that if needed later.
## What changes were proposed in this pull request? Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables. We follow the same when dropping a table. UT is added Closes #23905 from Udbhav30/SPARK-24669. Authored-by: Udbhav30 <u.agrawal30@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 9bddf71) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request? Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables. We follow the same when dropping a table. UT is added Closes #23905 from Udbhav30/SPARK-24669. Authored-by: Udbhav30 <u.agrawal30@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 9bddf71) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request? Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables. We follow the same when dropping a table. UT is added Closes apache#23905 from Udbhav30/SPARK-24669. Authored-by: Udbhav30 <u.agrawal30@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 9bddf71) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request? Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables. We follow the same when dropping a table. UT is added Closes apache#23905 from Udbhav30/SPARK-24669. Authored-by: Udbhav30 <u.agrawal30@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 9bddf71) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
## What changes were proposed in this pull request? Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables. We follow the same when dropping a table. UT is added Closes apache#23905 from Udbhav30/SPARK-24669. Authored-by: Udbhav30 <u.agrawal30@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 9bddf71) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
Before dropping database refresh the tables of that database, so as to refresh all cached entries associated with those tables.
We follow the same when dropping a table.
How was this patch tested?
UT is added