-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2917] [SQL] Avoid table creation in logical plan analyzing for CTAS #1846
Conversation
Hello, Jenkins, test this please. |
QA tests have started for PR 1846. This patch merges cleanly. |
QA results for PR 1846: |
Failed in |
@@ -110,18 +110,11 @@ private[hive] class HiveMetastoreCatalog(hive: HiveContext) extends Catalog with | |||
*/ | |||
object CreateTables extends Rule[LogicalPlan] { | |||
def apply(plan: LogicalPlan): LogicalPlan = plan transform { | |||
case InsertIntoCreatedTable(db, tableName, child) => | |||
case InsertIntoCreatedTable(db, tableName, child) if (db == None) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we want to check if db is None or not. We have
val databaseName = dbName.getOrElse(hive.sessionState.getCurrentDatabase)
right in line 115.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes, thank you for noticing this.
Can you explain your intention? Why do we want to add a new physical operator to take care CTAS queries and put all actual operations into |
Thank you @yhuai for reviewing this. For example in #1847, I was trying to "explain" the CTAS like I think we'd better move the table creation from logical plan analyzing to executing. So I created a new Physical Operator |
QA tests have started for PR 1846. This patch merges cleanly. |
QA tests have started for PR 1846. This patch merges cleanly. |
QA results for PR 1846: |
QA results for PR 1846: |
database: String, | ||
tableName: String, | ||
child: SparkPlan, | ||
insert: MetastoreRelation => InsertIntoHiveTable)(@transient sc: HiveContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are failing because you have parameters that are in a second parameter list but not in otherCopyArgs. However, you shouldn't need to pass the HiveContext in manually at all. It is available automatically through the sqlContext
method in SparkPlan
.
QA tests have started for PR 1846. This patch merges cleanly. |
QA results for PR 1846: |
Is this my fault? Jenkins, retest this please. |
QA tests have started for PR 1846. This patch merges cleanly. |
QA results for PR 1846: |
Seems failed due to some other reasons. |
Failed in difference places, Jenkins, retest this please. |
@marmbrus seems Jenkins doesn't answer me, can you also check if the unit test failure for me? Seems failed in something else. |
test this please |
QA tests have started for PR 1846. This patch merges cleanly. |
QA results for PR 1846: |
Still failed in pyspark. But it passed in my local. Any suggestion?
|
test this please |
QA tests have started for PR 1846 at commit
|
test this please |
QA tests have started for PR 1846 at commit
|
tableName: String, | ||
subquery: SparkPlan, | ||
insert: MetastoreRelation => InsertIntoHiveTable) | ||
extends LeafNode with Command { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a leaf node? I think that is going to break things like the addition of exchange operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a leaf node, and actually the child would be InsertIntoHiveTable
, however, the InsertIntoHiveTable
requires the table was existed in logical plan analysis, hence I pass in the MetastoreRelation => InsertIntoHiveTable
as the parameter of CreateTableAsSelect
.
You're right, the name subquery
is quite confusing, I will change it, but the logical plan it represented is well transformed in HiveStrategies
, see case logical.InsertIntoCreatedTable(database, tableName, child)
in HiveStrategies
.
QA tests have started for PR 1846 at commit
|
QA tests have finished for PR 1846 at commit
|
retest this please |
QA tests have started for PR 1846 at commit
|
QA tests have finished for PR 1846 at commit
|
This is a pretty big change, will investigate further after the 1.1. release. |
OK, sounds good to me. |
@chenghao-intel, I'd like to this merged before I begin on my next big refactoring. One small thing, do you think you could add a test to make sure we don't regress and start creating tables for EXPLAIN queries? |
Thank you @marmbrus I will add an unit test for this soon. |
ae1e072
to
6521ce8
Compare
test this please. |
QA tests have started for PR 1846 at commit
|
QA tests have finished for PR 1846 at commit
|
@@ -47,4 +47,16 @@ class SQLQuerySuite extends QueryTest { | |||
GROUP BY key, value | |||
ORDER BY value) a""").collect().toSeq) | |||
} | |||
|
|||
test("test CTAS") { | |||
sql("CREATE TABLE test_ctas_123 AS SELECT key, value FROM src").schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually isn't the semantic we want. All DDL commands should act as actions and be executed immediately. It is only when there is an EXPLAIN that we should not run the command eagerly.
I think InsertIntoCreatedTable
should inherit from Command
so that this happens automatically. I'd also consider renaming it CreateTableAsSelect
as you did with the physical operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I understood. But I'd like to make that as 2 independent PRs:
- Revert the
SchemaRDDLike.scala
andSQLQuerySuite.scala
, and only keep the logic of table creation move fromHiveMetastoreCatalog.scala
to the physical operatorCreateTableAsSelect.scala
- Update the [SPARK-2918] [SQL] [WIP] Support the extended & native command for EXPLAIN #1847 for supporting the
EXPLAIN
forcommand
(e.g.ctas
,InsertIntoTable
,WriteToFile
etc.)
What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as long as you don't remove InsertIntoCreatedTable
from SchemaRDDLike.scala I'm okay with pushing the other changes to another PR.
6521ce8
to
9a57abc
Compare
QA tests have started for PR 1846 at commit
|
QA tests have finished for PR 1846 at commit
|
Thanks! Merged to master. |
No description provided.