-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25425][SQL] Extra options should override session options in DataSource V2 #22413
Conversation
@cloud-fan Please, take a look. |
@MaxGekk . This looks worth for JIRA issue. Could you file a JIRA issue for this? |
@@ -202,7 +202,7 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { | |||
DataSourceOptions.PATHS_KEY -> objectMapper.writeValueAsString(paths.toArray) | |||
} | |||
Dataset.ofRows(sparkSession, DataSourceV2Relation.create( | |||
ds, extraOptions.toMap ++ sessionOptions + pathsOption, | |||
ds, sessionOptions ++ extraOptions.toMap + pathsOption, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, cc @rdblue since this is introduced at aadf953#diff-f70bda59304588cc3abfa3a9840653f4R197 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun The commit didn't change semantic actually. Before it was:
(extraOptions ++
DataSourceV2Utils.extractSessionConfigs(
ds = ds.asInstanceOf[DataSourceV2],
conf = sparkSession.sessionState.conf))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. It has more history. Thanks, @MaxGekk . Could you trace down when it started? We need to mark the affected version correctly in order to know the backport candidates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please add a test case for this. If this has a long history than SPARK-23203 (fixed at 2.4.0), we need to verify this during backporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes were added in #19861
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Then, it was 2.3.0.
Test build #96043 has finished for PR 22413 at commit
|
|
||
val relation = DataSourceV2Relation.create(source, options.toMap) | ||
val relation = DataSourceV2Relation.create(source, options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both read/write-side tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a test for read path since I was able to grab options propagated to DataSource but I have no idea so far for write path, only mocking probably. Does it make sense to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For write path, I think we can use the traditional way instead of introducing mocking. Let me try.
Test build #96051 has finished for PR 22413 at commit
|
retest this please |
Test build #96057 has finished for PR 22413 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but let me leave it to @cloud-fan
good catch! I believe this was a mistake. LGTM |
Test build #96078 has finished for PR 22413 at commit
|
@@ -385,6 +400,9 @@ class SimpleDataSourceV2 extends DataSourceV2 with BatchReadSupportProvider { | |||
} | |||
} | |||
|
|||
class DataSourceWithSessionConfV2 extends SimpleDataSourceV2 with SessionConfigSupport { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataSourceWithSessionConfV2
-> DataSourceV2WithSessionConfig
?
@MaxGekk . Could you specify |
Test build #96091 has finished for PR 22413 at commit
|
@@ -385,6 +400,9 @@ class SimpleDataSourceV2 extends DataSourceV2 with BatchReadSupportProvider { | |||
} | |||
} | |||
|
|||
class DataSourceV2WithSessionConfigs extends SimpleDataSourceV2 with SessionConfigSupport { | |||
def keyPrefix(): String = "test" | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, it seems that we can use the existing one.
Add a test case for write path. Thank you very much, @dongjoon-hyun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM pending Jenkins tests.
Test build #96096 has finished for PR 22413 at commit
|
Test build #96098 has finished for PR 22413 at commit
|
Thank you, @MaxGekk . Merged to master. There is some conflict. Please make backporting PRs to 2.4 and 2.3. |
+1, thanks for fixing this, @dongjoon-hyun! |
Sure, I will do that. |
Thank you so much, @MaxGekk . It would be great if we can have that in Spark 2.4.0 RC2. |
Thanks @MaxGekk, sorry for the original omission! |
…ataSource V2 In the PR, I propose overriding session options by extra options in DataSource V2. Extra options are more specific and set via `.option()`, and should overwrite more generic session options. Entries from seconds map overwrites entries with the same key from the first map, for example: ```Scala scala> Map("option" -> false) ++ Map("option" -> true) res0: scala.collection.immutable.Map[String,Boolean] = Map(option -> true) ``` Added a test for checking which option is propagated to a data source in `load()`. Closes apache#22413 from MaxGekk/session-options. Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun Here is the PR for 2.4: #22474 . I will check does it have conflicts in 2.3 and if so, I will backport it to 2.3 |
@dongjoon-hyun It cannot be merged to 2.3 easily because |
I see. Thank you, @MaxGekk ! |
…ataSource V2 In the PR, I propose overriding session options by extra options in DataSource V2. Extra options are more specific and set via `.option()`, and should overwrite more generic session options. Entries from seconds map overwrites entries with the same key from the first map, for example: ```Scala scala> Map("option" -> false) ++ Map("option" -> true) res0: scala.collection.immutable.Map[String,Boolean] = Map(option -> true) ``` Added a test for checking which option is propagated to a data source in `load()`. Closes apache#22413 from MaxGekk/session-options. Lead-authored-by: Maxim Gekk <maxim.gekk@databricks.com> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
In the PR, I propose overriding session options by extra options in DataSource V2. Extra options are more specific and set via
.option()
, and should overwrite more generic session options. Entries from seconds map overwrites entries with the same key from the first map, for example:How was this patch tested?
Added a test for checking which option is propagated to a data source in
load()
.