[SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] Support custom encoding for json files #21254

gatorsmile · 2018-05-07T03:39:40Z

What changes were proposed in this pull request?

This is to add a test case to check the behaviors when users write json in the specified UTF-16/UTF-32 encoding with multiline off.

How was this patch tested?

N/A

gatorsmile · 2018-05-07T03:40:22Z

cc @MaxGekk @HyukjinKwon Do we have any behavior change after the previous PR: #20937?

HyukjinKwon · 2018-05-07T03:52:36Z

Nope, I am quite sure that we don't have any kind of hidden behaviour change. Both lineSep and encoding options are new. Also, these are actually now restricter than it's actually needed for now. Writing can actually work and #21247 tries to allow it; however, I left a comment for him to just focus on getting rid of the restrictions in both read / write side, which I believe is his final goal in 2.4.0 (or 3.0.0).

HyukjinKwon

adding the test seems ok if you feel in that way. this might have to be removed soon within the next release since we should allow this case anyway.

HyukjinKwon · 2018-05-07T03:55:35Z

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala

+      withTempPath { path =>
+        val ds = spark.createDataset(Seq(
+          ("a", 1), ("b", 2), ("c", 3))
+        ).repartition(2)


we don't have to repartition though.

SparkQA · 2018-05-07T07:05:02Z

Test build #90292 has finished for PR 21254 at commit d4c290e.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk · 2018-05-07T09:35:52Z

Do we have any behavior change after the previous PR: #20937?

The PR brought the encoding (and charset) option but we didn't change behavior when encoding is not specified.

As @HyukjinKwon wrote above the PR #21247 eliminates restrictions in write but the restrictions don't break previous behavior (before #20937) in any case.

HyukjinKwon · 2018-05-08T00:09:42Z

retest this please

SparkQA · 2018-05-08T03:57:42Z

Test build #90347 has finished for PR 21254 at commit d4c290e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-05-08T13:24:29Z

Merged to master.

gatorsmile added 2 commits May 6, 2018 20:35

test case

07de099

name

d4c290e

HyukjinKwon approved these changes May 7, 2018

View reviewed changes

HyukjinKwon reviewed May 7, 2018

View reviewed changes

asfgit closed this in 2f6fe7d May 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] Support custom encoding for json files #21254

[SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] Support custom encoding for json files #21254

gatorsmile commented May 7, 2018

gatorsmile commented May 7, 2018

HyukjinKwon commented May 7, 2018 •

edited

Loading

HyukjinKwon left a comment

HyukjinKwon May 7, 2018

SparkQA commented May 7, 2018

MaxGekk commented May 7, 2018

HyukjinKwon commented May 8, 2018

SparkQA commented May 8, 2018

HyukjinKwon commented May 8, 2018

[SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] Support custom encoding for json files #21254

[SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] Support custom encoding for json files #21254

Conversation

gatorsmile commented May 7, 2018

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile commented May 7, 2018

HyukjinKwon commented May 7, 2018 • edited Loading

HyukjinKwon left a comment

Choose a reason for hiding this comment

HyukjinKwon May 7, 2018

Choose a reason for hiding this comment

SparkQA commented May 7, 2018

MaxGekk commented May 7, 2018

HyukjinKwon commented May 8, 2018

SparkQA commented May 8, 2018

HyukjinKwon commented May 8, 2018

HyukjinKwon commented May 7, 2018 •

edited

Loading