-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roundtrip null values of any type #147
Changes from 4 commits
cc2c9c0
ef89f78
678814f
877bb61
8b2de88
48d7aa6
6785285
01fa79c
176d4e8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,8 +35,9 @@ object TypeCast { | |
* @param datum string value | ||
* @param castType SparkSQL type | ||
*/ | ||
private[csv] def castTo(datum: String, castType: DataType, nullable: Boolean = true): Any = { | ||
if (datum == "" && nullable && !castType.isInstanceOf[StringType]){ | ||
private[csv] def castTo(datum: String, castType: DataType, nullable: Boolean = true, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. style:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here, I think the binary incompatibility problem is that we don't have something like Spark's MiMa exclude generator to handle the |
||
treatEmptyValuesAsNulls: Boolean = false): Any = { | ||
if (datum == "" && nullable && (!castType.isInstanceOf[StringType] || treatEmptyValuesAsNulls)){ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would you please elaborate why it is a good idea to have null for StringType? I am not against, it but I want to see a clear usecase in which empty string should be parsed as null. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, just saw this. Parsing the empty strings as nulls is compatible with the default behavior on multiple hadoop based systems, including the one we use the most (Scalding). It also allows us to indicate missing values (a null for missing values is much clearer than "" and consistent with other datatypes). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. Now let's get it passed style checker and I will merge this. |
||
null | ||
} else { | ||
castType match { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit:
if (treatEmptyValuesAsNulls ...
Leave a space between the parenthesis and if here and everywhere else in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yes. I prefer the same, but I was keeping it consistent with the rest of the file's style. Would you prefer I add the space here, add it everywhere else in the file as well, or leave it unchanged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we just merged a PR that unifies the style across the code and adds Scala style checks.