-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
report the error as well as the line that caused it #258
base: master
Are you sure you want to change the base?
report the error as well as the line that caused it #258
Conversation
@@ -170,7 +173,7 @@ case class CsvRelation protected[spark] ( | |||
val requiredSize = requiredFields.length | |||
tokenRdd(schemaFields.map(_.name)).flatMap { tokens => | |||
if (dropMalformed && schemaFields.length != tokens.size) { | |||
logger.warn(s"Dropping malformed line: ${tokens.mkString(delimiter.toString)}") | |||
logger.warn(s"Dropping malformed line (wrong length): ${tokens.mkString(delimiter.toString)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check travis. [error] /home/travis/build/databricks/spark-csv/src/main/scala/com/databricks/spark/csv/CsvRelation.scala:176: File line length exceeds 100 characters
For me I think this is reasonable because I sometimes see some users have some difficulties to find out why it was dropped or malformed. |
Current coverage is
|
Thanks Hyukjin, I've fixed that overlong line |
_: IllegalArgumentException if dropMalformed => | ||
logger.warn("Number format exception. " + | ||
case e: java.lang.NumberFormatException if dropMalformed => | ||
logger.warn("Number format exception (" + e + ")." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal opinion: since this is an warning message, shouldn't we maybe just print out the message only by e.getMessage()
excluding the class name of the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, you can write s"Number format exception (${a.getMessage})."
instead of concatenating them with +
which is not used generally in this library and Spark codes as well.
@@ -205,7 +205,7 @@ abstract class AbstractCsvSuite extends FunSuite with BeforeAndAfterAll { | |||
.collect() | |||
} | |||
|
|||
assert(exception.getMessage.contains("Malformed line in FAILFAST mode: 2015,Chevy,Volt")) | |||
assert(exception.getMessage.contains("Malformed line in FAILFAST mode (expected 5 tokens but received 3 tokens): 2015,Chevy,Volt")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line exceeds 100 characters as well.
I don't know Scala (at all!) so there's almost certainly cleaner ways - my apologies. The logging at the moment is sometimes unhelpful as it's hard to see the real issue - with DROPMALFORMED you see the line, with another parsing mode you get the error but not the line.