Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report the error as well as the line that caused it #258

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

abridgett
Copy link

I don't know Scala (at all!) so there's almost certainly cleaner ways - my apologies. The logging at the moment is sometimes unhelpful as it's hard to see the real issue - with DROPMALFORMED you see the line, with another parsing mode you get the error but not the line.

@@ -170,7 +173,7 @@ case class CsvRelation protected[spark] (
val requiredSize = requiredFields.length
tokenRdd(schemaFields.map(_.name)).flatMap { tokens =>
if (dropMalformed && schemaFields.length != tokens.size) {
logger.warn(s"Dropping malformed line: ${tokens.mkString(delimiter.toString)}")
logger.warn(s"Dropping malformed line (wrong length): ${tokens.mkString(delimiter.toString)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check travis. [error] /home/travis/build/databricks/spark-csv/src/main/scala/com/databricks/spark/csv/CsvRelation.scala:176: File line length exceeds 100 characters

@HyukjinKwon
Copy link
Member

For me I think this is reasonable because I sometimes see some users have some difficulties to find out why it was dropped or malformed.

@codecov-io
Copy link

Current coverage is 85.86%

Merging #258 into master will decrease coverage by -0.21% as of 26a1447

@@            master    #258   diff @@
======================================
  Files           12      12       
  Stmts          517     552    +35
  Branches       149     160    +11
  Methods          0       0       
======================================
+ Hit            445     474    +29
  Partial          0       0       
- Missed          72      78     +6

Review entire Coverage Diff as of 26a1447

Powered by Codecov. Updated on successful CI builds.

@abridgett
Copy link
Author

Thanks Hyukjin, I've fixed that overlong line

_: IllegalArgumentException if dropMalformed =>
logger.warn("Number format exception. " +
case e: java.lang.NumberFormatException if dropMalformed =>
logger.warn("Number format exception (" + e + ")." +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal opinion: since this is an warning message, shouldn't we maybe just print out the message only by e.getMessage() excluding the class name of the exception?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, you can write s"Number format exception (${a.getMessage})." instead of concatenating them with + which is not used generally in this library and Spark codes as well.

@@ -205,7 +205,7 @@ abstract class AbstractCsvSuite extends FunSuite with BeforeAndAfterAll {
.collect()
}

assert(exception.getMessage.contains("Malformed line in FAILFAST mode: 2015,Chevy,Volt"))
assert(exception.getMessage.contains("Malformed line in FAILFAST mode (expected 5 tokens but received 3 tokens): 2015,Chevy,Volt"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line exceeds 100 characters as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants