Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: fix csv parser #9005

Merged
merged 11 commits into from
Jan 15, 2019
Merged

executor: fix csv parser #9005

merged 11 commits into from
Jan 15, 2019

Conversation

imtbkcat
Copy link

@imtbkcat imtbkcat commented Jan 10, 2019

What problem does this PR solve?

TiDB will panic when encounter irregular csv format, like this.

\N
"\N",
"NULL",
NULL

Test SQL:

create table test (i int default null);
load data local infile 'abc.csv' into table test FIELDS TERMINATED BY ',' enclosed by '"';

This is because csv file above mixed enclosed words and unenclosed words.

another case will cause panic is shown below.

abc,123,
"def",456
hij,"789",

TiDB:

mysql> create table test (str varchar(10) default null, i int default null);
Query OK, 0 rows affected (0.01 sec)

mysql> load data local infile '~/test.csv' into table test FIELDS TERMINATED BY ',' enclosed by '"';
ERROR 2013 (HY000): Lost connection to MySQL server during query

MySQL:

mysql> create table test (str varchar(10) default null, i int default null);
Query OK, 0 rows affected (0.02 sec)

mysql> load data local infile '~/test.csv' into table test FIELDS TERMINATED BY ',' enclosed by '"';
Query OK, 3 rows affected (0.00 sec)
Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

mysql> select * from test;
+------+------+
| str  | i    |
+------+------+
| abc  |  123 |
| def  |  456 |
| hij  |  789 |
+------+------+
3 rows in set (0.00 sec)

What is changed and how it works?

Restructuring getFieldsFromLine function.

Check List

Tests

  • Unit test
  • Manual test

Code changes

  • Has exported function/method change

Side effects

  • Increased code complexity

Related changes

@imtbkcat imtbkcat added type/bugfix This PR fixes a bug. sig/execution SIG execution labels Jan 10, 2019
@imtbkcat
Copy link
Author

/run-all-tests

@codecov-io
Copy link

codecov-io commented Jan 10, 2019

Codecov Report

Merging #9005 into master will increase coverage by 0.01%.
The diff coverage is 82.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9005      +/-   ##
==========================================
+ Coverage   67.16%   67.17%   +0.01%     
==========================================
  Files         371      371              
  Lines       76393    76486      +93     
==========================================
+ Hits        51311    51383      +72     
- Misses      20494    20511      +17     
- Partials     4588     4592       +4
Impacted Files Coverage Δ
executor/load_data.go 81.48% <82.72%> (-0.88%) ⬇️
executor/join.go 77.92% <0%> (-0.52%) ⬇️
executor/distsql.go 72.53% <0%> (-0.47%) ⬇️
executor/executor.go 67.08% <0%> (+0.14%) ⬆️
expression/schema.go 94.95% <0%> (+0.84%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dca815c...0f0b27b. Read the comment docs.

Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to add the test cases which this PR could fix.

sep = append(sep, e.FieldsInfo.Enclosed)
sep = append(sep, e.FieldsInfo.Terminated...)
sep = append(sep, e.FieldsInfo.Enclosed)
var (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are so many lines in this function. Please split it into pieces.

@imtbkcat
Copy link
Author

/run-all-tests

1 similar comment
@imtbkcat
Copy link
Author

/run-all-tests

@imtbkcat
Copy link
Author

/run-unit-test

Copy link
Contributor

@tiancaiamao tiancaiamao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

fp, err = os.Create(path)
c.Assert(err, IsNil)
c.Assert(fp, NotNil)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can os.Remove(path) here immediately.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next test case still use this file.

}

func (w *fieldWriter) GetField() (bool, field) {
// the bool return value implies whether is at the end of line.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the/The

whether ?? is at the end of line

w.rollback()
}
}
for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's complex enough and more difficult to maintain.
If we meet some error next time, I'll consider use some more general method instead of hard written those things.

w.term = term
}

func (w *fieldWriter) rollback() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to change the function name.

@@ -198,6 +198,11 @@ func (s *testExecSuite) TestGetFieldsFromLine(c *C) {
`"\0\b\n\r\t\Z\\\ \c\'\""`,
[]string{string([]byte{0, '\b', '\n', '\r', '\t', 26, '\\', ' ', ' ', 'c', '\'', '"'})},
},
// Test mixed.
{
`"123",456,"\t7890",abcd`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to add more test cases to guarantee the behavior we support.

Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

@imtbkcat
Copy link
Author

/run-all-tests

Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jackysp jackysp merged commit 33b4c3e into pingcap:master Jan 15, 2019
imtbkcat pushed a commit to imtbkcat/tidb that referenced this pull request Apr 25, 2019
jackysp pushed a commit that referenced this pull request Apr 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants