Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table: use evalBuffer to improve performance of locatePartition #18818

Merged
merged 16 commits into from
Sep 1, 2020

Conversation

imtbkcat
Copy link

What problem does this PR solve?

Issue Number: close #16667

Problem Summary: MutRowFromDatums is so heavy that cause locatePartition performance worse than expected. Especially in LOAD DATA, partition table need nearly twice time than general table.

What is changed and how it works?

What's Changed:

  • For single column partition key, like partition by hash(col) or partition by range(col), get integer value directly.
  • For expression as partition key, use evalBuffer to avoid MutRowFromDatums repeatly.

How it Works:
This two method will avoid using MutRowFromDatums many times. improve the write performance of partition table.

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test (will be added later)
  • Manual test (add detailed scripts or steps below)

Side effects

  • None

Release note

  • table: improve the write performance of partition table

@imtbkcat imtbkcat added type/enhancement The issue or PR belongs to an enhancement. sig/transaction SIG:Transaction needs-cherry-pick-3.0 labels Jul 28, 2020
@imtbkcat
Copy link
Author

Here is a simple LOAD DATA benchmark result:

Before this fix:

range partition table:

MySQL [test]> LOAD DATA LOCAL INFILE '2017Q3-capitalbikeshare-tripdata.csv' INTO TABLE trips_range FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (duration, start_date, end_date, start_station_number, start_station, end_station_number, end_station, bike_number, member_type);
Query OK, 1191585 rows affected (25.168 sec)
Records: 1191585  Deleted: 0  Skipped: 0  Warnings: 0

hash partition table:

MySQL [test]> LOAD DATA LOCAL INFILE '2017Q3-capitalbikeshare-tripdata.csv' INTO TABLE trips_hash FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (duration, start_date, end_date, start_station_number, start_station, end_station_number, end_station, bike_number, member_type);
Query OK, 1191585 rows affected (15.620 sec)
Records: 1191585  Deleted: 0  Skipped: 0  Warnings: 0

general table:

MySQL [test]> LOAD DATA LOCAL INFILE '2017Q3-capitalbikeshare-tripdata.csv' INTO TABLE trips_general FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (duration, start_date, end_date, start_station_number, start_station, end_station_number, end_station, bike_number, member_type);
Query OK, 1191585 rows affected (9.432 sec)
Records: 1191585  Deleted: 0  Skipped: 0  Warnings: 0

After using optimize:

range:
MySQL [test]> LOAD DATA LOCAL INFILE '2017Q3-capitalbikeshare-tripdata.csv' INTO TABLE trips_range FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (duration, start_date, end_date, start_station_number, start_station, end_station_number, end_station, bike_number, member_type);
Query OK, 1191585 rows affected (9.566 sec)
Records: 1191585  Deleted: 0  Skipped: 0  Warnings: 0

hash:
MySQL [test]> LOAD DATA LOCAL INFILE '2017Q3-capitalbikeshare-tripdata.csv' INTO TABLE trips_hash FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (duration, start_date, end_date, start_station_number, start_station, end_station_number, end_station, bike_number, member_type);
Query OK, 1191585 rows affected (10.437 sec)
Records: 1191585  Deleted: 0  Skipped: 0  Warnings: 0

The performance improvement is very clear in LOAD DATA cases.

@codecov
Copy link

codecov bot commented Jul 28, 2020

Codecov Report

Merging #18818 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #18818   +/-   ##
===========================================
  Coverage   79.1417%   79.1417%           
===========================================
  Files           550        550           
  Lines        149749     149749           
===========================================
  Hits         118514     118514           
  Misses        21689      21689           
  Partials       9546       9546           

func (t *partitionedTable) locateRangePartition(ctx sessionctx.Context, pi *model.PartitionInfo, r []types.Datum) (int, error) {
var ret int64
if col, ok := t.partitionExpr.Expr.(*expression.Column); ok {
ret = r[col.Index].GetInt64()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the column be null?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add some test to cover these cases, PTAL @tiancaiamao

// TODO: supports linear hashing
func (t *partitionedTable) locateHashPartition(ctx sessionctx.Context, pi *model.PartitionInfo, r []types.Datum) (int, error) {
ret, isNull, err := t.partitionExpr.Expr.EvalInt(ctx, chunk.MutRowFromDatums(r).ToRow())
if col, ok := t.partitionExpr.Expr.(*expression.Column); ok {
ret := r[col.Index].GetInt64()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r[col.Index] is null?

@tiancaiamao
Copy link
Contributor

Please add more test to cover the new code.
Especially the +-1 case and null value.

@tiancaiamao
Copy link
Contributor

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 18, 2020
Copy link
Contributor

@lysu lysu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

table/tables/partition.go Show resolved Hide resolved
@imtbkcat imtbkcat requested a review from a team as a code owner August 25, 2020 11:01
@imtbkcat imtbkcat requested review from lzmhhh123 and removed request for a team August 25, 2020 11:01
@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Aug 27, 2020
ti-srebot
ti-srebot previously approved these changes Aug 27, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Aug 27, 2020
@lysu
Copy link
Contributor

lysu commented Aug 27, 2020

/merge

@ti-srebot
Copy link
Contributor

Your auto merge job has been accepted, waiting for:

  • 19527
  • 19528
  • 19485

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Aug 27, 2020
@ti-srebot
Copy link
Contributor

/run-all-tests

@imtbkcat
Copy link
Author

imtbkcat commented Sep 1, 2020

/run-all-tests

Copy link
Contributor

@lysu lysu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@imtbkcat
Copy link
Author

imtbkcat commented Sep 1, 2020

/merge

@ti-srebot
Copy link
Contributor

/run-all-tests

@ti-srebot ti-srebot merged commit 349adf8 into pingcap:master Sep 1, 2020
ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Sep 1, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-3.0 in PR #19647

ti-srebot pushed a commit to ti-srebot/tidb that referenced this pull request Sep 1, 2020
Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor

cherry pick to release-4.0 in PR #19649

ti-srebot added a commit that referenced this pull request Sep 5, 2020
…) (#19647)

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/transaction SIG:Transaction status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize locatePartition() function and improve the write performance on partitioned table
5 participants