Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tables: fix prefix index, when the charset is utf8, truncate it from runes #7109

Merged
merged 6 commits into from
Jul 19, 2018
Merged

tables: fix prefix index, when the charset is utf8, truncate it from runes #7109

merged 6 commits into from
Jul 19, 2018

Conversation

winkyao
Copy link
Contributor

@winkyao winkyao commented Jul 19, 2018

What have you changed? (mandatory)

fix #7104 , before this PR, the index length is counted by its bytes. when the charset is utf8 or utf8mb4, the length should be counted by its runes. This PR fixes this.

What is the type of the changes? (mandatory)

  • Bug fix (non-breaking change which fixes an issue)

How has this PR been tested? (mandatory)

UT

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

NO

Does this PR affect tidb-ansible update? (mandatory)

NO

Does this PR need to be added to the release notes? (mandatory)

release note:

fix prefix index, when the charset is utf8 or utf8mb4, truncate it from runes.

Refer to a related PR or issue link (optional)

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

@winkyao winkyao added type/bugfix This PR fixes a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jul 19, 2018
Copy link
Contributor

@zimulala zimulala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zimulala
Copy link
Contributor

/run-all-tests

@zimulala zimulala added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 19, 2018
crazycs520
crazycs520 previously approved these changes Jul 19, 2018
Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazycs520 crazycs520 added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 19, 2018
rs := bytes.Runes(val)
truncateStr := string(rs[:ic.Length])
// truncate value and limit its length
v.SetBytes([]byte(truncateStr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetString can save a memory allocation.

ic := c.idxInfo.Columns[i]
if ic.Tp.Charset == charset.CharsetUTF8 || ic.Tp.Charset == charset.CharsetUTF8MB4 {
val := v.GetBytes()
if ic.Length != types.UnspecifiedLength && utf8.RuneCount(val) > ic.Length {
Copy link
Contributor

@birdstorm birdstorm Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use utf8.RuneCountInString() instead, and thus eliminate the usage of val.

Copy link
Contributor Author

@winkyao winkyao Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But RuneCountInString needs to convert bytes to string first, it's unworthy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

@winkyao
Copy link
Contributor Author

winkyao commented Jul 19, 2018

@coocood @birdstorm PTAL

Copy link
Contributor

@birdstorm birdstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@coocood
Copy link
Member

coocood commented Jul 19, 2018

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prefix index implementation for utf8 string is incorrect
5 participants