Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequently updating non-index column causing too many Lock records in the unique index key #25659

Closed
MyonKeminta opened this issue Jun 22, 2021 · 7 comments
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. severity/major sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.

Comments

@MyonKeminta
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

This problem occurs if a table has:

  • Never / rarely updated unique index
  • Another column(s) that's frequently updated with point get in pessimistic transaction.

Example table definition: (without clustered index so that the primary key is a separated unique index)

create table t (id varchar(128), v int, primary key (id));

And there's frequent update operations like:

begin pessimistic;
update t set v = ? where id = ?;
commit;

2. What did you expect to see? (Required)

Everything goes well

3. What did you see instead (Required)

The performance decreases and it can be seen that next operation is very high some requests like acquire_pessimistic_lock and kv_get.

Since this locks the unique index key, it will produce a WriteType::Lock record in the write cf on the unique index key. When reading value on such a key, it will need to iterate over all these Lock records to find the latest Put or Delete record. Every time it iterates a Lock record, it performs a next operation. In some bad cases, reading each key may produce over 100k+ next operations.

4. What is your TiDB version? (Required)

4.0.10+, 5.x, master

@MyonKeminta MyonKeminta added type/bug The issue is confirmed as a bug. sig/transaction SIG:Transaction severity/major labels Jun 22, 2021
@cfzjywxk
Copy link
Contributor

Since #21229, the unique index key will be pessimistically locked, this change make the lock behaviour more reasonable but increased the possibility encountering the performance issue of the LOCK record at the tikv side.
So we may need to:

  • prepare a switch controling the lock behaviour for the unique index read, also a hotfix patch is needed in case some users may encouter the performance issues using old versions.
  • optimize the issue related to the LOCK record, there is already some plans for it.

@sticnarf
Copy link
Contributor

@cfzjywxk To solve exactly the problem introduced by #21229, I think we can just use PUT instead LOCK to write the unchanged index.

The index record should be small, so it does not bring too much overhead to put the index record again. With clustered index, the index can be larger, but typically it won't.

@cfzjywxk
Copy link
Contributor

@cfzjywxk To solve exactly the problem introduced by #21229, I think we can just use PUT instead LOCK to write the unchanged index.

The index record should be small, so it does not bring too much overhead to put the index record again. With clustered index, the index can be larger, but typically it won't.

Agree, this seems to be the plan with less changing risk, the TODOs are:

  • Change commit operation into PUT in tidb side if it's a LOCK on a unique index generating mutations.
  • As @sticnarf has suggested in the early document, change the prewrite logic in tikv side, if the prewrite operation is a LOCK, next a few nearest records for this key, if they are a sequence of LOCK records, change the current operation into PUT and use the latest value for it.

@sticnarf @MyonKeminta
What do you think?

BTW, I'm working on the hotfix patch as there are already some customers encoutering this issue.

@sticnarf
Copy link
Contributor

Change commit operation into PUT in tidb side if it's a LOCK on a unique index generating mutations.

We can include this in the next patch.

As @sticnarf has suggested in the early document, change the prewrite logic in tikv side, if the prewrite operation is a LOCK, next a few nearest records for this key, if they are a sequence of LOCK records, change the current operation into PUT and use the latest value for it.

Currently, I don't have a very clear idea of how to achieve it in detail. I come up with some different approaches, but each of them has their own drawbacks.

Approach 1

Read until the latest PUT or DELETE record in prewrite. Then, we can know how many LOCK records we need to skip, so we can do a automatic re-put.

But now prewrite does not read the latest valid record except for CDC is enabled. Reading the valid record every time introduces overhead to the hot path.

Approach 2

To optimize approach 1, we can record how many contiguous records need to be skipped in the latest LOCK or ROLLBACK record.

But it is a problem to decide where to store the number. If it is added as a separate field in the Write, then the plan cannot be picked to TiDB 4.0. Older versions of TiKV 4.0 panics when meeting an unknown field.

To keep backward compatibility, it must be put in the short_value of LOCK. This is like what we did to "protected rollback", but looks tricky and ugly.

Approach 3

A more conservative way is to let TiDB decide the re-put. TiKV only provides hints of how many useless records are skipped in PointGet or AcquirePessimisticLock (with need values) responses.

This approach does not modify the data format or introduce much overhead. But it seems complex and has limited use cases. We only re-put if we did PointGet or AcquirePessimisticLock. If we need to support scan, it can be more complex.

@cfzjywxk
Copy link
Contributor

@MyonKeminta @sticnarf
As the reading the lasting value in tikv side is expensive and changing the data format is risky.
Or could we workaround this issue doing for update point/batch point get read by unique index like this
https://github.com/pingcap/tidb/pull/25730/files in the release branch, and let GC do the whole cleanup work?

@cfzjywxk
Copy link
Contributor

Fixed in #25730 and related cherry-picks.

@github-actions
Copy link

Please check whether the issue should be labeled with 'affects-x.y' or 'backport-x.y.z',
and then remove 'needs-more-info' label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.0 This bug affects 5.0.x versions. affects-5.1 This bug affects 5.1.x versions. severity/major sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

3 participants