Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: fix show problem for kill tidb connection (#24031) #29212

Merged
merged 11 commits into from
Nov 12, 2021
Merged

server: fix show problem for kill tidb connection (#24031) #29212

merged 11 commits into from
Nov 12, 2021

Conversation

yiwen92
Copy link
Contributor

@yiwen92 yiwen92 commented Oct 28, 2021

What problem does this PR solve?

Issue Number: close #24031

Problem Summary:
Customer cannot be noticed with a killed session, which still stay in processlist info shown as normal sessions.

What is changed and how it works?

If a session has been killed by another session, it will not be showed in show processlist query.

Check List

Tests

After fix:
image

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

fix kill tidb connection problem

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Oct 28, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • morgo
  • tiancaiamao

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Oct 28, 2021
@CLAassistant
Copy link

CLAassistant commented Oct 28, 2021

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Oct 28, 2021
@yiwen92 yiwen92 marked this pull request as ready for review October 28, 2021 09:44
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 28, 2021
@yiwen92
Copy link
Contributor Author

yiwen92 commented Oct 28, 2021

I changed 2 fields in process info, one is 'Info' shown as 'this session is being killed', another is 'Command' shown like 'Kill', since 'Kill' in Command field has been deprecated(https://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html), I am not sure whether we should change this filed.

@yiwen92
Copy link
Contributor Author

yiwen92 commented Oct 28, 2021

Feel free to give me your commends @morgo @tiancaiamao THX!

server/server.go Outdated
@@ -685,6 +685,8 @@ func (s *Server) getTLSConfig() *tls.Config {
func killConn(conn *clientConn) {
sessVars := conn.ctx.GetSessionVars()
atomic.StoreUint32(&sessVars.Killed, 1)
// 'Kill' status can be showed in Command/Info field when show processlist
conn.ctx.SetProcessInfo("this session is being killed", time.Now(), mysql.ComProcessKill, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, for MySQL compatibility the Info column is not overwritten. This helps you see what is being actively killed. The State should also say "Killed" instead of autocommit.

For #24031, there are really two issues:

  1. Update the status for connections which have been "killed" so that it is more accurate.
  2. Change the wait_timeout to be a loop, so the connection which is being killed can cleanup faster.

This fixes issue (1), but we should also fix (2) to consider #24031 resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for your review, and do we need design doc for fix (2)?

@ti-chi-bot ti-chi-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 8, 2021
@yiwen92
Copy link
Contributor Author

yiwen92 commented Nov 8, 2021

/run-check_dev_2

1 similar comment
@yiwen92
Copy link
Contributor Author

yiwen92 commented Nov 8, 2021

/run-check_dev_2

@ti-chi-bot ti-chi-bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 8, 2021
@yiwen92 yiwen92 changed the title server: explicit show kill status in process list info (#24031) server: fix show problem for kill tidb connection (#24031) Nov 8, 2021
@yiwen92
Copy link
Contributor Author

yiwen92 commented Nov 8, 2021

@bb7133 @tiancaiamao Please review, thx.

@yiwen92 yiwen92 requested a review from morgo November 8, 2021 10:55
Copy link
Contributor

@morgo morgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some risk that "killed connections" that are not shown could still be consuming a fair amount of memory, and this makes it less transparent to users what the cause is. But it's also an existing problem, since background threads are not well instrumented. It is not typical that a user will have a very high count of killed connections, if we receive bug reports we can advise them to lower the wait_timeout.

So there is some risk, but overall the change LGTM so I think we should proceed with it.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 8, 2021
@morgo morgo requested a review from dveeden November 8, 2021 16:51
@yiwen92
Copy link
Contributor Author

yiwen92 commented Nov 11, 2021

@yiwen92 Please add a test case for your change.

an integration test was added in https://github.com/pingcap/automated-tests/pull/881

@yiwen92
Copy link
Contributor Author

yiwen92 commented Nov 11, 2021

/cc @bb7133 @tiancaiamao

@tiancaiamao
Copy link
Contributor

Could this lead to a situation where a killed connection still owns locks or other things that might block other sessions?

Yes I think it could happen. This sounds like a deal breaker too :(

Well, this is a bug if it really happens. If we found such kind of bug or any user report a bug to us, we need to figure out why and fix the bug.

Our promise to the user is, he killed a session, then he doesn't see the killed session, it's killed, right? Even the resource is not released immediately, or even the query continue for a while, it will (and should) be done eventually ... If a user observe that the kill doesn't take effect (the query still running for a long time...) then its our bug.

Some common situations are: a query eat up the memory, or a query use up tikv coprocessor resource, or a query just takes too long ... the user want to kill it.
If we display something like "the killing is still in process" ... it doesn't really solve the problem
... it's not something the user really want... he would be confused and report another issue: I kill the query and why TiDB still OOM? then we still need to fix the bug ...

So my point is, the user doesn't really care about the internal state or willing to know why, display "the killing is still in process" is not what he expected ... and we deliver what he wants and ... fix bugs :trollface:

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 11, 2021
@morgo
Copy link
Contributor

morgo commented Nov 12, 2021

I've discussed this with everyone: we will approve it for now because the goal is to cherry pick to 5.3. We will work on an additional fix for master to clean up killed connections faster.

@morgo
Copy link
Contributor

morgo commented Nov 12, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 7b0eecf

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Nov 12, 2021
@ti-chi-bot
Copy link
Member

@yiwen92: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot merged commit 52c6890 into pingcap:master Nov 12, 2021
bb7133 added a commit to bb7133/tidb that referenced this pull request Mar 3, 2022
ti-chi-bot pushed a commit that referenced this pull request Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Killing connections needs cooperation from the to-be-killed connection
7 participants