Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mutex to protect exchange receiver's async client #5008

Merged
merged 7 commits into from
May 26, 2022
Merged

Add mutex to protect exchange receiver's async client #5008

merged 7 commits into from
May 26, 2022

Conversation

yibin87
Copy link
Contributor

@yibin87 yibin87 commented May 26, 2022

What problem does this PR solve?

Issue Number: close #4977

Problem Summary:

What is changed and how it works?

For ExchangeReceiver's async client mode, AsyncRequestHandler instance is used to handle async client request and rsp. And it is shared between grpc and reactor thread. However, when handler is creating, there is a little chance that grpc thread get the instance before the creation is completed. So there is chance that #4977 would happen.
Now this is noticed only when makeAsyncReader failed and retried.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
@ti-chi-bot
Copy link
Member

ti-chi-bot commented May 26, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • LittleFall
  • windtalker

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 26, 2022
Signed-off-by: yibin <huyibin@pingcap.com>
Signed-off-by: yibin <huyibin@pingcap.com>
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 26, 2022
@@ -393,10 +400,10 @@ void ExchangeReceiverBase<RPCContext>::reactor(const std::vector<Request> & asyn
MPMCQueue<AsyncHandler *> ready_requests(alive_async_connections * 2);
std::vector<AsyncHandler *> waiting_for_retry_requests;

std::vector<AsyncRequestHandler<RPCContext>> handlers;
std::vector<std::unique_ptr<AsyncHandler>> handlers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't get what would happen without this change, could you tell me something?

Copy link
Contributor Author

@yibin87 yibin87 May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In AsyncGrpcExchangePacketReader‘s init function, reader = cluster->rpc_client->sendStreamRequestAsync(xxx), the right part can execute first, and grpc thread sees the reader.
I start from the issue's core down stack, so this suspicous one could be the problem. And I can't tell a whole story about what happend either. Just this seems a suspicous one. And fixed binary doesn't core down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's a great job and I've got what happened about this bug.

The only remaining question is why you change the element type of handlers from a value type to a unique_ptr.

It's ok, but why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutex field is not support for copy constructor...

@yibin87 yibin87 requested a review from fuzhe1989 May 26, 2022 11:54
Copy link
Contributor

@windtalker windtalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. do-not-merge/needs-triage-completed needs-cherry-pick-release-6.0 Type: Need cherry pick to release-6.0 needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. and removed status/LGT1 Indicates that a PR has LGTM 1. do-not-merge/needs-triage-completed labels May 26, 2022
Signed-off-by: yibin <huyibin@pingcap.com>
@yibin87
Copy link
Contributor Author

yibin87 commented May 26, 2022

/merge

@ti-chi-bot
Copy link
Member

@yibin87: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 606b704

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label May 26, 2022
@ti-chi-bot
Copy link
Member

@yibin87: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@windtalker
Copy link
Contributor

/run-unit-test

@yibin87
Copy link
Contributor Author

yibin87 commented May 26, 2022

/run-unit-test

@sre-bot
Copy link
Collaborator

sre-bot commented May 26, 2022

Coverage for changed files

Filename                            Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Flash/Mpp/ExchangeReceiver.cpp          320               307     4.06%          32                27    15.62%         486               453     6.79%         192               188     2.08%
Functions/FunctionsDateTime.cpp          68                31    54.41%          14                 3    78.57%         197                58    70.56%          48                25    47.92%
Functions/FunctionsDateTime.h          1061               628    40.81%         328               211    35.67%        2005              1196    40.35%         516               327    36.63%
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                  1449               966    33.33%         374               241    35.56%        2688              1707    36.50%         756               540    28.57%

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18277      9786             46.46%    204784  98120        52.09%

full coverage report (for internal network access only)

@yibin87 yibin87 merged commit f10b6d2 into pingcap:master May 26, 2022
ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request May 26, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5009.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5010.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-6.0 Type: Need cherry pick to release-6.0 needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random segment fault in grpc functions
6 participants