Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rpc channel leak due to concurrent operation #16021

Merged
merged 3 commits into from
May 20, 2024

Conversation

ruanwenjun
Copy link
Member

@ruanwenjun ruanwenjun commented May 17, 2024

Purpose of the pull request

related issue: #15983

Brief change log

  • Use lock to make sure only one channel will be created at the same time.

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(or)

If your pull request contain incompatible change, you should also add it to docs/docs/en/guide/upgrede/incompatible.md

@ruanwenjun ruanwenjun requested a review from caishunfeng as a code owner May 17, 2024 06:48
@ruanwenjun ruanwenjun force-pushed the dev_wenjun_fixChannelLeak branch from 949a4ea to 9f5cba2 Compare May 17, 2024 06:50
@ruanwenjun ruanwenjun added bug Something isn't working priority:high 3.2.2 labels May 17, 2024
@ruanwenjun ruanwenjun added this to the 3.2.2 milestone May 17, 2024
@ruanwenjun ruanwenjun self-assigned this May 17, 2024
caishunfeng
caishunfeng previously approved these changes May 17, 2024
Copy link
Contributor

@caishunfeng caishunfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

codecov-commenter commented May 17, 2024

Codecov Report

Attention: Patch coverage is 63.63636% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 40.59%. Comparing base (7b7a333) to head (c5feb69).

Current head c5feb69 differs from pull request most recent head 27223e7

Please upload reports for the commit 27223e7 to get more accurate results.

Files Patch % Lines
...duler/extract/base/client/NettyRemotingClient.java 63.63% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##                dev   #16021   +/-   ##
=========================================
  Coverage     40.58%   40.59%           
  Complexity     5216     5216           
=========================================
  Files          1380     1380           
  Lines         46056    46062    +6     
  Branches       4916     4916           
=========================================
+ Hits          18693    18699    +6     
  Misses        25437    25437           
  Partials       1926     1926           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Radeity
Radeity previously approved these changes May 17, 2024
Copy link
Member

@Radeity Radeity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ruanwenjun ruanwenjun dismissed stale reviews from Radeity and caishunfeng via ab60cfd May 17, 2024 09:25
@ruanwenjun ruanwenjun force-pushed the dev_wenjun_fixChannelLeak branch from 1da9f07 to eb50e67 Compare May 18, 2024 05:11
Copy link
Contributor

@rickchengx rickchengx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@rickchengx rickchengx merged commit 5463d02 into apache:dev May 20, 2024
62 of 63 checks passed
wangxj3 added a commit that referenced this pull request May 25, 2024
…ASF release (#16042)

* add version of 3.2.2

* Fix rpc channel leak due to concurrent operation (#16021)

* Fix rpc channel leak due to concurrent operation

* Throw channel create failed exception

---------

Co-authored-by: Rick Cheng <rickchengx@gmail.com>

* Fix WorkerTaskExecutorThreadPool#isOverload is incorrect (#16027)

* cp: Reduce the size of tarball to continue ASF release (#15004) (#15540)

* Reduce the size of tarball to continue ASF release

for more detail you can see https://lists.apache.org/thread/rmp7fghlj0n7h9y2v3p8gkw9f9qbo6qt
# Conflicts:
#	tools/dependencies/known-dependencies.txt

* Increase block-until-connected in ZookeeperRegistryTestCase (#16041)

* [Fix][CI] fix the ci error of Values.datasource.profile (#16031)

Co-authored-by: Eric Gao <ericgao.apache@gmail.com>

* Correction of Typos in the Chinese Document Appendix for Task Parameters (#16033)

Co-authored-by: Rick Cheng <rickchengx@gmail.com>

* [DSIP-39][parameter] Improvement startup parameters/global parameters/project parameters data type (#15967)

* [Improvement][parameter] New data types and type filtering

* [Improvement][parameter] Improvement startup parameters/global parameters data type

* fix api interfaces compatible

* add project parameter data type default value

* [Improvement][parameter] New data types and type filtering

* [Improvement][parameter] Improvement startup parameters/global parameters data type

* fix api interfaces compatible

* add project parameter data type default value

* improvement project code

* remove useless imports

* remove method onClearSearchTaskType

* add parameter doc

* optimisation logic

* code conflict resolution

* code conflict resolution

* [Improvement][Monitor] Show master && worker Busy Or Normal Status and Show Commands table list (#15978)

* update

* test

* add monitor enhance ui

* update

* update

* update doc

* fix spotless

* update

* update

* Update dolphinscheduler-api/src/main/java/org/apache/dolphinscheduler/api/controller/DataAnalysisController.java

Co-authored-by: Wenjun Ruan <wenjun@apache.org>

* Update dolphinscheduler-api/src/main/java/org/apache/dolphinscheduler/api/controller/DataAnalysisController.java

Co-authored-by: Wenjun Ruan <wenjun@apache.org>

* Update dolphinscheduler-dao/src/main/java/org/apache/dolphinscheduler/dao/mapper/ErrorCommandMapper.java

Co-authored-by: Wenjun Ruan <wenjun@apache.org>

* Update dolphinscheduler-dao/src/main/resources/org/apache/dolphinscheduler/dao/mapper/ErrorCommandMapper.xml

Co-authored-by: Wenjun Ruan <wenjun@apache.org>

* Update dolphinscheduler-dao/src/main/java/org/apache/dolphinscheduler/dao/mapper/CommandMapper.java

Co-authored-by: Wenjun Ruan <wenjun@apache.org>

* Update dolphinscheduler-dao/src/main/resources/org/apache/dolphinscheduler/dao/mapper/ErrorCommandMapper.xml

Co-authored-by: Wenjun Ruan <wenjun@apache.org>

* update

* fix spotless

* update

---------

Co-authored-by: Wenjun Ruan <wenjun@apache.org>

* [Improvement][Monitor] Add UT for montor (#15998)

* formatting Code

* add pom

* Fix NettyRemotingClient might throw IllegalMonitorStateException (#16038)

* Removed unused StateEventHandleFailure (#16052)

* Remove OmitStackTraceInFastThrow in start.sh (#16054)

* Update Chart.yaml Rollback APPversion

* modify log msg (#16062)

Co-authored-by: xiangzihao <460888207@qq.com>

* [DS-16046][fix] Set PreparedStatement parameter type (#16050)

Fix the PreparedStatement parameter is TIME, set it to java.sql.Time

Co-authored-by: Aaron Wang <wangweirao16@gmail.com>
Co-authored-by: Rick Cheng <rickchengx@gmail.com>

* Update pom.xml

* [DSIP-42] Add dolphinscheduler-aws-authentication module  (#16043)

---------

Co-authored-by: wangxj <wangxj31>
Co-authored-by: Wenjun Ruan <wenjun@apache.org>
Co-authored-by: Rick Cheng <rickchengx@gmail.com>
Co-authored-by: Eric Gao <ericgao.apache@gmail.com>
Co-authored-by: TianXinCoord <tianxincoord@163.com>
Co-authored-by: 小可耐 <46134044+sdhzwc@users.noreply.github.com>
Co-authored-by: sleo <97011595+alei1206@users.noreply.github.com>
Co-authored-by: xiangzihao <460888207@qq.com>
Co-authored-by: yinxiaolog <43245392+yinxiaolog@users.noreply.github.com>
Co-authored-by: Aaron Wang <wangweirao16@gmail.com>
@Musknine
Copy link

Musknine commented May 27, 2024

image
Does it need a container to store the channel that will be deleting? like below:
image

@Musknine
Copy link

Musknine commented May 27, 2024

another thing i thought is do we need to redefine the design pattern ? first ,we should to dicuss below questions or scenarios:
1、 one question is why do we need to cache the channel? is for wanting to reuse channel? if so,when and what scenarios do we call the close function?
2、if two(or many) threads are using the same channel,one thread call the close function,what will happen for anthor thread that is using the same channel?

in my opinion, is it better we close the channel in userEventTriggered when there is no real data transfer between channel(how long do we need to close, need to discuss,i am not familiar with the codebase)?

@ruanwenjun ruanwenjun deleted the dev_wenjun_fixChannelLeak branch May 31, 2024 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants