Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix thread blocked problem of etcd watcher #2991

Merged
merged 2 commits into from
Dec 23, 2022
Merged

Conversation

Yan1025
Copy link
Contributor

@Yan1025 Yan1025 commented Dec 7, 2022

in the previous version of EtcdDataSource, when the flowRules defined in the etcd change and trigger etcd watcher, there will be a following error: 2022-12-07 14:36:03.592 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 2336 ms, time limit is 2000 ms 2022-12-07 14:36:04.592 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 3337 ms, time limit is 2000 ms 2022-12-07 14:36:05.592 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 4337 ms, time limit is 2000 ms 2022-12-07 14:36:06.595 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] - Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 5337 ms, time limit is 2000 ms io.vertx.core.VertxException: Thread blocked
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3334)
at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at com.htsc.fst.saas.infrastructure.component.EtcdDataSource.readSource(EtcdDataSource.java:92)
at com.htsc.fst.saas.infrastructure.component.EtcdDataSource.readSource(EtcdDataSource.java:23)
at com.alibaba.csp.sentinel.datasource.AbstractDataSource.loadConfig(AbstractDataSource.java:44)
at com.htsc.fst.saas.infrastructure.component.EtcdDataSource.lambda$initWatcher$0(EtcdDataSource.java:74)
at com.htsc.fst.saas.infrastructure.component.EtcdDataSource$$Lambda$480/1375681611.accept(Unknown Source)
at io.etcd.jetcd.Watch$1.onNext(Watch.java:183)
at io.etcd.jetcd.impl.WatchImpl$WatcherImpl.onNext(WatchImpl.java:310)
at io.etcd.jetcd.impl.WatchImpl$WatcherImpl$$Lambda$488/2088661957.handle(Unknown Source

I guess the reason is : when the watcher hold etcd Client, you can't use etcd Client in the Watch's consumer function. Besides the problem above, use getKeyValue() in the watchEvent can reduce once invoke with etcd server to getting the latest flowRules :)

Describe what this PR does / why we need it

Does this pull request fix one issue?

Describe how you did it

Describe how to verify it

Special notes for reviews

in the previous version of EtcdDataSource, when the flowRules defined in the etcd change and trigger etcd watcher, there will be a following error:
2022-12-07 14:36:03.592 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] -  Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 2336 ms, time limit is 2000 ms
2022-12-07 14:36:04.592 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] -  Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 3337 ms, time limit is 2000 ms
2022-12-07 14:36:05.592 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] -  Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 4337 ms, time limit is 2000 ms
2022-12-07 14:36:06.595 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] -  Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 5337 ms, time limit is 2000 ms
io.vertx.core.VertxException: Thread blocked
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
 at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3334)
 at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
 at com.htsc.fst.saas.infrastructure.component.EtcdDataSource.readSource(EtcdDataSource.java:92)
 at com.htsc.fst.saas.infrastructure.component.EtcdDataSource.readSource(EtcdDataSource.java:23)
 at com.alibaba.csp.sentinel.datasource.AbstractDataSource.loadConfig(AbstractDataSource.java:44)
 at com.htsc.fst.saas.infrastructure.component.EtcdDataSource.lambda$initWatcher$0(EtcdDataSource.java:74)
 at com.htsc.fst.saas.infrastructure.component.EtcdDataSource$$Lambda$480/1375681611.accept(Unknown Source)
 at io.etcd.jetcd.Watch$1.onNext(Watch.java:183)
 at io.etcd.jetcd.impl.WatchImpl$WatcherImpl.onNext(WatchImpl.java:310)
 at io.etcd.jetcd.impl.WatchImpl$WatcherImpl$$Lambda$488/2088661957.handle(Unknown Source

I guess the reason is : when the watcher hold etcd Client, you can't use etcd Client in the Watch's consumer function. 
Besides the problem above, use getKeyValue() in the watchEvent can reduce once invoke with etcd server to getting the latest flowRules :)
@CLAassistant
Copy link

CLAassistant commented Dec 7, 2022

CLA assistant check
All committers have signed the CLA.

@Yan1025
Copy link
Contributor Author

Yan1025 commented Dec 7, 2022

Describe what this PR does / why we need it: it's a bug and a waste of invoke
Does this pull request fix one issue?: no, i found it and commit the bug fix immediately
Describe how you did it: the code committed can describe it
Describe how to verify it: it's a must-appear bug when flowRules in the etcd server change
Special notes for reviews: nothing more

@sczyh30 sczyh30 added the area/data-source Issues or PRs related to data-source extension label Dec 7, 2022
@sczyh30
Copy link
Member

sczyh30 commented Dec 7, 2022

Could you please sign the CLA here: https://cla-assistant.io/alibaba/Sentinel?pullRequest=2991

@Yan1025
Copy link
Contributor Author

Yan1025 commented Dec 7, 2022

Now i have signed the CLA and send the PDF to opensource@alibaba-inc.com

@sczyh30
Copy link
Member

sczyh30 commented Dec 23, 2022

cc @LProDreamAll Any suggestions?

@LProDreamAll
Copy link

LProDreamAll commented Dec 23, 2022

@sczyh30 @Yan1025

com.alibaba.csp.sentinel.datasource.etcd.EtcdDataSource#initWatcher

private void initWatcher() {
watcher = client.getWatchClient().watch(ByteSequence.from(key, charset), (watchResponse) -> {
for (WatchEvent watchEvent : watchResponse.getEvents()) {
WatchEvent.EventType eventType = watchEvent.getEventType();
if (eventType == WatchEvent.EventType.PUT) {
try {
T newValue = loadConfig();
getProperty().updateValue(newValue);
} catch (Exception e) {
RecordLog.warn("[EtcdDataSource] Failed to update config", e);
}
} else if (eventType == WatchEvent.EventType.DELETE) {
RecordLog.info("[EtcdDataSource] Cleaning config for key <{}>", key);
getProperty().updateValue(null);
}
}
});
}

Solution

private void initWatcher() {
watcher = client.getWatchClient().watch(ByteSequence.from(key, charset), (watchResponse) -> {
for (WatchEvent watchEvent : watchResponse.getEvents()) {
WatchEvent.EventType eventType = watchEvent.getEventType();

if (eventType == WatchEvent.EventType.PUT) {
try {
String newValueJson = watchEvent.getKeyValue().getValue().toString(charset);
T newValue = parser.convert(newValueJson);
} catch (Exception e) {
RecordLog.warn("[EtcdDataSource] Failed to update config", e);
}
} else if (eventType == WatchEvent.EventType.DELETE) {
RecordLog.info("[EtcdDataSource] Cleaning config for key <{}>", key);
getProperty().updateValue(null);
}
}
});
}

@LProDreamAll
Copy link

@Yan1025 @sczyh30
toString() should be toString(charset)

Add charset adaptation -- accept the suggestion from @LProDreamAll
@Yan1025
Copy link
Contributor Author

Yan1025 commented Dec 23, 2022

@Yan1025 @sczyh30 toString() should be toString(charset)

Thanks for your pertinent suggestion and i have optimized my pull request

@sczyh30 sczyh30 added the kind/bug Category issues or prs related to bug. label Dec 23, 2022
@sczyh30 sczyh30 added this to the v2.0.0 milestone Dec 23, 2022
Copy link
Member

@sczyh30 sczyh30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sczyh30 sczyh30 merged commit a7cd8a1 into alibaba:master Dec 23, 2022
@sczyh30
Copy link
Member

sczyh30 commented Dec 23, 2022

Thanks for contributing!

LearningGp pushed a commit that referenced this pull request Dec 28, 2023
* fix thread blocked problem of etcd watcher

in the previous version of EtcdDataSource, when the flowRules defined in the etcd change and trigger etcd watcher, there will be a following error:
2022-12-07 14:36:03.592 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] -  Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 2336 ms, time limit is 2000 ms
2022-12-07 14:36:04.592 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] -  Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 3337 ms, time limit is 2000 ms
2022-12-07 14:36:05.592 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] -  Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 4337 ms, time limit is 2000 ms
2022-12-07 14:36:06.595 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker [?:?] -  Thread Thread[vert.x-eventloop-thread-0,5,main] has been blocked for 5337 ms, time limit is 2000 ms
io.vertx.core.VertxException: Thread blocked
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
 at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3334)
 at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
 at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
 at com.htsc.fst.saas.infrastructure.component.EtcdDataSource.readSource(EtcdDataSource.java:92)
 at com.htsc.fst.saas.infrastructure.component.EtcdDataSource.readSource(EtcdDataSource.java:23)
 at com.alibaba.csp.sentinel.datasource.AbstractDataSource.loadConfig(AbstractDataSource.java:44)
 at com.htsc.fst.saas.infrastructure.component.EtcdDataSource.lambda$initWatcher$0(EtcdDataSource.java:74)
 at com.htsc.fst.saas.infrastructure.component.EtcdDataSource$$Lambda$480/1375681611.accept(Unknown Source)
 at io.etcd.jetcd.Watch$1.onNext(Watch.java:183)
 at io.etcd.jetcd.impl.WatchImpl$WatcherImpl.onNext(WatchImpl.java:310)
 at io.etcd.jetcd.impl.WatchImpl$WatcherImpl$$Lambda$488/2088661957.handle(Unknown Source

I guess the reason is : when the watcher hold etcd Client, you can't use etcd Client in the Watch's consumer function. 
Besides the problem above, use getKeyValue() in the watchEvent can reduce once invoke with etcd server to getting the latest flowRules :)

* Update EtcdDataSource.java

Add charset adaptation -- accept the suggestion from @LProDreamAll
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/data-source Issues or PRs related to data-source extension kind/bug Category issues or prs related to bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants