Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ZooKeeper client is disconnected and can not recover #20976

Closed
1 of 2 tasks
dongzhonghua opened this issue Aug 11, 2023 · 2 comments
Closed
1 of 2 tasks

[Bug] ZooKeeper client is disconnected and can not recover #20976

dongzhonghua opened this issue Aug 11, 2023 · 2 comments
Assignees
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@dongzhonghua
Copy link

dongzhonghua commented Aug 11, 2023

Search before asking

  • I searched in the issues and found nothing similar.

Version

Linux 4.18.0-2.4.3.3.kwai.x86_64 #1 SMP Wed Apr 6 06:31:51 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Pulsar 2.11.0

Minimal reproduce step

use 20topics and 128 partitions per topic to pressure test, I guess it's becaues to many look up requests.

What did you expect to see?

no exception

What did you see instead?

first, it occurs:

2023-08-11T14:59:17,487+0800 [metadata-store-zk-session-watcher-11-1] WARN org.apache.pulsar.metadata.impl.ZKSessionWatcher.checkState(ZKSessionWatcher.java:152) - ZooKeeper client is disconnected. Waiting to reconnect, time remaining = 25.0 seconds

then, there are endless exceptions like:

2023-08-11T15:15:33,332+0800 [pulsar-io-4-45] WARN org.apache.pulsar.client.impl.BinaryProtoLookupService.lambda$findBroker$1(BinaryProtoLookupService.java:122) - [persistent://public/default/__change_events] failed to send lookup request : Disconnected from server at infra-bjmt-rs-21.idchb1az4.yz.kwaidc.com/10.88.166.178:6650
2023-08-11T15:15:33,332+0800 [pulsar-io-4-47] WARN org.apache.pulsar.client.impl.ClientCnx.checkRequestTimeout(ClientCnx.java:1286) - [id: 0xbd641ae4, L:/10.88.166.178:41344 ! R:infra-bjmt-rs-21.idchb1az4.yz.kwaidc.com/10.88.166.178:6650] Lookup request timeout {'durationMs': '30000', 'reqId':'2001116152093582614', 'remote':'infra-bjmt-rs-21.idchb1az4.yz.kwaidc.com/10.88.166.178:6650', 'local':'/10.88.166.178:41344'}
2023-08-11T15:15:33,332+0800 [pulsar-io-4-45] WARN org.apache.pulsar.client.impl.ConnectionHandler.handleConnectionError(ConnectionHandler.java:88) - [persistent://public/default/__change_events] [null] Error connecting to broker: org.apache.pulsar.client.api.PulsarClientException$ConnectException: Disconnected from server at infra-bjmt-rs-21.idchb1az4.yz.kwaidc.com/10.88.166.178:6650

2023-08-11T15:25:52,265+0800 [pulsar-io-4-9] WARN org.apache.pulsar.client.impl.ConnectionHandler.handleConnectionError(ConnectionHandler.java:88) - [persistent://public/default/dzh_pressure_test_113-partition-10] [bop_reco_common_model_1-16-8] Error connecting to broker: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: Lookup request timeout {'durationMs': '30000', 'reqId':'2610730697763911561', 'remote':'infra-bjmt-rs-21.idchb1az4.yz.kwaidc.com/10.88.166.178:6650', 'local':'/10.88.166.177:53932'}
2023-08-11T15:25:52,265+0800 [pulsar-io-4-9] WARN org.apache.pulsar.client.impl.ConnectionHandler.reconnectLater(ConnectionHandler.java:114) - [persistent://public/default/dzh_pressure_test_113-partition-10] [bop_reco_common_model_1-16-8] Could not get connection to broker: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: Lookup request timeout {'durationMs': '30000', 'reqId':'2610730697763911561', 'remote':'infra-bjmt-rs-21.idchb1az4.yz.kwaidc.com/10.88.166.178:6650', 'local':'/10.88.166.177:53932'} -- Will try again in 54.183 s
2023-08-11T15:25:52,265+0800 [pulsar-io-4-9] WARN org.apache.pulsar.client.impl.ClientCnx.checkRequestTimeout(ClientCnx.java:1286) - [id: 0xc724bacf, L:/10.88.166.177:53932 - R:infra-bjmt-rs-21.idchb1az4.yz.kwaidc.com/10.88.166.178:6650] Lookup request timeout {'durationMs': '30000', 'reqId':'2610730697763911561', 'remote':'infra-bjmt-rs-21.idchb1az4.yz.kwaidc.com/10.88.166.178:6650', 'local':'/10.88.166.177:53932'}

Anything else?

Once the error message "ZooKeeper client is disconnected" appears, it seems to be followed by an endless stream of error messages that don't stop.

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@dongzhonghua dongzhonghua added the type/bug The PR fixed a bug or issue reported a bug label Aug 11, 2023
@dongzhonghua dongzhonghua changed the title [Bug] [Bug] ZooKeeper client is disconnected and can not recover Aug 11, 2023
@codelipenghui
Copy link
Contributor

@dongzhonghua It's better to take a thread dump of the broker to see if there any threads run into the deadlock state, or blocked somewhere.

@codelipenghui codelipenghui self-assigned this Aug 16, 2023
@codelipenghui
Copy link
Contributor

@dongzhonghua I will close this issue first since there have been no updates for two weeks. Feel free to reopen it if you have a thread dump for the broker. So that we can check more about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

2 participants