-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distributed ddl hang #7507
Comments
The dddl hang problem occurred today on a node of a cluster. Before that the node machine rebooted several times for unknown reason.
Later I restarted clickhouse-server. The following log indicates that the clickhouse executed that DDL which it refused to do.
Other distributed DDLs also work after the restart. |
I reproduced the hang problem with rebooting the machine on which a very light ck instance (loading tables costs only ~3 seconds) run.
Looks like ck starting so fast that it fails to get self's IP address. |
I added some debug code at dbms/src/Common/isLocalAddress.cpp to dump all interfaces' address.
After
In summary, dbms/src/Common/isLocalAddress.cpp#L15 cached interfaces caused the problem. If ck starts very fast, it could get no IP address. |
Describe the bug or unexpected behaviour
I have a four-node clickhouse cluster. Today I noticed that all distributed DDL hang.
I run the following
CREATE
statement on each node, I always get response only from one node (10.126.144.142).RENAME
also hang.Then I restarted clickhouse-server on all nodes. The the same statement became working as expected.
The cluster was up for about one week. And it's the first time I notice this problem.
#5295 inspired me to restart clickhouse-server.
How to reproduce
v19.15.3.6-stable
CLI
The text was updated successfully, but these errors were encountered: