-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Takes too long time for a cluster to recover from leader crash #2866
Comments
In one reproduce, raft take 15min do a rollback, and graph client looks stuck. |
@Aiee Go client, pls check it. |
raft #2903 |
This could a be quite complicated case, frankly speaking, I have no idea what will happen when sigstop is sent to thrift server, this may involved system futex. ( We could discover later. |
The effect of SIGSTOP is just stop scheduling this process from running on cpu temporarily, exactly same as when you run gdb and attach to the process, if the process's invoke of futex() cause block, I'm 100% sure that's not the problem of SIGSTOP. I'll add more detail later. |
closed by #3435 |
Please check the FAQ documentation before raising an issue
Please check the FAQ documentation and old issues before raising an issue in case someone has asked the same question that you are asking.
Describe the bug (must be provided)
Takes very long time for cluster to recover from leader crash.
Your Environments (must be provided)
uname -a
g++ --version
orclang++ --version
lscpu
a3ffc7d8
)https://github.com/liuyu85cn/nebula.git 730a39d
How To Reproduce(must be provided)
Steps to reproduce the behavior:
show hosts
in nebula-console)Despite the fact that nebula storage elect a new leader very quickly, it takes nearly 5min for the cluster to get back to normal.
Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Provide logs and configs, or any other context to trace the problem.
The text was updated successfully, but these errors were encountered: