You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please answer these questions before submitting your issue. Thanks!
What did you do?
The slogs are corrupted on more than 3 replcia servers in a standard Pegasus cluster.
(If the slog is corrupted in a single node cluster, it would cause the same issue)
The reason of slog corruption is there is a previous crash of replica server, it encountered a large amount of EAGAIN error when io_submit(), that's another issue we would handle (io_submit() has been replaced by pwrite() since 2.2)
What did you expect to see?
The replicas are kept in the normal directories, we can recover the table manually.
What did you see instead?
Some partitions' all replicas are moved to "<app_id>.<partition_id>.pegasus..err", the cluster is not able to recover automatically, we have to find and move the replicas back one by one.
)
#1572
Add an option to make it possible to exit the process and leave the
corrupted slog and replicas to be handled by the administrator when
open slog failed.
Bug Report
Please answer these questions before submitting your issue. Thanks!
The slogs are corrupted on more than 3 replcia servers in a standard Pegasus cluster.
(If the slog is corrupted in a single node cluster, it would cause the same issue)
What did you expect to see?
The replicas are kept in the normal directories, we can recover the table manually.
What did you see instead?
Some partitions' all replicas are moved to "<app_id>.<partition_id>.pegasus..err", the cluster is not able to recover automatically, we have to find and move the replicas back one by one.
2.0
(2.4 has the same issue)
The text was updated successfully, but these errors were encountered: