Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Explain what "quorum size" means exactly in "consul leave" docs #11975

Closed
nh2 opened this issue Jan 8, 2022 · 5 comments · Fixed by #17910
Closed

docs: Explain what "quorum size" means exactly in "consul leave" docs #11975

nh2 opened this issue Jan 8, 2022 · 5 comments · Fixed by #17910
Assignees
Labels
event/consul-docs-day A day focused on improving Consul's documentation. Date: Jan 10, 2022. type/docs Documentation needs to be created/updated/clarified

Comments

@nh2
Copy link

nh2 commented Jan 8, 2022

https://www.consul.io/commands/leave (pinned version) says:

Running consul leave on a server explicitly will reduce the quorum size. Even if the cluster used bootstrap_expect to set a quorum size initially, issuing consul leave on a server will reconfigure the cluster to have fewer servers. This means you could end up with just one server that is still able to commit writes because quorum is only 1

To the operator, it is unclear what "quorum size" means exactly here.

Assuming I ran consul leave, how can I check what the new decreased quorum size is?

  • Is it the number of entries that have Type = server in consul members?
  • Is it the known_servers field in consul info (which is presumably the same as the above)?

Given that the docs warn that the quorum size may drop to 1 and that that is bad, they should say how to check what the quorum size currently is.

@blake blake added event/consul-docs-day A day focused on improving Consul's documentation. Date: Jan 10, 2022. type/docs Documentation needs to be created/updated/clarified labels Jan 8, 2022
@blake
Copy link
Member

blake commented Jan 8, 2022

Sorry you found this to be confusing. Thanks for suggesting how we could improve this page.

To the operator, it is unclear what "quorum size" means exactly here.

This information is documented on the Consensus Protocol page, specifically under the Raft Protocol Overview and Raft in Consul sections.

  • Peer set - The peer set is the set of all members participating in log replication. For Consul's purposes, all server nodes are in the peer set of the local datacenter.

  • Quorum - A quorum is a majority of members from a peer set: for a set of size N, quorum requires at least (N/2)+1 members. For example, if there are 5 members in the peer set, we would need 3 nodes to form a quorum. If a quorum of nodes is unavailable for any reason, the cluster becomes unavailable and no new logs can be committed.

Only Consul server nodes participate in Raft and are part of the peer set. All client nodes forward requests to servers. Part of the reason for this design is that, as more members are added to the peer set, the size of the quorum also increases. This introduces performance problems as you may be waiting for hundreds of machines to agree on an entry instead of a handful.

We can look to either link "quorum size" on the page ago a section on the Consensus Protocols, or add a callout briefly stating what quorum size means and linking to the aforementioned page for more info.

@nh2
Copy link
Author

nh2 commented Jan 8, 2022

@blake Thanks for the swift reply!

Looking at the table https://www.consul.io/docs/architecture/consensus#deployment_table, isn't the statement from the leave docs also incorrect in some cases?

Running consul leave on a server explicitly will reduce the quorum size.

According to the table, when the Server count is reduced from 7 to 6, the Quorum Size stays at 4, so it's not reduced by 1 in this case. So that would be rather a "can" than "will" (which implies a strict "-1", and added to my confusion whether the table's column is meant or not).

If that is correct, I suggest to improve the wording as follows:

Depending on how many Consul servers are running, running consul leave on a server can reduce the quorum size (which is derived from the number of Consul servers, see table https://www.consul.io/docs/architecture/consensus#deployment_table). Even if the cluster used bootstrap_expect to set a number of servers and thus quorum size initially, issuing consul leave on a server will reconfigure the cluster to have fewer servers. This means you could end up with just one server that is still able to commit writes because the quorum size for 1-server setups is only 1.

@dnephin
Copy link
Contributor

dnephin commented Jan 10, 2022

Ya, I think you are correct. Sometimes we say "quorum size" but really we should say "cluster size". The Quorum size is derived from the cluster size, but it won't always change on a leave, as you point out.

@wangxinyi7 wangxinyi7 self-assigned this Jun 23, 2023
@wangxinyi7 wangxinyi7 mentioned this issue Jun 27, 2023
4 tasks
@wangxinyi7
Copy link
Member

Thanks for your suggestion @nh2
Opened the PR to update the doc.

@david-yu david-yu linked a pull request Jun 30, 2023 that will close this issue
4 tasks
@nh2
Copy link
Author

nh2 commented Jun 30, 2023

Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
event/consul-docs-day A day focused on improving Consul's documentation. Date: Jan 10, 2022. type/docs Documentation needs to be created/updated/clarified
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants