Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-17048; Add docs for KIP-853 #17076

Merged
merged 7 commits into from
Sep 18, 2024

Conversation

jsancio
Copy link
Member

@jsancio jsancio commented Sep 3, 2024

Update the quickstart, ops section, zk to kraft migration section and security section to document the KIP-853 changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@jsancio jsancio added kraft KIP-853 KRaft Controller Membership Changes labels Sep 3, 2024

############################# Socket Server Settings #############################

# The address the socket server listens on.
# Note that only the controller listeners are allowed here when `process.roles=controller`, and this listener should be consistent with `controller.quorum.voters` value.
# Note that only the controller listeners are allowed here when `process.roles=controller`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this listener still need to be consistent with what controller.quorum.bootstrap.servers contains?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases yes but it is not required. The best example is that the controller.quorum.boostrap.servers could be a domain name that resolves to any of the controller hosts.

Copy link
Contributor

@ahuang98 ahuang98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation changes!

docs/ops.html Outdated
@@ -3776,25 +3776,71 @@ <h5 class="anchor-heading"><a id="kraft_voter" class="anchor-link"></a><a href="

<p>A Kafka admin will typically select 3 or 5 servers for this role, depending on factors like cost and the number of concurrent failures your system should withstand without availability impact. A majority of the controllers must be alive in order to maintain availability. With 3 controllers, the cluster can tolerate 1 controller failure; with 5 controllers, the cluster can tolerate 2 controller failures.</p>

<p>All of the servers in a Kafka cluster discover the quorum voters using the <code>controller.quorum.voters</code> property. This identifies the quorum controller servers that should be used. All the controllers must be enumerated. Each controller is identified with their <code>id</code>, <code>host</code> and <code>port</code> information. For example:</p>
<p>All of the servers in a Kafka cluster discover the active controller using the <code>controller.quorum.bootrtrap.servers</code> property. All the controllers should be enumerated in this property. Each controller is identified with their <code>host</code> and <code>port</code> information. For example:</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo bootstrap.servers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

docs/ops.html Outdated
<p></p>
The <code>kafka-storage.sh random-uuid</code> command can be used to generate a cluster ID for your new cluster. This cluster ID must be used when formatting each server in the cluster with the <code>kafka-storage.sh format</code> command.

<p>This is different from how Kafka has operated in the past. Previously, Kafka would format blank storage directories automatically, and also generate a new cluster ID automatically. One reason for the change is that auto-formatting can sometimes obscure an error condition. This is particularly important for the metadata log maintained by the controller and broker servers. If a majority of the controllers were able to start with an empty log directory, a leader might be able to be elected with missing committed data.</p>

<h5 class="anchor-heading"><a id="kraft_storage_standalone" class="anchor-link"></a><a href="#kraft_storage_standalone">Bootstrap a Standalone Controller</a></h5>
The recommended method for creating a new KRaft controller cluster is to bootstrap it with one voter and dyncamically <a href="#kraft_reconfig_add">add the rest of the controllers</a>. This can be done with the following CLI command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: dynamically

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

docs/ops.html Outdated

This command will 1) create a meta.properties file in metadata.log.dir with a randomly generated directory.id, 2) create a snapshot at 00000000000000000000-0000000000.checkpoint with the necessary control records (KRaftVersionRecord and VotersRecord) to make this Kafka node the only voter for the quorum.

<h5 class="anchor-heading"><a id="kraft_storage_voters" class="anchor-link"></a><a href="#kraft_storage_voters">Bootstrap with Multiple Controller</a></h5>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Controllers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

docs/ops.html Outdated
<h5 class="anchor-heading"><a id="kraft_storage_voters" class="anchor-link"></a><a href="#kraft_storage_voters">Bootstrap with Multiple Controller</a></h5>
The KRaft cluster metadata partition can also be bootstrapped with more than one voter. This can be done by using the --initial-controllers flag:

<pre><code class="language-bash">kafka-storage format --cluster-id <cluster-id> --initial-controller 0@controller-0:1234:3Db5QLSqSZieL3rJBUUegA,1@controller-1:1234:L3rJBUUegA3Db5QLSqSZie,2@controller-2:1234:UegA3Db5QLSqSZieL3rJBU --config controller.properties</code></pre>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment says --initial-controllers and the command says --initial-controller

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

docs/ops.html Outdated

<pre><code class="language-bash">kafka-storage format --cluster-id <cluster-id> --initial-controller 0@controller-0:1234:3Db5QLSqSZieL3rJBUUegA,1@controller-1:1234:L3rJBUUegA3Db5QLSqSZie,2@controller-2:1234:UegA3Db5QLSqSZieL3rJBU --config controller.properties</code></pre>

This command is similar to the standalone version but the snapshot at 00000000000000000000-0000000000.checkpoint will instead contain a VotersRecord that includes information for all of the controllers specified in --initial-controllers. Just like the controller.quorum.voters properties, it is important that the set of voters is equal in all of the controllers with the same cluster id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean controller.quorum.bootstrap.servers instead of controller.quorum.voters?

if you did mean controller.quorum.voters, it might not make sense to readers since it's not mentioned prior to this line anymore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I meant controller.quorum.voters but like you said it is better to not mention that property since we are deprecating it and we have not mentioned in the past. I reworded the sentence to not mention the property.

docs/ops.html Outdated
<h5 class="anchor-heading"><a id="kraft_storage_standalone" class="anchor-link"></a><a href="#kraft_storage_standalone">Bootstrap a Standalone Controller</a></h5>
The recommended method for creating a new KRaft controller cluster is to bootstrap it with one voter and dyncamically <a href="#kraft_reconfig_add">add the rest of the controllers</a>. This can be done with the following CLI command:

<pre><code class="language-bash">kafka-storage format --cluster-id <cluster-id> --standalone --config controller.properties</code></pre>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to talk at all about what information must be in controller.properties? Perhaps in a separate section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a "Configuration / Controllers" section. In there we discuss the kraft configuration. I updated the example to be a bit more complete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked <cluster-id> renders fine? We may need to use &lt;/&gt;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mimaison Thanks! I filed #17235 to fix this.

docs/ops.html Outdated
<h5 class="anchor-heading"><a id="kraft_reconfig_add" class="anchor-link"></a><a href="#kraft_reconfig_add">Add New Controller</a></h5>
If the KRaft Controller cluster already exist, the cluster can be expanded by first provisioning a new controller using the <a href="#kraft_storage_observers">kafka-storage tool</a> and starting the controller.

After sarting the controller, the replication to the new controller can be monitored using the <code>kafka-metadata-quorum describe --replication</code> command. Once the new controller has caught up to the active controller, it can be added to the cluster using the <code>kafka-metadata-quorum add-controller</code> command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: starting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

docs/ops.html Outdated

After sarting the controller, the replication to the new controller can be monitored using the <code>kafka-metadata-quorum describe --replication</code> command. Once the new controller has caught up to the active controller, it can be added to the cluster using the <code>kafka-metadata-quorum add-controller</code> command.

<pre><code class="language-bash">kafka-metadata-quorum --command-config controller.properties --bootstrap-server localhost:9092 add-controller</code></pre>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must be run from a broker, should we give the bootstrap-controller alternative command as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

docs/ops.html Outdated
<pre><code class="language-bash">kafka-metadata-quorum --command-config controller.properties --bootstrap-server localhost:9092 add-controller</code></pre>

<h5 class="anchor-heading"><a id="kraft_reconfig_remove" class="anchor-link"></a><a href="#kraft_reconfig_remove">Remove Controller</a></h5>
If the KRaft Controller cluster already exist, the cluster can be shrunk using the <code>kafka-metadata-quorum remove-controller</code> command. Until KIP-996: Pre-vote has been implemented and release, it is recommended to shutdown the controller that will be removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion to make this extremely explicit: it is recommended to shutdown the controller that will be removed before running the remove-controller command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

defined by <code>controller.quorum.voters</code> must then also use port 9093,
as it does here.</p>
<p>It is a requirement for the host and port defined in <code>controller.quorum.bootstrap.servers</code>
is routed to the exposed controller listeners. For example, here the <code>CONTROLLER</code>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a requirement that the... is routed to

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

@ahuang98 ahuang98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the detailed docs!

@@ -32,8 +32,8 @@ <h4 class="anchor-heading">
the latest Kafka release and extract it:
</p>

<pre><code class="language-bash">$ tar -xzf kafka_{{scalaVersion}}-{{fullDotVersion}}.tgz
$ cd kafka_{{scalaVersion}}-{{fullDotVersion}}</code></pre>
<pre><code class="language-bash">tar -xzf kafka_{{scalaVersion}}-{{fullDotVersion}}.tgz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing the $ prompts? We use them everywhere in the documentation. It's useful to distinguish the commands from their outputs.

Copy link
Contributor

@cmccabe cmccabe Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsancio : I agree, let's not make this change removing the $ prompts.

(Or if you really want to do it, let's create a separate PR where we can discuss further, and do it for all docs.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I'll revert these changes. I may do it in another PR.

I made these changes for two reasons. It assumes that the CLI prompt is $ which is not the case in all terminals and shells. The user cannot copy and paste command and instead has to edit the begin the command before executing it.

<p></p>
The <code>kafka-storage.sh random-uuid</code> command can be used to generate a cluster ID for your new cluster. This cluster ID must be used when formatting each server in the cluster with the <code>kafka-storage.sh format</code> command.

<p>This is different from how Kafka has operated in the past. Previously, Kafka would format blank storage directories automatically, and also generate a new cluster ID automatically. One reason for the change is that auto-formatting can sometimes obscure an error condition. This is particularly important for the metadata log maintained by the controller and broker servers. If a majority of the controllers were able to start with an empty log directory, a leader might be able to be elected with missing committed data.</p>

<h5 class="anchor-heading"><a id="kraft_storage_standalone" class="anchor-link"></a><a href="#kraft_storage_standalone">Bootstrap a Standalone Controller</a></h5>
The recommended method for creating a new KRaft controller cluster is to bootstrap it with one voter and dynamically <a href="#kraft_reconfig_add">add the rest of the controllers</a>. Bootstrapping the first controller can be done with the following CLI command:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason this is the recommended method over the alternative below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is the safest option and easiest to get correct by the user.

With the --standalone option the user just needs to execute the command once on one the controller and the controller cluster will be consistent.

The --initial-controllers option assumes that the user is executing the same command across all of the controllers and that the replica id and replica directory id matches in all of the controllers. The KRaft controllers don't have a way to verify this and it is possible that the user will bootstrap a controller cluster incorrectly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to add this reasoning to the documentation so that users can make an informed decision, and know what the requirements are if choosing to go against the recommendation for some reason.

docs/ops.html Outdated Show resolved Hide resolved
docs/ops.html Outdated Show resolved Hide resolved
@@ -0,0 +1,132 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear why we're adding a new file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsancio we should use the same approach here in server.properties as in broker.properties and controller.properties

As I commented above, the simplest approach is just to have a line for controller.quorum.voters, but have it be commented out, with an explanation about how to use the old style if desired. That would avoid the need for a separate file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear why we're adding a new file

This file is reference in Dockerfile and a few other places. I haven't had a chance to fix all of those places to use kraft.version 0. I instead created this jira: https://issues.apache.org/jira/browse/KAFKA-17576

rg "kraft\/server.properties"
docker/native/Dockerfile
66:COPY --chown=appuser:root --from=build-native-image /app/kafka/config/kraft/server.properties /etc/kafka/docker/

docker/docker_official_images/3.7.0/jvm/Dockerfile
81:    cp /opt/kafka/config/kraft/server.properties /etc/kafka/docker/server.properties; \

docker/docker_official_images/3.7.0/jvm/jsa_launch
20:KAFKA_JVM_PERFORMANCE_OPTS="-XX:ArchiveClassesAtExit=storage.jsa" opt/kafka/bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c opt/kafka/config/kraft/server.properties
22:KAFKA_JVM_PERFORMANCE_OPTS="-XX:ArchiveClassesAtExit=kafka.jsa" opt/kafka/bin/kafka-server-start.sh opt/kafka/config/kraft/server.properties &

docker/jvm/Dockerfile
81:    cp /opt/kafka/config/kraft/server.properties /etc/kafka/docker/server.properties; \

docker/jvm/jsa_launch
20:KAFKA_JVM_PERFORMANCE_OPTS="-XX:ArchiveClassesAtExit=storage.jsa" opt/kafka/bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c opt/kafka/config/kraft/server.properties
22:KAFKA_JVM_PERFORMANCE_OPTS="-XX:ArchiveClassesAtExit=kafka.jsa" opt/kafka/bin/kafka-server-start.sh opt/kafka/config/kraft/server.properties &

docs/streams/quickstart.html
133:<pre><code class="language-bash">$ bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties</code></pre>
139:<pre><code class="language-bash">$ bin/kafka-server-start.sh config/kraft/server.properties</code></pre>

README.md
98:    ./bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties
99:    ./bin/kafka-server-start.sh config/kraft/server.properties

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented above, the simplest approach is just to have a line for controller.quorum.voters, but have it be commented out, with an explanation about how to use the old style if desired. That would avoid the need for a separate file.

See my other comment. I am not sure if we need to document all of the options. Kafka and KRaft support a lot of configurations that are not enumerated in kraft/server.properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I guess we can add a new file, although it's not ideal.

docs/quickstart.html Outdated Show resolved Hide resolved
Copy link
Contributor

@cmccabe cmccabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

Copy link
Contributor

@cmccabe cmccabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cmccabe cmccabe merged commit aee44ef into apache:trunk Sep 18, 2024
6 of 7 checks passed
cmccabe pushed a commit that referenced this pull request Sep 18, 2024
Change the configurations under config/kraft to use controller.quorum.bootstrap.servers instead of controller.quorum.voters. Add comments explaining how to use the older static quorum configuration where appropriate.

In docs/ops.html, remove the reference to "tentative timelines for ZooKeeper removal" and "Tiered storage is considered as an early access feature" since they are no longer up-to-date. Add KIP-853 information.

In docs/quickstart.html, move the ZK instructions to be after the KRaft instructions. Update the KRaft instructions to use KIP-853.

In docs/security.html, add an explanation of --bootstrap-controller and document controller.quorum.bootstrap.servers instead of controller.quorum.voters.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Alyssa Huang <ahuang@confluent.io>, Colin P. McCabe <cmccabe@apache.org>
@cmccabe
Copy link
Contributor

cmccabe commented Sep 18, 2024

Thanks, everyone. I committed what we have so far. If there are any gaps please raise another PR. Hopefully in time for RC1, so that everyone can kick the tires :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
KIP-853 KRaft Controller Membership Changes kraft
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants