-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cruise Control deployment #2773
Conversation
) | ||
@JsonInclude(JsonInclude.Include.NON_NULL) | ||
@JsonPropertyOrder({ | ||
"deployment", "pod", "apiService", "podDisruptionBudget", "cruiseControlcontainer", "tlsSidecarContainer"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Container needs capitalization in cruiseControlContainer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #2779 for resolving this sort of thing once and for all.
|
||
// Check Service | ||
Service svc = cc.generateService(); | ||
assertThat(svc.getMetadata().getLabels().entrySet().containsAll(svcLabels.entrySet()), is(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor niggle, but from a failing test point of view might the check
assertThat(svc.getMetadata().getLabels(), is(svcLabels));
give a more informative error message on failure? This applys to all of the assertions in this file with containsAll
calls
try { | ||
assertThat(cc.generateService(), is(nullValue())); | ||
} catch (Throwable expected) { | ||
assertEquals(NullPointerException.class, expected.getClass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be a little more consistent with the rest of the tests in Strimzi would you opposed to using
assertThrows(NullPointerException.class, () -> cc.generateService());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And let's be clear that the code above it wrong anyway. If you're expecting cc.generateService()
to throw there's no point in having the assertThat
and the cc.generateService()
statement should be followed by a fail()
, otherwise in the case of no NPE the test will erroneously pass.
CruiseControl cc = CruiseControl.fromCrd(resource, VERSIONS); | ||
|
||
List<EnvVar> kafkaEnvVars = cc.getEnvVars(); | ||
assertThat(kafkaEnvVars.stream().filter(var -> testEnvOneKey.equals(var.getName())).map(EnvVar::getValue).findFirst().get(), is(testEnvOneValue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertThat(kafkaEnvVars, hasEntry(testEnvOneKey, testEnvOneValue)
suggested improvement
And maybe have same for envar2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since kafkaEnvVars
is a list instead of a map I used hasItem()
instead. How does this look:
List<EnvVar> envVarList = cc.getEnvVars();
assertThat(envVarList, hasItems(new EnvVar(testEnvOneKey, testEnvOneValue, null)));
assertThat(envVarList, hasItems(new EnvVar(testEnvTwoKey, testEnvTwoValue, null)));
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, though since hasItems takes ...items
you could achieve the same with
assertThat(envVarList, hasItems(new EnvVar(testEnvOneKey, testEnvOneValue, null),
new EnvVar(testEnvTwoKey, testEnvTwoValue, null)));
But this is only a suggestion , not a required change at all! :P
CruiseControl cc = CruiseControl.fromCrd(resource, VERSIONS); | ||
|
||
List<EnvVar> kafkaEnvVars = cc.getEnvVars(); | ||
assertThat(kafkaEnvVars.stream().filter(var -> testEnvOneKey.equals(var.getName())).map(EnvVar::getValue).findFirst().get(), is(testEnvOneValue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertThat(kafkaEnvVars, hasEntry(testEnvOneKey, testEnvOneValue)
suggested improvement
And maybe have same for envar2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
|
||
private final CruiseControl cc = CruiseControl.fromCrd(resource, VERSIONS); | ||
|
||
public void checkOwnerReference(OwnerReference ownerRef, HasMetadata resource) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as an aside, this might be a good function to contribute to test utils as I can see all of the tests needing to call this method!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please! Good spot @samuel-hawker!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll save this for another PR if that is alright with you two, the PR already changes a lot of files!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyguy for sure updating ALL the other files should be done in a different PR but here you could start to move the method in the test utils and using it. Then, again, reusing the method for ALL the other tests has to be done in a different PR. Wdyt?
public class CruiseControlSpec implements UnknownPropertyPreserving, Serializable { | ||
private static final long serialVersionUID = 1L; | ||
|
||
public static final String FORBIDDEN_PREFIXES = "bootstrap.servers, zookeeper., ssl., security., failed.brokers.zk.path," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kyguy is there a link to the documentation about all these configs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added this link[1] above the field
[1] https://github.com/linkedin/cruise-control/wiki/Configurations
this.tlsSidecar = tlsSidecar; | ||
} | ||
|
||
@Description("The Cruise Control broker capacity config.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Description("The Cruise Control broker capacity config.") | |
@Description("The Cruise Control broker capacity configuration.") |
|
||
@Description("The Cruise Control broker capacity config.") | ||
@JsonInclude(JsonInclude.Include.NON_NULL) | ||
public CruiseControlBrokerCapacity getCapacity() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class is called ...BokerCapacity, so should the property be called that too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored
this.capacity = capacity; | ||
} | ||
|
||
@Description("The Cruise Control config. Properties with the following prefixes cannot be set: " + FORBIDDEN_PREFIXES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should include a link here to the place where the user can learn about what the valid configs are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this link [1] to the description
[1] https://github.com/linkedin/cruise-control/wiki/Configurations
} | ||
|
||
@JsonInclude(JsonInclude.Include.NON_EMPTY) | ||
@Description("JVM Options for pods") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the cruise control pod (singular)? If so, please say so in the Description to the reader of the docs is left in no doubt about what they're configuring.
documentation/modules/cruise-control/con-cruise-control-overview.adoc
Outdated
Show resolved
Hide resolved
<1> Number of Cruise Control application instances; set to `1` to run or `0` to pause the application. | ||
<2> Specifies the Cruise Control capacity configuration. For more information, see xref:capacity_configuration[capacity configuration]. | ||
<3> Specifies the Cruise Control configuration. You can provide any xref:ref-cruise-control-configuration-{context}[standard configuration option] apart from those managed directly by {ProductName}. | ||
<4> CPU and memory resources reserved for Cruise Control. For more information, see xref:assembly-resource-limits-and-requests-deployment-configuration-kafka[]. | ||
<5> Logging configuration, to log messages with a given severity (debug, info, warn, error, fatal) or above. | ||
<6> xref:assembly-customizing-deployments-str[Customization of deployment templates and pods]. | ||
<7> xref:assembly-healthchecks-deployment-configuration-kafka[Healthcheck readiness probes]. | ||
<8> xref:assembly-healthchecks-deployment-configuration-kafka[Healthcheck liveness probes]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this belong in a procedure, rather than a concept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be separated into a reference but have been shuffling these around. Let me get the final word from @laidan6000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On balance, I think it should stay here because the example configuration is helpful within the context of the task. In other words, it doesn't seem to stand alone as a concept or reference module.
The `config` property in `Kafka.spec.cruiseControl` contains configuration options as keys with values as one of the following JSON types: | ||
|
||
* String | ||
* Number | ||
* Boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies more generally to the rest of the docs too, but here would be a good place for a NOTE
to say that if your String happens to look like YAML or JSON it will need to be explicitly quoted.
---- | ||
== Capacity configuration | ||
|
||
Cruise Control uses capacity limits to determine if goals related to broker resources are being violated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which begs the question: What happens if the limits are violated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some text:
A violation of a goal can cause a optimization to fail, preventing that optimization from being used to generate a partition rebalance proposal.
For example, an optimization that would cause a broker to exceed its CPU capacity would not be used for when the CpuCapacityGoal is set as a hard goal
* `networkIn` - Network inbound throughput in KB/s | ||
* `networkOut` - Newtork outbound throughput in KB/s | ||
|
||
Since {ProductName} brokers are homogenous, Cruise Control will apply the same capacity limits to every broker it is monitoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since {ProductName} brokers are homogenous, Cruise Control will apply the same capacity limits to every broker it is monitoring. | |
Since {ProductName} brokers are homogeneous, Cruise Control will apply the same capacity limits to every broker it is monitoring. |
this.capacity = capacity; | ||
} | ||
|
||
@Description("The Cruise Control config. Properties with the following prefixes cannot be set: " + FORBIDDEN_PREFIXES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as Tom suggested above, config
should also be configuration
here
25ae97a
to
dfeccf9
Compare
public static final String FORBIDDEN_PREFIXES = "bootstrap.servers, zookeeper., ssl., security., failed.brokers.zk.path," | ||
+ "webserver.http.port, webserver.http.address, webserver.api.urlprefix, metric.reporter.sampler.bootstrap.servers, metric.reporter.topic, metric.reporter.topic.pattern," | ||
+ "partition.metric.sample.store.topic, broker.metric.sample.store.topic, brokerCapacity.config.file, skip.sample.store.topic.rack.awareness.check, cruise.control.metrics.topic" | ||
+ "cruise.control.metrics.reporter.bootstrap.servers, sasl. interceptor.classes"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ "cruise.control.metrics.reporter.bootstrap.servers, sasl. interceptor.classes"; | |
+ "cruise.control.metrics.reporter.bootstrap.servers, sasl.interceptor.classes"; |
api/src/main/java/io/strimzi/api/kafka/model/CruiseControlSpec.java
Outdated
Show resolved
Hide resolved
// For the full configuration list refer to https://github.com/linkedin/cruise-control/wiki/Configurations | ||
public static final String FORBIDDEN_PREFIXES = "bootstrap.servers, zookeeper., ssl., security., failed.brokers.zk.path," | ||
+ "webserver.http.port, webserver.http.address, webserver.api.urlprefix, metric.reporter.sampler.bootstrap.servers, metric.reporter.topic, metric.reporter.topic.pattern," | ||
+ "partition.metric.sample.store.topic, broker.metric.sample.store.topic, brokerCapacity.config.file, skip.sample.store.topic.rack.awareness.check, cruise.control.metrics.topic" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ "partition.metric.sample.store.topic, broker.metric.sample.store.topic, brokerCapacity.config.file, skip.sample.store.topic.rack.awareness.check, cruise.control.metrics.topic" | |
+ "partition.metric.sample.store.topic, broker.metric.sample.store.topic, brokerCapacity.config.file, skip.sample.store.topic.rack.awareness.check, cruise.control.metrics.topic," |
this.tlsSidecar = tlsSidecar; | ||
} | ||
|
||
@Description("The Cruise Control brokerCapacity configuration.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Description("The Cruise Control brokerCapacity configuration.") | |
@Description("The Cruise Control `brokerCapacity` configuration.") |
} | ||
|
||
@Description("The Cruise Control configuration. For a full list of configuration options refer to" + | ||
" https://github.com/linkedin/cruise-control/wiki/Configurations Note that properties " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" https://github.com/linkedin/cruise-control/wiki/Configurations Note that properties " + | |
" https://github.com/linkedin/cruise-control/wiki/Configurations. Note that properties " + |
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/KafkaCluster.java
Show resolved
Hide resolved
if (cruiseControlSpec != null) { | ||
metricReporterList.add(CRUISE_CONTROL_METRIC_REPORTER); | ||
} else { | ||
metricReporterList.remove(CRUISE_CONTROL_METRIC_REPORTER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it they added CRUISE_CONTROL_METRIC_REPORTER
explicitly themselves we remove it unless they also specified a CruiseControlSpec
? I think this is more or less a moot point at the moment, since there's no mechanism for people to add plugin classes to our Kafka images (apart from them building their own images, which perhaps some people do). But I would none the less argue that this is not the right thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it they added CRUISE_CONTROL_METRIC_REPORTER explicitly themselves we remove it unless they also specified a CruiseControlSpec?
Yes, but the main intention of this code is to stop the Cruise Control metric reporters after the Cruise Control application has been deleted. Otherwise, the Cruise Control metric reporters will continue to run and cost resources even if they are not being used.
cluster-operator/src/main/java/io/strimzi/operator/cluster/model/StorageUtils.java
Show resolved
Hide resolved
List<SingleVolumeStorage> volumeList = ((JbodStorage) storage).getVolumes(); | ||
long size = 0; | ||
for (SingleVolumeStorage volume : volumeList) { | ||
size += generateDiskCapacity(volume); | ||
} | ||
return size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does this mean the CC doesn't really distinguish the disks as having individual sizes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cruise Control does distinguish the disks as having individual sizes using the adminClient.
However, for balancing partitions between disks of the same broker, Cruise Control needs us to explicitly state the capacity of each disk here, which I have not done yet. This will require some extra logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So will that extra logic be in this PR, or a future one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A future one
documentation/modules/cruise-control/ref-cruise-control-configuration.adoc
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Peer reviewed. This is looking great!
---- | ||
|
||
.Additional resources | ||
For information, refer to the xref:type-CruiseControlBrokerCapacity-reference[Capacity schema reference]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - this cross-reference is currently broken.
[id='con-cruise-control-overview-{context}'] | ||
= Cruise Control overview | ||
|
||
Brokers in a Kafka cluster can become unevenly loaded due to things like an uneven distribution of busy partitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brokers in a Kafka cluster can become unevenly loaded due to things like an uneven distribution of busy partitions. | |
Brokers in a Kafka cluster can become unevenly loaded for many reasons. For example, partitions that handle large amounts of message traffic can be unevenly distributed across the available brokers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next, I would explain that partition reassignment is usually a manual task. Perhaps you could add a sentence like the following:
"Monitoring broker load and reassigning busy partitions to brokers with spare capacity is time consuming and adds to the overhead of managing a Kafka cluster"
= Cruise Control overview | ||
|
||
Brokers in a Kafka cluster can become unevenly loaded due to things like an uneven distribution of busy partitions. | ||
Cruise Control is a tool for simplifying the monitoring and rebalancing of partitions across a Kafka cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cruise Control is a tool for simplifying the monitoring and rebalancing of partitions across a Kafka cluster. | |
Cruise Control is a tool for automating the monitoring and rebalancing of partitions across a Kafka cluster. |
|
||
Brokers in a Kafka cluster can become unevenly loaded due to things like an uneven distribution of busy partitions. | ||
Cruise Control is a tool for simplifying the monitoring and rebalancing of partitions across a Kafka cluster. | ||
You deploy Cruise Control alongside a Kafka cluster to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make CC the subject of this sentence and use different terminology. Please rephrase to something like the following:
When deployed alongside a Kafka cluster, Cruise Control can perform the following cluster balancing tasks:
- Monitoring...
- Making optimization proposals: balanced partition assignments based on configurable optimization goals.
- Executing partition reassignments based on optimization proposals.
[id='proc-deploying-cruise-control-{context}'] | ||
= Deploying Cruise Control | ||
|
||
Deploy Cruise Control to your {ProductName} cluster through the `cruiseControl` property in the `Kafka` resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deploy Cruise Control to your {ProductName} cluster through the `cruiseControl` property in the `Kafka` resource. | |
Cruise Control is configured using the `cruiseControl` property in the `Kafka` resource. Once configured, you can deploy a Cruise Control instance to your {ProductName} cluster by creating or updating the `Kafka` resource. |
|
||
Cruise Control uses capacity limits to determine if goals related to broker resources are being violated. | ||
A violation of a goal can cause a optimization to fail, preventing that optimization from being used to generate a partition rebalance proposal. | ||
For example, an optimization that would cause a broker to exceed its CPU capacity would not be used for when the CpuCapacityGoal is set as a hard goal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, an optimization that would cause a broker to exceed its CPU capacity would not be used for when the CpuCapacityGoal is set as a hard goal. | |
For example, an optimization that would cause a broker to exceed its CPU capacity would not be used if the `CpuCapacityGoal` is set as a hard goal. |
A violation of a goal can cause a optimization to fail, preventing that optimization from being used to generate a partition rebalance proposal. | ||
For example, an optimization that would cause a broker to exceed its CPU capacity would not be used for when the CpuCapacityGoal is set as a hard goal. | ||
|
||
Capacity limits for broker resources can be specified in the `brokerCapacity` property in `Kafka.spec.cruiseControl`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capacity limits for broker resources can be specified in the `brokerCapacity` property in `Kafka.spec.cruiseControl`. | |
You specify capacity limits for broker resources in the `brokerCapacity` property in `Kafka.spec.cruiseControl` . |
* `diskMiB` - Disk storage in MiB | ||
* `cpuUtilization` - CPU utilization as a percent (0-100) | ||
* `inboundNetworkKiBPerSecond` - Network inbound throughput in KiB/s | ||
* `outboundNetworkKiBPerSecond` - Newtork outbound throughput in KiB/s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `outboundNetworkKiBPerSecond` - Newtork outbound throughput in KiB/s | |
* `outboundNetworkKiBPerSecond` - Network outbound throughput in KiB/s |
* `inboundNetworkKiBPerSecond` - Network inbound throughput in KiB/s | ||
* `outboundNetworkKiBPerSecond` - Newtork outbound throughput in KiB/s | ||
|
||
Since {ProductName} brokers are homogeneous, Cruise Control will apply the same capacity limits to every broker it is monitoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since {ProductName} brokers are homogeneous, Cruise Control will apply the same capacity limits to every broker it is monitoring. | |
Because Kafka brokers are homogeneous, Cruise Control applies the same capacity limits to every broker it is monitoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to {ProductName} Kafka brokers
since Kafka brokers are not necessarily homogeneous but Kafka brokers managed by Strimzi are homogeneous
---- | ||
|
||
.Additional resources | ||
For information, refer to the xref:type-CruiseControlBrokerCapacity-reference[Capacity schema reference]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For information, refer to the xref:type-CruiseControlBrokerCapacity-reference[Capacity schema reference]. | |
For more information, refer to the xref:type-CruiseControlBrokerCapacity-reference[]. |
be06b74
to
cf09d68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, but I'm happy for this to be merged.
private Map<String, Object> additionalProperties = new HashMap<>(0); | ||
|
||
@JsonInclude(JsonInclude.Include.NON_DEFAULT) | ||
@Pattern("^[0-9]+([KMGTPE]i?)?$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should allow stuff like 1.5T and exponential notation like 2e6 (which Kube does for memory, for example).
@Pattern("^[0-9]+([KMGTPE]i?)?$") | |
@Pattern("^[0-9]+([.][0-9]*)?([KMGTPE]i?|e[0-9]+)?$") |
And I guess to be consistent with the network settings we should allow the B suffix in there too. It's difficult because we're trying to be consistent with Kube, but what Kube does (omitting the actual unit) doesn't really work for the network rates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to include .
and exponential notation.
I think we should leave the B suffix out of the disk
capacity values so that it is consistent with the storage values we use for Strimzi (which follow the K8s memory format). I do still think the B suffix is very important for the network rates though! What do you think?
|
||
@JsonInclude(JsonInclude.Include.NON_NULL) | ||
@Pattern("[0-9]+([KMG]i?)?B/s") | ||
@Description("Broker capacity for network inbound throughput, for example, 10000KB/s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we explicitly say that this is in bytes per second, just to avoid people mistakenly thinking it might be in bits per second? Same for the outbound.
private Double brokerInboundNetworkKiBPerSecondCapacity; | ||
private Double brokerOuboundNetworkKiBPerSecondCapacity; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need these double empty lines (here and the following two).
List<ServicePort> ports = new ArrayList<>(1); | ||
ports.add(createServicePort(REST_API_PORT_NAME, REST_API_PORT, REST_API_PORT, "TCP")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collections.singletonList()
} | ||
|
||
protected List<Volume> getVolumes(boolean isOpenShift) { | ||
List<Volume> volumeList = new ArrayList<>(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you insist on doing it like this then get the right number of items:
List<Volume> volumeList = new ArrayList<>(1); | |
List<Volume> volumeList = new ArrayList<>(3); |
But even better would be to use
return Arrays.asList(createSecretVolume(...),
createSecretVolume(...),
createConfigMapVolume(...));
} | ||
|
||
protected List<VolumeMount> getVolumeMounts() { | ||
List<VolumeMount> volumeMountList = new ArrayList<>(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment.
CC_DEFAULT_PROPERTIES_MAP.put("default.goals", DEFAULT_GOALS); | ||
CC_DEFAULT_PROPERTIES_MAP.put("goals", DEFAULT_GOALS); | ||
|
||
FORBIDDEN_OPTIONS = asList(CruiseControlSpec.FORBIDDEN_PREFIXES.split(", ")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ", *"
(split takes a regex) would protect you a little from people being careless with whitespace when making changes in the future.
Map<String, String> annotations = new HashMap<>(); | ||
annotations.put(CruiseControl.ANNO_STRIMZI_IO_LOGGING, logAndMetricsConfigMap.getData().get(ANCILLARY_CM_KEY_LOG_CONFIG)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collections.singletonMap()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc looks good so far.
cf09d68
to
0f5eb11
Compare
@@ -172,7 +174,7 @@ public void setImage(String image) { | |||
} | |||
|
|||
@JsonInclude(JsonInclude.Include.NON_NULL) | |||
@Description("CPU and memory resources to reserve.") | |||
@Description("CPU and memory resources to reserve for the Cruise Control container.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are in the KafkaClusterSpec
, why this is about Cruise Control container resources?
protected static final String LOG_AND_METRICS_CONFIG_VOLUME_NAME = "cruise-control-logging"; | ||
protected static final String LOG_AND_METRICS_CONFIG_VOLUME_MOUNT = "/opt/cruise-control/custom-config/"; | ||
|
||
public static final String ANNO_STRIMZI_IO_LOGGING = Annotations.STRIMZI_DOMAIN + "/logging"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you rebased against the latest master
? Just asking because lately the Annotations.STRIMZI_DOMAIN
was changed from strimzi.io
to strimzi.io/
. In this case, we are going to have strimzi.io//logging
so you should remove the /
from logging. I had the same problem with one of my latest PR :-)
@kyguy - I think you should comment out the two modules that are "TBC". |
d52a0be
to
e9ac04b
Compare
b57762a
to
91ac7f2
Compare
Can someone kick the Azure Pipelines again? The build failure appears to be related to the pipeline |
@strimzi-ci run tests |
Yes. Opened #2876 to fix it |
❗ Systemtests Failed (no tests results are present) ❗ |
@strimzi-ci run tests |
✔️ Test Summary ✔️TEST_PROFILE: acceptance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good. Just some minor comments.
) | ||
@JsonInclude(JsonInclude.Include.NON_NULL) | ||
@JsonPropertyOrder({ | ||
"replicas", "image", "config", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could remove the replicas
from here.
.withReadinessProbe(ModelUtils.createHttpProbe(readinessPath, REST_API_PORT_NAME, readinessProbeOptions)) | ||
.withResources(getResources()) | ||
.withVolumeMounts(getVolumeMounts()) | ||
.withImagePullPolicy(determineImagePullPolicy(imagePullPolicy, getImage())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add here the securityontext form the container templates.
.withCommand("/opt/stunnel/cruise_control_stunnel_pre_stop.sh", | ||
String.valueOf(templateTerminationGracePeriodSeconds)) | ||
.endExec().endPreStop().build()) | ||
.withImagePullPolicy(determineImagePullPolicy(imagePullPolicy, tlsSidecarImage)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add here the security context from the tlsSidecar container template.
import static io.strimzi.operator.cluster.model.VolumeUtils.createSecretVolume; | ||
import static io.strimzi.operator.cluster.model.VolumeUtils.createVolumeMount; | ||
|
||
public class CruiseControl extends AbstractModel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set some network policies on the CruiseControl ports?
name: my-cluster | ||
spec: | ||
kafka: | ||
version: 2.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 2.4.1 now.
entityOperator: | ||
topicOperator: {} | ||
userOperator: {} | ||
cruiseControl: {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should provide a default CC configuration in the example file. We could provide the full list of goals and default goals so the user can remove any they don't need and/or fine-tune the constraints.
Otherwise, we're making the user work harder. They'll need to refer to the documentation to figure out how to set goals and capacity limits.
I don't mind if you merge this now and we continue the discussion in later PRs.
* Added security context and tests Signed-off-by: Paolo Patierno <ppatierno@live.com> * Added network policy for the cruise control REST API port Signed-off-by: Paolo Patierno <ppatierno@live.com>
@strimzi-ci run tests |
❗ Systemtests Failed (no tests results are present) ❗ |
Signed-off-by: Paolo Patierno <ppatierno@live.com>
@strimzi-ci run profile=regression testcase=CruiseControlST |
@strimzi-ci run tests |
@strimzi-ci run tests profile=regression testcase=CruiseControlST |
❗ Systemtests Failed (no tests results are present) ❗ |
1 similar comment
❗ Systemtests Failed (no tests results are present) ❗ |
Signed-off-by: Paolo Patierno <ppatierno@live.com>
@strimzi-ci run tests |
@strimzi-ci run tests profile=regression testcase=CruiseControlST |
✔️ Test Summary ✔️TEST_PROFILE: regression |
@strimzi-ci run tests |
✔️ Test Summary ✔️TEST_PROFILE: acceptance |
* Added Cruise Control System Tests - Added deployment with CC to Kafka ST resources - Added pod name and deployment resources - Added CC deployment system test Signed-off-by: Thomas Cooper <tcooper@redhat.com> * Cruise Control Deployment Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Add broker capacity estimation and configuration Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Add k8s memory parsing to return different byte multiples Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Refactor Capacity constructor Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Regenerate helm charts Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Addressing some comments Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Addressing more comments Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Addressing more comments (refactoring capacity API, checkOwnerReferences) Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Addressing comments ( capacity properties as strings/validate units in schema) Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Tightening capacity regex; fixing docs Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Another doc fix Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Add missing goal; bump CC version to 2.0.100 Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Fix logging and typos Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Addressing comments ( Update allowed disk capacity notation + refactoring) Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Fixing more typos Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Fixing another doc issue Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Securing communication between Cruise Control and Kafka (strimzi#37) * Adding TLS communication between CC and Kafka Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Adding TLS communication between metric reporter and Kafka Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Fixed metrics report to use TLS hostname verification (strimzi#38) Moved metrics topic to Cruise Control configuration Hiding truststore and keystore passwords in the log Signed-off-by: Paolo Patierno <ppatierno@live.com> Co-authored-by: Paolo Patierno <ppatierno@live.com> Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Removing CC replicas, refactor Dockerfile + CC dependencies Signed-off-by: Kyle Liberti <kliberti@redhat.com> * Added security context, network policy and tests (strimzi#39) * Added security context and tests Signed-off-by: Paolo Patierno <ppatierno@live.com> * Added network policy for the cruise control REST API port Signed-off-by: Paolo Patierno <ppatierno@live.com> * Added CruiseControl ST to regression tests Signed-off-by: Paolo Patierno <ppatierno@live.com> * Fixed NPE on Cruise Control network policy Signed-off-by: Paolo Patierno <ppatierno@live.com> Co-authored-by: Thomas Cooper <tcooper@redhat.com> Co-authored-by: Paolo Patierno <ppatierno@live.com>
Type of change
Description
Gives the ability to bring up a Cruise Control instance to monitor a Strimzi Kafka cluster. Cruise Control and its configurations can be declared in a Kafka resource like this:
After declaring a
cruiseControl
in the resource's spec, all Kafka brokers will be rerolled with a Cruise Control metric reporter which will gather metrics about the cluster and store them in a Kafka topic. The Cruise Control application will then process these metrics and use them to learn about the cluster.Note: This is PR exposes the full Cruise Control API, but later PRs will lock this API down, making only a subset of the operations available through another resource.
[1] Cruise Control uses these capacity limits to determine if goals related to broker resources are being violated. (e.g. DiskCapacityGoal: violated when the disk capacity set for a broker is exceeded by the disk being used by that broker). The settings specified here will populate the capacity configuration file fed to Cruise Control (https://github.com/linkedin/cruise-control/blob/2.0.97/config/capacity.json)
[2] Represents the Cruise Control properties file used for configuring the Cruise Control application (https://github.com/linkedin/cruise-control/wiki/Configurations) All configurations are allowed except for the FORBIDDEN CONFIG OPTIONS = bootstrap.servers, client.id, zookeeper., network., security., failed.brokers.zk.path, webserver.http., webserver.api.urlprefix, webserver.session.path, webserver.accesslog., two.step., request.reason.required, metric.reporter.topic, partition.metric.sample.store.topic, broker.metric.sample.store.topic, capacity.config.file, self.healing.
Checklist
Please go through this checklist and make sure all applicable tasks have been done
./design