Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cruise Control deployment #2773

Merged
merged 22 commits into from
Apr 24, 2020
Merged

Conversation

kyguy
Copy link
Member

@kyguy kyguy commented Apr 1, 2020

Type of change

  • Enhancement / new feature

Description

Gives the ability to bring up a Cruise Control instance to monitor a Strimzi Kafka cluster. Cruise Control and its configurations can be declared in a Kafka resource like this:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
...
spec:
  kafka:
    ...
  cruiseControl:
    image: <image>
    jvmOptions: {}
    logging: {}
    template: {}
    resources: {}
    tlsSideCar: {}
    readinessProbe: {}
    livenessProbe: {}
    brokerCapacity: [1]
      disk: 100G
      cpu: 100     # as a percentage (0-100)
      inboundNetwork: 10000KB/s
      outboundNetwork: 10000KB/s
    config: [2]
      default.goals:<list_of_goals>
      goals: <list_of_goals>
      intra.broker.goals: <list_of_goals>
      hard.goals: <list_of_goals>
      cpu.balance.threshold: 1.1
      metadata.max.age.ms: 300000
      send.buffer.bytes: 131072
      ...

After declaring a cruiseControl in the resource's spec, all Kafka brokers will be rerolled with a Cruise Control metric reporter which will gather metrics about the cluster and store them in a Kafka topic. The Cruise Control application will then process these metrics and use them to learn about the cluster.

Note: This is PR exposes the full Cruise Control API, but later PRs will lock this API down, making only a subset of the operations available through another resource.

[1] Cruise Control uses these capacity limits to determine if goals related to broker resources are being violated. (e.g. DiskCapacityGoal: violated when the disk capacity set for a broker is exceeded by the disk being used by that broker). The settings specified here will populate the capacity configuration file fed to Cruise Control (https://github.com/linkedin/cruise-control/blob/2.0.97/config/capacity.json)

[2] Represents the Cruise Control properties file used for configuring the Cruise Control application (https://github.com/linkedin/cruise-control/wiki/Configurations) All configurations are allowed except for the FORBIDDEN CONFIG OPTIONS = bootstrap.servers, client.id, zookeeper., network., security., failed.brokers.zk.path, webserver.http., webserver.api.urlprefix, webserver.session.path, webserver.accesslog., two.step., request.reason.required, metric.reporter.topic, partition.metric.sample.store.topic, broker.metric.sample.store.topic, capacity.config.file, self.healing.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Update/write design documentation in ./design
  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md

@kyguy kyguy added this to the 0.18.0 milestone Apr 1, 2020
)
@JsonInclude(JsonInclude.Include.NON_NULL)
@JsonPropertyOrder({
"deployment", "pod", "apiService", "podDisruptionBudget", "cruiseControlcontainer", "tlsSidecarContainer"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Container needs capitalization in cruiseControlContainer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #2779 for resolving this sort of thing once and for all.


// Check Service
Service svc = cc.generateService();
assertThat(svc.getMetadata().getLabels().entrySet().containsAll(svcLabels.entrySet()), is(true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor niggle, but from a failing test point of view might the check

assertThat(svc.getMetadata().getLabels(), is(svcLabels));

give a more informative error message on failure? This applys to all of the assertions in this file with containsAll calls

try {
assertThat(cc.generateService(), is(nullValue()));
} catch (Throwable expected) {
assertEquals(NullPointerException.class, expected.getClass());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be a little more consistent with the rest of the tests in Strimzi would you opposed to using

assertThrows(NullPointerException.class, () -> cc.generateService());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And let's be clear that the code above it wrong anyway. If you're expecting cc.generateService() to throw there's no point in having the assertThat and the cc.generateService() statement should be followed by a fail(), otherwise in the case of no NPE the test will erroneously pass.

CruiseControl cc = CruiseControl.fromCrd(resource, VERSIONS);

List<EnvVar> kafkaEnvVars = cc.getEnvVars();
assertThat(kafkaEnvVars.stream().filter(var -> testEnvOneKey.equals(var.getName())).map(EnvVar::getValue).findFirst().get(), is(testEnvOneValue));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertThat(kafkaEnvVars, hasEntry(testEnvOneKey, testEnvOneValue) suggested improvement
And maybe have same for envar2?

Copy link
Member Author

@kyguy kyguy Apr 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since kafkaEnvVars is a list instead of a map I used hasItem() instead. How does this look:

List<EnvVar> envVarList = cc.getEnvVars();

assertThat(envVarList, hasItems(new EnvVar(testEnvOneKey, testEnvOneValue, null)));
assertThat(envVarList, hasItems(new EnvVar(testEnvTwoKey, testEnvTwoValue, null)));

Let me know what you think!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though since hasItems takes ...items you could achieve the same with

assertThat(envVarList, hasItems(new EnvVar(testEnvOneKey, testEnvOneValue, null),
    new EnvVar(testEnvTwoKey, testEnvTwoValue, null)));

But this is only a suggestion , not a required change at all! :P

CruiseControl cc = CruiseControl.fromCrd(resource, VERSIONS);

List<EnvVar> kafkaEnvVars = cc.getEnvVars();
assertThat(kafkaEnvVars.stream().filter(var -> testEnvOneKey.equals(var.getName())).map(EnvVar::getValue).findFirst().get(), is(testEnvOneValue));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertThat(kafkaEnvVars, hasEntry(testEnvOneKey, testEnvOneValue) suggested improvement
And maybe have same for envar2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above


private final CruiseControl cc = CruiseControl.fromCrd(resource, VERSIONS);

public void checkOwnerReference(OwnerReference ownerRef, HasMetadata resource) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as an aside, this might be a good function to contribute to test utils as I can see all of the tests needing to call this method!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please! Good spot @samuel-hawker!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll save this for another PR if that is alright with you two, the PR already changes a lot of files!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyguy for sure updating ALL the other files should be done in a different PR but here you could start to move the method in the test utils and using it. Then, again, reusing the method for ALL the other tests has to be done in a different PR. Wdyt?

public class CruiseControlSpec implements UnknownPropertyPreserving, Serializable {
private static final long serialVersionUID = 1L;

public static final String FORBIDDEN_PREFIXES = "bootstrap.servers, zookeeper., ssl., security., failed.brokers.zk.path,"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyguy is there a link to the documentation about all these configs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added this link[1] above the field
[1] https://github.com/linkedin/cruise-control/wiki/Configurations

this.tlsSidecar = tlsSidecar;
}

@Description("The Cruise Control broker capacity config.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Description("The Cruise Control broker capacity config.")
@Description("The Cruise Control broker capacity configuration.")


@Description("The Cruise Control broker capacity config.")
@JsonInclude(JsonInclude.Include.NON_NULL)
public CruiseControlBrokerCapacity getCapacity() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class is called ...BokerCapacity, so should the property be called that too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored

this.capacity = capacity;
}

@Description("The Cruise Control config. Properties with the following prefixes cannot be set: " + FORBIDDEN_PREFIXES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include a link here to the place where the user can learn about what the valid configs are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this link [1] to the description

[1] https://github.com/linkedin/cruise-control/wiki/Configurations

}

@JsonInclude(JsonInclude.Include.NON_EMPTY)
@Description("JVM Options for pods")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the cruise control pod (singular)? If so, please say so in the Description to the reader of the docs is left in no doubt about what they're configuring.

Comment on lines 72 to 84
<1> Number of Cruise Control application instances; set to `1` to run or `0` to pause the application.
<2> Specifies the Cruise Control capacity configuration. For more information, see xref:capacity_configuration[capacity configuration].
<3> Specifies the Cruise Control configuration. You can provide any xref:ref-cruise-control-configuration-{context}[standard configuration option] apart from those managed directly by {ProductName}.
<4> CPU and memory resources reserved for Cruise Control. For more information, see xref:assembly-resource-limits-and-requests-deployment-configuration-kafka[].
<5> Logging configuration, to log messages with a given severity (debug, info, warn, error, fatal) or above.
<6> xref:assembly-customizing-deployments-str[Customization of deployment templates and pods].
<7> xref:assembly-healthchecks-deployment-configuration-kafka[Healthcheck readiness probes].
<8> xref:assembly-healthchecks-deployment-configuration-kafka[Healthcheck liveness probes].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this belong in a procedure, rather than a concept?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be separated into a reference but have been shuffling these around. Let me get the final word from @laidan6000

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On balance, I think it should stay here because the example configuration is helpful within the context of the task. In other words, it doesn't seem to stand alone as a concept or reference module.

Comment on lines +8 to +12
The `config` property in `Kafka.spec.cruiseControl` contains configuration options as keys with values as one of the following JSON types:

* String
* Number
* Boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies more generally to the rest of the docs too, but here would be a good place for a NOTE to say that if your String happens to look like YAML or JSON it will need to be explicitly quoted.

----
== Capacity configuration

Cruise Control uses capacity limits to determine if goals related to broker resources are being violated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which begs the question: What happens if the limits are violated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some text:

A violation of a goal can cause a optimization to fail, preventing that optimization from being used to generate a partition rebalance proposal.
For example, an optimization that would cause a broker to exceed its CPU capacity would not be used for when the CpuCapacityGoal is set as a hard goal

* `networkIn` - Network inbound throughput in KB/s
* `networkOut` - Newtork outbound throughput in KB/s

Since {ProductName} brokers are homogenous, Cruise Control will apply the same capacity limits to every broker it is monitoring.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Since {ProductName} brokers are homogenous, Cruise Control will apply the same capacity limits to every broker it is monitoring.
Since {ProductName} brokers are homogeneous, Cruise Control will apply the same capacity limits to every broker it is monitoring.

this.capacity = capacity;
}

@Description("The Cruise Control config. Properties with the following prefixes cannot be set: " + FORBIDDEN_PREFIXES)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as Tom suggested above, config should also be configuration here

@kyguy kyguy requested a review from d-laing April 3, 2020 12:32
@kyguy kyguy force-pushed the cruise-control-deployment branch 3 times, most recently from 25ae97a to dfeccf9 Compare April 6, 2020 23:22
@kyguy kyguy requested a review from tombentley April 6, 2020 23:23
public static final String FORBIDDEN_PREFIXES = "bootstrap.servers, zookeeper., ssl., security., failed.brokers.zk.path,"
+ "webserver.http.port, webserver.http.address, webserver.api.urlprefix, metric.reporter.sampler.bootstrap.servers, metric.reporter.topic, metric.reporter.topic.pattern,"
+ "partition.metric.sample.store.topic, broker.metric.sample.store.topic, brokerCapacity.config.file, skip.sample.store.topic.rack.awareness.check, cruise.control.metrics.topic"
+ "cruise.control.metrics.reporter.bootstrap.servers, sasl. interceptor.classes";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ "cruise.control.metrics.reporter.bootstrap.servers, sasl. interceptor.classes";
+ "cruise.control.metrics.reporter.bootstrap.servers, sasl.interceptor.classes";

// For the full configuration list refer to https://github.com/linkedin/cruise-control/wiki/Configurations
public static final String FORBIDDEN_PREFIXES = "bootstrap.servers, zookeeper., ssl., security., failed.brokers.zk.path,"
+ "webserver.http.port, webserver.http.address, webserver.api.urlprefix, metric.reporter.sampler.bootstrap.servers, metric.reporter.topic, metric.reporter.topic.pattern,"
+ "partition.metric.sample.store.topic, broker.metric.sample.store.topic, brokerCapacity.config.file, skip.sample.store.topic.rack.awareness.check, cruise.control.metrics.topic"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ "partition.metric.sample.store.topic, broker.metric.sample.store.topic, brokerCapacity.config.file, skip.sample.store.topic.rack.awareness.check, cruise.control.metrics.topic"
+ "partition.metric.sample.store.topic, broker.metric.sample.store.topic, brokerCapacity.config.file, skip.sample.store.topic.rack.awareness.check, cruise.control.metrics.topic,"

this.tlsSidecar = tlsSidecar;
}

@Description("The Cruise Control brokerCapacity configuration.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Description("The Cruise Control brokerCapacity configuration.")
@Description("The Cruise Control `brokerCapacity` configuration.")

}

@Description("The Cruise Control configuration. For a full list of configuration options refer to" +
" https://github.com/linkedin/cruise-control/wiki/Configurations Note that properties " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" https://github.com/linkedin/cruise-control/wiki/Configurations Note that properties " +
" https://github.com/linkedin/cruise-control/wiki/Configurations. Note that properties " +

if (cruiseControlSpec != null) {
metricReporterList.add(CRUISE_CONTROL_METRIC_REPORTER);
} else {
metricReporterList.remove(CRUISE_CONTROL_METRIC_REPORTER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it they added CRUISE_CONTROL_METRIC_REPORTER explicitly themselves we remove it unless they also specified a CruiseControlSpec? I think this is more or less a moot point at the moment, since there's no mechanism for people to add plugin classes to our Kafka images (apart from them building their own images, which perhaps some people do). But I would none the less argue that this is not the right thing to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it they added CRUISE_CONTROL_METRIC_REPORTER explicitly themselves we remove it unless they also specified a CruiseControlSpec?

Yes, but the main intention of this code is to stop the Cruise Control metric reporters after the Cruise Control application has been deleted. Otherwise, the Cruise Control metric reporters will continue to run and cost resources even if they are not being used.

Comment on lines 55 to 60
List<SingleVolumeStorage> volumeList = ((JbodStorage) storage).getVolumes();
long size = 0;
for (SingleVolumeStorage volume : volumeList) {
size += generateDiskCapacity(volume);
}
return size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this mean the CC doesn't really distinguish the disks as having individual sizes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cruise Control does distinguish the disks as having individual sizes using the adminClient.

However, for balancing partitions between disks of the same broker, Cruise Control needs us to explicitly state the capacity of each disk here, which I have not done yet. This will require some extra logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So will that extra logic be in this PR, or a future one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A future one

Copy link
Member

@d-laing d-laing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peer reviewed. This is looking great!

----

.Additional resources
For information, refer to the xref:type-CruiseControlBrokerCapacity-reference[Capacity schema reference].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - this cross-reference is currently broken.

[id='con-cruise-control-overview-{context}']
= Cruise Control overview

Brokers in a Kafka cluster can become unevenly loaded due to things like an uneven distribution of busy partitions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Brokers in a Kafka cluster can become unevenly loaded due to things like an uneven distribution of busy partitions.
Brokers in a Kafka cluster can become unevenly loaded for many reasons. For example, partitions that handle large amounts of message traffic can be unevenly distributed across the available brokers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next, I would explain that partition reassignment is usually a manual task. Perhaps you could add a sentence like the following:

"Monitoring broker load and reassigning busy partitions to brokers with spare capacity is time consuming and adds to the overhead of managing a Kafka cluster"

= Cruise Control overview

Brokers in a Kafka cluster can become unevenly loaded due to things like an uneven distribution of busy partitions.
Cruise Control is a tool for simplifying the monitoring and rebalancing of partitions across a Kafka cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Cruise Control is a tool for simplifying the monitoring and rebalancing of partitions across a Kafka cluster.
Cruise Control is a tool for automating the monitoring and rebalancing of partitions across a Kafka cluster.


Brokers in a Kafka cluster can become unevenly loaded due to things like an uneven distribution of busy partitions.
Cruise Control is a tool for simplifying the monitoring and rebalancing of partitions across a Kafka cluster.
You deploy Cruise Control alongside a Kafka cluster to:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make CC the subject of this sentence and use different terminology. Please rephrase to something like the following:

When deployed alongside a Kafka cluster, Cruise Control can perform the following cluster balancing tasks:

  • Monitoring...
  • Making optimization proposals: balanced partition assignments based on configurable optimization goals.
  • Executing partition reassignments based on optimization proposals.

[id='proc-deploying-cruise-control-{context}']
= Deploying Cruise Control

Deploy Cruise Control to your {ProductName} cluster through the `cruiseControl` property in the `Kafka` resource.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Deploy Cruise Control to your {ProductName} cluster through the `cruiseControl` property in the `Kafka` resource.
Cruise Control is configured using the `cruiseControl` property in the `Kafka` resource. Once configured, you can deploy a Cruise Control instance to your {ProductName} cluster by creating or updating the `Kafka` resource.


Cruise Control uses capacity limits to determine if goals related to broker resources are being violated.
A violation of a goal can cause a optimization to fail, preventing that optimization from being used to generate a partition rebalance proposal.
For example, an optimization that would cause a broker to exceed its CPU capacity would not be used for when the CpuCapacityGoal is set as a hard goal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For example, an optimization that would cause a broker to exceed its CPU capacity would not be used for when the CpuCapacityGoal is set as a hard goal.
For example, an optimization that would cause a broker to exceed its CPU capacity would not be used if the `CpuCapacityGoal` is set as a hard goal.

A violation of a goal can cause a optimization to fail, preventing that optimization from being used to generate a partition rebalance proposal.
For example, an optimization that would cause a broker to exceed its CPU capacity would not be used for when the CpuCapacityGoal is set as a hard goal.

Capacity limits for broker resources can be specified in the `brokerCapacity` property in `Kafka.spec.cruiseControl`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Capacity limits for broker resources can be specified in the `brokerCapacity` property in `Kafka.spec.cruiseControl`.
You specify capacity limits for broker resources in the `brokerCapacity` property in `Kafka.spec.cruiseControl` .

* `diskMiB` - Disk storage in MiB
* `cpuUtilization` - CPU utilization as a percent (0-100)
* `inboundNetworkKiBPerSecond` - Network inbound throughput in KiB/s
* `outboundNetworkKiBPerSecond` - Newtork outbound throughput in KiB/s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `outboundNetworkKiBPerSecond` - Newtork outbound throughput in KiB/s
* `outboundNetworkKiBPerSecond` - Network outbound throughput in KiB/s

* `inboundNetworkKiBPerSecond` - Network inbound throughput in KiB/s
* `outboundNetworkKiBPerSecond` - Newtork outbound throughput in KiB/s

Since {ProductName} brokers are homogeneous, Cruise Control will apply the same capacity limits to every broker it is monitoring.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Since {ProductName} brokers are homogeneous, Cruise Control will apply the same capacity limits to every broker it is monitoring.
Because Kafka brokers are homogeneous, Cruise Control applies the same capacity limits to every broker it is monitoring.

Copy link
Member Author

@kyguy kyguy Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to {ProductName} Kafka brokers since Kafka brokers are not necessarily homogeneous but Kafka brokers managed by Strimzi are homogeneous

----

.Additional resources
For information, refer to the xref:type-CruiseControlBrokerCapacity-reference[Capacity schema reference].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For information, refer to the xref:type-CruiseControlBrokerCapacity-reference[Capacity schema reference].
For more information, refer to the xref:type-CruiseControlBrokerCapacity-reference[].

@kyguy kyguy force-pushed the cruise-control-deployment branch from be06b74 to cf09d68 Compare April 15, 2020 15:33
Copy link
Member

@tombentley tombentley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, but I'm happy for this to be merged.

private Map<String, Object> additionalProperties = new HashMap<>(0);

@JsonInclude(JsonInclude.Include.NON_DEFAULT)
@Pattern("^[0-9]+([KMGTPE]i?)?$")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow stuff like 1.5T and exponential notation like 2e6 (which Kube does for memory, for example).

Suggested change
@Pattern("^[0-9]+([KMGTPE]i?)?$")
@Pattern("^[0-9]+([.][0-9]*)?([KMGTPE]i?|e[0-9]+)?$")

And I guess to be consistent with the network settings we should allow the B suffix in there too. It's difficult because we're trying to be consistent with Kube, but what Kube does (omitting the actual unit) doesn't really work for the network rates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to include . and exponential notation.

I think we should leave the B suffix out of the disk capacity values so that it is consistent with the storage values we use for Strimzi (which follow the K8s memory format). I do still think the B suffix is very important for the network rates though! What do you think?


@JsonInclude(JsonInclude.Include.NON_NULL)
@Pattern("[0-9]+([KMG]i?)?B/s")
@Description("Broker capacity for network inbound throughput, for example, 10000KB/s")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explicitly say that this is in bytes per second, just to avoid people mistakenly thinking it might be in bits per second? Same for the outbound.

private Double brokerInboundNetworkKiBPerSecondCapacity;
private Double brokerOuboundNetworkKiBPerSecondCapacity;


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these double empty lines (here and the following two).

Comment on lines 293 to 294
List<ServicePort> ports = new ArrayList<>(1);
ports.add(createServicePort(REST_API_PORT_NAME, REST_API_PORT, REST_API_PORT, "TCP"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collections.singletonList()

}

protected List<Volume> getVolumes(boolean isOpenShift) {
List<Volume> volumeList = new ArrayList<>(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you insist on doing it like this then get the right number of items:

Suggested change
List<Volume> volumeList = new ArrayList<>(1);
List<Volume> volumeList = new ArrayList<>(3);

But even better would be to use

return Arrays.asList(createSecretVolume(...),
                     createSecretVolume(...),
                     createConfigMapVolume(...));

}

protected List<VolumeMount> getVolumeMounts() {
List<VolumeMount> volumeMountList = new ArrayList<>(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment.

CC_DEFAULT_PROPERTIES_MAP.put("default.goals", DEFAULT_GOALS);
CC_DEFAULT_PROPERTIES_MAP.put("goals", DEFAULT_GOALS);

FORBIDDEN_OPTIONS = asList(CruiseControlSpec.FORBIDDEN_PREFIXES.split(", "));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ", *" (split takes a regex) would protect you a little from people being careless with whitespace when making changes in the future.

Comment on lines 3093 to 3094
Map<String, String> annotations = new HashMap<>();
annotations.put(CruiseControl.ANNO_STRIMZI_IO_LOGGING, logAndMetricsConfigMap.getData().get(ANCILLARY_CM_KEY_LOG_CONFIG));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collections.singletonMap()

Copy link
Member

@d-laing d-laing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc looks good so far.

@kyguy kyguy force-pushed the cruise-control-deployment branch from cf09d68 to 0f5eb11 Compare April 20, 2020 19:57
@@ -172,7 +174,7 @@ public void setImage(String image) {
}

@JsonInclude(JsonInclude.Include.NON_NULL)
@Description("CPU and memory resources to reserve.")
@Description("CPU and memory resources to reserve for the Cruise Control container.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are in the KafkaClusterSpec, why this is about Cruise Control container resources?

protected static final String LOG_AND_METRICS_CONFIG_VOLUME_NAME = "cruise-control-logging";
protected static final String LOG_AND_METRICS_CONFIG_VOLUME_MOUNT = "/opt/cruise-control/custom-config/";

public static final String ANNO_STRIMZI_IO_LOGGING = Annotations.STRIMZI_DOMAIN + "/logging";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you rebased against the latest master? Just asking because lately the Annotations.STRIMZI_DOMAIN was changed from strimzi.io to strimzi.io/. In this case, we are going to have strimzi.io//logging so you should remove the / from logging. I had the same problem with one of my latest PR :-)

@d-laing
Copy link
Member

d-laing commented Apr 21, 2020

@kyguy - I think you should comment out the two modules that are "TBC".

@kyguy kyguy force-pushed the cruise-control-deployment branch 2 times, most recently from d52a0be to e9ac04b Compare April 21, 2020 13:40
@kyguy kyguy force-pushed the cruise-control-deployment branch from b57762a to 91ac7f2 Compare April 22, 2020 23:17
@kyguy
Copy link
Member Author

kyguy commented Apr 23, 2020

Can someone kick the Azure Pipelines again? The build failure appears to be related to the pipeline

@ppatierno
Copy link
Member

@strimzi-ci run tests

@ppatierno
Copy link
Member

@kyguy I think there is a problem with Java 8 on Azure. @Frawless can you confirm this?

@Frawless
Copy link
Member

Yes. Opened #2876 to fix it

@strimzi-ci
Copy link

Systemtests Failed (no tests results are present)

@ppatierno
Copy link
Member

@strimzi-ci run tests

@strimzi-ci
Copy link

✔️ Test Summary ✔️

TEST_PROFILE: acceptance
TEST_CASE: *ST
TOTAL: 30
PASS: 30
FAIL: 0
SKIP: 0
BUILD_NUMBER:

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good. Just some minor comments.

)
@JsonInclude(JsonInclude.Include.NON_NULL)
@JsonPropertyOrder({
"replicas", "image", "config",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could remove the replicas from here.

.withReadinessProbe(ModelUtils.createHttpProbe(readinessPath, REST_API_PORT_NAME, readinessProbeOptions))
.withResources(getResources())
.withVolumeMounts(getVolumeMounts())
.withImagePullPolicy(determineImagePullPolicy(imagePullPolicy, getImage()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add here the securityontext form the container templates.

.withCommand("/opt/stunnel/cruise_control_stunnel_pre_stop.sh",
String.valueOf(templateTerminationGracePeriodSeconds))
.endExec().endPreStop().build())
.withImagePullPolicy(determineImagePullPolicy(imagePullPolicy, tlsSidecarImage))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add here the security context from the tlsSidecar container template.

import static io.strimzi.operator.cluster.model.VolumeUtils.createSecretVolume;
import static io.strimzi.operator.cluster.model.VolumeUtils.createVolumeMount;

public class CruiseControl extends AbstractModel {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to set some network policies on the CruiseControl ports?

name: my-cluster
spec:
kafka:
version: 2.4.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 2.4.1 now.

entityOperator:
topicOperator: {}
userOperator: {}
cruiseControl: {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should provide a default CC configuration in the example file. We could provide the full list of goals and default goals so the user can remove any they don't need and/or fine-tune the constraints.

Otherwise, we're making the user work harder. They'll need to refer to the documentation to figure out how to set goals and capacity limits.

I don't mind if you merge this now and we continue the discussion in later PRs.

* Added security context and tests

Signed-off-by: Paolo Patierno <ppatierno@live.com>

* Added network policy for the cruise control REST API port

Signed-off-by: Paolo Patierno <ppatierno@live.com>
@ppatierno
Copy link
Member

@strimzi-ci run tests

@strimzi-ci
Copy link

Systemtests Failed (no tests results are present)

Signed-off-by: Paolo Patierno <ppatierno@live.com>
@ppatierno
Copy link
Member

@strimzi-ci run profile=regression testcase=CruiseControlST

@ppatierno
Copy link
Member

@strimzi-ci run tests

@ppatierno
Copy link
Member

@strimzi-ci run tests profile=regression testcase=CruiseControlST

@strimzi-ci
Copy link

Systemtests Failed (no tests results are present)

1 similar comment
@strimzi-ci
Copy link

Systemtests Failed (no tests results are present)

Signed-off-by: Paolo Patierno <ppatierno@live.com>
@ppatierno
Copy link
Member

@strimzi-ci run tests

@ppatierno
Copy link
Member

@strimzi-ci run tests profile=regression testcase=CruiseControlST

@strimzi-ci
Copy link

✔️ Test Summary ✔️

TEST_PROFILE: regression
TEST_CASE: CruiseControlST
TOTAL: 1
PASS: 1
FAIL: 0
SKIP: 0
BUILD_NUMBER:

@ppatierno
Copy link
Member

@strimzi-ci run tests

@strimzi-ci
Copy link

✔️ Test Summary ✔️

TEST_PROFILE: acceptance
TEST_CASE: *ST
TOTAL: 30
PASS: 30
FAIL: 0
SKIP: 0
BUILD_NUMBER:

@ppatierno ppatierno merged commit 1a94d2e into strimzi:master Apr 24, 2020
klalafaryan pushed a commit to klalafaryan/strimzi-kafka-operator that referenced this pull request Oct 21, 2020
* Added Cruise Control System Tests

    - Added deployment with CC to Kafka ST resources
    - Added pod name and deployment resources
    - Added CC deployment system test

Signed-off-by: Thomas Cooper <tcooper@redhat.com>

* Cruise Control Deployment

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Add broker capacity estimation and configuration

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Add k8s memory parsing to return different byte multiples

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Refactor Capacity constructor

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Regenerate helm charts

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Addressing some comments

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Addressing more comments

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Addressing more comments (refactoring capacity API, checkOwnerReferences)

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Addressing comments ( capacity properties as strings/validate units in schema)

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Tightening capacity regex; fixing docs

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Another doc fix

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Add missing goal; bump CC version to 2.0.100

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Fix logging and typos

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Addressing comments ( Update allowed disk capacity notation + refactoring)

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Fixing more typos

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Fixing another doc issue

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Securing communication between Cruise Control and Kafka (strimzi#37)

* Adding TLS communication between CC and Kafka

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Adding TLS communication between metric reporter and Kafka

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Fixed metrics report to use TLS hostname verification (strimzi#38)

Moved metrics topic to Cruise Control configuration
Hiding truststore and keystore passwords in the log

Signed-off-by: Paolo Patierno <ppatierno@live.com>

Co-authored-by: Paolo Patierno <ppatierno@live.com>
Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Removing CC replicas, refactor Dockerfile + CC dependencies

Signed-off-by: Kyle Liberti <kliberti@redhat.com>

* Added security context, network policy and tests (strimzi#39)

* Added security context and tests

Signed-off-by: Paolo Patierno <ppatierno@live.com>

* Added network policy for the cruise control REST API port

Signed-off-by: Paolo Patierno <ppatierno@live.com>

* Added CruiseControl ST to regression tests

Signed-off-by: Paolo Patierno <ppatierno@live.com>

* Fixed NPE on Cruise Control network policy

Signed-off-by: Paolo Patierno <ppatierno@live.com>

Co-authored-by: Thomas Cooper <tcooper@redhat.com>
Co-authored-by: Paolo Patierno <ppatierno@live.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants