Let kafka-headless service resolve even before pods are ready #56 #58

scholzj · 2017-10-16T19:17:35Z

No description provided.

tombentley · 2017-10-17T10:19:30Z

kafka-inmemory/resources/kafka-headless-service.yaml

@@ -2,6 +2,8 @@ apiVersion: v1
 kind: Service
 metadata:
  name: kafka-headless
+  annotations:
+    service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"


Since which version of k8s is this available?

I don't know ... it works at least from Kubernetes 1.6.

@tombentley It was added in Kubernetes 1.3.

matzew · 2017-10-17T11:51:00Z

I did use the stateful-set template, like:

oc new-project someproject

oc new-app -f https://raw.githubusercontent.com/scholzj/barnabas/1b893e8ccaa8a2bea868bc6bbf1863898f3a86a3/kafka-statefulsets/resources/openshift-template.yaml -n someproject

It does provision three nodes each (zk and kafka).

Now I did the following, locally (w/ console scripts from kafka_2.12-0.11.0.0)

➜  bin oc get services 

NAME                 CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
kafka                172.30.210.122   <none>        9092/TCP                     20m
kafka-headless       None             <none>        9092/TCP                     20m
zookeeper            172.30.13.87     <none>        2181/TCP                     20m
zookeeper-headless   None             <none>        2181/TCP,2888/TCP,3888/TCP   20m

Using the zk and kafka IP addresses for some CLI fu, like:

➜  bin ./kafka-topics.sh --create --zookeeper 172.30.13.87:2181 --replication-factor 1 --partitions 1 --topic test

Created topic "test".
➜  bin ./kafka-console-producer.sh --broker-list 172.30.210.122:9092 --topic test
>HELLO
>[2017-10-17 13:46:56,278] ERROR Error when sending message to topic test with key: null, value: 5 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0: 10053 ms has passed since batch creation plus linger time

scholzj · 2017-10-17T12:09:36Z

@matzew What is the timing for the operations? Did you wait after oc new-app for all the components to get up and running? Or did you just run it command after command. The way the open shift template works right now is that it creates all resources and is done. But inside OpenShift / Kubernetes things take more time. It starts creating the Zookeeper pods one by one. At the same time it starts creating the Kafka pods. But the Kafka pods will basically crash once or twice because the Zookeeper will be not yet ready. So the question is, did you waited long enough for all pods to be up and running? It this issue something what is fixed over time?

Another point which I'm curious about - where does the CLI where you run the kafka-topics.sh and kafka-console-producer.sh run? Do you connect from outside of OpenShift or from inside OpenShift?

matzew · 2017-10-17T12:13:41Z

@scholzj yeah, I've waited until all are up (three nodes ZK, and three for Kafka).

I just tried again:

➜  bin ./kafka-topics.sh --create --zookeeper 172.30.13.87:2181 --replication-factor 1 --partitions 1 --topic testy

Created topic "testy".
➜  bin ./kafka-console-producer.sh --broker-list 172.30.210.122:9092 --topic testy                                 
>HELLO
>[2017-10-17 14:11:24,973] ERROR Error when sending message to topic testy with key: null, value: 5 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for testy-0: 10050 ms has passed since batch creation plus linger time

new topic / "old" problem.

Do you connect from outside of OpenShift or from inside OpenShift?

From outside, from the shell on my Fedora26 notebook

scholzj · 2017-10-17T12:24:09Z

@matzew Hmm, ok. I guess that is related to #50 - it now uses DNS names as advertised hostnames and you will probably not have them resolved outside of OpenShift. The error doesn't really say anything, but this is what normally prints in this situation. I have to think about the best way to deal with this.

matzew · 2017-10-17T12:26:56Z

Back in September, this was all working fine :)

scholzj · 2017-10-17T12:27:08Z

I guess we can just rollback some of the changes from #50 to get you up and running. We should get back to #50 once we have better solution for external access.

matzew · 2017-10-17T12:38:51Z

@scholzj I am happy to test any PR :)

scholzj · 2017-10-17T13:37:26Z

After rolling back #50 this is not needed anymore and can be closed.

Updating NotReady state as a failure Signed-off-by: Paolo Patierno <ppatierno@live.com>

CSMDS-321: Dump all Kafka resources in report.sh (strimzi#24) CSMDS-329: Add all topic describe to report.sh (strimzi#37) CSMDS-420: Fix report.sh to not fail when Kafka resource is being deleted during script run (strimzi#39) CSMDS-317: Add java_thread_dump.sh to dump Java threads of all containers o… (strimzi#23) CSMDS-445: Make cluster arg optional in report.sh (strimzi#47) This will allow using report.sh on a namespace which only contains a cluster operator. CSMDS-433: Fix getting a ready kafka broker pod with kubectl when describing topics (strimzi#48) The head command will immediately return with first line and if kubectl writes anything to stdout after that, there will be nobody to receive it on the right side of the pipe. Because of that, the command will fail with error code 141. CSMDS-450: Get events with -o wide flag in report.sh script (strimzi#51) CSMDS-458: Update report.sh to be cluster-wide (strimzi#54) To get a proper diagnostic bundle from a cluster, report.sh should be changed to dump all information. This simplifies the process (should only be called once), and also makes sure that everything needed gets captured for diagnosing issues. CSMDS-444: Dump license JSON in report.sh (strimzi#56) CSMDS-444: Use secret.data to capture license content (strimzi#73) MINOR: Allow report.sh to continue when a resource disappears (strimzi#74) CSMDS-418: Fix local build issues (strimzi#58) CSMDS-514: remove --request-timeout flag where it is buggy (strimzi#129) CSMDS-600: report.sh fails to collect multiple replicasets (strimzi#149) CSMDS-601: Don't export property files when using report.sh (strimzi#151) CSMDS-388: Extend report.sh to dump all Kafka Connect CRs and KConnect status (strimzi#150) CSMDS-598: Tolerating not found entities in report.sh (strimzi#162) It could happen that between listing by type and the actual retrieval of an entity, the entity is being deleted. CSMDS-637: Add k8s version to report.sh (strimzi#167) CSMDS-588: Collect kafka-log-dirs output in report.sh (strimzi#172) CSMDS-815: Add cluster ID and pod top to report.sh (strimzi#201) CSMDS-803: Dump additional volumes in report.sh (strimzi#203)

Let kafka-headless service resolve even before pods are ready - Fixes s…

1b893e8

…trimzi#56

tombentley reviewed Oct 17, 2017

View reviewed changes

scholzj closed this Oct 17, 2017

tomncooper pushed a commit to tomncooper/strimzi-kafka-operator that referenced this pull request May 7, 2020

Fixed lock failure on periodic reconcile (strimzi#58)

72fcd17

Updating NotReady state as a failure Signed-off-by: Paolo Patierno <ppatierno@live.com>

tomncooper pushed a commit to tomncooper/strimzi-kafka-operator that referenced this pull request May 7, 2020

Fixed lock failure on periodic reconcile (strimzi#58)

6755115

Updating NotReady state as a failure Signed-off-by: Paolo Patierno <ppatierno@live.com>

scholzj deleted the issue_56 branch June 12, 2020 16:14

charris-ca mentioned this pull request Jul 21, 2023

[Bug]: #8878

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let kafka-headless service resolve even before pods are ready #56 #58

Let kafka-headless service resolve even before pods are ready #56 #58

scholzj commented Oct 16, 2017

tombentley Oct 17, 2017

scholzj Oct 17, 2017

scholzj Oct 17, 2017 •

edited

Loading

matzew commented Oct 17, 2017 •

edited

Loading

scholzj commented Oct 17, 2017

matzew commented Oct 17, 2017

scholzj commented Oct 17, 2017

matzew commented Oct 17, 2017

scholzj commented Oct 17, 2017

matzew commented Oct 17, 2017

scholzj commented Oct 17, 2017

Let kafka-headless service resolve even before pods are ready #56 #58

Let kafka-headless service resolve even before pods are ready #56 #58

Conversation

scholzj commented Oct 16, 2017

tombentley Oct 17, 2017

Choose a reason for hiding this comment

scholzj Oct 17, 2017

Choose a reason for hiding this comment

scholzj Oct 17, 2017 • edited Loading

Choose a reason for hiding this comment

matzew commented Oct 17, 2017 • edited Loading

scholzj commented Oct 17, 2017

matzew commented Oct 17, 2017

scholzj commented Oct 17, 2017

matzew commented Oct 17, 2017

scholzj commented Oct 17, 2017

matzew commented Oct 17, 2017

scholzj commented Oct 17, 2017

scholzj Oct 17, 2017 •

edited

Loading

matzew commented Oct 17, 2017 •

edited

Loading