Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let kafka-headless service resolve even before pods are ready #56 #58

Closed
wants to merge 1 commit into from

Conversation

scholzj
Copy link
Member

@scholzj scholzj commented Oct 16, 2017

No description provided.

@@ -2,6 +2,8 @@ apiVersion: v1
kind: Service
metadata:
name: kafka-headless
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since which version of k8s is this available?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know ... it works at least from Kubernetes 1.6.

Copy link
Member Author

@scholzj scholzj Oct 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tombentley It was added in Kubernetes 1.3.

@matzew
Copy link
Contributor

matzew commented Oct 17, 2017

I did use the stateful-set template, like:

oc new-project someproject

oc new-app -f https://raw.githubusercontent.com/scholzj/barnabas/1b893e8ccaa8a2bea868bc6bbf1863898f3a86a3/kafka-statefulsets/resources/openshift-template.yaml -n someproject

It does provision three nodes each (zk and kafka).

Now I did the following, locally (w/ console scripts from kafka_2.12-0.11.0.0)

➜  bin oc get services 

NAME                 CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
kafka                172.30.210.122   <none>        9092/TCP                     20m
kafka-headless       None             <none>        9092/TCP                     20m
zookeeper            172.30.13.87     <none>        2181/TCP                     20m
zookeeper-headless   None             <none>        2181/TCP,2888/TCP,3888/TCP   20m

Using the zk and kafka IP addresses for some CLI fu, like:

➜  bin ./kafka-topics.sh --create --zookeeper 172.30.13.87:2181 --replication-factor 1 --partitions 1 --topic test

Created topic "test".
➜  bin ./kafka-console-producer.sh --broker-list 172.30.210.122:9092 --topic test
>HELLO
>[2017-10-17 13:46:56,278] ERROR Error when sending message to topic test with key: null, value: 5 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0: 10053 ms has passed since batch creation plus linger time

@scholzj
Copy link
Member Author

scholzj commented Oct 17, 2017

@matzew What is the timing for the operations? Did you wait after oc new-app for all the components to get up and running? Or did you just run it command after command. The way the open shift template works right now is that it creates all resources and is done. But inside OpenShift / Kubernetes things take more time. It starts creating the Zookeeper pods one by one. At the same time it starts creating the Kafka pods. But the Kafka pods will basically crash once or twice because the Zookeeper will be not yet ready. So the question is, did you waited long enough for all pods to be up and running? It this issue something what is fixed over time?

Another point which I'm curious about - where does the CLI where you run the kafka-topics.sh and kafka-console-producer.sh run? Do you connect from outside of OpenShift or from inside OpenShift?

@matzew
Copy link
Contributor

matzew commented Oct 17, 2017

@scholzj yeah, I've waited until all are up (three nodes ZK, and three for Kafka).

I just tried again:

➜  bin ./kafka-topics.sh --create --zookeeper 172.30.13.87:2181 --replication-factor 1 --partitions 1 --topic testy

Created topic "testy".
➜  bin ./kafka-console-producer.sh --broker-list 172.30.210.122:9092 --topic testy                                 
>HELLO
>[2017-10-17 14:11:24,973] ERROR Error when sending message to topic testy with key: null, value: 5 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for testy-0: 10050 ms has passed since batch creation plus linger time

new topic / "old" problem.

Do you connect from outside of OpenShift or from inside OpenShift?

From outside, from the shell on my Fedora26 notebook

@scholzj
Copy link
Member Author

scholzj commented Oct 17, 2017

@matzew Hmm, ok. I guess that is related to #50 - it now uses DNS names as advertised hostnames and you will probably not have them resolved outside of OpenShift. The error doesn't really say anything, but this is what normally prints in this situation. I have to think about the best way to deal with this.

@matzew
Copy link
Contributor

matzew commented Oct 17, 2017

Back in September, this was all working fine :)

@scholzj
Copy link
Member Author

scholzj commented Oct 17, 2017

I guess we can just rollback some of the changes from #50 to get you up and running. We should get back to #50 once we have better solution for external access.

@matzew
Copy link
Contributor

matzew commented Oct 17, 2017

@scholzj I am happy to test any PR :)

@scholzj
Copy link
Member Author

scholzj commented Oct 17, 2017

After rolling back #50 this is not needed anymore and can be closed.

@scholzj scholzj closed this Oct 17, 2017
tomncooper pushed a commit to tomncooper/strimzi-kafka-operator that referenced this pull request May 7, 2020
Updating NotReady state as a failure

Signed-off-by: Paolo Patierno <ppatierno@live.com>
tomncooper pushed a commit to tomncooper/strimzi-kafka-operator that referenced this pull request May 7, 2020
Updating NotReady state as a failure

Signed-off-by: Paolo Patierno <ppatierno@live.com>
@scholzj scholzj deleted the issue_56 branch June 12, 2020 16:14
@charris-ca charris-ca mentioned this pull request Jul 21, 2023
david-simon pushed a commit to david-simon/strimzi-kafka-operator that referenced this pull request Feb 11, 2025
CSMDS-321: Dump all Kafka resources in report.sh (strimzi#24)

CSMDS-329: Add all topic describe to report.sh (strimzi#37)

CSMDS-420: Fix report.sh to not fail when Kafka resource is being deleted during script run (strimzi#39)

CSMDS-317: Add java_thread_dump.sh to dump Java threads of all containers o… (strimzi#23)

CSMDS-445: Make cluster arg optional in report.sh (strimzi#47)

This will allow using report.sh on a namespace which only contains a cluster operator.

CSMDS-433: Fix getting a ready kafka broker pod with kubectl when describing topics (strimzi#48)

The head command will immediately return with first line and if kubectl writes anything to stdout after that, there will be nobody to receive it on the right side of the pipe. Because of that, the command will fail with error code 141.

CSMDS-450: Get events with -o wide flag in report.sh script (strimzi#51)

CSMDS-458: Update report.sh to be cluster-wide (strimzi#54)

To get a proper diagnostic bundle from a cluster, report.sh should be changed to dump all information.
This simplifies the process (should only be called once), and also makes sure that everything needed gets captured for diagnosing issues.

CSMDS-444: Dump license JSON in report.sh (strimzi#56)

CSMDS-444: Use secret.data to capture license content (strimzi#73)

MINOR: Allow report.sh to continue when a resource disappears (strimzi#74)

CSMDS-418: Fix local build issues (strimzi#58)

CSMDS-514: remove --request-timeout flag where it is buggy (strimzi#129)

CSMDS-600: report.sh fails to collect multiple replicasets (strimzi#149)

CSMDS-601: Don't export property files when using report.sh (strimzi#151)

CSMDS-388: Extend report.sh to dump all Kafka Connect CRs and KConnect status (strimzi#150)

CSMDS-598: Tolerating not found entities in report.sh (strimzi#162)

It could happen that between listing by type and the actual retrieval of an entity, the entity is being deleted.

CSMDS-637: Add k8s version to report.sh (strimzi#167)

CSMDS-588: Collect kafka-log-dirs output in report.sh (strimzi#172)

CSMDS-815: Add cluster ID and pod top to report.sh (strimzi#201)

CSMDS-803: Dump additional volumes in report.sh (strimzi#203)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants