Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nats-Operator incompatible with istio? #88

Closed
therealmitchconnors opened this issue Oct 31, 2018 · 19 comments · Fixed by #346
Closed

Nats-Operator incompatible with istio? #88

therealmitchconnors opened this issue Oct 31, 2018 · 19 comments · Fixed by #346
Labels

Comments

@therealmitchconnors
Copy link

When I follow the instructions in the project readme to create a nats cluster with 3 members on a gke cluster using istio, all three members immediately show unhealthy and quickly go to crashloopbackoff. Is there something additional I need to do to get nats-operator to play nice with a service mesh?

My Nats Cluster:

echo '
apiVersion: "nats.io/v1alpha2"
kind: "NatsCluster"
metadata:
  name: "example-nats-cluster"
spec:
  size: 3
  version: "1.3.0"
' | kubectl apply -f -

Log from one member:

[1] 2018/10/30 20:27:15.907885 [INF] Starting nats-server version 1.3.0
[1] 2018/10/30 20:27:15.907943 [INF] Git commit [eed4fbc]
[1] 2018/10/30 20:27:15.908133 [INF] Starting http monitor on 0.0.0.0:8222
[1] 2018/10/30 20:27:15.908194 [INF] Listening for client connections on 0.0.0.0:4222
[1] 2018/10/30 20:27:15.908208 [INF] Server is ready
[1] 2018/10/30 20:27:15.908541 [INF] Listening for route connections on 0.0.0.0:6222
[1] 2018/10/30 20:27:15.914868 [ERR] Error trying to connect to route: dial tcp 10.12.12.4:6222: connect: connection refused
[1] 2018/10/30 20:27:16.930604 [ERR] Error trying to connect to route: dial tcp 10.12.12.4:6222: connect: connection refused
[1] 2018/10/30 20:27:17.935214 [INF] 10.12.12.4:6222 - rid:1 - Route connection created
[1] 2018/10/30 20:27:17.940613 [INF] 127.0.0.1:41486 - rid:2 - Route connection created
[1] 2018/10/30 20:27:18.962862 [INF] 10.12.12.4:6222 - rid:3 - Route connection created

(and the Route connection messages continue 290 times before the container is shut down as unhealthy)

My Istio deployment is the default Isitio App from the GCP marketplace, with three nodes in it.
K8S version info:

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.7", GitCommit:"0c38c362511b20a098d7cd855f1314dad92c2780", GitTreeState:"clean", BuildDate:"2018-08-20T10:09:03Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9+", GitVersion:"v1.9.7-gke.6", GitCommit:"9b635efce81582e1da13b35a7aa539c0ccb32987", GitTreeState:"clean", BuildDate:"2018-08-16T21:33:47Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}

istio-pilot version is 1.3

I'd be happy to add more detail if there are follow up questions. I can also cross-post this issue to Istio if the problem appears to be on their side...

@therealmitchconnors
Copy link
Author

One more detail: I am running with sidecars enabled, an the NATS pods get properly injected with the istio-proxy container, which is healthy.

@pires pires added the question label Dec 7, 2018
@pires
Copy link
Collaborator

pires commented Dec 7, 2018

NATS protocol requires direct connectivity to peers when trying to establish routes. With Istio, you are introducing a proxy (actually, two!) in between peers. You shouldn't do that!

Since I'm not versed in Istio, I don't have a concrete answer for you but I'm thinking that maybe this thread will help.

@annismckenzie
Copy link

One way to support Istio would be to create services with manually managed endpoints. The service's name would be the pod name and the selector wouldn't be set, allowing the operator to create an endpoint manually. Deletion of the service as well as the endpoint would be handled by the automatic garbage collection by setting the owner of these to the pod. I'm currently doing that in one of our internal operators and it's quite painless. What I don't know is a) whether it's worth it (🙈) and b) what else would need updating with regards to discovery and the certificate SANs (grasping at straws here as I didn't have time to read the whole source yet). If Istio's configured to do mTLS then the whole TLS handling could be disabled in the operator because it'd be handled transparently by Istio. In that case you'd gain the metrics generation via Istio while still being secure. Would anything stand in the way of taking a stab at implementing that? /cc @pires

@annismckenzie
Copy link

@therealmitchconnors take a look at #111 to get it to work for now.

@laugimethods
Copy link

Making use of Istio to monitor NATS traffic would be great!
Even with simple applications based on NATS, it is hard to tell where communications are broken between services.

Take note that's I'm currently experimenting NATS on OpenShift + Istio + Kiali (https://www.kiali.io/).

@laugimethods
Copy link

That issue was also raised on the Istio project side: istio/old_issues_repo#338

@thedodd
Copy link

thedodd commented Jan 13, 2019

Hey all, just wanted to put this here for the record. I just spun up a nats cluster using this operator. I created the istio VirtualServices as you would normally do, and everything appears to be working as expected. 3 node cluster is live and seems to be properly clustered. I checked the cluster routes by hitting the /routez endpoint on the management network.

Here are the virtual service definitions:

---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: nats
spec:
  hosts:
  - nats-cluster.nats.svc.cluster.local
  tcp:
  - match:
    - port: 4222
    route:
    - destination:
        host: nats-cluster.nats.svc.cluster.local
        port:
          number: 4222
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: nats-management
spec:
  hosts:
  - nats-cluster-mgmt.nats.svc.cluster.local
  tcp:
  - match:
    - port: 8222
    route:
    - destination:
        host: nats-cluster-mgmt.nats.svc.cluster.local
        port:
          number: 8222
  - match:
    - port: 6222
    route:
    - destination:
        host: nats-cluster-mgmt.nats.svc.cluster.local
        port:
          number: 6222

@pires pires removed their assignment Mar 8, 2019
@piotrmsc
Copy link

piotrmsc commented Mar 27, 2019

Hi! @thedodd did you have sidecar injected? I'm struggling currently with the same issue but I have sidecar enabled.
I have managed to connect to nats using telnet from pod with sidecar but from nats client it fails.

@Tharun-Sabbu
Copy link

@piotrmsc , struggling with the same issue, client requests failing, tried using tcp protocol for virtualservice and serviceentry but wasn't able to succeed. Any solution that you found?

Here are my configurations:

apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: nats.test.com
spec:
  hosts:
  - nats.test.com
  location: MESH_EXTERNAL
  ports:
  - number: 8443
    name: tcp
    protocol: TCP
  resolution: DNS
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: nats-external
spec:
  hosts:
    - nats.test.com
  tcp:
  - match:
    - port: 8443
    timeout: 60s
    route:
      - destination:
          host: nats.test.com
          port:
            number: 8443
        weight: 100

@tariq1890
Copy link

@thedodd I tried your solution, but it did not work for me. Were the istio sidecars injected to your NATS pods?

@lukeweber
Copy link

From a first glance I was noticing that I wasn't getting a response over telnet (with istio-sidecar)

Little telnet debug no response

> telnet nats.nats-system.svc.cluster.local 4222

Telnet with a PING -> instant response

>telnet nats.nats-system.svc.cluster.local 4222
PING
INFO {"server_id":"NBB2A2ML5APZWCXAX6SPEQBINHC4B5J2DPYHYRITHXXLQEDW64KVKWMM","server_name":"NBB2A2ML5APZWCXAX6SPEQBINHC4B5J2DPYHYRITHXXLQEDW64KVKWMM","version":"2.1.7","proto":1,"git_commit":"bf0930e","go":"go1.13.10","host":"0.0.0.0","port":4222,"max_payload":1048576,"client_id":1,"client_ip":"127.0.0.1"}
PONG

Interesting istio thinks it's HTTP raw_buffer:

> istioctl pc listeners <pod-name> --port 4222
ADDRESS       PORT MATCH                        DESTINATION
10.96.249.161 4222 Trans: raw_buffer; App: HTTP Route: nats.nats-system.svc.cluster.local:4222

Istio Explicit Port Selection helped me.

Here the service for nats doesn't declare tcp or tls. If they added appProtocol explicity for k8s 1.18+ or named the port tcp-client for example for tcp that would resolve it for Istio.

After renaming the port in service and on the pod spec:

istioctl pc listeners <pod-name> --port 4222
ADDRESS     PORT MATCH DESTINATION
10.96.0.174 4222 ALL   Cluster: outbound|4222||nats.nats-system.svc.cluster.local

Seems to have resolved my connectivity issues, but should be noted the same would need to be done for the other tcp ports.

@clarkmcc
Copy link

This issue is also mentioned here istio/istio#28623. Are there any plans to support this in the operator? I'm struggling with finding a way to customize port names in an operator-managed service.

AppProtocol as mentioned by @lukeweber doesn't seem like a viable option yet because it's not scheduled to make it to GA until 1.21 and cloud providers may or may not allow you to customize your feature gates.

@narenarjun
Copy link

narenarjun commented Mar 29, 2021

for the last two days i have been facing the same issue of making nats streaming server work in istio auto sidecar injection enabled namespace , tried disabling mtls peerauthentication, created virtual service with explicit destination rule to make it work with other pods, nothing worked,

then exec into one of the client pod and started adding console logs to the express nodejs app on all the external connections to manually debug from the live cluster , finally figured out that nats.connect() promise, didn't resolve at all

eventually while googling stumbled upon this open issue

huge thanks to @lukeweber for a detailed comment , i tried it and it worked .

previously, before stumbling to this github open issue page , when i used Kiali dashboard ,it threw KIA0601 error on the nats streaming server deployment , i renamed the deployments associated service port names [the convention is <protocol_prefix>-<any_name_suffix>] to http-suffix convention instead of mentioning tcp-suffix [here http & tcp is used a prefix to the port names] as i wasn't aware that nats used tcp explicitly

now, all the deployments works properly , this explicit mentioning of tcp to the port name prefix must be documented in the nats website under a topic of something like nats working with istio

@wallyqs
Copy link
Member

wallyqs commented Mar 29, 2021

thanks @narenarjun will see if can add a page on this, or feel free to make a PR to the docs too that can be found here: https://github.com/nats-io/nats.docs/tree/master/nats-on-kubernetes

@narenarjun
Copy link

sure 😃 , will do it before end of this week. @wallyqs

@tariq1890
Copy link

tariq1890 commented Mar 30, 2021

If anybody ends up coming here after a Google search, here is my TL;DR

Your kubernetes service should have this value set as its port name: tcp or tcp- as prefix

antoniomo added a commit to antoniomo/argo-events that referenced this issue Aug 14, 2021
Istio-related issues affecting this behaviour:

- nats-io/nats-operator#88
- istio/istio#28623

The fix is very simple. I added the related issues to the code comment,
let me know if that's undesirable or there's something else to clarify.
antoniomo added a commit to antoniomo/argo-events that referenced this issue Aug 14, 2021
Istio-related issues affecting this behaviour:

- nats-io/nats-operator#88
- istio/istio#28623

The fix is very simple. I added the related issues to the code comment,
let me know if that's undesirable or there's something else to clarify.

Signed-off-by: Antonio M. Macías Ojeda <antonio.macias.ojeda@gmail.com>
whynowy added a commit to argoproj/argo-events that referenced this issue Aug 30, 2021
Istio-related issues affecting this behaviour:

- nats-io/nats-operator#88
- istio/istio#28623

The fix is very simple. I added the related issues to the code comment,
let me know if that's undesirable or there's something else to clarify.

Signed-off-by: Antonio M. Macías Ojeda <antonio.macias.ojeda@gmail.com>

Co-authored-by: Derek Wang <whynowy@gmail.com>
@rhummelmose
Copy link

If anybody ends up coming here after a Google search, here is my TL;DR

Your kubernetes service should have this value set as its port name: tcp or tcp- as prefix

Thanks!

@unckleg
Copy link

unckleg commented Oct 31, 2021

From a first glance I was noticing that I wasn't getting a response over telnet (with istio-sidecar)

Little telnet debug no response

> telnet nats.nats-system.svc.cluster.local 4222

Telnet with a PING -> instant response

>telnet nats.nats-system.svc.cluster.local 4222
PING
INFO {"server_id":"NBB2A2ML5APZWCXAX6SPEQBINHC4B5J2DPYHYRITHXXLQEDW64KVKWMM","server_name":"NBB2A2ML5APZWCXAX6SPEQBINHC4B5J2DPYHYRITHXXLQEDW64KVKWMM","version":"2.1.7","proto":1,"git_commit":"bf0930e","go":"go1.13.10","host":"0.0.0.0","port":4222,"max_payload":1048576,"client_id":1,"client_ip":"127.0.0.1"}
PONG

Interesting istio thinks it's HTTP raw_buffer:

> istioctl pc listeners <pod-name> --port 4222
ADDRESS       PORT MATCH                        DESTINATION
10.96.249.161 4222 Trans: raw_buffer; App: HTTP Route: nats.nats-system.svc.cluster.local:4222

Istio Explicit Port Selection helped me.

Here the service for nats doesn't declare tcp or tls. If they added appProtocol explicity for k8s 1.18+ or named the port tcp-client for example for tcp that would resolve it for Istio.

After renaming the port in service and on the pod spec:

istioctl pc listeners <pod-name> --port 4222
ADDRESS     PORT MATCH DESTINATION
10.96.0.174 4222 ALL   Cluster: outbound|4222||nats.nats-system.svc.cluster.local

Seems to have resolved my connectivity issues, but should be noted the same would need to be done for the other tcp ports.

Thanks a lot!!

juliev0 pushed a commit to juliev0/argo-events that referenced this issue Mar 29, 2022
…argoproj#1312)

Istio-related issues affecting this behaviour:

- nats-io/nats-operator#88
- istio/istio#28623

The fix is very simple. I added the related issues to the code comment,
let me know if that's undesirable or there's something else to clarify.

Signed-off-by: Antonio M. Macías Ojeda <antonio.macias.ojeda@gmail.com>

Co-authored-by: Derek Wang <whynowy@gmail.com>
@vitalii-buchyn-exa
Copy link

hi, sorry for postmortem post, but is this fix needed for tcp/6222 server port as well?
thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.