Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNI issue with 0.7.1 #524

Closed
dcherniv opened this issue Feb 1, 2020 · 32 comments
Closed

SNI issue with 0.7.1 #524

dcherniv opened this issue Feb 1, 2020 · 32 comments
Labels
bug Something isn't working work in progress Work In Progress

Comments

@dcherniv
Copy link

dcherniv commented Feb 1, 2020

In reference to #510
The problem still persists with 0.7.1. It takes longer to manifest now, but in the end there's still an issue and TLS still breaks from time to time.
When TLS does break kong may take quite some time to recover.

@hbagdi hbagdi added the bug Something isn't working label Feb 3, 2020
@dermyrddin
Copy link

Hello, @hbagdi is there any progress or ETA on this?

We are using a huge amount of dynamic (create/delete) ephemeral namespaces with the same certificate in our development process and parser issues are mostly break KONG configuration updates so our environments became inaccessible.

@hbagdi
Copy link
Member

hbagdi commented Feb 10, 2020

@merlineus Can you elaborate on your use-case so that I understand why this happens?

The fix for this will be either in 0.8 that is in the works or 0.8.1.

@dermyrddin
Copy link

Hello, sure.
We can setup a remote session tomorrow (11'th February) 12-20 UTC. How can we communicate?

@hbagdi
Copy link
Member

hbagdi commented Feb 10, 2020

Let's try to do this asynchronously. Can you elaborate your use-case in this Github Issue itself?

@dermyrddin
Copy link

Below is an example of configuration. We have 2 namespace (in each 2 ingresses):

namespace test1
- ingress that directed to test1.example.com
- ingress that directed to test1.example.com/api

namespace test2
- ingress that directed to test2.example.com
- ingress that directed to test2.example.com/api

We also have the wildcard certificate for * .example.com (the certificate is kept in kubernetes secrets with name "wildcard-certificate-for-example-com" in each namespace).

We configure ingresses in the namespaces as follows (an example is given only for test1):

  1. Setup HTTPS redirect
kind: KongIngress
metadata:
    name: https-only
route:
  protocols:
  - https
  https_redirect_status_code: 302
  1. Setup ingress is directed to test1.example.com:
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: test1
  annotations:
    kubernetes.io/ingress.class: kong
    configuration.konghq.com: https-only
spec:
  tls:
    - hosts:
        - test1.example.com
      secretName: wildcard-certificate-for-example-com
  rules:
    - host: test1.example.com
      http:
        paths:
          - path: /
            backend:
              serviceName: test1
              servicePort: 80
  1. Setup rewrite plugin
kind: KongPlugin
apiVersion: configuration.konghq.com/v1
metadata:
  name: test1-api-rewrite
config:
  replace:
    uri: '/$(uri_captures[1])'
plugin: request-transformer
  1. Setup ingres, which is aimed at test1.example.com/api
kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: test1-api
  annotations:
    kubernetes.io/ingress.class: kong
    configuration.konghq.com: https-only
    plugins.konghq.com: 'test1-api-rewrite'
spec:
  rules:
    - host: test1.example.com
      http:
        paths:
          - path: /api/(.*)
            backend:
              serviceName: test1-api
              servicePort: 8080

After in the namespace "test2" we followed the steps described above (with a different dns name, but with same certificate) and in logs getting errors:

	while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test1.example.com already associated with existing certificate '15cfbbbf-bd8d-4546-8262-0b7726057875')","name":"schema violation","fields":{"snis":"test1.example.com already associated with existing certificate '15cfbbbf-bd8d-4546-8262-0b7726057875'"},"code":2}
W0206 17:50:16.451447       1 queue.go:112] requeuing cert-manager/cluster-certmanager, err 1 errors occurred:
	while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test1.example.com already associated with existing certificate '15cfbbbf-bd8d-4546-8262-0b7726057875')","name":"schema violation","fields":{"snis":"test1.example.com already associated with existing certificate '15cfbbbf-bd8d-4546-8262-0b7726057875'"},"code":2}
E0206 17:50:19.832091       1 controller.go:119] unexpected failure updating Kong configuration: 
1 errors occurred:

@hbagdi
Copy link
Member

hbagdi commented Feb 11, 2020

@merlineus

  1. Are you using 0.7.0 or 0.7.1?
  2. Does the error resolve itself if you start from a clean database (not an upgrade) or after some time?

@dermyrddin
Copy link

  1. 0.7.1
  2. Error appearing immediately on enabling certificate on second namespace. This was reproduced on an existing database after upgrade from 0.7.0 to 0.7.1, but all certificates were deleted from the DB.

@elkh510
Copy link

elkh510 commented Feb 18, 2020

@hbagdi hi
any updates?

@hbagdi
Copy link
Member

hbagdi commented Feb 18, 2020

Controller 0.8.0 is the priority right now. The fix is a little involved so it will take me time to write it up. Expect a fix sometime in April.

@ypysarev-lohika-tix
Copy link

@hbagdi referenced issue #544 is a little bit different.

Current issue is related to using the same certificates and error appears immediately after adding already present certificate to another namespace.

But #544 is related to using different certificates - there is no error on adding certificate, but error appears accidentally after some time and possibly related to attempts to re-apply existing configuration.

@Ehekatl
Copy link

Ehekatl commented Mar 3, 2020

@hbagdi struggling with this issue for weeks. we eliminate all the possible duplication, change multiple SNIs to single *.example.com with wildcard cert in single namespace.

and the problem still persists.
I couldn't possibly understand what logic flaw it is.
it seems trigger by a successful configuration sync, either produce the issue or recover from the issue.
and when it happen, for every two or three requests there will be one that is missing SNI match (fallback to default localhost cert)

Could you help me understand the possible cause?

@landorg
Copy link

landorg commented Mar 3, 2020

we removed all the tls.hosts entries from all ingresses in our cluster except a single one having only *.example.com. this "works" as long as noone adds an ingress with another hosts enry.

@Ehekatl
Copy link

Ehekatl commented Mar 31, 2020

@RolandG we have cert-bot running in our cluster, which updates several times in a day.
And it's even worse with endpoints update, if a pod is flapping between health and unhealthy.
It will constantly trigger UPDATE event and the SNI problem gonna keep come and go.

@hbagdi I see release 0.8.0 has came out but didn't target these bugs...any hope?

@elkh510
Copy link

elkh510 commented Mar 31, 2020

hi @hbagdi
we test kong-ingress-controller:0.8.0 with configuration below
but still get sni error.
when fix is ​​expected?
ths
configuration:

Image of "proxy":
kong-docker-kong-enterprise-k8s.bintray.io/kong-enterprise-k8s:1.4.2.0-alpine
Image of "ingress-controller":
kong-docker-kubernetes-ingress-controller.bintray.io/kong-ingress-controller:0.8.0
Image of "wait-for-migrations":
kong-docker-kong-enterprise-k8s.bintray.io/kong-enterprise-k8s:1.4.2.0-alpine

@Spazzy757
Copy link

Hi, any news on the above? Would this be target in 0.8.1?

@hbagdi
Copy link
Member

hbagdi commented Apr 2, 2020

This seemingly simple bug is a little challenging to fix because of the way the underlying libraries are designed.
I've written a few fixes but all fail with one or the other corner cases. I'm working on another one.

Meanwhile, there are two workarounds:

  1. Use Kong in DB-less mode. This bug is limited to the DB mode only.
  2. Define all hosts associated with the same certificate in a single Ingress resource or better, use wildcard host for wildcard certificates rather than duplicating the same certificate in different namespaces.

Stay tuned for a forthcoming patch to fix this issue.

@hbagdi
Copy link
Member

hbagdi commented Apr 2, 2020

Hello everyone,

Can someone in this thread give one patch a try and share some feedback if it fixes the problem or not?
The patch is a simple one: d1dbfe2. I've build a docker image with this patch on top of 0.8.0 and is available here: hbagdi/test:sni-binding-1.

To test the patch in your dev cluster, please do a fresh install of Kong with the above docker image. If you do an in-place replace of the docker image, it will not fix the problem.

Once you have the controller setup, please create duplicate secrets with the same TLS cert and associate different hostnames to the same cert using Ingress resources.

@Ehekatl
Copy link

Ehekatl commented Apr 2, 2020

@hbagdi Hi thanks for the patch.
But we running in db-less mode, and we've ensure there is only one to one map between Ingress, SNI and certificate.

As I mentioned earlier, the problem be reproduced by keeping make update to ingress.

@elkh510
Copy link

elkh510 commented Apr 2, 2020

hi @hbagdi
we tested (fresh install, not update) image "hbagdi/test:sni-binding-1" with configuration below, but still get error

Image of "proxy":
kong-docker-kong-enterprise-k8s.bintray.io/kong-enterprise-k8s:1.4.2.0-alpine
Image of "ingress-controller":
hbagdi/test:sni-binding-1
Image of "wait-for-migrations":
kong-docker-kong-enterprise-k8s.bintray.io/kong-enterprise-k8s:1.4.2.0-alpine

kong log:

I0402 12:14:54.348046       1 kong.go:66] successfully synced configuration to Kong
E0402 12:15:27.732512       1 controller.go:124] unexpected failure updating Kong configuration: 
1 errors occurred:
	while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
W0402 12:15:27.732556       1 queue.go:112] requeuing monitoring/kubecost-cost-analyzer, err 1 errors occurred:
	while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
I0402 12:15:31.236208       1 kong.go:66] successfully synced configuration to Kong
I0402 12:15:40.532099       1 status.go:365] updating Ingress monitoring/kubecost-cost-analyzer status to [{52.169.160.213 }]
I0402 12:15:40.965537       1 kong.go:66] successfully synced configuration to Kong
I0402 12:17:40.774244       1 kong.go:66] successfully synced configuration to Kong
I0402 12:17:43.683670       1 kong.go:57] no configuration change, skipping sync to Kong
E0402 12:27:40.647382       1 controller.go:124] unexpected failure updating Kong configuration: 
1 errors occurred:
	while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
W0402 12:27:40.647415       1 queue.go:112] requeuing test-wildcard-3/auth, err 1 errors occurred:
	while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
I0402 12:27:43.713515       1 kong.go:57] no configuration change, skipping sync to Kong
E0402 12:37:40.617480       1 controller.go:124] unexpected failure updating Kong configuration: 
1 errors occurred:
	while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
W0402 12:37:40.617521       1 queue.go:112] requeuing test-wildcard-1/notifications-apiv1-rewrite, err 1 errors occurred:
	while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
I0402 12:37:43.683400       1 kong.go:57] no configuration change, skipping sync to Kong
I0402 12:47:40.352156       1 kong.go:57] no configuration change, skipping sync to Kong
I0402 12:47:43.680721       1 kong.go:57] no configuration change, skipping sync to Kong
E0402 12:47:47.274083       1 controller.go:124] unexpected failure updating Kong configuration: 

@hbagdi
Copy link
Member

hbagdi commented Apr 2, 2020

But we running in db-less mode, and we've ensure there is only one to one map between Ingress, SNI and certificate.

Are you sure you are running into this bug in DB-less mode? This code-path is not invoked in DB-less mode and it not possible to run into this.

@elkh510 Can you please share reproducible steps for your case?

@elkh510
Copy link

elkh510 commented Apr 3, 2020

hi @hbagdi

@elkh510 Can you please share reproducible steps for your case?

We have 2 namespace:

  1. namespace test-wildcard-1 with ingress that directed to test-wildcard-1.example.com
  2. namespace test-wildcard-2 with ingress that directed to test-wildcard-2.example.com

We also have the wildcard certificate for * .example.com (the certificate is kept in kubernetes secrets with name "wildcard-certificate-for-example-com" in each namespace).

We configure ingresses in the namespaces as follows:

Setup HTTPS redirect (same for two namespaces)

kind: KongIngress
metadata:
    name: https-only
route:
  protocols:
  - https
  https_redirect_status_code: 302

Setup ingress is directed to test-wildcard-1.example.com:

kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: test-wildcard-1
  annotations:
    kubernetes.io/ingress.class: kong
    configuration.konghq.com: https-only
spec:
  tls:
    - hosts:
        - test-wildcard-1.example.com
      secretName: wildcard-certificate-for-example-com
  rules:
    - host: test-wildcard-1.example.com
      http:
        paths:
          - path: /
            backend:
              serviceName: test-wildcard-1
              servicePort: 80

Setup ingress is directed to test-wildcard-2.example.com:

kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: test-wildcard-2
  annotations:
    kubernetes.io/ingress.class: kong
    configuration.konghq.com: https-only
spec:
  tls:
    - hosts:
        - test-wildcard-2.example.com
      secretName: wildcard-certificate-for-example-com
  rules:
    - host: test-wildcard-2.example.com
      http:
        paths:
          - path: /
            backend:
              serviceName: test-wildcard-2
              servicePort: 80

After in kong log:

I0402 12:14:54.348046       1 kong.go:66] successfully synced configuration to Kong
E0402 12:15:27.732512       1 controller.go:124] unexpected failure updating Kong configuration: 
1 errors occurred:
  while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
W0402 12:15:27.732556       1 queue.go:112] requeuing monitoring/kubecost-cost-analyzer, err 1 errors occurred:
  while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
I0402 12:15:31.236208       1 kong.go:66] successfully synced configuration to Kong
I0402 12:15:40.532099       1 status.go:365] updating Ingress monitoring/kubecost-cost-analyzer status to [{52.169.160.213 }]
I0402 12:15:40.965537       1 kong.go:66] successfully synced configuration to Kong
I0402 12:17:40.774244       1 kong.go:66] successfully synced configuration to Kong
I0402 12:17:43.683670       1 kong.go:57] no configuration change, skipping sync to Kong
E0402 12:27:40.647382       1 controller.go:124] unexpected failure updating Kong configuration: 
1 errors occurred:
  while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
W0402 12:27:40.647415       1 queue.go:112] requeuing test-wildcard-3/auth, err 1 errors occurred:
  while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
I0402 12:27:43.713515       1 kong.go:57] no configuration change, skipping sync to Kong
E0402 12:37:40.617480       1 controller.go:124] unexpected failure updating Kong configuration: 
1 errors occurred:
  while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
W0402 12:37:40.617521       1 queue.go:112] requeuing test-wildcard-1/notifications-apiv1-rewrite, err 1 errors occurred:
  while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}
I0402 12:37:43.683400       1 kong.go:57] no configuration change, skipping sync to Kong
I0402 12:47:40.352156       1 kong.go:57] no configuration change, skipping sync to Kong
I0402 12:47:43.680721       1 kong.go:57] no configuration change, skipping sync to Kong
E0402 12:47:47.274083       1 controller.go:124] unexpected failure updating Kong configuration: 

@Ehekatl
Copy link

Ehekatl commented Apr 3, 2020

@hbagdi we've multiple cluster running with db-less mode.
it's VERY EASY to reproduce the problem.

when there is no configuration sync, everything goes fine.
the problem happen when cert-bot trigger ingress update, or endpoints change trigger updates.
I can also manually reproduce it by keep apply and delete ANY ingress resource. (sometimes 3-5 times can trigger the problem, sometimes need 10 - 20 times)

when shit happens, a simple patch can fix it by configuration resync, like below:

kubectl patch -n test ing/test-ingress --type=json -p='[{"op": "replace", "path": "/spec/rules/0/host", "value":"test-ingress-RANDOM.xxx"}]'

@hbagdi
Copy link
Member

hbagdi commented Apr 3, 2020

@hbagdi we've multiple cluster running with db-less mode.
it's VERY EASY to reproduce the problem.

@Ehekatl, thanks for being patient. The problem I'm having is reproducing this bug in DB-less mode.

and the problem still persists.
I couldn't possibly understand what logic flaw it is.
it seems trigger by a successful configuration sync, either produce the issue or recover from the issue.
and when it happen, for every two or three requests there will be one that is missing SNI match (fallback to default localhost cert)

From this comment of yours, it seems that you are seeing a different bug here.

A question for you, are you seeing an error log line similar to the following in your logs?

while processing event: {Create} failed: 400 Bad Request {"message":"schema violation (snis: test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e')","name":"schema violation","fields":{"snis":"test-wildcard-1.example.com already associated with existing certificate 'e7a501cc-c26f-482b-a99a-81d5312ab59e'"},"code":2}

@hbagdi
Copy link
Member

hbagdi commented Apr 7, 2020

@elkh510 Do you mind test another patch: b2d454a?

I've built a Docker image for you as well: hbagdi/test:sni-binding-2.

@elkh510
Copy link

elkh510 commented Apr 7, 2020

hi @hbagdi

@elkh510 Do you mind test another patch: b2d454a?

i have setuped two Ingres with the wildcard certificate, I do not see errors in kongo logs.

@Ehekatl
Copy link

Ehekatl commented Apr 7, 2020

@hbagdi we don't have any error, even with debug log

@hbagdi
Copy link
Member

hbagdi commented Apr 7, 2020

Good to hear.
I'll make the patch production ready. This will be included in 0.8.1. Expect it to be out in about 2 weeks.

@hbagdi hbagdi added the work in progress Work In Progress label Apr 7, 2020
hbagdi added a commit that referenced this issue Apr 13, 2020
Previously, SNIs were treated as a property on Certificate resource in
Kong. This breaks the synchronization loop when an SNI to Certificate
association changes.

The version of decK has been bumped up to v1.1.0, which treats SNIs as a
separate entity.

Fix #524
@hbagdi hbagdi closed this as completed in 9e8e502 Apr 14, 2020
@Ehekatl
Copy link

Ehekatl commented Apr 15, 2020

@hbagdi after running 2.0.3 with your fix
I’m still able to reproduce the problem, seems less likely to happen though.

@Ehekatl
Copy link

Ehekatl commented Apr 15, 2020

TLS may break after certbot renew certificate by a small chance (running 5 kong pods, update 10 times, may trigger this issue once on a single pod)

the tricky part is, this may break related TLS certificate, we have SNI cert not managed by certbot, still breaks serveral times a day before the fix.

Screen Shot 2020-04-15 at 1 20 01 PM

@hbagdi
Copy link
Member

hbagdi commented Apr 15, 2020

Please wait for 0.8.1 (coming out later this week) and test it using that.
If you still run into issues, please open a new Github issue.

@Ehekatl
Copy link

Ehekatl commented Apr 15, 2020

@hbagdi but I'm running the lastet master...will there be new fix add this week?
I really want sort this out, we've been struggling with it ever since switch to db-less mode.

@hbagdi
Copy link
Member

hbagdi commented Apr 15, 2020

There is no new fix. What you are seeing is a different issue and this thread is not about that issue.
Please open a new Github issue for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working work in progress Work In Progress
Projects
None yet
Development

No branches or pull requests

8 participants