Self check always fail #863

slavoren · 2018-08-29T15:10:08Z

Describe the bug:
Unable to pass "self check" when Ingress Service is using NodePort and public IP is on HA proxy (tcp mode) outside the Kubernetes cluster. We can simulate the test from cert-manger container (kubectl exec) using curl (fetching /.well-known/...), which is successful. The same applies from outside the cluster.

Logs:

helpers.go:188 Found status change for Certificate "myip-secret" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-29 14:36:25.387757463 +0000 UTC m=+2049.620517469
sync.go:244 Error preparing issuer for certificate pwe/pwe-secret: http-01 self check failed for domain "www.example.com"
controller.go:190 certificates controller: Re-queuing item "default/myip-secret" due to error processing: http-01 self check failed for domain "www.example.com"

We replaced real domain name in this bug report for www.example.com

The cert-manager is working only when public IP is on Kubernetes cluster and Ingress Service is using LoadBalancer method.

Expected behaviour:
self check to pass with NodePort on Ingress Service

Steps to reproduce the bug:

cat <<EOF > /root/nginx-ingress.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress
  namespace: nginx-ingress
spec:
  externalTrafficPolicy: Local
  type: NodePort
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
    name: http
    nodePort: 31080
  - port: 443
    targetPort: 443
    protocol: TCP
    name: https
    nodePort: 31443
  selector:
    app: nginx-ingress
EOF


cat <<EOF > /root/letsencrypt-staging.yml
---
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
  # Adjust the name here accordingly
  name: letsencrypt-staging
spec:
  acme:
    # The ACME server URL
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: name@example.com
    # Name of a secret used to store the ACME account private key from step 3
    privateKeySecretRef:
      name: letsencrypt-staging-private-key
    # Enable the HTTP-01 challenge provider
    http01: {}
EOF

cat <<EOF > /root/myip-ingress.yml
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: myip-ingress
  annotations:
    kubernetes.io/tls-acme: "true"
    kubernetes.io/ingress.class: "nginx"
    certmanager.k8s.io/cluster-issuer: letsencrypt-staging
spec:
  tls:
  - hosts:
    - www.example.com
    secretName: myip-secret
  rules:
  - host: www.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: myip-svc
          servicePort: 80
EOF

# Nginx ingress
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/ns-and-sa.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/default-server-secret.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/common/nginx-config.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/rbac/rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/kubernetes-ingress/master/install/daemon-set/nginx-ingress.yaml
kubectl create -f /root/nginx-ingress.yaml

# CertManager
kubectl create -f https://raw.githubusercontent.com/jetstack/cert-manager/master/contrib/manifests/cert-manager/with-rbac.yaml
kubectl create -f /root/letsencrypt-staging.yml

# MyApp
kubectl run myip --image=cloudnativelabs/whats-my-ip --replicas=1 --port=8080
kubectl expose deployment myip-svc --port=8080 --target-port=8080
kubectl create -f /root/myip-ingress.yml
openssl req -x509 -nodes -days 3650 -newkey rsa:2048 -keyout /root/tls.key -out /root/tls.crt -subj "/CN=www.example.com"
kubectl create secret tls myip-secret --key /root/tls.key --cert /root/tls.crt

Anything else we need to know?:
It is not clear to us, what exactly the self check is expecting to find, because the fetch of /well-known key is successful (confirmed via wireshark), but the self check is running again and again and still failing. Some more details about the reason of fail would be great.

Wireshark captured data - request from Cluster Node to HA proxy:

GET /.well-known/acme-challenge/B2tNUfzfPgK_VOF7AAQEktKaikWxwBQlD0uL77d0N8k HTTP/1.1
Host: pwe.kube.freebox.cz
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip

HTTP/1.1 200 OK
Server: nginx/1.15.2
Date: Wed, 29 Aug 2018 14:42:26 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 87
Connection: keep-alive

B2tNUfzfPgK_VOF7AAQEktKaikWxwBQlD0uL77d0N8k.6RElade5K0jHqS1ysziuv2Gm3_LgD-D9APNRg5k8sak

Environment details::

Kubernetes version v1.11.2
cert-manager version (v0.4.1)
nginx-ingress (v1.15.2)
Install method (primary via kubectl, but we also tried helm [following this guide - https://dzone.com/articles/secure-your-kubernetes-services-using-cert-manager] with the same result):

/kind bug

The text was updated successfully, but these errors were encountered:

AdrianRibao · 2018-08-30T07:25:25Z

It happens the same to me. I'm setting a HA cluster and this is blocking us from moving the apps.

We have used the helm package for installing it. Is there any workaround that could help us to continue deploying our infrastructure?

AdrianRibao · 2018-08-30T12:47:36Z

Fixed in my case. The problem was that the nginx configuration in the load balancer was redirecting connections to port 80 to 443.

julienfig · 2018-09-05T10:13:38Z

the same here.
i got a ha cluster with nginx reverse proxy (pointing dns entry on it) and i redirect http/https port on public ips of the kubernetes nodes

Then i have my kubernetes cluster with ingress-nginx controller configured like this:

apiVersion: v1
kind: Service
metadata:
name: ingress-nginx
namespace: ingress-nginx
spec:
type: NodePort
ports:

name: http
port: 80
targetPort: 80
protocol: TCP
name: https
port: 443
targetPort: 443
protocol: TCP
externalIPs:
- public-IP-node1
- public-IP-node2
- public-IP-node3
  selector:
  app: ingress-nginx

This way when i use cert-manager to ger my cert, i have always a self check error (by the way, all acme challenge are checked if i do it manually inside and outside the cluster.

if i change my dns entry for one of the kubernetes nodes public ip, all is good and the certificate is issuing (but this is a big SPOF if the node where is the dns entry is going down)

retest-bot · 2018-12-05T12:03:54Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

raphaelpereira · 2018-12-30T20:15:53Z

The same happens here, but using DNAT on public IP to internal MetalLB load balance configuration.

raphaelpereira · 2018-12-31T13:07:01Z

I found out that the problem was that the cluster wasn't able to resolve the DNS. I solved that and it worked.

ptjhuang · 2019-01-04T11:57:34Z

Solve this myself too after a long time of messing about. Self-check is kinda tricky on your network confirmation. The certificate-mgr resolver tries to connect to itself to verify LetsEncrypt can access data at .well-known/acme-challenge/. This is often deceptively complicated in many networks. It requires the resolver being able to connect to itself using what would often resolve to a public IP address. Do a wget/curl to the .well-known/acme-challenge to see if it succeeds from the resolver container. In my case, I had to setup hairpin NAT at the router.

Is it a good idea to optionally skip self-check?

munnerz · 2019-01-10T23:00:20Z

I'm going to close this issue out as it seems to be more related to network configuration than anything else. Let's Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges, and exposing your ingress controller to the public internet (either via a LoadBalancer service or a NodePort) is outside the scope of cert-manager itself. We just need port 80 to work 😄

ptjhuang · 2019-01-22T11:48:03Z

Port 80 isn't the issue, that's a given. The IP address is though. All installations behind NAT is likely going to fail without hairpin config. If not allow self-check be disabled, maybe mention it in docs?

intellix · 2019-04-20T05:01:40Z

Let's Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges

I guess this means "Cloudflare Always Use HTTPS" was causing this for me. Perhaps something about requiring port 80 and HTTP access to the domain here would be good: https://docs.cert-manager.io/en/latest/getting-started/troubleshooting.html

ghost · 2019-05-01T19:20:08Z

Same issue here. I would like to disable self-check or provide the ip address of the loadbalancer because of hairpinning

MichaelOrtho · 2019-12-17T01:38:40Z

The problem is in Kubernetes networking if you use LoadBalancer that is provided by the hosting. I use DigitalOcean. Kubernetes is not routing network through LB public interface so there is no adding PROXY protocol header or SSL if you are setting it outside Kubernetes. I use PROXY protocol and the moment when I enable it and update Nginx to handle it everything works but cert-manager fails as it is trying to connect to public domain name and that fails. It works from my computer as I am outside and LB is adding needed headers, but not from within the cluster.

Cert-manager is not guilty for this, but if we can add some switches where we can instruct validator to add PROXY protocol or to disable validation for that domain it would help a lot.

For curl if I do (from inside the cluster):

curl -I https://myhost.domain.com

it fails.

If I do (from inside the cluster):

curl -I https://myhost.domain.com --haproxy-protocol

it works.

MichaelOrtho · 2019-12-18T14:48:46Z

The problem is in Kubernetes networking if you use LoadBalancer that is provided by the hosting. I use DigitalOcean. Kubernetes is not routing network through LB public interface so there is no adding PROXY protocol header or SSL if you are setting it outside Kubernetes. I use PROXY protocol and the moment when I enable it and update Nginx to handle it everything works but cert-manager fails as it is trying to connect to public domain name and that fails. It works from my computer as I am outside and LB is adding needed headers, but not from within the cluster.

Cert-manager is not guilty for this, but if we can add some switches where we can instruct validator to add PROXY protocol or to disable validation for that domain it would help a lot.

For curl if I do (from inside the cluster):
curl -I https://myhost.domain.com
it fails.

If I do (from inside the cluster):
curl -I https://myhost.domain.com --haproxy-protocol
it works.

I was informed by DigitalOcean team that there is a fix for this behavior. They added an additional annotation to nxinx-ingress controller service that forces Kubernetes to use domain name of public IP instead of IP and that tricks Kubernetes to think that it is not "ours" and routes network around through LB.

https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/master/docs/controllers/services/examples/README.md#accessing-pods-over-a-managed-load-balancer-from-inside-the-cluster
This is it: (I just added this one)

kind: Service
apiVersion: v1
metadata: 
  name: nginx-ingress-controller
  annotations: 
    service.beta.kubernetes.io/do-loadbalancer-hostname: "hello.example.com"

vitobotta · 2020-03-29T10:26:38Z

@MichaelOrtho Hi, do you know if a similar workaround exists for Scaleway? I am testing their managed Kubernetes and am having the same problem. Thanks

AlexsJones · 2020-04-16T12:09:48Z

@vitobotta I have found on Scaleway you need to restart coredns and it will usually succeed.

vitobotta · 2020-04-16T15:20:16Z

@AlexsJones Not for me. I had to add the annotation below

"service.beta.kubernetes.io/scw-loadbalancer-use-hostname": "true"

btwiuse · 2020-05-29T08:16:14Z

...
apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress
  namespace: nginx-ingress
spec:
  externalTrafficPolicy: Local
  type: NodePort
...

After changing externalTrafficPolicy: Local to externalTrafficPolicy: Cluster, I was able to perform self check.

Reason being, pod with the certificate-issuer wound up on a different node than the load balancer did, so it couldn’t talk to itself through the ingress.

compumike · 2020-11-08T04:22:02Z

Hi all, I ran into the same issue. I've recently published hairpin-proxy which works around the issue, specifically for cert-manager self-checks. https://github.com/compumike/hairpin-proxy

It uses CoreDNS rewriting to intercept traffic that would be heading toward the external load balancer. It then adds a PROXY line to requests originating from within the cluster. This allows cert-manager's self-check to pass.

shibumi · 2021-01-04T10:49:32Z

@munnerz I think you misunderstood the problem here. You wrote:

I'm going to close this issue out as it seems to be more related to network configuration than anything else. Let's Encrypt needs to be able to access your Ingress controller on port 80 in order to validate challenges, and exposing your ingress controller to the public internet (either via a LoadBalancer service or a NodePort) is outside the scope of cert-manager itself. We just need port 80 to work smile

The problem is not that Let's Encrypt can't reach the LoadBalancer... the problem is that certificate manager self-check can't reach it. The connection from LE to the LoadBalancer is fine, due to Destination NAT. The certificate manager inside the cluster how ever tries to resolve the domain name with the external IP and this will fail in DNAT scenarios.

@munnerz there is already a whole project just for fixing this issue. Is there really no option to just disable self-checks?

shibumi · 2021-01-04T15:13:39Z

Here is another possible solution:

You can use coredns for broadcasting wrong DNS records. Just create host aliases for the domains and link them to the internal cluster IPs. Then propagate these host/IP tuples via:

hosts {
    fallthrough
}

in your coredns config. This way you can use the internal IP addresses inside of your cluster. You just have to maintain another list (or you might just automate this via a custom operator or script).

trantor1 · 2021-01-15T12:05:16Z

In DNAT Scenarios just set externalIP of a ingress Service to your external IP Addresses.

apiVersion: v1
kind: Service
metadata:
  name: nginx-ingress-ext
  namespace: nginx-ingress
spec:
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
    name: http
  - port: 443
    targetPort: 443
    protocol: TCP
    name: https
  selector:
    app: nginx-ingress-ext
  externalIPs:
    - 11.22.33.44

kubernetes, configured with iptables, mostly standard setup,
creates iptables rules to redirect cluster internal requests to external ip's to apropriate services.

$ sudo iptables-save  | grep 11.22.33.44
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-VMPDTJD5TKOUD6KL
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:http external IP" -m tcp --dport 80 -m addrtype --dst-type LOCAL -j KUBE-SVC-VMPDTJD5TKOUD6KL
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -m physdev ! --physdev-is-in -m addrtype ! --src-type LOCAL -j KUBE-SVC-SUC36V4R4VKNMIWK
-A KUBE-SERVICES -d 11.22.33.44/32 -p tcp -m comment --comment "nginx-ingress/nginx-ingress-ext:https external IP" -m tcp --dport 443 -m addrtype --dst-type LOCAL -j KUBE-SVC-SUC36V4R4VKNMIWK

jeffmccune · 2021-05-21T22:49:28Z

Switching from ipvs to iptables mode solved this for me, see also kubernetes/kubernetes#75262

exenin · 2021-09-18T09:43:56Z

I was stuck with this selfcheck issue for the longest time.

Issue was:
My kube is a VM running RKE on openstack. My internet gateway / router forwards all 443 & 80 port traffic to the cluster.
I was able to curl the challenege and get response from my laptop.
My issue seemed to be that i could not curl the challenge from within a pod...

I just came back to say thank you to @ptjhuang for the solution on setting hairpin on the gateway/router. I wanted to just let people know that this worked for me since I felt lost for the longest time. Hope this inspires others to try this solution

stigok · 2022-01-23T22:56:32Z

As @vitobotta points out, but with lack of context, for cert-manager running in a Scaleway Kubernetes cluster

"service.beta.kubernetes.io/scw-loadbalancer-use-hostname": "true"

This annotations should be applied to the LoadBalancer service created by ingress-nginx.

service.beta.kubernetes.io/scw-loadbalancer-use-hostname
This is the annotation that forces the use of the LB hostname instead of the public IP. This is useful when it is needed to not bypass the LoadBalacer for traffic coming from the cluster.

If you're configuring ingress-nginx with Helm, you can set the value controller.service.annotations.\"service\\.beta\\.kubernetes\\.io/scw-loadbalancer-use-hostname\" to "true"

jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 29, 2018

slavoren changed the title ~~Issue certificate with~~ Self check always fail Sep 6, 2018

jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 5, 2018

munnerz closed this as completed Jan 10, 2019

xmassx mentioned this issue Jan 31, 2019

Allowing skipping HTTP01 and DNS01 self-check on a per-solver basis #1292

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self check always fail #863

Self check always fail #863

slavoren commented Aug 29, 2018

AdrianRibao commented Aug 30, 2018

AdrianRibao commented Aug 30, 2018

julienfig commented Sep 5, 2018

retest-bot commented Dec 5, 2018

raphaelpereira commented Dec 30, 2018

raphaelpereira commented Dec 31, 2018

ptjhuang commented Jan 4, 2019

munnerz commented Jan 10, 2019

ptjhuang commented Jan 22, 2019

intellix commented Apr 20, 2019 •

edited

Loading

ghost commented May 1, 2019

MichaelOrtho commented Dec 17, 2019

MichaelOrtho commented Dec 18, 2019

vitobotta commented Mar 29, 2020

AlexsJones commented Apr 16, 2020

vitobotta commented Apr 16, 2020

btwiuse commented May 29, 2020

compumike commented Nov 8, 2020

shibumi commented Jan 4, 2021 •

edited

Loading

shibumi commented Jan 4, 2021

trantor1 commented Jan 15, 2021

jeffmccune commented May 21, 2021 •

edited

Loading

exenin commented Sep 18, 2021

stigok commented Jan 23, 2022 •

edited

Loading

Self check always fail #863

Self check always fail #863

Comments

slavoren commented Aug 29, 2018

AdrianRibao commented Aug 30, 2018

AdrianRibao commented Aug 30, 2018

julienfig commented Sep 5, 2018

retest-bot commented Dec 5, 2018

raphaelpereira commented Dec 30, 2018

raphaelpereira commented Dec 31, 2018

ptjhuang commented Jan 4, 2019

munnerz commented Jan 10, 2019

ptjhuang commented Jan 22, 2019

intellix commented Apr 20, 2019 • edited Loading

ghost commented May 1, 2019

MichaelOrtho commented Dec 17, 2019

MichaelOrtho commented Dec 18, 2019

vitobotta commented Mar 29, 2020

AlexsJones commented Apr 16, 2020

vitobotta commented Apr 16, 2020

btwiuse commented May 29, 2020

compumike commented Nov 8, 2020

shibumi commented Jan 4, 2021 • edited Loading

shibumi commented Jan 4, 2021

trantor1 commented Jan 15, 2021

jeffmccune commented May 21, 2021 • edited Loading

exenin commented Sep 18, 2021

stigok commented Jan 23, 2022 • edited Loading

intellix commented Apr 20, 2019 •

edited

Loading

shibumi commented Jan 4, 2021 •

edited

Loading

jeffmccune commented May 21, 2021 •

edited

Loading

stigok commented Jan 23, 2022 •

edited

Loading