Random download failures - 403 errors [hetzner] #138

marblerun · 2023-01-12T14:56:54Z

Hi,

Attepting to build a 3 node kubernetes cluster, using kubespray (latest) on hetzner cloud instances running Debian 11.

First attempt failed due to download failure for kubeadm on 1 of the 3 instances. Confirmed using local download, 1 fail, 2 sucess.
Swapped in a replacement instance, and moved past this point, assumed possible ip blacklisting, though not confirmed.

All 3 instances then downloaded 4 calico networking containers, and came to the pause 3.7 download, using a command like this.

root@kube-3:# /usr/local/bin/nerdctl -n k8s.io pull --quiet registry.k8s.io/pause:3.7
root@kube-3:# nerdctl images
REPOSITORY TAG IMAGE ID CREATED PLATFORM SIZE BLOB SIZE
registry.k8s.io/pause 3.7 bb6ed397957e 4 seconds ago linux/amd64 700.0 KiB 304.0 KiB
bb6ed397957e 4 seconds ago linux/amd64 700.0 KiB 304.0 KiB

on the failing instance, we see the following error if applied by hand, using kubespray it tries 4 times and then fails the whole install at that point.

root@kube-2:~# /usr/local/bin/nerdctl -n k8s.io pull --quiet registry.k8s.io/pause:3.7
FATA[0000] failed to resolve reference "registry.k8s.io/pause:3.7": unexpected status from HEAD request to https://registry.k8s.io/v2/pause/manifests/3.7: 403 Forbidden

Do you have any idea why the download from this registry might be failing, and is there any alternative source I could try ?

The ip address starts and ends as shown below, and was run a couple of minutes ago

Thu 12 Jan 2023 02:52:21 PM UTC

65.x.x.244

Many thanks

Mike

BenTheElder · 2023-01-12T20:15:08Z

That endpoint works fine from here

$ curl -IL https://registry.k8s.io/v2/pause/manifests/3.7
HTTP/2 307 
content-type: text/html; charset=utf-8
location: https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.7
x-cloud-trace-context: 9e8f3405a102bf4332d81593461d200a
date: Thu, 12 Jan 2023 20:13:55 GMT
server: Google Frontend
via: 1.1 google
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000

HTTP/2 200 
content-length: 2761
content-type: application/vnd.docker.distribution.manifest.list.v2+json
docker-content-digest: sha256:bb6ed397957e9ca7c65ada0db5c5d1c707c9c8afc80a94acbe69f3ae76988f0c
docker-distribution-api-version: registry/2.0
date: Thu, 12 Jan 2023 20:13:55 GMT
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

Is there a proxy involved?

Can nerdctl produce more verbose results? That path should have served a redirect to some other backend.

BenTheElder · 2023-01-12T20:19:24Z

We don't even have code to serve 403 in the registry.k8s.io application, so that would be coming from the backing store we redirect to, but from the logs above we can't see that part.

marblerun · 2023-01-13T16:46:39Z

Thanks Ben,

As a temp fix, I've looked at the kubespray logs, downloaded the missing elements on a working instance, then exported them to a local file, copied over and imported them back into the instance that is being blocked. I now have a working cluster, but it is concerning that access seems to being blocked in an arbitary fashion. Have a good weekend

Mike

tcahill · 2023-01-18T07:23:24Z

I'm seeing the same behavior in a similar context. I'm trying to install the kube-prometheus-stack helm chart on a k3s cluster in Hetzner Cloud (hosted in their Oregon location) and getting a 403 when pulling registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.7.0. Interestingly I'm only seeing this behavior on one of the three hosts comprising my cluster, which are all running Ubuntu 22. It's also not consistent on the problematic host - I occasionally get a successful response but primarily see 403s.

We don't even have code to serve 403 in the registry.k8s.io application

For me the 403 is appearing without following the redirect:

curl -v https://registry.k8s.io/v2/pause/manifests/3.7
*   Trying 34.107.244.51:443...
* Connected to registry.k8s.io (34.107.244.51) port 443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
* [CONN-0-0][CF-SSL] TLSv1.0 (OUT), TLS header, Certificate Status (22):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Client hello (1):
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Certificate Status (22):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Server hello (2):
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Certificate (11):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, CERT verify (15):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Finished (20):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=registry.k8s.io
*  start date: Dec 31 01:52:06 2022 GMT
*  expire date: Mar 31 02:44:39 2023 GMT
*  subjectAltName: host "registry.k8s.io" matched cert's "registry.k8s.io"
*  issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1D4
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* h2h3 [:method: GET]
* h2h3 [:path: /v2/pause/manifests/3.7]
* h2h3 [:scheme: https]
* h2h3 [:authority: registry.k8s.io]
* h2h3 [user-agent: curl/7.87.0]
* h2h3 [accept: */*]
* Using Stream ID: 1 (easy handle 0x7f356982fa90)
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
> GET /v2/pause/manifests/3.7 HTTP/2
> Host: registry.k8s.io
> user-agent: curl/7.87.0
> accept: */*
> 
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
< HTTP/2 403 
< content-type: text/html; charset=UTF-8
< referrer-policy: no-referrer
< content-length: 317
< alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
< 
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):

<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>403 Forbidden</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Forbidden</h1>
<h2>Your client does not have permission to get URL <code>/v2/pause/manifests/3.7</code> from this server.</h2>
<h2></h2>
</body></html>
* [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23):
* [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Connection #0 to host registry.k8s.io left intact

BenTheElder · 2023-01-18T19:57:00Z

Thanks for the additional logs.

cc @ameukam maybe cloud armor? I forgot about that dimension in the actual deployment.

This definitely looks like it's coming from the infra in front of the app, we also don't serve HTML, only redirects (or simple API errors).

BenTheElder · 2023-01-20T00:09:03Z

@ameukam and I discussed this yesterday.

This appears to be coming from the cloud loadbalancer security policy (we're using cloud armor, configured here:
https://github.com/kubernetes/k8s.io/blob/f858f4680ada6385eaa4c76b2a295e33ec0ed51c/infra/gcp/terraform/k8s-infra-oci-proxy-prod/network.tf#L112

I don't think we're doing anything special here, best guess is hetzner IPs have been flagged for abuse?

I actually can't seem to find these particular requests in the loadbalancer logs, otherwise we could see what preconfigured rule this is hitting.

BenTheElder · 2023-01-20T00:10:41Z

I can see other 403s served by the security policy for more obviously problematic incoming requests like https://registry.k8s.io/?../../../../../../../../../../../etc/profile

mysticaltech · 2023-01-26T01:12:46Z

Folks, I can confirm this issue shows randomly when pulling CSI images. It seems that some IPs are blacklisted or something!

This has been a huge issue this last month for us! It started in late December.

mysticaltech · 2023-01-26T01:23:16Z

@ameukam and I discussed this yesterday.

This appears to be coming from the cloud loadbalancer security policy (we're using cloud armor, configured here: https://github.com/kubernetes/k8s.io/blob/f858f4680ada6385eaa4c76b2a295e33ec0ed51c/infra/gcp/terraform/k8s-infra-oci-proxy-prod/network.tf#L112

I don't think we're doing anything special here, best guess is hetzner IPs have been flagged for abuse?

I actually can't seem to find these particular requests in the loadbalancer logs, otherwise we could see what preconfigured rule this is hitting.

That would make absolute sense! Somehow, some Hetzner IPs seem to be blacklisted. For our Kube-Hetzner project, it's been a real pain. Please fix 🙏

kube-hetzner/terraform-hcloud-kube-hetzner#524
kube-hetzner/terraform-hcloud-kube-hetzner#451
kube-hetzner/terraform-hcloud-kube-hetzner#442

dims · 2023-01-26T02:01:05Z

@mysticaltech can you please drop a few ip address(es) of boxes that seem to have trouble?

mysticaltech · 2023-01-26T03:59:32Z

@dims Definitely, I can try to get some.

@aleksasiriski Could you fetch some of the 10 IPs that you had reserved as static IPs because they were blocked by registry.k8s.io when used for nodes?

mysticaltech · 2023-01-26T04:16:44Z

@dims I just deployed a test cluster of 10 nodes, and got "lucky" on one of them. The one affected IP is 5.75.240.113.

aleksasiriski · 2023-01-26T11:11:42Z

@dims Definitely, I can try to get some.

@aleksasiriski Could you fetch some of the 10 IPs that you had reserved as static IPs because they were blocked by registry.k8s.io when used for nodes?

I had like 3 IPs that were blacklisted, I'll try to fetch them later today (UTC+1) when I'm home.

dims · 2023-01-26T12:05:39Z

I just deployed a test cluster of 10 nodes, and got "lucky" on one of them. The one affected IP is 5.75.240.113.

Uploading downloaded-logs-20230126-065347.json.txt…

I see 4 hits, all with a valid redirect using http status 307's, no 403's at all

the code it hits is here:
https://cs.k8s.io/?q=StatusTemporaryRedirect&i=nope&files=handlers.go&excludeFiles=&repos=kubernetes/registry.k8s.io

mysticaltech · 2023-01-26T12:37:12Z

@dims Thanks for looking into this. The 403 are most probably appearing closer later down the request chain. As stated by @BenTheElder, it could be your LB security policy (cloud armor) configured here https://github.com/kubernetes/k8s.io/blob/f858f4680ada6385eaa4c76b2a295e33ec0ed51c/infra/gcp/terraform/k8s-infra-oci-proxy-prod/network.tf#L112

mysticaltech · 2023-01-26T12:47:07Z

Also @dims, something interesting, discovered by one our users, is that if they tried to pull the image manually with crictl pull a bunch of times, it would actually work at some point. As if magically whitelisting the node again, it works for pulling other images afterward.

Sometimes it works after 100 tries, sometimes it just does not work. So kind of a hit-or-miss situation! All this to say, there's something up with your LB IMHO.

mysticaltech · 2023-01-26T12:56:45Z

@dims I have created another small test cluster and the IP above 5.75.240.113 has been reused and it does it again. I will leave it on for 24h so that you can have more logs.

pulling from host registry.k8s.io failed with status code [manifests v2.7.0]: 403 Forbidden

mysticaltech · 2023-01-26T13:01:16Z

Now if I ssh into the node and run the crictl pull command, I get the same:

mysticaltech · 2023-01-26T13:21:44Z

@dims Also an interesting finding. If I simply issue curl -v https://registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0 a few times in a row. It randomly returns either a 404 or a 403.

404

403

dims · 2023-01-26T13:35:25Z

@mysticaltech yeah, looks like there is very little tolerance for your range of ips from hetzner

mysticaltech · 2023-01-26T13:43:13Z

Exactly! Which is really a pain when working with Kubernetes. If possible to fix, it would be awesome.

mysticaltech · 2023-01-26T15:53:20Z

@dims Did you do something? Because it started to work.

dims · 2023-01-26T15:56:35Z

@mysticaltech nope. theory is still the same - cloud armor!

mysticaltech · 2023-01-26T16:04:26Z

Oh my! Maybe some kind of form to request whitelisting? That would be kind of good. But not great for autoscaling nodes for instance.

valkenburg-prevue-ch · 2023-01-26T17:07:28Z

Hi, I'm getting an HTTP 404 code on this request:

{"access_time":"26/Jan/2023:17:04:49 +0000","upstream_cache_status":"","method":"GET","uri":"/artifacts-downloads/namespaces/k8s-artifacts-prod/repositories/images/downloads/ALMFTafKuNkWwH8ArOFD4KogY3p5kp9zcsZSbyhKLzMCEPih3pGxlf8hdweputz3nxUZBrevwToc16OLF7zMqHYUiYRUHvlEfEVSsuu2L5J4uzlOgj_1BY7ZHOHwmRLsHwyaJ8TQE8XlkrCSQSak71-6ZVgvBT9nv57reoR-AE6o4ei_iszTDpPq2xtnFA4tZpIL0tBJor_u8ZoD83KGOGN-aAHsqelMjVqLR5fPp3uluRC1I8coYtFZgafJjEKsqrkeVUdt9hQTHpQ-dGdlbIBOVPWaZCl1IeoDzlHcwrybwcYTB8hyYzJ--mHnaZWfOWs8i2p-dFzdPy68CBTaXgW-gDRymEFDCJe_3b8GhvFMnOOo0ldCZEk4K2fJsnTt_gMC2-4y1zr5k_TrUmcrV_nt8bo4tw4cvYCvb9EJn7GQ3LbkY41avfNbipQmoBkR-rZ9lPhySAVcmiharpD7gJYrqvSxSafP_IBJ3Oxkt0_aUY4A9n4qeqtZRZeSE-BoWdGhiagQVnPWDewkpAMY2M9XfotDZhOUIR_kb8nYWzSi4cjECfltywKzgriY2IT0TS1GoHBLwuJPpGrRFR0afzF-BOQTR8SUnb0b70zprBC8lSc4HkzzW_4MiPBbxPGpa6OXiIZbvjO6ORb-YXGXwCSsee4nkheizN1xTof6z_GHPVJFqhNRqNJSaN8Jfm2Dd0w0C6MBrkTP34K2hKnORXI=","request_type":"unknown","status":"404","bytes_sent":"19","upstream_response_time":"0.000, 0.048","host":"registry.k8s.io","proxy_host":"registry.k8s.io","upstream":"[2600:1901:0:1013::]:443, 34.107.244.51:443"}

from ip 78.47.222.2 (same project and provider as @mysticaltech ) . Does posting ip's and info like this, help? Or is there something else I can provide?

BenTheElder · 2023-01-26T18:00:00Z

@valkenburg-prevue-ch /artifacts-downloads/.* is not a valid OCI distribution API path. The 404 is because that URL does not exist.

Also @dims, something interesting, kube-hetzner/terraform-hcloud-kube-hetzner#451 (comment),

This actually points to the issue with hetzner IPs existing with plain GCR.

k8s.gcr.io is a special alias domain provided by GCR but it has the same allow-listing etc as any other gcr.io registry.

Kubernetes doesn't run that infra, just populate the images.

Maybe some kind of form to request whitelisting?

I'm not sure how well this would scale given the relatively volunteer staffing we have for this sort of free image host ...

It seems registry.k8s.io has no regression here vs k8s.gcr.io, though I can't recall ever having seen a similar issue reported to Kubernetes previously.

BenTheElder · 2023-01-26T18:02:25Z

At present time I would recommend mirroring images, which also helps us reduce our massive distribution costs and reallocate resources towards testing etc.

mysticaltech · 2023-01-26T18:16:26Z

@BenTheElder Thanks for clarifying. But Hetzner Cloud is still a major European cloud, not supporting fully it is a shame IMHO, and for a young open-source project like ours, we don't yet have the resource to deploy a full-blown mirror.

However, if we were to do that, how would you recommend we proceed? This is something we obviously thought about, and have considered already both https://docs.k3s.io/installation/private-registry and https://github.com/goharbor/harbor, would you recommend anything else that is an easy fix for that particular issue?

BenTheElder · 2023-01-26T19:20:52Z

curl -v https://registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0 a few times in a row. It randomly returns either a 404 or a 403.

Again, this is not a valid API path. So the 404s are expected, the request is invalid. 403 are seemingly due to the security mechanism(s).

I recommend crane pull --verbose registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0 /dev/null to see what valid request paths look like, or the distribution spec.

Thanks for clarifying. But Hetzner Cloud is still a major European cloud, not supporting fully it is a shame IMHO, and for a young open-source project like ours, we don't yet have the resource to deploy a full-blown mirror.

I hear that, but even as a large open source project we have constrained resources to host things and we're not actively choosing to block these IPs, some security layer on our donated hosting infrastructure is blocking these IPs. At the moment keeping things online and trying to bring our spend back within the budget is a bigger priority than resolving an issue present in the previous infrastructure, and even that is a bit of a stretch. Open source staffing is hard :(

Perhaps you could ask your users to mirror for themselves if they encounter issues like this.

Hetzner might also have thoughts about this issue? It seems in their best interest to avoid what seems to be an IP reputation issue.

Searching online I see similar discussions for Amazon CloudFront and CloudFlare with respect to hetzner IP ban issues.

However, if we were to do that, how would you recommend we proceed? This is something we obviously thought about, and have considered already both https://docs.k3s.io/installation/private-registry and https://github.com/goharbor/harbor, would you recommend anything else that is an easy fix for that particular issue?

Mirroring guides are something I hope to get folks to contribute. Options will depend on the tools involved client-side (like container runtime).

For consuming a mirror, I usually recommend containerd's mirroring config (as dockershim is deprecated), cri-o has something similar I beleive.

For hosting a mirror, I recommend roughly populate images with crane cp upstream mirror, where mirror is any preferred registry host. However there are many other options like harbor, that I've not personally used.

vitobotta · 2024-04-27T10:58:32Z

For anyone still facing this problem I have been able to work around it by deploying peerd in my cluster.

Hi! I installed peerd in k3s (had to build the image with a changed containerd socket path and it's now running) but how to use it? Thanks

rbjorklin · 2024-04-27T18:07:54Z

@vitobotta

Ensure /etc/containerd/config.toml contains:

[plugins."io.containerd.grpc.v1.cri".registry]
   config_path = "/etc/containerd/certs.d"

After that images will automatically be pulled from other nodes in your cluster if they are present.

vitobotta · 2024-04-28T10:11:19Z

@vitobotta

Ensure /etc/containerd/config.toml contains:
[plugins."io.containerd.grpc.v1.cri".registry]
   config_path = "/etc/containerd/certs.d"
After that images will automatically be pulled from other nodes in your cluster if they are present.

Thanks :) In the meantime I ended up using https://github.com/spegel-org/spegel since it doesn't require me to open any ports in the firewall. Peerd does if I am not mistaken, right?

mysticaltech · 2024-05-01T08:23:33Z

@valkenburg-prevue-ch FYI above, the landscape of solutions for this has evolved fast! 🤯

mysticaltech · 2024-05-01T08:26:40Z

Thanks :) In the meantime I ended up using https://github.com/spegel-org/spegel since it doesn't require me to open any ports in the firewall. Peerd does if I am not mistaken, right?

@vitobotta Any tips on the config for k3s (or any other cluster), is it straightforward?

mysticaltech · 2024-05-01T08:28:09Z

@phillebaba Your project is resolving a big need, thank you for that 🙏

valkenburg-prevue-ch · 2024-05-01T08:33:45Z

@valkenburg-prevue-ch FYI above, the landscape of solutions for this has evolved fast! 🤯

Yeah, I've been following this discussion closely! Very interested.

vitobotta · 2024-05-01T08:59:22Z

Thanks :) In the meantime I ended up using https://github.com/spegel-org/spegel since it doesn't require me to open any ports in the firewall. Peerd does if I am not mistaken, right?

@vitobotta Any tips on the config for k3s (or any other cluster), is it straightforward?

Yes, there are some settings that differ on k3s, so here's how I ended up configuring it after some investigation. Hope it can save you and/or others some time:

    helm upgrade --install \
    --version v0.0.22 \
    --create-namespace \
    --namespace spegel \
    --set spegel.containerdSock=/run/k3s/containerd/containerd.sock \
    --set spegel.containerdContentPath=/var/lib/rancher/k3s/agent/containerd/io.containerd.content.v1.content \
    --set spegel.containerdRegistryConfigPath=/var/lib/rancher/k3s/agent/etc/containerd/certs.d \
    --set spegel.logLevel="DEBUG" \
    spegel oci://ghcr.io/spegel-org/helm-charts/spegel

The only problem I have encountered with Spegel is that despite it's a DaemonSet, for some reason only max 100 pods exactly are up and running. With larger clusters (I tried with 400 and 500 nodes clusters) it always maxes at 100 pods and all the others on the other nodes remain in a non running state. I opened an issue about it here spegel-org/spegel#459.

Other than that it seems to work pretty well. Like I mentioned I tested with clusters of up to 500 nodes to increase the likelihood to get some problematic IPs and in fact every time there were many out of that large number of nodes, and thanks to Spegel all pods that require images from problematic registries were started without any issue. And I see a nice boost in speed re: the time it takes for a node to acquire the image from other nodes, so deployments scale more quickly which is awesome.

mysticaltech · 2024-05-01T09:13:38Z

Thanks @vitobotta, appreciate it. I guess 100 IPs should be enough to successfully do the job. FYI, @valkenburg-prevue-ch just found out that Spegel is already integrated within k3s and can be enabled with the --embedded-registry flag. https://docs.k3s.io/installation/registry-mirror 🥳

vitobotta · 2024-05-01T09:16:46Z

Thanks @vitobotta, appreciate it. I guess 100 IPs should be enough to successfully do the job. FYI, @valkenburg-prevue-ch just found out that Spegel is already integrated within k3s and can be enabled with the --embedded-registry flag. https://docs.k3s.io/installation/registry-mirror 🥳

I know and I forgot to mention it. Maybe it's because it's still experimental and perhaps buggy, but I couldn't get the embedded spegel support to work after many attempts. Also seems that with that version you need to open a port in the firewall. Perhaps I will try again when I have some more time.

mysticaltech · 2024-05-01T09:21:59Z

@vitobotta Ah ok, good to know! So maybe for now your helm setup will be best to get the latest and greatest 🙏 @valkenburg-prevue-ch FYI

vitobotta · 2024-05-01T10:36:05Z

@vitobotta Ah ok, good to know! So maybe for now your helm setup will be best to get the latest and greatest 🙏 @valkenburg-prevue-ch FYI

I am trying the embedded spegel now again. Let's see if I can figure it out.

phillebaba · 2024-05-01T12:39:10Z

Just to add to this, if you are running k3s I would suggest using the embedded Spegel. It works just as well without having to deal with daemonsets.

vitobotta · 2024-05-01T13:07:22Z

Just to add to this, if you are running k3s I would suggest using the embedded Spegel. It works just as well without having to deal with daemonsets.

Like I mentioned in a previous comment I couldn't get it to work, so I tried the Helm installation of Spegel and it worked.

I am trying the embedded one now again and still can't get it to work. I have tried also with the port 5001 open for the peer to peer exchange but it's not working. There is nothing even listening on the nodes on that port as if the embedded registry is not configured at all, but I am indeed using the --embedded-registry flag on the servers. Any suggestions?

vitobotta · 2024-05-01T13:09:01Z

I think I know what the problem might be: when I create a cluster in Hetzner without private network, and then I install the Hetzner Cloud Controller Manager, the CCM populates the external IP field of the nodes, leaving the internal ip unset. But the k3s documentation about the embedded registry mirror talks about communication with internal IPs, so perhaps that's why it's not working.

vitobotta · 2024-05-01T13:14:43Z

@phillebaba have you actually gotten the embedded registry mirror working?

vitobotta · 2024-05-01T13:48:11Z

Got it working! My mistake was that I enabled the embedded registry on existing clusters without restarting the agents. When I restart the agents or create a new cluster then all is working. It seems that it requires the port 5001 to be opened in the firewall though... will test this more.

valkenburg-prevue-ch · 2024-05-01T14:04:15Z

Awesome news! Thanks for doing all this and reporting here.

In which firewall do you have to open port 5001 though? The hetzner firewall only applies to public internet, right? Isn't the private network always all open? Or am I missing something in our setup of microos, is there a firewall too?

vitobotta · 2024-05-01T14:33:19Z

Awesome news! Thanks for doing all this and reporting here.

In which firewall do you have to open port 5001 though? The hetzner firewall only applies to public internet, right? Isn't the private network always all open? Or am I missing something in our setup of microos, is there a firewall too?

Yep public firewall. There are no restrictions with private network afaik. The reason my I am testing without private networks is that they support max 100 nodes, so it's impossible to create a large cluster with them. I have tested (using hetzner-k3s, my tool) with clusters of up to 500 nodes using the public network and I could probably scale into the thousands now that I added support for cilium as cni and for external data stores like postgres instead of etcd. I wish I had the money to experiment with more nodes lol.

valkenburg-prevue-ch · 2024-05-01T14:48:21Z

Thanks for clarifying. Do I understand correctly that for up to 100 nodes, one does not need to open anything on the firewall, and that your use-case with everything over public ip's might be "beyond the scope of the default supported setups"?

vitobotta · 2024-05-01T15:00:00Z

Correct. If you use the private network you don't need to open anything in the firewall provided you configure everything to use the private interface.

mysticaltech · 2024-05-03T00:54:32Z

Thanks @vitobotta and @phillebaba for sharing, really appreciate it.

mysticaltech mentioned this issue Jan 26, 2023

Move to registry.k8s.io? Images are stuck on ImagePullBackOff hetznercloud/csi-driver#339

Closed

BenTheElder mentioned this issue May 7, 2024

Disconnected Environments and "mirroring to a location you control" #281

Closed

2 tasks

apricote mentioned this issue Jun 13, 2024

Hetzner Cloud Control manager not connecting with Hetzner hetznercloud/hcloud-cloud-controller-manager#663

Closed

mrclrchtr mentioned this issue Jul 29, 2024

Add docs for 403 forbidden on gcr.io pull hcloud-talos/terraform-hcloud-talos#46

Closed

dims mentioned this issue Oct 13, 2024

Strange behavior when downloading kubeadm from dl.k8s.io kubernetes/k8s.io#7398

Closed

22e88 added a commit to 22e88/cluster-api-mgmt-cluster that referenced this issue Oct 15, 2024

add location to fsn1 because of kubernetes/registry.k8s.io#138

fe9f0a6

22e88 added a commit to 22e88/cluster-api-workload-cluster that referenced this issue Oct 15, 2024

change location to fsn1 because of kubernetes/registry.k8s.io#138

ec0b821

guettli mentioned this issue Oct 16, 2024

Control plane node unhealthy - NodeProvisioning waiting for matching ProviderID syself/cluster-api-provider-hetzner#1492

Closed

Pacerino mentioned this issue Dec 24, 2024

[FEATURE] Allow Config for custom registries hcloud-talos/terraform-hcloud-talos#149

Open

Random download failures - 403 errors [hetzner] #138

Random download failures - 403 errors [hetzner] #138

Comments

marblerun commented Jan 12, 2023

BenTheElder commented Jan 12, 2023

BenTheElder commented Jan 12, 2023

marblerun commented Jan 13, 2023

tcahill commented Jan 18, 2023

BenTheElder commented Jan 18, 2023

BenTheElder commented Jan 20, 2023

BenTheElder commented Jan 20, 2023

mysticaltech commented Jan 26, 2023 • edited Loading

mysticaltech commented Jan 26, 2023 • edited Loading

dims commented Jan 26, 2023

mysticaltech commented Jan 26, 2023

mysticaltech commented Jan 26, 2023

aleksasiriski commented Jan 26, 2023

dims commented Jan 26, 2023

mysticaltech commented Jan 26, 2023

mysticaltech commented Jan 26, 2023 • edited Loading

mysticaltech commented Jan 26, 2023 • edited Loading

mysticaltech commented Jan 26, 2023

mysticaltech commented Jan 26, 2023

dims commented Jan 26, 2023

mysticaltech commented Jan 26, 2023

mysticaltech commented Jan 26, 2023

dims commented Jan 26, 2023

mysticaltech commented Jan 26, 2023

valkenburg-prevue-ch commented Jan 26, 2023

BenTheElder commented Jan 26, 2023

BenTheElder commented Jan 26, 2023

mysticaltech commented Jan 26, 2023

BenTheElder commented Jan 26, 2023

vitobotta commented Apr 27, 2024

rbjorklin commented Apr 27, 2024

vitobotta commented Apr 28, 2024

mysticaltech commented May 1, 2024

mysticaltech commented May 1, 2024

mysticaltech commented May 1, 2024

valkenburg-prevue-ch commented May 1, 2024

vitobotta commented May 1, 2024

mysticaltech commented May 1, 2024

vitobotta commented May 1, 2024

mysticaltech commented May 1, 2024 • edited Loading

vitobotta commented May 1, 2024

phillebaba commented May 1, 2024

vitobotta commented May 1, 2024

vitobotta commented May 1, 2024

vitobotta commented May 1, 2024

vitobotta commented May 1, 2024

valkenburg-prevue-ch commented May 1, 2024

vitobotta commented May 1, 2024

valkenburg-prevue-ch commented May 1, 2024

vitobotta commented May 1, 2024

mysticaltech commented May 3, 2024

mysticaltech commented Jan 26, 2023 •

edited

Loading

mysticaltech commented Jan 26, 2023 •

edited

Loading

mysticaltech commented Jan 26, 2023 •

edited

Loading

mysticaltech commented Jan 26, 2023 •

edited

Loading

mysticaltech commented May 1, 2024 •

edited

Loading