Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenStack: long hostnames may prevent mdns from working properly #2243

Closed
mandre opened this issue Aug 20, 2019 · 11 comments
Closed

OpenStack: long hostnames may prevent mdns from working properly #2243

mandre opened this issue Aug 20, 2019 · 11 comments

Comments

@mandre
Copy link
Member

mandre commented Aug 20, 2019

When deploying on OpenStack, for nodes where the hostname is too long -- when giving a long enough cluster name -- the mdns-publisher pod will fail with the following error:

time="2019-08-19T06:08:55Z" level=info msg="Publishing with settings" collision_avoidance=hostname ip=192.168.0.19                                                                                                                                                              
time="2019-08-19T06:08:55Z" level=info msg="Binding interface" name=ens3                                                                                                                                                                                                        
time="2019-08-19T06:08:55Z" level=debug msg="Changing service name" new="preserve-wjosp0819fa Etcd-preserve-wjosp0819fa-9z8qw-master-0" original="preserve-wjosp0819fa Etcd"                                                                                                    
time="2019-08-19T06:08:55Z" level=info msg="Publishing service" domain=local. hostname=preserve-wjosp0819fa-9z8qw-etcd-0.local. name="preserve-wjosp0819fa Etcd-preserve-wjosp0819fa-9z8qw-master-0" port=2380 ttl=3200 type=_etcd-server-ssl._tcp                              
time="2019-08-19T06:08:55Z" level=debug msg="Changing service name" new="preserve-wjosp0819fa Workstation-preserve-wjosp0819fa-9z8qw-master-0" original="preserve-wjosp0819fa Workstation"                                                                                      
time="2019-08-19T06:08:55Z" level=info msg="Publishing service" domain=local. hostname=preserve-wjosp0819fa-9z8qw-master-0.local. name="preserve-wjosp0819fa Workstation-preserve-wjosp0819fa-9z8qw-master-0" port=42424 ttl=3200 type=_workstation._tcp                        
time="2019-08-19T06:08:55Z" level=debug msg="Changing service name" new="preserve-wjosp0819fa EtcdWorkstation-preserve-wjosp0819fa-9z8qw-master-0" original="preserve-wjosp0819fa EtcdWorkstation"                                                                              
time="2019-08-19T06:08:55Z" level=info msg="Publishing service" domain=local. hostname=preserve-wjosp0819fa-9z8qw-etcd-0.local. name="preserve-wjosp0819fa EtcdWorkstation-preserve-wjosp0819fa-9z8qw-master-0" port=42424 ttl=300 type=_workstation._tcp                       
time="2019-08-19T06:08:55Z" level=info msg="Zeroconf registering service" name="preserve-wjosp0819fa EtcdWorkstation-preserve-wjosp0819fa-9z8qw-master-0"                                                                                                                       
time="2019-08-19T06:08:55Z" level=info msg="Zeroconf registering service" name="preserve-wjosp0819fa Etcd-preserve-wjosp0819fa-9z8qw-master-0"                                                                                                                                  
time="2019-08-19T06:08:55Z" level=info msg="Zeroconf registering service" name="preserve-wjosp0819fa Workstation-preserve-wjosp0819fa-9z8qw-master-0"                                                                                                                           
time="2019-08-19T06:08:55Z" level=info msg="Zeroconf setting service ttl" name="preserve-wjosp0819fa Etcd-preserve-wjosp0819fa-9z8qw-master-0" ttl=3200                                                                                                                         
time="2019-08-19T06:08:55Z" level=info msg="Zeroconf setting service ttl" name="preserve-wjosp0819fa EtcdWorkstation-preserve-wjosp0819fa-9z8qw-master-0" ttl=300                                                                                                               
2019/08/19 06:08:55 [ERR] zeroconf: failed to send probe: dns: bad rdata                                                                                                                                                                                                        
time="2019-08-19T06:08:55Z" level=info msg="Zeroconf setting service ttl" name="preserve-wjosp0819fa Workstation-preserve-wjosp0819fa-9z8qw-master-0" ttl=3200                                                                                                                  
2019/08/19 06:08:55 [ERR] zeroconf: failed to send probe: dns: bad rdata                                                                                                                                                                                                        
2019/08/19 06:08:55 [ERR] zeroconf: failed to send probe: dns: bad rdata                                                                                                                                                                                                        
2019/08/19 06:08:55 [ERR] zeroconf: failed to send announcement: dns: bad rdata
2019/08/19 06:08:55 [ERR] zeroconf: failed to send probe: dns: bad rdata
2019/08/19 06:08:55 [ERR] zeroconf: failed to send announcement: dns: bad rdata
2019/08/19 06:08:56 [ERR] zeroconf: failed to send announcement: dns: bad rdata
2019/08/19 06:08:56 [ERR] zeroconf: failed to send announcement: dns: bad rdata

There is a hard limit on 64 characters. In this example the service name ends up being preserve-wjosp0820mh Workstation-preserve-wjosp0820mh-q82sf-master-0 after we apply the hostname-based collision avoidance, which is 68 characters.

We should add a validation in the installer to ensure the resulting service name isn't longer than 63 chars.

Baremetal has the same issue, however it's much less likely to occur since they don't prepend the cluster name to the hostname.

@mandre
Copy link
Member Author

mandre commented Aug 20, 2019

/label platform/openstack

@celebdor
Copy link
Contributor

I would really recommend not prepending the cluster name to the node name. master-0.clustername.domainname is a much better fqdn for a node than clustername-master-0.clustername.domainname

@iamemilio
Copy link

The standard for setting the hostname is that the hostname should match the node name. The standard for the node name scheme is --<master/worker>-#.

@ghost
Copy link

ghost commented Sep 11, 2019

Should be resolved with #2270

+ ./openshift-install create cluster --log-level debug
level=debug msg="OpenShift Installer v4.2.0"
level=debug msg="Built from commit 8426eda869832e539940631b8a41a80e58d9a1e7"
level=debug msg="Fetching \"Terraform Variables\"..."
level=debug msg="Loading \"Terraform Variables\"..."
level=debug msg="  Loading \"Cluster ID\"..."
level=debug msg="    Loading \"Install Config\"..."
level=debug msg="      Loading \"SSH Key\"..."
level=debug msg="      Loading \"Base Domain\"..."
level=debug msg="        Loading \"Platform\"..."
level=debug msg="      Loading \"Cluster Name\"..."
level=debug msg="        Loading \"Base Domain\"..."
level=debug msg="      Loading \"Pull Secret\"..."
level=debug msg="      Loading \"Platform\"..."
level=fatal msg="failed to fetch Terraform Variables: failed to load asset \"Install Config\": invalid \"install-config.yaml\" file: metadata.name: Invalid value: \"morenod-nightly\": metadata name is too long, please restrict it to 14 characters"

@mandre
Copy link
Member Author

mandre commented Sep 11, 2019

/close

We've implemented a validation in #2270 that checks for the length of the cluster name. There is still room for improvement, in making the generated service name shorter, allowing for much longer cluster names. This should be treated as a separate issue, though.

@openshift-ci-robot
Copy link
Contributor

@mandre: Closing this issue.

In response to this:

/close

We've implemented a validation in #2270 that checks for the length of the cluster name. There is still room for improvement, in making the generated service name shorter, allowing for much longer cluster names. This should be treated as a separate issue, though.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dustinmm80
Copy link

I ran into this issue when using openshift-installer with GCP. When cluster name was > 11 chars, creating a cluster fails waiting on operators to resolve.

Full details here: https://gitlab.com/gitlab-org/gl-openshift/gitlab-operator/-/merge_requests/47#note_435260548.

I see we added some name length validation in #2270, but only for OpenStack. It seems like we need to add name length validation to GCP, and perhaps other platforms as well. The character limit is affected by the root domain and GCP project name, since they are included in the domain.

@staebler
Copy link
Contributor

I ran into this issue when using openshift-installer with GCP. When cluster name was > 11 chars, creating a cluster fails waiting on operators to resolve.

Which version of OpenShift were you installing? BZ 1872885 was fixed recently.

@dustinmm80
Copy link

I ran into this issue with 4.5.15. I see 4.6.1 is out now, will try with that version. Thanks for linking to the bug.

@dustinmm80
Copy link

I was able to launch a cluster successfully with a 12 character cluster name using openshift-install 4.6.1. Thanks for the help!

@marksaitis
Copy link

Latest openshift-install and AWS or GCP - doesn't work whatsoever. This waiting for kubernettes thing comes up and then later - refused to connect. I got my zone nicely hosted on route53 and also google dns (respective zone). I can resolve the api.**** endpoint just fine to the ip's of load balancers... but the damn thing doesn't work I tell you. Followed redhat's guide with my right eye at 1cm distance from the screen.... didn't miss a thing. Quality control for installers team 0/10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants