Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install with static IP takes too long when DHCP is not present on port groups. #3436

Open
mlh78750 opened this issue Dec 2, 2016 · 23 comments
Labels
area/appliance component/isos component/test Tests not covered by a more specific component label kind/debt Problems that increase the cost of other work priority/p2

Comments

@mlh78750
Copy link
Contributor

mlh78750 commented Dec 2, 2016

exists in 0.8

If you install with static IP's the appliance image will first try to get a dhcp address and then assigns the static ip over writing the address wit the static. But, if there is no DHCP server, the appliance will wait until the dhcp request fails and then will assign the static ip.

Recommend increasing timeout for the install if using static IP's.

@mlh78750 mlh78750 added area/appliance component/appliance-base kind/debt Problems that increase the cost of other work impact/doc/note Requires creation of or changes to an official release note labels Dec 2, 2016
@mlh78750
Copy link
Contributor Author

mlh78750 commented Dec 2, 2016

@stuclem, I think we should add a generic release note recommending increased timeout value for static ip installs when using vic-machine.

@hmahmood
Copy link
Contributor

hmahmood commented Dec 2, 2016

@mlh78750 do you have logs (with log level set to debug) for this scenario?

@mlh78750
Copy link
Contributor Author

mlh78750 commented Dec 2, 2016

@hmahmood : ask @mobla He should be able to hook you up. I found this in his environment. You have to install in a network with no DHCP available.

@mobla
Copy link

mobla commented Dec 3, 2016

@hmahmood, let me know the steps to set the log level to debug and I will upload the logs for you.

Thanks,
Murali

@mdubya66 mdubya66 added this to the v0.9.0 milestone Dec 3, 2016
@mdubya66
Copy link
Contributor

mdubya66 commented Dec 3, 2016

usability issue.

@mlh78750
Copy link
Contributor Author

mlh78750 commented Dec 5, 2016

This affects container VMs as well. And the timeout for the DHCP failure is about the same as the timeout for attach if you docker run -it. So if you are in an environment without DHCP boot times for the containers will be quite high while photon waits for the DHCP to fail, then the tether will write the static config and things will work.

So if you use a non-DHCP environment for bridge network, know that the container will take a while to boot, and that the attach when using run -it will likely fail due to a timeout, but a docker attach <container> will work.

@mlh78750
Copy link
Contributor Author

mlh78750 commented Dec 5, 2016

cc @sflxn

@mdubya66
Copy link
Contributor

mdubya66 commented Dec 5, 2016

@mikeh mentioned one option is to disable network bring up in the bootstrap iso's and have tether do all the network setup.

@mlh78750
Copy link
Contributor Author

mlh78750 commented Dec 5, 2016

@mdubya66 that's the way it's supposed to be working. Not sure what's getting hung up right now.

@hmahmood
Copy link
Contributor

hmahmood commented Dec 5, 2016

It looks like dhcp is not running @mobla 's setup. I verified systemd-networkd service is not running (which runs the dhcp client). We also didn't see any evidence of dhcp activity in the kernel/systemd logs.

However, we did see the docker personality restarting due to the following errors:

time="2016-12-05T08:00:07Z" level=error msg="Error while pulling image: Head https://sc2-cpbu-vcsa01-6.eng.vmware.com/v2/: x509: certificate signed by unknown authority"
time=2016-12-05T08:00:07.981386890Z level=debug msg=[ END ] [github.com/vmware/vic/lib/apiservers/engine/backends.(*Image).PullImage:235] [13.497085ms] sc2-cpbu-vcsa01-6.eng.vmware.com/project1/busybox:1.0
time="2016-12-05T08:00:07Z" level=error msg="Handler for POST /v1.23/images/create returned error: Head https://sc2-cpbu-vcsa01-6.eng.vmware.com/v2/: x509: certificate signed by unknown authority"
time="2016-12-05T08:04:23Z" level=info msg="Launching docker personality pprof server on 127.0.0.1:6062"
time="2016-12-05T08:04:52Z" level=error msg="Error while pulling image: Head https://sc2-cpbu-vcsa01-6.eng.vmware.com/v2/: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"Self-signed by VMware, Inc.\")"
time=2016-12-05T08:04:52.138234639Z level=debug msg=[ END ] [github.com/vmware/vic/lib/apiservers/engine/backends.(*Image).PullImage:235] [13.38656ms] sc2-cpbu-vcsa01-6.eng.vmware.com/project1/busybox:1.0
time="2016-12-05T08:04:52Z" level=error msg="Handler for POST /v1.23/images/create returned error: Head https://sc2-cpbu-vcsa01-6.eng.vmware.com/v2/: x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"Self-signed by VMware, Inc.\")"
time="2016-12-05T08:07:24Z" level=info msg="Launching docker personality pprof server on 127.0.0.1:6062"

Not sure what this is about, but it apparently causes the docker personality to bail.

@mlh78750
Copy link
Contributor Author

mlh78750 commented Dec 5, 2016

@hmahmood that was for the VCH. Can you confirm that the container VM is also configured to not run the DHCP client. I'm trying to figure out why the boot of a container in this environment takes a while.

@hmahmood
Copy link
Contributor

hmahmood commented Dec 5, 2016

The container VM shares the same configuration for the VCH, so it should not have dhcp running as well. I confirmed that systemd-networkd is not running in a busybox containers I brought up.

@hmahmood
Copy link
Contributor

hmahmood commented Dec 5, 2016

The cert failure above is likely because a private registry being used. Is that the case here @mobla? If that is the case, the personality should not exit as it is happening here. @sflxn?

@mobla
Copy link

mobla commented Dec 6, 2016

@hmahmood: attached the container logs for create VCH taking ~5mins as per your request
container-logs.tar.gz

@hmahmood
Copy link
Contributor

hmahmood commented Dec 6, 2016

The vmware.log file for the VCH is missing. Also, our systemd config setup seems to be busted; from dmesg:

Dec 05 23:31:59 Photon systemd[1]: Failed to populate /etc with preset unit settings, ignoring: Too many levels of symbolic links
Dec 05 23:31:59 Photon systemd[1]: [/etc/systemd/system/vic-init.service:15] Unknown lvalue 'Wants' in section 'Install'
Dec 05 23:31:59 Photon systemd[1]: [/etc/systemd/system/vic-init.service:16] Unknown lvalue 'Wants' in section 'Install'

Not sure if this is contributing to the problem.

@hmahmood
Copy link
Contributor

hmahmood commented Dec 6, 2016

For the "Too many levels of symbolic links" error:

systemd/systemd#3010

We have systemd 228 installed; the fixes to this issue came after that release. https://github.com/systemd/systemd/blob/master/NEWS

@mobla
Copy link

mobla commented Dec 6, 2016

Attached the vmware.log and tether.debug. Please rename the tar.gz file to zip and unzip it..

vmware-and-tether-logs.tar.gz

@npakrasi
Copy link
Contributor

npakrasi commented Dec 8, 2016

Added to 0.8 release notes:

  • Install with static IP takes too long when DHCP is not present on port groups. #3436
    If you install with static IP, the appliance image first tries to get a DHCP address and then assigns the static IP by overwriting the DHCP address with the static IP. But if there is no DHCP server, the appliance will wait until the DHCP request fails and then assigns the static IP.
    Workaround: Increase the timeout for the install if using static IP.

@hmahmood
Copy link
Contributor

hmahmood commented Dec 8, 2016

@npakrasi I don't think that explanation is accurate. We are still investigating, and at last check DHCP is not looking like a cause.

Could you just say that some static ip installs may take longer than expected, and a workaround is to increase the timeout?

@stuclem
Copy link
Contributor

stuclem commented Dec 8, 2016

Moving this back to To Do because this is the engineering issue, not a doc issue.

@hmahmood hmahmood self-assigned this Dec 13, 2016
@mobla
Copy link

mobla commented Dec 15, 2016

@hmahmood: Uploaded the vmware.log from VCH (took 5 mins) appliance deployed on vCenter..

vmware-log.tar.gz

@stuclem
Copy link
Contributor

stuclem commented Dec 15, 2016

@hmahmood apologies - I only just saw this. Updated the release note as follows:

  • Deployment with static IP takes a long time. #3436
    If you deploy a VCH with a static IP, the deployment might take longer than expected, resulting in timeouts.
    Workaround: Increase the timeout for the deployment when using static IP.

Does this cover it?

@stuclem stuclem removed the impact/doc/note Requires creation of or changes to an official release note label Dec 15, 2016
@hmahmood
Copy link
Contributor

@stuclem that should cover it.

@mhagen-vmware mhagen-vmware removed this from the v0.9.0 milestone Jan 18, 2017
@zjs zjs added component/isos component/test Tests not covered by a more specific component label and removed component/isos/appliance labels Jul 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/appliance component/isos component/test Tests not covered by a more specific component label kind/debt Problems that increase the cost of other work priority/p2
Projects
None yet
Development

No branches or pull requests

8 participants