-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ds-identify: fails to recognize NoCloud datasource on boot cause it does not have /sbin in $PATH and thus does not find blkid #3182
Comments
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-16T10:20:37.418939+00:00 I now cloned a VM from the Cloud Init image that I "fixed" by running ds-identify --force, but I am running into the same thing again: Cloud Init is disabled again, since ds-identify fails to run blkid command: slestemplate:~ # cat /run/cloud-init/ds-identify.log Which leads to: However just using it again with the force option fixes the issue: slestemplate:~ # /usr/lib/cloud-init/ds-identify --force I can also reproduce the behavior on the template VM just by rebooting it. It appears that the initial check for NoCloud data source during boot fails. So the caching may not be the issue here. Retitling. |
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-16T10:29:36.729631+00:00 The command reliable works here on the fully booted VM: slestemplate:~ # blkid -c /dev/null -o export >/dev/null ; echo $? DEVNAME=/dev/sda1 DEVNAME=/dev/sr0 DEVNAME=/dev/mapper/0QEMU_QEMU_HARDDISK_drive-scsi0 DEVNAME=/dev/mapper/0QEMU_QEMU_HARDDISK_drive-scsi0-part1 DEVNAME=/dev/mapper/sys0-rootfs I think I am just going to hardcode the datasource now. Either via configuration file if possible or by hacking the shell script. |
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-16T11:39:48.127955+00:00 I ran blkid command 10 times in a row and always got return code 0. So I don´t get why:
is not working on boot. It looks like correct shell code. Only idea I have: It is run to early for the ISO device to become available. Okay, testing for this with this change: slestemplate:~ # git diff /usr/lib/cloud-init/ds-identify.orig /usr/lib/cloud-init/ds-identify
Which gets me: slestemplate:~ # cat /run/cloud-init/ds-identify.log | head -7 Which may just mean that during startup via Systemd slestemplate:~ # type blkid is not in path. And well now I learned from bash manpagethat is exactly what the bash error code tells me (Manpage: bash(1)):
But it does not seem that the systemd generator is being run on reboot, cause I added: slestemplate:~ # diff -u cloud-init-generator.orig /usr/lib/systemd/system-generators/cloud-init-generator +echo "PATH: $PATH" > /root/path at the beginning of it, yet got no output in /tmp, after reboot, while when running it manually I get the output. So it appears on reboot something else is calling it and this does not have /sbin in path. I have no clue what else might be calling it: slestemplate:/etc # grep -ir "ds-identify" . slestemplate:/usr/lib/systemd # grep -ir "ds-identify" only reports that system-generators/cloud-init-generator. Also nothing in slestemplate:/var # LANG=en grep -ir "ds-identify" . So I am done with it for now and will just hardcode the path in ds-identify to /sbin/blkid. And voila, this finally works. After a few dozens of attempts and reboots I finally at least have found the root cause and a work-around. I think to be really portable it ds-identify needs to try harder to find blkid, cause hard coding it to UsrMerge /usr/sbin/blkid is going to break on Debian and Ubuntu als long as UsrMerge is not done there. Or one might use /sbin/blkid at this is hard-linked on SLES 12 and RHEL 7 to /usr/sbin – and I bet these hardlinks better to be kept around for decades. Gosh, this works. This finally works. Retitling again and adding patch. |
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-16T11:46:47.836829+00:00 I had 18.2-1.1.x86_64 but is also affected. |
Launchpad user Martin Steigerwald(martin-steigerwald) wrote on 2018-05-16T11:54:15+00:00 Please see upstream bug report for all the details on this: ds-identify: fails to recognize NoCloud datasource on boot cause it does not have /sbin in $PATH and thus does not find blkid Minimal patch to fix the issue: slestemplate:~ # diff -u ds-identify.orig /usr/lib/cloud-init/ds-identify
Of course with UsrMerge you could also use /usr/sbin/blkid. As stated in upstream bug report I have not the slightest idea what it calling ds-identify during boot. I thought it would be the systemd cloud-init generator, but I added debug output to it and it apparently is not called. For all the gory details see the upstream bug report. Proper fix might be to make sure blkid is in $PATH. |
Launchpad user Martin Steigerwald(martin-steigerwald) wrote on 2018-05-16T11:58:12+00:00 Created attachment 770432 |
Launchpad user Martin Steigerwald(martin-steigerwald) wrote on 2018-05-16T12:01:53+00:00 Created attachment 770433 I first thought about blkid might be called to early in the boot time, but that is not true. 127 is bash´s return code for command not found. Still attaching it as a reference. |
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-16T12:15:26.470274+00:00 I have no idea how to attach patches as files here, but I attached them in downstream bug report: |
Launchpad user Robert Schweikert(rjschwei) wrote on 2018-05-16T12:30:29+00:00 We will not take this patch. At present it is not understood why PATH is not always part of the environment when the generator runs. This is being investigated. For example the generator that spawns ds-identify also runs in the SUSE published images in AWS and there is no problem with finding blkid. |
Launchpad user Martin Steigerwald(martin-steigerwald) wrote on 2018-05-16T13:28:21+00:00 (In reply to Robert Schweikert from comment #3)
Fair enough, I will use it nonetheless, cause with it I have it working now.
Interesting. That SLES 12 image has some service pack migrations behind it already. Its an minimal image I installed and adapted myself for training purposes. It has only minimal adaptions in configuration: These are the files I checked in to git repo – my changes are limited to these files (I did not adapt os-release and so on of course): etc/SuSE-release |
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-16T13:43:21.469875+00:00 Downstream is investigating why PATH is not always in the environment of cloud-init systemd generator. I never clearly noted: The OpenSUSE Build Service cloud-init 18.2 are still experimental. |
Launchpad user Robert Schweikert(rjschwei) wrote on 2018-05-16T15:40:11+00:00 After receiving additional input and based on what was already known I have decided to drop the generator from our package. The generator, which in turn runs ds-identify, where the problem is created by using blkid, speeds up the boot process in cases where cloud-init shold not be running in the first place. The chain of events is as follows: generator runs ds-identify With cloud-init disabled the boot is sped up as no Python code gets executed. A reasonable assumption is that the person installing and enabling cloud-init knows they run in environment where cloud-init is needed. Thus looking for the data source twice, once in ds-identify and then again in the cloud-init Python code is not really an advantage. Dropping the generator avoids the problem with blkid and it avoids looking for the data source twice, once in shell code and once in Python code. Change is on the way to Factory |
Launchpad user Swamp-a(swamp-a) wrote on 2018-05-16T16:00:39+00:00 This is an autogenerated message for OBS integration: |
Launchpad user Scott Moser(smoser) wrote on 2018-05-17T17:32:24.294552+00:00 For what its worth, the caching of ds-identify result is due to cloud-init-generator being called multiple times in a boot and thus ds-identify being called multiple times. We wanted to avoid 'blkid' calls to re-search disks during high IO process as boot. we do have intent to make ds-identify more stand-alone useful. In doing that it would make sense to have the systemmd-generator use a "--respect-previous-run" or something and only then cache it. |
Launchpad user Scott Moser(smoser) wrote on 2018-05-17T20:14:24.185007+00:00 I've put a merge proposal up. that will ensure that PATH is set to include common locations. I really am interesetd in cloud-init doing the right thing in all cases, and thus I would like to have ds-identify enabled in suse and am willing to carry the change there so that we can assume a sane PATH. Even though I think a sane PATH should be set by the system rather than any program that expected to execute other programs. |
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-22T10:56:43+00:00 Just removing the generator leads to cloud-init services not being run at startup on my SLES 12 SP 3 VM with slestemplate:~ # find /etc/systemd | grep cloud All services are enabled according to systemctl status SERVICE slestemplate:~ # rpm -qa | grep cloud-init slestemplate:~ # systemctl status cloud-init.target The output I get is: slestemplate:~ # systemctl |grep cloud When I do: slestemplate:~ # systemctl start cloud-init.target So it is still not working out of the box. I enabled all services and the targets with systemctl enable. It is not obvious for me to enable cloud init in case the Systemd generator does not do it. I thought I did, but apparently I did not. |
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-22T10:59:43+00:00 Also please note that according https://bugs.launchpad.net/cloud-init/+bug/1771382/comments/15 upstream developer Scott Moser would like to see cloud-init doing the sane thing in all cases and added a merge proposal for setting the PATH to common locations. He would like to see ds-identify enabled in SUSE. |
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-22T11:06:13.419848+00:00 Scott, I mentioned your comment about having a sane PATH everywhere and ds-identify enabled in SUSE at the SUSE bugtracker. Their change to just remove the generator did not yield the expected result on my SLES 12 SP 3 VM. Cloud Init is simply not started at all then. Reported there. |
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-22T11:15:33+00:00 (In reply to Martin Steigerwald from comment #8)
With the work-around slestemplate:/etc/systemd/system # mv cloud-init.target.wants/* multi-user.target.wants/ cloud-init is started on boot. Seems cloud-init.target is never triggered. I am not really experienced with targets in Systemd. |
Launchpad user Robert Schweikert(rjschwei) wrote on 2018-05-22T11:45:36+00:00
cloud.cfg should contain something along these lines: datasource_list: [ NoCloud,......, None ] systemctl disable cloud-init.target or if you are building images with kiwi add the following to config.sh suseInsertService cloud-init-local For the next version 18.3 I expect a better solution as upstream has a pending patch for the PATH issue |
Launchpad user Scott Moser(smoser) wrote on 2018-05-22T14:11:09.305026+00:00 An upstream commit landed for this bug. To view that commit see the following URL: |
Launchpad user Robert Schweikert(rjschwei) wrote on 2018-05-22T19:46:56+00:00 OK, this is clearly putting the burden on the user and not really what we want to do. I've pulled the upstream patch to address the PATH issue. New cloud-init on it's way to Factory and available in Cloud:Tools |
Launchpad user Swamp-a(swamp-a) wrote on 2018-05-22T20:20:12+00:00 This is an autogenerated message for OBS integration: |
Launchpad user Swamp-a(swamp-a) wrote on 2018-05-23T22:14:54+00:00 openSUSE-RU-2018:1407-1: An update that has two recommended fixes can now be installed. Category: recommended (moderate) |
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-24T10:22:06+00:00 (In reply to Robert Schweikert from comment #12)
Working out of the box with: slestemplate:~ # rpm -qa | grep cloud Thank you, Robert. |
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-24T10:49:26.856738+00:00 Fix confirmed to work with: slestemplate:~ # rpm -qa | grep cloud Thank you, Scott. |
Launchpad user Swamp-a(swamp-a) wrote on 2018-06-05T13:50:37+00:00 This is an autogenerated message for OBS integration: |
Launchpad user Swamp-a(swamp-a) wrote on 2018-06-07T16:19:37+00:00 SUSE-RU-2018:1575-1: An update that has 10 recommended fixes can now be installed. Category: recommended (moderate) |
Launchpad user Swamp-a(swamp-a) wrote on 2018-06-08T19:12:49+00:00 openSUSE-RU-2018:1609-1: An update that has 10 recommended fixes can now be installed. Category: recommended (moderate) |
Launchpad user Swamp-a(swamp-a) wrote on 2018-06-08T19:15:54+00:00 openSUSE-RU-2018:1613-1: An update that has two recommended fixes can now be installed. Category: recommended (moderate) |
Launchpad user Scott Moser(smoser) wrote on 2018-06-20T18:06:01.770440+00:00 This bug is believed to be fixed in cloud-init in version 18.3. If this is still a problem for you, please make a comment and set the state back to New Thank you. |
Launchpad user Dimitri John Ledkov(xnox) wrote on 2018-09-11T18:53:20.401668+00:00 Systemd by default executes things, with execv, not execve. Hence the default environment is not available. However, cloud-init generator is executed by /bin/sh which does that a built-in default path $ lxc launch images:opensuse/15.0 test-sh-built-in-path $ lxc exec test-sh-built-in-path -- env -u PATH /bin/sh -c 'echo $PATH' On ubuntu, it is instead: $ env -u PATH /bin/dash -c 'echo $PATH' $ env -u PATH /bin/bash -c 'echo $PATH' Maybe you want to report a bug against Suse's default /bin/sh about this.... Also /bin/dash and /bin/bash differences are akward.... |
Launchpad user Dimitri John Ledkov(xnox) wrote on 2018-09-11T19:03:37.665716+00:00 Systemd by default executes things, with execv, not execve. Hence the default environment is not available. However, cloud-init generator is executed by /bin/sh which does has a built-in default path $ lxc launch images:opensuse/15.0 test-sh-built-in-path $ lxc exec test-sh-built-in-path -- env -u PATH /bin/sh -c 'echo $PATH' No idea if it is intentional, or not, that "sbin" is excluded there. |
This bug was originally filed in Launchpad as LP: #1771382
Launchpad details
Launchpad user Martin Steigerwald(ms-proact) wrote on 2018-05-15T15:37:38.152796+00:00
cloud-init 18.2 from http://download.opensuse.org/repositories/Cloud:/Tools/SLE_12_SP3/ on SLES 12 SP 3 with NoCloud data source via Cloud Init drive made by Proxmox.
On SLES 12 SP3 NoCloud data source was not working, despite
slestemplate:~ # blkid -c /dev/null -o export
[…]
DEVNAME=/dev/sr0
UUID=2018-05-15-16-34-27-00
LABEL=cidata
TYPE=iso9660
[…]
with necessary files on it. blkid gives 0 as returncode
Why?
I only kept parts of the output:
slestemplate:/etc/cloud # cat /run/cloud-init/ds-identify.log
[up 8.63s] ds-identify
policy loaded: mode=search report=false found=all maybe=all notfound=disabled
no datasource_list found, using default: MAAS ConfigDrive NoCloud AltCloud Azure Bigstep CloudSigma CloudStack DigitalOcean AliYun Ec2 GCE OpenNebula OpenStack OVF SmartOS Scaleway Hetzner IBMCloud
ERROR: failed running [127]: blkid -c /dev/null -o export
[…]
FS_LABELS=unavailable:error
ISO9660_DEVS=unavailable:error
It might have been that I did not yet add the CloudInit drive in Proxmox yet.
A subsequent call to
slestemplate:~ # /usr/lib/cloud-init/ds-identify
did not yet yield a different result.
Only by analysing the source I found that it caches results and I can use the
--force
option to override this. I did this and the NoCloud datasource got detected properly. Apparently this is cached now.The tool would only inform of the caching as a DEBUG message. However I set logging to INFO for all parts of Cloud Init as the FileHandler clutters the log with tons of messages how many bytes it read from each file. Sure, I could use INFO only for FileHandler.
Several issues reduce the ease of administration here:
Don´t cache errors. Really… just… don´t.
Don´t cache errors almost silently (just as a debug message).
Decide wisely what is a debug message and what is not.
A search for
ds-identify
in the documentation available at https://cloudinit.readthedocs.io/en/latest/ did not yield any result.And in general: Keep it short and simple.
IMHO the first is the most important: Don´t cache errors. If the resource now is there, recognize it, without further discussion.
Related bugs:
The text was updated successfully, but these errors were encountered: