Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xcat on Centos 7.6.1810 - how to set the latest kernel to run on compute nodes #6014

Closed
pcmc opened this issue Feb 20, 2019 · 8 comments
Closed
Assignees
Milestone

Comments

@pcmc
Copy link

pcmc commented Feb 20, 2019

Hello all,

Not sure if this is the right place to post this question, please redirect me if not.

I have some success in installing xcat on a small cluster running Centos 7.6.1810.
The compute nodes seem to start up okay.

After updating the kernel on the master, and subsequently on the compute nodes,
I re-run the genimage, packimage, and nodeset, and rebooted the compute nodes.
But they all run with the first kenel.

Below are the steps involved.
Any idea how to find out what I have missed and/or done wrong?

Many thanks.
Peter

Steps taken:
[root@main ~]# # genimage -i eth0 -n dca,ixgbe,igb,e1000e,e1000,forcedeth,tg3 -o centos7.6 -p compute 2>&1 | tee -a /tmp/genimage.log

[root@main ~]# packimage centos7.6-x86_64-netboot-compute 2>&1 | tee -a /tmp/genimage.log
Packing contents of /install/netboot/centos7.6/x86_64/compute/rootimg
archive method:cpio
compress method:gzip

[root@main ~]# nodeset compute osimage=centos7.6-x86_64-netboot-compute 2>&1 | tee -a /tmp/genimage.log
proc01: netboot centos7.6-x86_64-compute
proc02: netboot centos7.6-x86_64-compute
proc03: netboot centos7.6-x86_64-compute
proc04: netboot centos7.6-x86_64-compute
[root@main ~]#

[root@main rootimg]# ls /install/netboot/centos7.6/x86_64/compute/rootimg/boot/
config-3.10.0-957.5.1.el7.x86_64 symvers-3.10.0-957.el7.x86_64.gz
config-3.10.0-957.el7.x86_64 System.map-3.10.0-957.5.1.el7.x86_64
initramfs-3.10.0-957.5.1.el7.x86_64.img System.map-3.10.0-957.el7.x86_64
initramfs-3.10.0-957.el7.x86_64.img vmlinuz-3.10.0-957.5.1.el7.x86_64
symvers-3.10.0-957.5.1.el7.x86_64.gz vmlinuz-3.10.0-957.el7.x86_64
[root@main rootimg]#

[root@main rootimg]# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
[root@main rootimg]#
[root@main rootimg]# uname -r
3.10.0-957.5.1.el7.x86_64 <---- the master node is running the latest kernel

[root@proc01 ~]# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
[root@proc01 ~]# uname -r
3.10.0-957.el7.x86_64 <---- the compute node is running the old kernel after reboot
[root@proc01 ~]# ls /boot
ls: cannot access /boot: No such file or directory <-- no /boot folder on the compute node.

@whowutwut
Copy link
Member

@pcmc Hi Peter, welcome, this is the correct place..

For example:

Feb 20 02:50:14 proc01 systemd-udevd: ERR failed to execute '/usr/lib/udev/sock
et:@/org/freedesktop/hal/udev_event' 'socket:@/org/freedesktop/hal/udev_event':
No such file or directory

It does improve readability.

@immarvin Do you have any ideas here? This seems strange that the compute nodes didn't pull down the same image at the master..

@immarvin
Copy link
Contributor

hi @pcmc ,2 questions for your:

  1. what is the xcat version your are using?
  2. please follow https://xcat-docs.readthedocs.io/en/latest/guides/admin-guides/manage_clusters/ppc64le/diskless/customize_image/install_new_kernel.html#installing-a-new-kernel-in-the-diskless-image to apply the new kernel. Your steps does not show me how you applied the new kernel, you cannot simply copy the new kernel to <rootimg dir>/boot.

Since for diskless, the boot procedure is totally different with Diskfull node, the kernel and initrd under /boot directory will not be loaded during boot up.

/boot directory is in the exlist of diskless osimage(look into the file lsdef -t osimage -o centos7.6-x86_64-netboot-compute -i exlist) , i.e, the <rootimg dir>/boot will be excluded from the compresses rootimg tarball during packimage, this explains why you find:

[root@proc01 ~]# ls /boot
ls: cannot access /boot: No such file or directory <-- no /boot folder on the compute node.

@immarvin immarvin self-assigned this Feb 22, 2019
@immarvin immarvin added this to the 2.14.6 milestone Feb 22, 2019
@bybai
Copy link
Contributor

bybai commented Feb 22, 2019

Hi @pcmc ,
Welcome, I think your do not use xCAT method to add latest kernel packages into xCAT osimage , please refer to this doc: https://xcat-docs.readthedocs.io/en/latest/guides/admin-guides/manage_clusters/ppc64le/diskless/customize_image/install_new_kernel.html?highlight=kernel
If you still have problem, contact us.

@pcmc
Copy link
Author

pcmc commented Feb 22, 2019

Hello all,

Thanks a lot for all your replies.

It is quite possible that I have missed some important steps.
So if you could bear with me, I shall retrace the steps I took.

  1. download the Centos 7.6 1810 latest iso
    /misc/iso/centos/CentOS-7-x86_64-Everything-1810.iso

  2. copycds -i /misc/iso/centos/CentOS-7-x86_64-Everything-1810.iso

    to work out the distribution being known as centos7.6

  3. copycds --osver=centos7.6 /misc/iso/centos/CentOS-7-x86_64-Everything-1810.iso

    it puts software into /install/centos7.6/x86_64, and also the images into
    ls /install/netboot/centos7.6/x86_64/compute/
    initrd-stateless.gz initrd-statelite.gz kernel rootimg rootimg.cpio.gz

  4. Define local node details

     [root@main ~]# tabdump nodelist   - some fields have been updated by xcatd:
#node,groups,status,statustime,appstatus,appstatustime,primarysn,hidden,updatestatus,updatestatustime,zonename,comments,disable
"main","master",,,,,,,,,,,
"proc01","compute,ipmi","booted","02-22-2019 07:29:55",,,,,,,,,
"proc02","compute,ipmi","booted","02-22-2019 07:30:05",,,,,,,,,
"proc03","compute,ipmi","booted","02-22-2019 07:30:06",,,,,,,,,
"proc04","compute,ipmi","booted","02-22-2019 07:30:22",,,,,,,,,
"node-0025905aec91","all",,,,,,,,,,,
[root@main ~]# tabdump bootparams
#node,kernel,initrd,kcmdline,addkcmdline,dhcpstatements,adddhcpstatements,comments,disable
"compute",,,,"selinux=0",,,,
"proc01","xcat/osimage/centos7.6-x86_64-netboot-compute/kernel","xcat/osimage/centos7.6-x86_64-netboot-compute/initrd-stateless.gz","imgurl=http://130.246.32.140:80//install/netboot/centos7.6/x86_64/compute/rootimg.cpio.gz XCAT=130.246.32.140:3001 NODE=proc01 FC=yes ifname=eth0:00:25:90:5a:eb:8a netdev=eth0 ",,,,,
"proc02","xcat/osimage/centos7.6-x86_64-netboot-compute/kernel","xcat/osimage/centos7.6-x86_64-netboot-compute/initrd-stateless.gz","imgurl=http://130.246.32.140:80//install/netboot/centos7.6/x86_64/compute/rootimg.cpio.gz XCAT=130.246.32.140:3001 NODE=proc02 FC=yes ifname=eth0:00:25:90:5a:eb:f2 netdev=eth0 ",,,,,
….
[root@main ~]# tabdump networks  
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,mtu,comments,disable
"stfc-net","130.246.32.0","255.255.252.0","bond0","130.246.32.254","130.246.32.140","130.246.32.140","130.246.32.140,130.246.8.13,130.246.188.240","130.246.32.140,130.246.8.13,193.62.22.82","130.246.32.140","130.246.32.141-155","130.246.32.141-155",,,"bnsc.rl.ac.uk",,"bnsc.rl.ac.uk",,,
[root@main ~]# 
[root@main ~]# tabdump mac     
#node,interface,mac,comments,disable
"proc01",,"00:25:90:5a:eb:8a",,
"proc02",,"00:25:90:5a:eb:f2",,
"proc03",,"00:25:90:5a:eb:a2",,
"proc04",,"00:25:90:5a:eb:d0",,

[root@main ~]# tabdump hosts
#node,ip,hostnames,otherinterfaces,comments,disable
"main","130.246.32.140",,,,
"proc01","130.246.32.141",,,,
"proc02","130.246.32.142",,,,
"proc03","130.246.32.143",,,,
"proc04","130.246.32.144",,,,
    makedns -n
    makedhcp -n
    systemctl restart dhcpd
 cd /opt/xcat/share/xcat/netboot/centos
   genimage -i eth0 -n dca,ixgbe,igb,e1000e,e1000,tg3 -o centos7.6 -p compute 

   packimage centos7.6-x86_64-netboot-compute

   nodeset compute osimage=centos7.6-x86_64-netboot-compute 

  rsync -av /install/netboot/centos7.6/x86_64/compute/{kernel,initrd-stateless.gz} /tftpboot/xcat/netboot/centos7.6/x86_64/compute/ 
  rsync -av /install/netboot/centos7.6/x86_64/compute/{kernel,initrd-stateless.gz} /tftpboot/xcat/osimage/centos7.6-x86_64-netboot-compute/  
  chmod 644 /tftpboot/xcat/netboot/centos7.6/x86_64/compute/*
  chmod 644 /tftpboot/xcat/osimage/centos7.6-x86_64-netboot-compute//*
  1. Power up processing nodes, and they all seem to have started up okay.

  2. add in more packages to the compute image from the master node using yum:

    yum -y --installroot=/install/netboot/centos7.6/x86_64/compute/rootimg install htop ipmitool parted nmap dstat nc lsof vim-enhanced less
also make some local changes in /install/netboot/centos7.6/x86_64/compute/rootimg/etc/fstab
to include local partitions and NFS mounts.
  1. Update the compute node kernel using yum
    yum  -y --installroot=/install/netboot/centos7.6/x86_64/compute/rootimg   update
A new kernel (3.10.0-957.5.1) is produced under rootimg folder: 
[root@main ~]# ls /install/netboot/centos7.6/x86_64/compute/rootimg/boot/
config-3.10.0-957.5.1.el7.x86_64         symvers-3.10.0-957.el7.x86_64.gz
config-3.10.0-957.el7.x86_64             System.map-3.10.0-957.5.1.el7.x86_64
initramfs-3.10.0-957.5.1.el7.x86_64.img  System.map-3.10.0-957.el7.x86_64
initramfs-3.10.0-957.el7.x86_64.img      vmlinuz-3.10.0-957.5.1.el7.x86_64
symvers-3.10.0-957.5.1.el7.x86_64.gz     vmlinuz-3.10.0-957.el7.x86_64
  1. redo genimage steps to activate the new kernel into compute image:
   cd /opt/xcat/share/xcat/netboot/centos
   genimage -i eth0 -n dca,ixgbe,igb,e1000e,e1000,tg3 -o centos7.6 -p compute
   packimage centos7.6-x86_64-netboot-compute
   nodeset compute osimage=centos7.6-x86_64-netboot-compute
   rsync -av /install/netboot/centos7.6/x86_64/compute/\
{kernel,initrdstateless.gz} /tftpboot/xcat/netboot/centos7.6/x86_64/compute/
   rsync -av /install/netboot/centos7.6/x86_64/compute/\
{kernel,initrd-stateless.gz}  /tftpboot/xcat/osimage/centos7.6-x86_64-netboot-compute/ 
   chmod 644 /tftpboot/xcat/netboot/centos7.6/x86_64/compute/*
   chmod 644 /tftpboot/xcat/osimage/centos7.6-x86_64-netboot-compute/*
  1. reboot the compute nodes.

The result is that the compute nodes are still running the old kernel (3.10.0-957.).

I have looked into the web link you sent me.
Currently there is no /install/kernel folder on the master node.
Naturally I can manually create that.

But the new kernel and the associated packages are sourced from the Centos distribution site.
Does that mean I need to download all the new kernel and the associated packages and put them into
/install/kernel/3.10.0-957.5.1.el7.x86_64?

But given the working folder /install/netboot/centos7.6/x86_64/compute/rootimg
has already been updated, I wonder if there is an easier way to update the running kernel from
the existing working folder?

Many thanks for your advice again.

Regards,
Peter

@immarvin
Copy link
Contributor

hi @pcmc , yes, /install/kernel is just an example directory for description, you need to customize the example according to your actual scenario. There are some internal logic for genimage -k, I can not exactly tell you what are the according manual steps for them.

1 issue I can find in your step is that genimage will place the first kernel found(maybe old, or new) in the install image, if nothing specified explicitly, to /install/netboot/centos7.6/x86_64/compute/, then in the following rsync -av, the kernel(maybe the old one) will be copied to /tftpboot, hence the provisioned node still run on the old kernel.

I suggest you follow the steps in xCAT Doc

@immarvin immarvin removed the sprint3 label Feb 28, 2019
@pcmc
Copy link
Author

pcmc commented Mar 6, 2019

Dear all,

Thanks a lot for all the advice.

Just an update that as suggested, by following the instructions given in
https://xcat-docs.readthedocs.io/en/latest/guides/admin-guides/manage_clusters/ppc64le/diskless/customize_image/install_new_kernel.html?highlight=kernel
I have managed to load the latest kernel for the compute nodes.

Below are the extra steps required essentially.

Peter

mkdir -p /install/kernel/3.10.0-957.5.1
cd /install/kernel/3.10.0-957.5.1

yumdownloader kernel-3.10.0-957.5.1

createrepo /install/kernels/3.10.0-957.5.1

chdef -t osimage centos7.6-x86_64-netboot-compute -p pkgdir=/install/kernels/3.10.0-957.5.1

cd /opt/xcat/share/xcat/netboot/centos
genimage -i eth0 -n dca,ixgbe,igb,e1000e,e1000,forcedeth,tg3 -o centos7.6 -p compute -k 3.10.0-957.5.1
{ note /install/netboot/centos7.6/x86_64/compute/rootimg/etc/fstab and yum.repos.d would be overwritten by genimage, so any local mod needs to re-instated.)

packimage centos7.6-x86_64-netboot-compute

nodeset compute osimage=centos7.6-x86_64-netboot-compute

rsync -av /install/netboot/centos7.6/x86_64/compute/{kernel,initrd-stateless.gz} /tftpboot/xcat/netboot/centos7.6/x86_64/compute/

chmod 644 /tftpboot/xcat/netboot/centos7.6/x86_64/compute/initrd-stateless.gz /tftpboot/xcat/netboot/centos7.6/x86_64/compute/kernel

@immarvin
Copy link
Contributor

is it ok to close this? @pcmc

@pcmc
Copy link
Author

pcmc commented Mar 11, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants