Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdcp EXECUTE of .post script not working as expected #5987

Closed
wrighrc opened this issue Jan 31, 2019 · 19 comments
Closed

xdcp EXECUTE of .post script not working as expected #5987

wrighrc opened this issue Jan 31, 2019 · 19 comments
Assignees
Labels
Milestone

Comments

@wrighrc
Copy link

wrighrc commented Jan 31, 2019

from reading the man page for xcdp, it seems that whenever /etc/hosts is updated I should be able to run a .post script. I find that doesn't work as expected.

[root@mgt ~]# cat /install/custom/synclist 
/etc/hosts -> /etc/hosts
/etc/hosts.post -> /etc/hosts.post
EXECUTE:
/etc/hosts.post

[root@mgt ~]# cat /etc/hosts.post 
#!/bin/bash
date >> /tmp/hosts.date

# If I update the /etc/hosts file the .post script isn't running...
[root@mgt ~]# echo "192.168.1.201 n03" >> /etc/hosts
[root@mgt ~]# xdcp compute -v -T -F /install/custom/synclist
TRACE:Default context is XCAT.
TRACE:Fanout Value is 64.
TRACE:Timeout Value is .
TRACE:Verifying remaining targets with pping command.
 TRACE: Executing Command:/bin/sh -c /tmp/rsync_n01

# If I touch the .post script  it runs...
[root@mgt ~]# touch /etc/hosts.post 
[root@mgt ~]# xdcp compute -v -T -F /install/custom/synclist
TRACE:Default context is XCAT.
TRACE:Fanout Value is 64.
TRACE:Timeout Value is .
TRACE:Verifying remaining targets with pping command.
 TRACE: Executing Command:/bin/sh -c /tmp/rsync_n01
TRACE:Default context is XCAT
TRACE:Node RSH is 
TRACE: Fanout value is 64.
TRACE: Timeout value is  
TRACE: Verify value is  
TRACE: Execute option specified.
TRACE:Execute: Exporting File:/usr/bin/scp -B /etc/hosts.post root@n01:/tmp/bok561Q9OY.dsh
Command name: /usr/bin/ssh -o BatchMode=yes -x root@n01 export NODE=n01; export LANG=en_US.UTF-8 LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL=C PERL_BADLANG=0 ;  /tmp/bok561Q9OY.dsh ; export DSH_TARGET_RC=$?; echo ":DSH_TARGET_RC=${DSH_TARGET_RC}:";rm /tmp/bok561Q9OY.dsh

# Here's the version info
[root@mgt ~]# rpm -qa | grep -i xcat
conserver-xcat-8.2.1-1.x86_64
xCAT-genesis-base-ppc64-2.14.5-snap201811160710.noarch
xCAT-genesis-base-x86_64-2.14.5-snap201811190037.noarch
elilo-xcat-3.14-4.noarch
xCAT-probe-2.14.5-snap201812062220.noarch
grub2-xcat-2.02-0.16.el7.snap201506090204.noarch
xCAT-buildkit-2.14.5-snap201812062220.noarch
xCAT-genesis-scripts-ppc64-2.14.5-snap201812062220.noarch
xCAT-client-2.14.5-snap201812062220.noarch
xCAT-genesis-scripts-x86_64-2.14.5-snap201812062220.noarch
xCAT-server-2.14.5-snap201812062220.noarch
xCAT-2.14.5-snap201812062220.x86_64
ipmitool-xcat-1.8.18-0.x86_64
perl-xCAT-2.14.5-snap201812062220.noarch
syslinux-xcat-3.86-2.noarch

@cxhong cxhong self-assigned this Feb 1, 2019
@zet809 zet809 changed the title xcdp EXECUTE of .post script not working as expected xdcp EXECUTE of .post script not working as expected Feb 11, 2019
@immarvin
Copy link
Contributor

hi @wrighrc , it seems that whenever /etc/hosts is updated I should be able to run a .post script , this is not true. You should use EXECUTEALWAYS here to run /etc/hosts.post whether the file /etc/hosts.post is updated or not.

From https://xcat-docs.readthedocs.io/en/stable/guides/admin-guides/manage_clusters/common/deployment/syncfile/syncfile_synclist_file.html?highlight=syncfile :

The EXECUTEALWAYS clause will list all the postscripts you would always like to run after the files are sync’d, whether or not any file is actually updated. 

@immarvin immarvin self-assigned this Feb 11, 2019
@immarvin immarvin added this to the 2.14.6 milestone Feb 11, 2019
@wrighrc
Copy link
Author

wrighrc commented Feb 11, 2019

I completely disagree with you. I DON'T want to use EXECUTEALWAYS I want to use EXECUTE. My example was a simple one to demonstrate the issue. I'm wanting to reload autofs if and only if its configuration files have changed. I believe this is why there are two options, EXECUTE being one of them. If what I'm trying to do isn't possible with EXECUTE, then please do provide me with some examples of what EXECUTE would be useful for. I'm sure that EXECUTE was developed for some probably forgotten reason. Thanks.

@cxhong
Copy link
Contributor

cxhong commented Feb 11, 2019

@immarvin and @wrighrc , I was looking this issue last week and i can recreate in our test system. From the xCAT documentation, I think this is a issue. for EXECUTE clause, /etc/hosts.post will be run if /etc/hosts file changes. I was trying to find out why xCAT introduced post file, what was reason when developed this. I like to get rid of post file if it's possible ,

from this example (compute.synclist):

#sync list
/tmp/share/file2  -> /tmp/file2
/tmp/myscript1 -> /tmp/myscript1
# Postscripts
EXECUTE:
/tmp/share/file2
/tmp/share/file3
EXECUTEALWAYS:
/tmp/myscript1
/tmp/myscript2
  1. the files under EXECUTE and EXECUTEALWAYS needs to be in the sync list (??)
  2. for EXECUTE Clause, the /tmp/share/file2 will be run first time, then will be run again if file changed
  3. for EXECUTEALWAYS Clause, the /tmp/myscript1 will be always run whether file is changed or not.
  4. /tmp/share/file3 and /tmp/myscript2 will not run because they are not in the sync list ( ? ?)

@gurevichmark
Copy link
Contributor

Please note that the synclist file section in the docs has been modified for 2.14.6.
Mostly formatting and rewording to make things more clear.

You can see the updated version https://xcat-docs.readthedocs.io/en/latest/guides/admin-guides/manage_clusters/common/deployment/syncfile/syncfile_synclist_file.html

@immarvin
Copy link
Contributor

okay, find the root cause, it is introduced by this commit:

[root@c910f03c05k21 xcat-core]# git show 5476763673c6d61b1576c210a1749283cb3057fe
commit 5476763673c6d61b1576c210a1749283cb3057fe
Author: Victor Hu <whowutwut@gmail.com>
Date:   Tue Oct 20 17:07:57 2015 -0400

    Removed the code that strips the .post off the end of the files.
    The file list being returned from xdcp contains .post.  For EXECUTE
    files, we never match the postscript and so it never gets executed

diff --git a/perl-xCAT/xCAT/DSHCLI.pm b/perl-xCAT/xCAT/DSHCLI.pm
index ac6eb16..4bbe363 100644
--- a/perl-xCAT/xCAT/DSHCLI.pm
+++ b/perl-xCAT/xCAT/DSHCLI.pm
@@ -5685,10 +5685,6 @@ sub run_rsync_postscripts
         # return from rsync is tmp/file1  not /tmp/file1
         substr($tmppostfile,0,1)="";

-        # now remove .post from the postscript file for the compare
-        # with the returned file name
-        my($tp,$post) = split(/\.post/,$tmppostfile);
-        $tmppostfile = $tp;
         foreach my $line (@rsync_output) {
             my($hostname,$ps) = split(/: /, $line);
             chomp $ps;

I will create a fix for this issue.

@immarvin
Copy link
Contributor

With the fix #5997, the postscript <file>.post specified in EXECUTE will be invoked in either of the following conditions:
(1) the <file>.post is updated
(2) the <file>.post if updated and <file>.post is in synclist

@cxhong , please review the PR #5997 and merge it if you are ok with it.

cxhong added a commit that referenced this issue Feb 12, 2019
fix issue xdcp EXECUTE of .post script not working as expected #5987
@gurevichmark
Copy link
Contributor

gurevichmark commented Feb 12, 2019

@immarvin Is the idea here is to be able to execute a script, when a non-script files is updated?

For example, if you had /etc/hosts in the synclist file and needed to run some script after this file is changed, you would add /etc/hosts.post. This way /etc/hosts.post will get executed every time /etc/hosts is changed.

I think this is different from putting a script into EXECUTEALWAYS section. There the script will get executed even if /etc/hosts is not changed.

Currently our docs say:

image

But I do not understand, why we say this .post extension is required for hierarchical clusters? If this is true and hierarchical clusters do require .post extension, should we say what users need to put into the .post file ? Also calling it postscript support is a little confusing as it makes it sound it is related to the postscripts in provisioning. Perhaps here we should call it postsync support ?

@immarvin
Copy link
Contributor

hi @gurevichmark , great questions!

To answer you questions, I looked into the Doc and code, find that there are some errors and conflicts in Doc https://xcat-docs.readthedocs.io/en/latest/guides/admin-guides/manage_clusters/common/deployment/syncfile/syncfile_synclist_file.html?highlight=sync%20file#advanced-synclist-file-features . I list them here to discuss:

  1. The user scenarios of EXECUTE and EXECUTEALWAYS are totally different:
  • EXECUTE is to invoke corresponding post-sync script <file-to-sync>.post of the file to sync < file-to-sync > if < file-to-sync > is updated
    • < file-to-sync > should always be specified in sync list file
    • <file-to-sync>.post should be specified in sync list file in hierarchy cluster. The reason is: the xdsh command to invoke the post sync script <file-to-sync>.post on CN is initiated by SN, <file-to-sync>.post itself should be synced to SN first.
    • <file-to-sync>.post will only be triggered when <file-to-sync> is updated. The update of <file-to-sync>.post itself(in sync list file for hierarchy case) won't trigger the invocation. I created a PR on this EXECUTE in sync list will not invoke post sync scripts if the script itself is updated #6001
  • EXECUTEALWAYS is to invoke the scripts <script-to-sync> which are specified to be synced in sync file list
    • <script-to-sync> should always be specified in sync list file
    • there is no relationship like the post-sync script <file-to-sync>.post and the file to sync < file-to-sync > in EXECUTE clause.

The information in Doc https://xcat-docs.readthedocs.io/en/latest/guides/admin-guides/manage_clusters/common/deployment/syncfile/syncfile_synclist_file.html?highlight=sync%20file#advanced-synclist-file-feature is quite obscure, we should make the information more clear in Doc, maybe a example with description will better.

We should also correct the errors like:

postscript support

Putting the filename.post in the rsyncfile to rsync to the node is required for hierarchical clusters. It is optional for non-hierarchical cluster.

I agree with the term postsync support you proposed, maybe we can rename the postscript to post sync script.

Would you please create a PR to refine the Doc? @gurevichmark thanks

@immarvin
Copy link
Contributor

hi @cxhong , would you please take a look my comments above and confirm the information is right?

and please review PR #6001 , thanks

@gurevichmark
Copy link
Contributor

gurevichmark commented Feb 13, 2019

@immarvin Thank you for the detailed investigation and your explanation.
A few more questions:

  1. I still do not quite understand what we mean when we say Putting the filename.post in the rsyncfile to rsync to the node is required for hierarchical clusters. It is optional for non-hierarchical cluster.

    a. It says rsyncfile, should it be synclist file ?
    b. It says to rsync, should it be to sync ? Because in our descriptions we talk about xdcp and updatenode command, but we do not talk about rsync command.
    c. I read your explanation about why .post is required for hierarchy cluster, but still not clear to me. Lets say you have hierarchy cluster and want to update /etc/hosts on all computes and you do not put /etc/hosts.post into the EXECUTE section. Then you run updatenode for all computes, what problem will you see?

  2. What would happen if user does not specify /etc/hosts file in sync file, but species /etc/hosts.post in EXECUTE section ?

@immarvin
Copy link
Contributor

a. It says rsyncfile, should it be synclist file ?
b. It says to rsync, should it be to sync ? Because in our descriptions we talk about xdcp and updatenode command, but we do not talk about rsync command.

yes, it is a problem expose the underlie tool rsync to user. I think it is caused by the fact that at first rsync is the only engine for file syncing in xdcp and updatenode, scp is added as the 2nd engine last year. EXECUTE is not support when scp is the underlie engine. EXECUTEALWAYS is supported in both engines.

c. I read your explanation about why .post is required for hierarchy cluster, but still not clear to me. Lets say you have hierarchy cluster and want to update /etc/hosts on all computes and you do not put /etc/hosts.post into the EXECUTE section. Then you run updatenode for all computes, what problem will you see?

/etc/hosts will be synced to CN, since you do not specify any post sync script for /etc/hosts file

  1. What would happen if user does not specify /etc/hosts file in sync file, but species /etc/hosts.post in EXECUTE section ?

/etc/hosts.post will not be invoked since nothing will trigger it.

@gurevichmark
Copy link
Contributor

a. It says rsyncfile, should it be synclist file ?
b. It says to rsync, should it be to sync ? Because in our descriptions we talk about xdcp and updatenode command, but we do not talk about rsync command.

yes, it is a problem expose the underlie tool rsync to user. I think it is caused by the fact that at first rsync is the only engine for file syncing in xdcp and updatenode, scp is added as the 2nd engine last year. EXECUTE is not support when scp is the underlie engine. EXECUTEALWAYS is supported in both engines.

c. I read your explanation about why .post is required for hierarchy cluster, but still not clear to me. Lets say you have hierarchy cluster and want to update /etc/hosts on all computes and you do not put /etc/hosts.post into the EXECUTE section. Then you run updatenode for all computes, what problem will you see?

/etc/hosts will be synced to CN, since you do not specify any post sync script for /etc/hosts file

So why does our documentation says Putting the filename.post in the rsyncfile to rsync to the node is required for hierarchical clusters. ? Is that something that needs to be removed from our doc or am I missing some usecase in which syncing will not work in hierarchical cluster if <filename.post> is not in synclist ?

  1. What would happen if user does not specify /etc/hosts file in sync file, but species /etc/hosts.post in EXECUTE section ?

/etc/hosts.post will not be invoked since nothing will trigger it.

What will happen if you do have /etc/hosts file in sync file, do not have /etc/hosts.post in sync file, and specify /etc/hosts.post in EXECUTE section ?

@immarvin
Copy link
Contributor

Let me answer your question thru an example:

In a hierarchy cluster with MN, SN and CN. The sync list is:

# cat /install/custom/synclist
/etc/test -> /tmp/etc/test
/etc/test.post -> /tmp/etc/test.post
EXECUTE:
/etc/test.post

during file syncing with xdcp and updatenode -F on MN:

  1. /etc/test and /etc/test.post are synced to the destination directory under intermediate directory/var/xcat/syncfiles(specified in site.SNsyncfiledir) on SN
    • /etc/test on MN is synced to /var/xcat/syncfiles/etc/test on SN
    • /etc/test.post on MN is syned to /var/xcat/syncfiles/etc/test.post on SN
  2. /etc/test and /etc/test.post are synced from intermediate directory /var/xcat/syncfiles on SN to destination directory on CN.
    • /var/xcat/syncfiles/etc/test on SN is synced to /tmp/etc/test on CN
    • /var/xcat/syncfiles/etc/test.post on SN is synced to /tmp/etc/test.post on CN
  3. an xdsh command xdsh CN -e /var/xcat/syncfiles/etc/test.post is invoked on SN to run post sync script /etc/test.post specified in EXECUTE

If /etc/test.post is not in sync list, step #3 will fail due to non-exist /var/xcat/syncfiles/etc/test.post on SN.

What will happen if you do have /etc/hosts file in sync file, do not have /etc/hosts.post in sync file, and specify /etc/hosts.post in EXECUTE section ?

  • on flat cluster, /etc/hosts is synced to CN and /etc/hosts.post is invoked on CN if /etc/hosts is updated
  • on hierarchy cluster, /etc/hosts is synced to CN successfully, but fail to run /etc/hosts.post, due to /etc/hosts.post does not exist in SN

@gurevichmark
Copy link
Contributor

gurevichmark commented Feb 15, 2019

@immarvin So our statement Putting the filename.post in the rsyncfile to rsync to the node is required for hierarchical clusters. It is optional for non-hierarchical cluster. is there to cover step 3 ?

If we have synclist file like this:

# cat /install/custom/synclist
/etc/test -> /tmp/etc/test
EXECUTE:
/etc/test.post

It will work on flat cluster but fail in hierarchical cluster ?
And the reason is that when we execute xdsh CN -e /var/xcat/syncfiles/etc/test.post it will work on flat cluster because xdsh will copy the test.post from MN to CN and run it there ? But in hierarchical cluster the xdsh CN -e /var/xcat/syncfiles/etc/test.post does not first copy the test.post to SN, then copy it to CN, and then run it on CN ?

@immarvin
Copy link
Contributor

@immarvin So our statement Putting the filename.post in the rsyncfile to rsync to the node is required for hierarchical clusters. It is optional for non-hierarchical cluster. is there to cover step 3 ?

If we have synclist file like this:

# cat /install/custom/synclist
/etc/test -> /tmp/etc/test
EXECUTE:
/etc/test.post

It will work on flat cluster but fail in hierarchical cluster ?

yes

And the reason is that when we execute xdsh CN -e /var/xcat/syncfiles/etc/test.post it will work on flat cluster because xdsh will copy the test.post from MN to CN and run it there ? But in hierarchical cluster the xdsh CN -e /var/xcat/syncfiles/etc/test.post does not first copy the test.post to SN, then copy it to CN, and then run it on CN ?

yes,

@gurevichmark
Copy link
Contributor

gurevichmark commented Feb 18, 2019

And the reason is that when we execute xdsh CN -e /var/xcat/syncfiles/etc/test.post it will work on flat cluster because xdsh will copy the test.post from MN to CN and run it there ? But in hierarchical cluster the xdsh CN -e /var/xcat/syncfiles/etc/test.post does not first copy the test.post to SN, then copy it to CN, and then run it on CN ?

yes,

@immarvin Are you sure about that ? :-)

If I create a script in /root/script_mg2.sh, on MN, I can then run xdsh cn01 -e /root/script_mg2.sh, targeting a compute node. It appears the script gets copied to the service node first:

[root@boston02 ~]# lsdef mid08tor03cn01 -i servicenode -c
mid08tor03cn01: servicenode=sn02
[root@boston02 ~]#

[root@boston02 ~]# cat script_mg2.sh
#!/bin/bash
echo "script_mg2.single running on $(hostname -s)"
[root@boston02 ~]#

[root@boston02 ~]# xdsh sn02 "ls /var/xcat/syncfiles/root/"
[root@boston02 ~]#

[root@boston02 ~]# xdsh mid08tor03cn01 -e script_mg2.sh
mid08tor03cn01: script_mg2.single running on mid08tor03cn01
[root@boston02 ~]#

[root@boston02 ~]# xdsh sn02 "ls /var/xcat/syncfiles/root/"
sn02: script_mg2.sh
[root@boston02 ~]#

@immarvin
Copy link
Contributor

hi @gurevichmark , yes, xdsh can process the file relay on SN, however, for command, xdcp mid08tor03cn01 -v -T -F /tmp/compute.synclist, the wrapped xdsh command in xdcp is initiated on SN instead of MN. Please revisit the example description I mentioned in #5987 (comment)

This is proved by:

[root@boston02 postscripts]# cat /tmp/compute.synclist
/root/file_mg2 -> /tmp/gurevich/file_mg2
EXECUTE:
/root/file_mg2.post
[root@boston02 postscripts]# touch /root/file_mg2
[root@boston02 postscripts]# xdcp mid08tor03cn01 -v -T -F /tmp/compute.synclist
TRACE:Default context is XCAT.
TRACE:Fanout Value is 64.
TRACE:Timeout Value is .
TRACE:Verifying remaining targets with pping command.
 TRACE: Executing Command:/bin/sh -c /tmp/rsync_mid08tor03cn01
Error: [sn02]: Command failed: xdsh. Error message: TRACE:Default context is XCATTRACE:Node RSH is TRACE: Fanout value is 64.TRACE: Timeout value is  TRACE: Verify value is  File /var/xcat/syncfiles/root/file_mg2.post does not exist.

TRACE:Default context is XCAT
TRACE:Node RSH is
TRACE: Fanout value is 64.
TRACE: Timeout value is
TRACE: Verify value is
File /var/xcat/syncfiles/root/file_mg2.post does not exist
[root@boston02 postscripts]# echo $?
1

@immarvin
Copy link
Contributor

hi @wrighrc , the code and doc have been merged, can this be closed?

@wrighrc
Copy link
Author

wrighrc commented Feb 25, 2019

Yes please close. Thanks a lot for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants