Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix sandbox use in job wrapper #7705

Open
belforte opened this issue Jun 30, 2023 · 9 comments · May be fixed by #8704
Open

fix sandbox use in job wrapper #7705

belforte opened this issue Jun 30, 2023 · 9 comments · May be fixed by #8704

Comments

@belforte
Copy link
Member

belforte commented Jun 30, 2023

currently sandbox.tar.gz is expanded in two places (see #7681 (comment) and comment below that)

  1. /srv : so that e.g. scriptExe and additional userFiles are found where needed and as documented in FAQ
  2. /srv/CMSSW_X_Y_Z so that "scram subdirectories" picked frome $CMSSW_BASE on submission host are placed in corresponding directories

Furthermore, PSet.py and PSet.pkl are moved up from /srv/CMSSW_X_Y_Z to /srv because that's the current directory where cmsRun command is executed. This overrides the ones places in /srv in step 1.

We also have a real problem (**bug**) : new code to expand user tar files https://github.com//issues/7661 only does that in `/srv/CMSSW_X_Y_Z`, while user may expect this to be happen in `/srv` **NO**: reviewing the above one year later, user tar files are expanded in /srv, not /srv/CMSSW_X_Y_Z. So I am not sure if anything is needed, aside good documentation so that we avoid going through things again.

We should expand the sandbox only once and place each file in the right place !

@belforte
Copy link
Member Author

This is not particularly urgent at this point. But I'd rather do the cleanup sooner than later, to do it when we have things clear in memory and so that we also fix documentation and will never be confused again

@mapellidario
Copy link
Member

Instead of using multiple tar archives, which would require some changes, we can use a single archive, but change the directory where files are put inside of it. This would still require changes in both the client and the jobwr, but hopefully they will be a bit less invasive.
For example, we could use tarfile.add(..., arcname="srv/" + filename ) for the files that should go in /srv and tarfile.add(..., arcname="cmssbase/" + filename ) for the files that need to go to srv/CMSSW_X_Y_Z. in CMSRunAnalysis.py:prepSandbox we can extract the archive and then move all the directories from the archive to the desired final path.

@mapellidario
Copy link
Member

We also have a real problem (bug) : new code to expand user tar files #7661 only does that in /srv/CMSSW_X_Y_Z, while user may expect this to be happen in /srv

Maybe I am missing something, but why dont you execute the same extract command twice, both in prepSandbox and extractUserSandbox? are you waiting to for the cleanup and do not want to add any "dirty" fixes?

@belforte
Copy link
Member Author

belforte commented Jul 3, 2023

neat idea, thanks. Indeed that way we can also easily make a backward compatible TW and avoid disruption.

As to the second comment, indeed now I expand tar files only in /srv . But we need to change things a bit and only expand user sandbox after CMSSW_X_Y_Z directory has been created so we do all things in one place.

@belforte
Copy link
Member Author

belforte commented Jul 3, 2023

let's map into things to do:

  • modify TW to do sandbox expansion in only one place in the code after CMSSW_X_Y_Z exists and remove prepSandbox
  • expand in a temporary directory
  • if files appear with a flat structure, put them all both in /srv and /srv/CMSSW_X_Y_Z (do not use explitic /srv string, of course, define a name for job's starting directory like e.g. topdir and use it consistently)
  • if files from sandbox are organized as cmssw_base/... + topdir/.... put each set in the proper place
  • expand tar files placed in topdir
  • modify CRAB client to archive files in the two cmssw_baseand topdir directories
  • once old client is gone (also from PRE), cleanup wrapper to remove backward compatibility

Note added on Sept 20: all actions listed above are not needed. Once we "tarfile.add(..., arcname="cmssbase/" + filename ) for the files that need to go to srv/CMSSW_X_Y_Z." as @mapellidario cleverly pointed out in #7705 (comment) , expansion inside CMSSW_X_Y_Z will be taken care of by the current iterative expansion of sandbox.tar.gz

@belforte
Copy link
Member Author

belforte commented Sep 12, 2024

of course I have now forgotten everything... and will need to re-learn :-)

@belforte
Copy link
Member Author

after re-reading the issue and the code and a first attempt a documentation in #7754 (comment)
I am leaning toward

  • fix the documentation and leave code as is

@belforte
Copy link
Member Author

Well CRAB3.zip is not sent to the WN. No reason to reference it in the JobWrapper !

export PYTHONPATH=`pwd`/CRAB3.zip:`pwd`/WMCore.zip:$PYTHONPATH

@belforte
Copy link
Member Author

About the "sandbox expanded in two places".
I will start by removing

def prepSandbox(opts):

and incorporating the functionality in
def extractUserSandbox(archiveJob, cmsswVersion):

so that at least things will be clear.

Also it means that expansion is done after CMSSW_X_Y_Z has been created, opening the way for a "do it once".
BEst option looks to me at some point to modify sandboxing in client so that $CMSSW_BASE is tarred into a CMSSW_X_Y_Z directory and then all files will automatically go in the proper place when the sandbox is iteratively expanded

# if the sandbox contains tar files, expand them
files = subprocess.getoutput(f"tar tf {opts.archiveJob}").split('\n')
for file in files:
if ('.tar.' in file) or file.endswith('.tar') or\
file.endswith('.tgz') or file.endswith('.tbz'):
print(f"expanding {file} in {os.getcwd()}")
print(subprocess.getoutput(f"tar xfm {file}"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants