-
Notifications
You must be signed in to change notification settings - Fork 0
MANHEP/man-dirac
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Basic idea 1. Edit man-jobs-submit to suit the script you want to run across your data files 2. Make sure the data files are on grid storage (use dirac-dms-add-file) 3. Run man-jobs-submit The jobs will be created in a unique DIRAC JobGroup so you can find them more easily. The JobGroup name consists of your DIRAC nickname then the current date and time. 4. Use the DIRAC portal to see how they are getting on (you can select by JobGroup) 5. Get the results with a command like: dirac-wms-job-get-output --JobGroup andrew.mcnab.20170811164806 A directory is created with the output files of each job and each of those directories is created in a directory named after the JobGroup. If you run the command more than once, it uses the existing directories to keep track of what job outputs it has already fetched. You can safely ignore warnings like "No jobs selected with date ..." 6. Write a script to merge the outputs of your outputs in all those directories and run the script when they've all finished. If your output files are big too, you need to use OutputData (see the example in man-jobs-submit) and fetch those files using dirac-wms-job-get-output-data or dirac-dms-get-file (you can give it a file containing a list of LFNs to fetch.) man-jobs-submit uses the example shell script tarJob.sh which records some things about the environment then unpacks a tar file in the working directory before doing a word count with wc of the file data.txt from the tar file. That wc log is then written to JOB_ID.wc.log where JOB_ID is the unique DIRAC Job ID. The six input files were created one by one like this: echo "zero" > data.txt ; tar cvf 0.tar data.txt dirac-dms-add-file /skatelescope.eu/user/a/andrew.mcnab/skatest/0.tar 0.tar UKI-NORTHGRID-MAN-HEP-disk The JDL options in man-jobs-submit provide the shell script in each job with one of the input tar files from grid storage and save the JOB_ID.wc.log into the output sandbox which you can retrieve when the job finishes. They can be retrieved with something like this (depending on the JobGroup): dirac-wms-job-get-output --JobGroup andrew.mcnab.20170811164806 Each job output will be created as a subdirectory of the JobGroup directory (andrew.mcnab.20170811164806) that the command also creates. You can run the command multiple times to retrieve the outputs from your jobs as they finish. The command avoids downloading the same outputs more than once by checking to see if a job's subdirectory already exists. In the example, each of those subdirectories contains a wc.log output file, so you can merge the output of all your jobs with cat andrew.mcnab.20170811164806/*/*.wc.log > merged.wc.log It should be possible to adapt this trivial example to most High Throughput Computing workflows.
About
SKA/AENEAS script templates for using DIRAC at Manchester
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published