This tutorial is viewable at:
Swift is a simple scripting language that can run many copies of ordinary application programs (apps) on local or remote resources.
"Resources" can include your local computer (desktop,laptop, login host), distributed computers (grid, cloud), and parallel computers (cluster, HPC). Swift can use the resources you give it to run the copies at the same time (in parallel).
A key part of most Swift scripts is the parallel loop statement
foreach
, which looks like this:
foreach protein, i in proteinList { output[i] = runSimulation(protein); }
Swift acts like a high-level structured "shell" language. A Swift script just says what needs to be done: what are the apps, what are their inputs and outputs, and in what pattern should they be run. Swift then determines what can run in parallel, what can run when, and what can run where.
Programs run as soon as their inputs are available. They run on the resources you provide. And they run in parallel if possible, based on when the data they depend on is available. This makes Swift scripts very portable. The same script can run on a laptop, a cloud, or a collection of HPC systems, with little or no change.
The way in which Swift runs applications on local and remote resources is shown in the figure below.
In this tutorial, you’ll first try a few Swift examples (scripts 1-3)
on a local login host (workflow.iu.xsede.org
), to get a sense of the
language.
Then, in example scripts 4-6 you’ll run similar workflows on XSEDE resources and see how more complex workflows can be expressed with Swift scripts.
Copy the tutorial repository from a global folder:
cp -R /opt/tutorials/swift-tutorial . cd swift-tutorial
Now, run the tutorial setup script:
source setup.sh # NOTE: You must run this with "source" !
This adds the example applications simulate
and stats
(explained in the next part) and some other functionalities to
your local $PATH for you to run the tutorial. It also adds the
Swift installation on the workflow.iu.xsede.org machine to your PATH.
Note
|
You can also obtain the tutorial repository from github, to run on other machines or to get updates if they are needed during the tutorial: |
git clone https://github.com/swift-lang/xsede-tutorial.git swift-tutorial cd swift-tutorial
This section will show you how to run a science application under
Swift on your local login host (workflow.iu.xsede.org
). We use
trivial "mock" simulation and analysis applications to represent
typical scientific programs.
The first Swift script, p1.swift
, runs one instance of the mock
application simulate
, which generates a single random number and
writes that number as its output, to a file.
sys::[cat -n ../part01/p1.swift]
Line 1: Defines file
as a type.
Line 3-6: Defines an app function called simulation
, which has no input arguments and has
one output, type file. An app function is a function that is executed on target resources.
Line 5: This line within the app function definition defines the
command used to invoke the application on the selected compute
resource (here, just the local login host). stdout
and stderr
are
keywords that can be used to redirect these output streams from the
application to files defined by the user. filename()
gets the
correct path that the file
variable o
maps to on the selected
compute resource.
Line 8: A variable f
of type file
is defined that maps to a file called sim.out
on the
filesystem. The angle bracket < >
are used to define mappings from files and directories to
Swift variables. For more on mappers here’s mapper reference
Line 9: Variable f
is assigned the output of the invocation of the
app function simulation()
.
To run this script, run the following command:
$ cd swift-tutorial/part01 $ swift p1.swift Swift 0.96.2 git-rev: 6390483cc61035700e7278ae1a888f27b3bded2b heads/release-0.96-swift 6286 RunID: run001 Progress: Thu, 22 Jan 2015 16:21:51-0600 Progress: Thu, 22 Jan 2015 16:21:52-0600 Active:1 Final status:Thu, 22 Jan 2015 16:22:11-0600 Finished successfully:1 $ cat sim.out 18
To cleanup the directory and remove all outputs (including the log
files and directories that Swift generates), run the cleanup
script
which is located in the tutorial PATH:
$ cleanup
Note
|
You will also find a Swift configuration file swift.conf in
each partNN directory of this tutorial. This file specifies
system-specific details of the target computational resources where
Swift will run the application programs invoked by your script. This
configuration file will be explained in more detail in parts 4-6. It
can be ignored for now.
|
The p2.swift
script introduces the foreach
parallel iteration construct to run many concurrent simulations.
sys::[cat -n ../part02/p2.swift]
Lines 1-6: The simulaton
app is declared as in Example 1.
Lines 8-11: The foreach
loop construct iterates over a list of
integers from 0 to 9. The statements inside the foreach loop will be
executed 10 times, potentially in parallel (based on how many CPUs are
available and requested on the selected resource).
Line 9: Here we use define a variable f
of type file
, and use
the single_file_mapper
to map it to a unique file name created by
including the loop index in the filename. The single_file_mapper
, as
it’s name suggests, maps a single file, whose name is specified using
the file
attribute, to a Swift variable.
Line 10: The results from the app simulation
are returned to the
variable f
, which is mapped to unique file name in each iteration of
the loop.
This is an example of how you can name the output files of an ensemble
run. In this case, the output files will be output/sim_N.out
.
To run the script and view the output:
$ cd swift-tutorial/part02 $ swift p2.swift Swift 0.96.2 git-rev: 6390483cc61035700e7278ae1a888f27b3bded2b heads/release-0.96-swift 6286 RunID: run001 Progress: Thu, 22 Jan 2015 16:24:07-0600 Progress: Thu, 22 Jan 2015 16:24:08-0600 Active:10 Final status:Thu, 22 Jan 2015 16:24:27-0600 Finished successfully:10 $ ls output/ sim_0.out sim_1.out sim_2.out sim_3.out sim_4.out sim_5.out sim_6.out sim_7.out sim_8.out sim_9.out $ cat output/sim_1.out 13 $ cat output/sim_2.out 4
After all the simulations in an ensemble run are done, you will
typically want to gather and analyze the simulation results with a
post-processing analysis program or script. The example p3.swift
shows how to do this.
Here, the files created by all of the runs of simulate
are
averaged by the trivial "analysis application" stats
:
sys::[cat -n ../part03/p3.swift]
Line 3-6: The Swift app function simulation()
has been modified to
accept 3 arguments to control the simulation. Line 5 defines the
command invocation to be run on the compute resources.
Line 8-11: A new app function analyze()
is defined. This app takes an
array of files as input and returns a single file. When variables
mapped to files are passed as inputs or outputs to an app, Swift
manages the movement ("staging") of these files between the host where
the Swift script is executed and the compute resources where the
applications run. Line 10 defines the command to be run on the compute
resources.
Line 13-16: The built-in function arg(name,default)
extracts
user-specific command line arguments that are given when the Swift
script is called. The second argument to arg
is used as the default
if this option is not used on the command line.
Line 18: sims
is defined as an array of elements of files.
Line 20-24: The foreach
loop iterates over a list of integers
[0:nsim-1]
. nsim
is set by placing a -nsim
option on the swift
command invocation. If -nsim
is not set on the command line, the
nsim
variable defaults to 10 (line 13). In each loop iteration,
line 21 defines a temporary output file; line 22 runs the
simulation()
function, which actually calls the simulate
app; and
line 23 copies the simulation function output to an element of the
sims array, indexed by the foreach
loop index i
.
Line 26: stats
is defined as a file variable and mapped to the
file output/average.out
Line 27: The array of files sims[]
is passed to the function
analyze()
(which runs the analyze
app), whose results are stored
in stats
.
To run:
$ cd swift-tutorial/part03 $ swift p3.swift Swift 0.96.2 git-rev: 6390483cc61035700e7278ae1a888f27b3bded2b heads/release-0.96-swift 6286 RunID: run001 Progress: Thu, 22 Jan 2015 16:27:23-0600 Progress: Thu, 22 Jan 2015 16:27:24-0600 Active:10 Final status:Thu, 22 Jan 2015 16:27:44-0600 Finished successfully:11 $ ls output/ average.out sim_0.out sim_1.out sim_2.out sim_3.out sim_4.out sim_5.out sim_6.out sim_7.out sim_8.out sim_9.out $ cat output/average.out 52
Note that in p3.swift
we expose more of the capabilities of the
simulate.sh
application to the simulation()
app function:
app (file o) simulation (int sim_steps, int sim_range, int sim_values) { simulate "--timesteps" sim_steps "--range" sim_range "--nvalues" sim_values stdout=filename(o); }
p3.swift
also shows how to fetch application-specific values from
the swift
command line in a Swift script using the built-in function
arg()
which accepts a keyword-style user-specified command line
argument name and its default value:
int nsim = toInt(arg("nsim","10")); int steps = toInt(arg("steps","1")); int range = toInt(arg("range","100")); int values = toInt(arg("values","5"));
Now lets perform more runs of this Swift script, each with more
timesteps, and each producing more than one value, within a specified
range of values (between 0 and range
), using command-line arguments of the form
-parameterName=value
specified on the swift
command line.
For example, try running the swift
command with -nsim=100
and
-steps=1
to perform 100 simulations of 1 second each:
$ swift p3.swift -nsim=100 -steps=1 Swift 0.96.2 git-rev: 6390483cc61035700e7278ae1a888f27b3bded2b heads/release-0.96-swift 6286 RunID: run002 Progress: Thu, 22 Jan 2015 16:29:45-0600 Progress: Thu, 22 Jan 2015 16:29:46-0600 Selecting site:80 Active:20 Progress: Thu, 22 Jan 2015 16:30:07-0600 Selecting site:60 Active:20 Finished successfully:20 Progress: Thu, 22 Jan 2015 16:30:28-0600 Selecting site:40 Active:20 Finished successfully:40 Progress: Thu, 22 Jan 2015 16:30:49-0600 Selecting site:20 Active:20 Finished successfully:60 Progress: Thu, 22 Jan 2015 16:31:10-0600 Active:20 Finished successfully:80 Final status:Thu, 22 Jan 2015 16:31:31-0600 Finished successfully:101
We can see from Swift’s "progress" status output that the tutorial’s
default swift.conf
parameters for local execution allow Swift to run
up to 20 application invocations concurrently on the login node. We
will look at this in more detail in the next sections where we execute
applications on the compute nodes of several remote XSEDE sites (i.e.,
XSEDE "resource providers").
This section introduces the aspects of running on remote computational resources.
We will go into the configuration aspects that allow Swift to run applications on computation
resources. The swift.conf
file contains definitions of various aspects of different remote
computational resources that Swift can run your tasks on. Swift automatically looks for this
file when it runs.
Examples 4-6 are designed to run on remote sites, so they require the configuration to
be set in the swift.conf. The supplied swift.conf
config file, define several sites, and in
this tutorial, we use the following sites:
-
Stampede at TACC
-
Comet at SDSC
To configure the definition for a particular site, open the swift-tutorial/swift.conf file and edit the site entry for that site. For example, if you want to run the tutorial on the Stampede cluster, edit the site.stampede entry in the swift-tutorial/swift.conf file and follow the instructions given for stampede in the config file.
Here is the section of the swift.conf
file that describes the XSEDE resource "Stampede":
sys::[cat -n stampede.example.conf]
Note
|
You tell Swift which resource site(s) it should execute the apps of your workflow script
on by using the -sites option of the swift command. For example:
|
swift -sites stampede,gordan myscript.swift -nmodels=1024
p4.swift
shows a simple app
that takes a file containing random
numbers and sorts
them, then returns the sorted output. The part04
folder has a file, unsorted.txt
, that contains 100 random integers
ranging from 0 to 99. We will run the job on a remote resource. Be
sure that you have configured the swift.conf
for your target remote
site.
sys::[cat -n ../part04/p4.swift]
Line 3-6: The application function sortdata()
takes a file (mapped
to unsorted
) and returns a file mapped to out
. It uses the
command-line utility sort
to process the file passed to it.
Line 8-9: File variables sorted
and unsorted
are defined and
mapped to specific files.
Line 11: The new file sorted.txt
(mapped to the variable sorted
)
will be created to hold the output of the app invocation
sortdata(unsorted)
.
When a remote site is selected as the execution target for an
application (in this case, sort
), Swift will connect to that site
(in this case, with ssh
) and start a service that submits worker
processes which in turn will execute Swift app invocation tasks. Swift
moves (or "stages") any needed input and output files (as declared in the app
function interface definition) between the target systems and the
machine you are running Swift on.
When the swift
command completes, you should see a new sorted.txt
file in the folder. This contains contains the sorted results (the
output of the sort
command).
For example, to run the job remotely on Stampede and to view the output:
$ cd swift-tutorial/part04 $ swift -sites stampede p4.swift Swift 0.96.2 git-rev: 6390483cc61035700e7278ae1a888f27b3bded2b heads/release-0.96-swift 6286 RunID: run001 Progress: Thu, 22 Jan 2015 17:09:43-0600 Progress: Thu, 22 Jan 2015 17:09:44-0600 Submitting:1 Progress: Thu, 22 Jan 2015 17:09:59-0600 Submitted:1 Progress: Thu, 22 Jan 2015 17:10:06-0600 Stage in:1 Progress: Thu, 22 Jan 2015 17:10:07-0600 Stage out:1 Final status: Thu, 22 Jan 2015 17:10:14-0600 Finished successfully:1 $ more unsorted.txt 7 49 73 58 30 72 ... $ more sorted.txt 1 2 3 4 5 ...
Important
|
Once the Swift status shows the jobs to be "Submitted", the time it will take to complete the jobs can vary greatly based on how congested the queues are on the target resource. |
Tip
|
For this XSEDE tutorial, the swift.conf config provided in the
tutorial folders is sufficient. To learn more about configuring Swift
for specific sites and resource needs, a
Remote site configuration reference for the
XSEDE sites supported in the tutorial is included near the end of this
tutorial page. That section also explains how to check the status of
your jobs in the queue for systems with PBS, Condor or Slurm
schedulers.
|
The SDSC Comet and Gordon systems put tight memory restrictions on commands that are run on their login hosts. This prevents Swift from running its remote job-launching server (a Java application) on those systems.
For such systems we provide a simple example of a wrapper script
bswift
which runs the swift command on a compute node (in this case
using the faster turnaround shared
partition). The swift command
then submits pilot worker jobs to the Comet compute
partition to run
the application tasks of your Swift script.
$ gsissh comet $ tar zxf /oasis/scratch/comet/xdtr1/temp_project/swift/swift-tutorial.tgz $ cd swift-tutorial $ source setup.sh Swift version is Swift 0.96.2 git-rev: b9611649002eecd640fc6c58bbb88cb35ce03539 heads/release-0.96-swift 6287 $ cd part04 $ bswift -sites comet p4.swift
bswift
prints the id of batch job that it submits to the shared
queue, and passes its arguments to the swift command in that batch
job. The stdout/err of this batch job (including the swift command
output) is written to bswift.JOBNUMBER.out
. Here’s a sample
bswift
session:
comet$ ls p4.swift swift.conf unsorted.txt comet$ bswift -sites comet p4.swift Submitted batch job 3460317 comet$ squeue -u $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 3460317 shared bswift xdtr1 R 0:02 1 comet-03-14 comet$ squeue -u $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 3460320 compute B0718-45 xdtr1 PD 0:00 1 (None) 3460317 shared bswift xdtr1 R 0:10 1 comet-03-14 comet$ ls p4.swift swift.conf unsorted.txt comet$ ls bswift.3460317.out p4.kml p4.swift p4.swiftx run001 swift.conf unsorted.txt comet$ cat bswift.3460317.out bswift: /home/xdtr1/swift-tutorial/bin/bswift Submitted at Mon Jul 18 04:45:08 PDT 2016 -d bswift: Started at Mon Jul 18 04:45:19 PDT 2016 bswift: Running in dir /home/xdtr1/swift-tutorial/part04 bswift: Running on host comet-03-14.sdsc.edu Swift 0.96.2 git-rev: b9611649002eecd640fc6c58bbb88cb35ce03539 heads/release-0.96-swift 6287 RunID: run001 Progress: Mon, 18 Jul 2016 04:45:23-0700 Progress: Mon, 18 Jul 2016 04:45:24-0700 Submitted:1 comet$ squeue -u $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 3460317 shared bswift xdtr1 R 1:01 1 comet-03-14 3460320 compute B0718-45 xdtr1 R 0:01 1 comet-14-32 comet$ squeue -u $USER JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 3460320 compute B0718-45 xdtr1 CG 0:01 1 comet-14-32 comet$ cat bswift.3460317.out bswift: /home/xdtr1/swift-tutorial/bin/bswift Submitted at Mon Jul 18 04:45:08 PDT 2016 -d bswift: Started at Mon Jul 18 04:45:19 PDT 2016 bswift: Running in dir /home/xdtr1/swift-tutorial/part04 bswift: Running on host comet-03-14.sdsc.edu Swift 0.96.2 git-rev: b9611649002eecd640fc6c58bbb88cb35ce03539 heads/release-0.96-swift 6287 RunID: run001 Progress: Mon, 18 Jul 2016 04:45:23-0700 Progress: Mon, 18 Jul 2016 04:45:24-0700 Submitted:1 Progress: Mon, 18 Jul 2016 04:45:54-0700 Submitted:1 Final status: Mon, 18 Jul 2016 04:46:19-0700 Finished successfully:1 bswift: swift command completed at: Mon Jul 18 04:46:20 PDT 2016 comet$ head sorted.txt 0 1 2 ... 9 comet$
Example p5.swift
and its associated swift.conf
file will run our
mock "simulation" applications on the compute nodes of a remote XSEDE
resource. The script is similar to p3.swift
, but specifies that
each simulation()
app invocation should additionally return the log
file that the application writes to stderr
.
In p3.swift
the apps simulation()
and stats()
called the
excutable programs stats
and simulate
which were available on the
local machine and were present in the system path. The p5.swift
script instead passes the executables programs as additional file
arguments on the app invocation, to make them available on the remote
compute node.
In this case, these "apps" are in fact trivial shell scripts. In more
realistic and hence complex cases, Swift can run apps that are
pre-installed on the remote machine, as we did with sort
in example
4. Swift can also install a new app on a site or compute node the
first time that an app needs to run on a remote location, using its
softImage
feature (described in the Swift User Guide).
app (file out, file log) simulation (int sim_steps, int sim_range, int sim_values, file sim_script) { bash @sim_script "--timesteps" sim_steps "--range" sim_range "--nvalues" sim_values stdout=@out stderr=@log; }
sys::[cat -n ../part05/p5.swift]
Line 3-6: The application simulation()
has been modified to take
the simulation script as an argument through the file variable
sim_script
and to return a log file which contains output on the
stderr
stream from the application. Instead of calling the
application simulation
the command line string now calls bash
,
which in turns runs the simulation script. (Note that in our example
codes, simulate
is just a symbolic link alias for simulate.sh
).
Line 8-11: The application analyze()
has been modified to return a
log file which contains output on the stderr stream from the
application. You can use this log file to verify where the remote
application ran, by using grep
to search for "hostname".
To run:
$ cd swift-tutorial/part05 $ swift -sites <SITES> p5.swift Swift 0.96.2 git-rev: 6390483cc61035700e7278ae1a888f27b3bded2b heads/release-0.96-swift 6286 RunID: run001 Progress: Thu, 22 Jan 2015 17:15:01-0600 Progress: Thu, 22 Jan 2015 17:15:02-0600 Submitting:10 Progress: Thu, 22 Jan 2015 17:15:16-0600 Submitted:10 Progress: Thu, 22 Jan 2015 17:15:24-0600 Submitted:6 Active:4 Progress: Thu, 22 Jan 2015 17:15:45-0600 Stage in:1 Submitted:3 Active:2 Finished successfully:4 Progress: Thu, 22 Jan 2015 17:15:46-0600 Stage in:1 Submitted:2 Active:3 Finished successfully:4 Progress: Thu, 22 Jan 2015 17:15:47-0600 Submitted:2 Active:4 Finished successfully:4 Progress: Thu, 22 Jan 2015 17:16:07-0600 Active:3 Finished successfully:7 Progress: Thu, 22 Jan 2015 17:16:08-0600 Active:2 Stage out:1 Finished successfully:7 Progress: Thu, 22 Jan 2015 17:16:21-0600 Active:2 Finished successfully:8 Progress: Thu, 22 Jan 2015 17:16:28-0600 Stage in:1 Finished successfully:10 Progress: Thu, 22 Jan 2015 17:16:29-0600 Stage out:1 Finished successfully:10 Final status: Thu, 22 Jan 2015 17:16:51-0600 Finished successfully:11 # Open the output/average.log to take a look at the rich set of machine specific # information collected from the target system. $ more output/average.log Start time: Thu Jan 22 17:16:29 CST 2015 Running as user: uid=6040(yadunandb) gid=1000(ci-users) groups=1000(ci-users),1033(vdl2-svn),1082(CI-CCR000013),1094(CI-SES000031),1120(CI-IBN000050) Running on node: nid00116 ...
To run larger tests, two changes are required. The first is a change
to the command line arguments. The example below will run 100
simulations (-nsim=100
) with each simulation taking 5 seconds
(-steps=5
). The second change increase the resource limits specified
in the swift.conf
file (for example, increasing the number of nodes
requested, the number of tasks to be run concurrently on each compute
node, etc.)
# You can increase maxJobs or tasksPerNode to increase the resources available to Swift # With the default swift.conf, the following will be processed 4 tasks at a time : $ swift p5.swift -steps=5 -nsim=100 Swift 0.96.2 git-rev: 6390483cc61035700e7278ae1a888f27b3bded2b heads/release-0.96-swift 6286 RunID: run001 Progress: Thu, 22 Jan 2015 17:35:01-0600 Progress: Thu, 22 Jan 2015 17:35:02-0600 Submitting:100 Progress: Thu, 22 Jan 2015 17:35:16-0600 Submitted:100 Progress: Thu, 22 Jan 2015 17:35:27-0600 Submitted:96 Active:4 Progress: Thu, 22 Jan 2015 17:35:52-0600 Submitted:92 Active:4 Finished successfully:4 Progress: Thu, 22 Jan 2015 17:36:17-0600 Submitted:92 Active:3 Stage out:1 Finished successfully:4 Progress: Thu, 22 Jan 2015 17:36:18-0600 Submitted:88 Active:4 Finished successfully:8 ... Progress: Thu, 22 Jan 2015 17:46:27-0600 Stage out:1 Finished successfully:99 Progress: Thu, 22 Jan 2015 17:46:40-0600 Stage in:1 Finished successfully:100 Progress: Thu, 22 Jan 2015 17:46:53-0600 Active:1 Finished successfully:100 Final status: Thu, 22 Jan 2015 17:46:53-0600 Finished successfully:101 # From the time-stamps it can be seen that run001 took ~12minutes, with only 4 jobs active at # any given time # The following run was done with swift.conf modified to use higher tasksPerNode and maxJobs # maxJobs : 2 # Increased from 1 # tasksPerNode : 15 # Increased from 4 $ swift p5.swift -steps=5 -nsim=100 Swift 0.96.2 git-rev: 6390483cc61035700e7278ae1a888f27b3bded2b heads/release-0.96-swift 6286 RunID: run002 Progress: Thu, 22 Jan 2015 17:30:35-0600 Progress: Thu, 22 Jan 2015 17:30:36-0600 Submitting:100 Progress: Thu, 22 Jan 2015 17:30:49-0600 Submitted:100 Progress: Thu, 22 Jan 2015 17:31:04-0600 Submitted:85 Active:15 Progress: Thu, 22 Jan 2015 17:31:05-0600 Stage in:8 Submitted:77 Active:15 Progress: Thu, 22 Jan 2015 17:31:06-0600 Submitted:70 Active:30 Progress: Thu, 22 Jan 2015 17:31:30-0600 Submitted:55 Active:30 Finished successfully:15 Progress: Thu, 22 Jan 2015 17:31:31-0600 Submitted:53 Active:29 Stage out:1 Finished successfully:17 Progress: Thu, 22 Jan 2015 17:31:32-0600 Stage in:1 Submitted:40 Active:29 Finished successfully:30 Progress: Thu, 22 Jan 2015 17:31:33-0600 Submitted:40 Active:30 Finished successfully:30 ... Progress: Thu, 22 Jan 2015 17:32:23-0600 Active:17 Stage out:1 Finished successfully:82 Progress: Thu, 22 Jan 2015 17:32:24-0600 Active:10 Finished successfully:90 Progress: Thu, 22 Jan 2015 17:32:47-0600 Active:6 Stage out:1 Finished successfully:93 Progress: Thu, 22 Jan 2015 17:32:48-0600 Stage out:1 Finished successfully:99 Progress: Thu, 22 Jan 2015 17:32:49-0600 Stage in:1 Finished successfully:100 Progress: Thu, 22 Jan 2015 17:33:02-0600 Active:1 Finished successfully:100 Final status: Thu, 22 Jan 2015 17:33:02-0600 Finished successfully:101
The p6.swift
script expands the workflow pattern of p5.swift
to
add additional stages to the workflow. This example illustrates how
to specify the common scientific workflow pattern of running a
"preparation" program for each unique simulation.
Here, we generate a dynamic random number "seed" value that will be
used by all of the simulations, and for each simulation, we run a
pre-processing application to generate a unique "bias file" for that
simulation. The bias files contains new random numbers which are
added to the random numbers generated in simulate
. The new workflow
pattern is shown below, followed by the Swift script.
sys::[cat -n ../part06/p6.swift]
Note that the workflow execution pattern is driven by data flow
dependencies. Each simulation depends on the seed value, calculated in
line 42 ( seedfile = genseed(1,simulate_script)
) and on the bias
file, computed and then consumed in these two dependent statements at
lines 50-51:
biasfile = genbias(1000, 20, simulate_script); (simout,simlog) = simulation(steps, range, biasfile, 1000000, values, simulate_script, seedfile);
To run:
$ cd swift-tutorial/part06 $ swift p6.swift Swift 0.96.2 git-rev: 6390483cc61035700e7278ae1a888f27b3bded2b heads/release-0.96-swift 6286 RunID: run001 Progress: Thu, 22 Jan 2015 17:54:47-0600 *** Script parameters: nsim=10 range=100 num values=10 Progress: Thu, 22 Jan 2015 17:54:48-0600 Submitting:11 Progress: Thu, 22 Jan 2015 17:55:01-0600 Submitted:11 Progress: Thu, 22 Jan 2015 17:55:08-0600 Stage in:3 Submitted:8 Progress: Thu, 22 Jan 2015 17:55:09-0600 Submitted:7 Active:4 Progress: Thu, 22 Jan 2015 17:55:29-0600 Submitted:4 Active:4 Finished successfully:3 Progress: Thu, 22 Jan 2015 17:55:32-0600 Submitted:3 Active:4 Finished successfully:4 Progress: Thu, 22 Jan 2015 17:55:49-0600 Stage in:3 Submitted:6 Active:1 Finished successfully:7 Progress: Thu, 22 Jan 2015 17:55:50-0600 Submitted:6 Active:4 Finished successfully:7 Progress: Thu, 22 Jan 2015 17:55:52-0600 Submitted:6 Active:3 Stage out:1 Finished successfully:7 Progress: Thu, 22 Jan 2015 17:56:10-0600 Submitted:6 Active:4 Finished successfully:11 Progress: Thu, 22 Jan 2015 17:56:31-0600 Stage in:2 Submitted:4 Active:2 Finished successfully:13 Progress: Thu, 22 Jan 2015 17:56:32-0600 Submitted:2 Active:4 Finished successfully:15 Progress: Thu, 22 Jan 2015 17:56:53-0600 Active:2 Finished successfully:19 Progress: Thu, 22 Jan 2015 17:57:14-0600 Stage in:1 Finished successfully:21 Final status: Thu, 22 Jan 2015 17:57:16-0600 Finished successfully:22 # which produces the following output: $ ls output/ average.log bias_1.dat bias_4.dat bias_7.dat seed.dat sim_1.log sim_2.out sim_4.log sim_5.out sim_7.log sim_8.out average.out bias_2.dat bias_5.dat bias_8.dat sim_0.log sim_1.out sim_3.log sim_4.out sim_6.log sim_7.out sim_9.log bias_0.dat bias_3.dat bias_6.dat bias_9.dat sim_0.out sim_2.log sim_3.out sim_5.log sim_6.out sim_8.log sim_9.out # Each sim_N.out file is the sum of its bias file plus newly "simulated" random output scaled by 1,000,000: $ cat output/bias_0.dat 302 489 81 582 664 290 839 258 506 310 293 508 88 261 453 187 26 198 402 555 $ cat output/sim_0.out 64000302 38000489 32000081 12000582 46000664 36000290 35000839 22000258 49000506 75000310
(For simplicity, we produce a fixed number of values in each bias file. Simulations ignore any unneeded bias numbers, or use the last bias number repeatedly as needed).
Note
|
As an exercise, modify the example scripts and apps to produce the same number of bias values as are needed for each simulation. As a further exercise, modify the script to generate a unique seed value for each simulation, which is a common practice in ensemble computations. |
In example part07
we use a simple MPI Mandelbrot application that
generates fractal images. We run this application with a range of
parameters that determine the level of detail in the mandelbrot image,
and create a sequence of images, which are then stitched together to
create a montage and a movie to show the impact of the parameter
values on the geometry.
The application takes the resolution of the image, an mpi strategy and
the number of iterations computed per point in the problem space. The
swift script itself invokes a wrapper script run_mandelbrot
which
encapsulated the site-specific differences in how MPI applications
need to be invoked for multi-node program invocations. This script in
turn executes the MPI application mandelbrot
that has been compiled
and installed on the Stampede
and Blacklight
sites.
The workflow invokes the MPI application mandelbrot
across a range
of values for the parameter iterations
, which determine the number
of iterations per point in fractal space. The higher the number of
iterations, the higher the degree of detail in the generated
Mandelbrot fractal image. The foreach
loop describes the parameter
sweep.
The results generated from the the mandelbrot
application are
assembled by the application assemble
. At the end of each invocation
of the mandelbrot
application, the generated image files are staged
back to the local machine. The assemble
step stitches these results
into a "movie" file output/mandel.gif
and a montage image
output/montage.jpg
. This processing is done on the site localhost
,
as it does not benefit from running on a 16-core compute node. Hence
the assemble
application is only defined for the site localhost
in
the swift.conf
, which ensures that the assemble
application runs
only on the local machine.
Currently, for running MPI applications, each Swift worker manages one parallel job resource/site job at a time, and can run one MPI job at a time. Multiple MPI applications can be invoked, one at a time, within the same resource job. If enough resources were available, multiple MPI jobs could be invoked in parallel using multiple jobs on the site.
sys::[cat -n ../part07/p7.swift]
Note
|
Source the mpi_setup.sh script in the part07 folder before running the swift scripts.
|
cd swift-tutorial/part07 source mpi_setup.sh
To run:
$ cd swift-tutorial/part07 $ source mpi_setup.sh # Dont forget to do this, once! $ swift -sites blacklight,localhost p7.swift Swift 0.96.2 git-rev: 6390483cc61035700e7278ae1a888f27b3bded2b heads/release-0.96-swift 6286 RunID: run001 Progress: Sun, 26 Jul 2015 18:29:04-0400 i = 10 i = 15 i = 5 i = 20 Progress: Sun, 26 Jul 2015 18:29:05-0400 Submitting:4 Progress: Sun, 26 Jul 2015 18:29:18-0400 Submitted:4 Progress: Sun, 26 Jul 2015 18:29:21-0400 Stage in:1 Submitted:3 Progress: Sun, 26 Jul 2015 18:29:22-0400 Submitted:3 Active:1 Progress: Sun, 26 Jul 2015 18:29:45-0400 Submitted:2 Active:1 Finished successfully:1 Progress: Sun, 26 Jul 2015 18:30:12-0400 Submitted:1 Active:1 Finished successfully:2 Progress: Sun, 26 Jul 2015 18:30:35-0400 Stage in:1 Finished successfully:3 Progress: Sun, 26 Jul 2015 18:30:36-0400 Active:1 Finished successfully:3 Progress: Sun, 26 Jul 2015 18:30:58-0400 Stage out:1 Finished successfully:3 Progress: Sun, 26 Jul 2015 18:30:59-0400 Active:1 Finished successfully:4 Final status: Sun, 26 Jul 2015 18:31:02-0400 Finished successfully:5
This produces the following output:
$ ls output/ assemble.err mandel_0005.err mandel_0005.out mandel_0010.jpg mandel_0015.err mandel_0015.out mandel_0020.jpg mandel.gif assemble.out mandel_0005.jpg mandel_0010.err mandel_0010.out mandel_0015.jpg mandel_0020.err mandel_0020.out montage.jpg
The files mandel_NNNN.out
and mandel_NNNN.err
are the stdout
and
stderr
from the mandelbrot
MPI app. mandel_NNNN.jpg
is the
fractal image generated by each invocation of the application. The
file mandel.gif
is the animated GIF movie generated, and
montage.jpg
is a montage of the generated images.
To see the images, start the webserver
application, which is
provided in the part07/bin
directory and included in your PATH
by
mpi_setup.sh
:
$ webserver
As the webserver starts, it prints the port number that it will listen
on. For this tutorial, the port number should be 60000 plus your
"train" login number (the last two digits of your username. I.e., if
you are using train23
, your webserver will listen on port 60023.
To see the output go to the following URLs on your browser, being sure to replace the "NN" in 600NN with your training username number. For example:
http://workflow.iu.xsede.org:60023/output/montage.jpg http://workflow.iu.xsede.org:60023/output/mandel.gif
This concludes the XSEDE tutorial. Please look for further information on Swift at http://swift-lang.org, and join the community via the email lists at http://swift-lang.org/support.
We thank you for your time and interest, and welcome your suggestions for improvements to this tutorial and to Swift!
This tutorial is based on two trivial example programs,
simulate.sh
and stats.sh
, (implemented as bash shell scripts)
that serve as easy-to-understand proxies for real science
applications. These "programs" behave as follows.
The simulation.sh script serves as a trivial proxy for any more complex scientific simulation application. It generates and prints a set of one or more random integers in the range [0-2^62) as controlled by its command line arguments, which are:
$ ./app/simulate.sh --help ./app/simulate.sh: usage: -b|--bias offset bias: add this integer to all results [0] -B|--biasfile file of integer biases to add to results [none] -l|--log generate a log in stderr if not null [y] -n|--nvalues print this many values per simulation [1] -r|--range range (limit) of generated results [100] -s|--seed use this integer [0..32767] as a seed [none] -S|--seedfile use this file (containing integer seeds [0..32767]) one per line [none] -t|--timesteps number of simulated "timesteps" in seconds (determines runtime) [1] -x|--scale scale the results by this integer [1] -h|-?|?|--help print this help $
All of these arguments are optional, with default values indicated above as [n]
.
With no arguments, simulate.sh prints 1 number in the range of 1-100. Otherwise it generates n numbers of the form (R*scale)+bias where R is a random integer. By default it logs information about its execution environment to stderr. Here is some examples of its usage:
$ simulate.sh 2>log 5 $ head -4 log Called as: /home/wilde/swift/tut/CIC_2013-08-09/app/simulate.sh: Start time: Thu Aug 22 12:40:24 CDT 2013 Running on node: login01.osgconnect.net $ simulate.sh -n 4 -r 1000000 2>log 239454 386702 13849 873526 $ simulate.sh -n 3 -r 1000000 -x 100 2>log 6643700 62182300 5230600 $ simulate.sh -n 2 -r 1000 -x 1000 2>log 565000 636000 $ time simulate.sh -n 2 -r 1000 -x 1000 -t 3 2>log 336000 320000 real 0m3.012s user 0m0.005s sys 0m0.006s
The stats.sh script serves as a trivial model of an "analysis" program. It reads N files each containing M integers and simply prints the average of all those numbers to stdout. Similar to simulate.sh it logs environmental information to the stderr.
$ ls f* f1 f2 f3 f4 $ cat f* 25 60 40 75 $ stats.sh f* 2>log 50
-
Swift scripts are text files ending in
.swift
Theswift
command runs on any host, and executes these scripts.swift
is a Java application, which you can install almost anywhere. On Linux, just unpack the distributiontar
file and add itsbin/
directory to yourPATH
. -
Swift scripts run ordinary applications, just like shell scripts do. Swift makes it easy to run these applications on parallel and remote computers (from laptops to supercomputers). If you can
ssh
to the system, Swift can likely run applications there. -
The details of where to run applications and how to get files back and forth are described in configuration files that are separate from your script. Swift speaks ssh, PBS, Condor, SLURM, LSF, SGE, Cobalt, and Globus to run applications, and scp, http, ftp, and GridFTP to move data.
-
The Swift language has 5 main data types:
boolean
,int
,string
,float
, andfile
. Collections of these are dynamic, sparse arrays of arbitrary dimension and structures of scalars and/or arrays defined by thetype
declaration. -
Swift file variables are "mapped" to external files. Swift sends files to and from remote systems for you automatically.
-
Swift variables are "single assignment": once you set them you can not change them (in a given block of code). This makes Swift a natural, "parallel data flow" language. This programming model keeps your workflow scripts simple and easy to write and understand.
-
Swift lets you define functions to "wrap" application programs, and to cleanly structure more complex scripts. Swift
app
functions take files and parameters as inputs and return files as outputs. -
A compact set of built-in functions for string and file manipulation, type conversions, high level IO, etc. is provided. Swift’s equivalent of
printf()
istracef()
, with limited and slightly different format codes. -
Swift’s parallel
foreach {}
statement is the workhorse of the language. It can execute all iterations of the loop concurrently. The actual number of parallel tasks executed is based on available resources and settable "throttles". -
Swift conceptually executes all the statements, expressions and function calls in your program in parallel, based on data flow. These are similarly throttled based on available resources and settings.
-
Swift has
if
andswitch
statements for conditional execution. These are seldom needed in simple workflows but they enable very dynamic workflow patterns to be specified.
We will see many of these points in action in the examples below. Lets get started!
Starting with Part04, the tutorial is designed to run on remote computational resources. The following sections outline the steps required to enable swift to run tasks remotely.
Setting up ssh-keys for password-less acccess : How-to-passwordless-login
Swift allows you to run you applications on multiple sites that you have access to. Let’s say you would like to run you applications on Stampede and Gordon
-
Ensure you have enable ssh keys for passwordless access to the both stampede & gordon
-
Set the site specific variables for both sites in the swift-tutorial/setup.sh file.
-
Set the following line in the swift-tutorial/swift.conf file:
sites: [stampede, gordon]
The TACC Stampede* system is a 10 PFLOPS (PF) Dell Linux Cluster based on 6400+ Dell PowerEdge server nodes, each outfitted with 2 Intel Xeon E5 (Sandy Bridge) processors and an Intel Xeon Phi Coprocessor (MIC Architecture). Here’s a great reference for stampede: Stampede User Guide
Here are the steps to run the tutorial on Stampede:
Note
|
The preferred way to run the tutorial is from the stampede login nodes rather than from a remote system. |
-
Ensure you have enabled ssh keys for passwordless access to the Stampede login nodes (Only necessary if running from remote)
-
If you are running on login<ID>.stampede.tacc.utexas.edu, set jobManager: "local:slurm"
-
Set workDirectory to /tmp/your_username_on_stampede
-
Set the following line in the swift-tutorial/swift.conf file.
sites: [stampede]
Note
|
Stampede uses Lustre parallel shared filesystem. The environment variables $HOME, $WORK, $SCRATCH point at different Lustre filesystems all of which are accessible from the login and compute nodes. |
Note
|
There’s a limit of one job per user on the development queue (∴ maxJobs=1) |
# List queues and status sinfo -o "%20P %5a %.10l %16F" # List your jobs and state showq -u $USER # Interactive shell for debugging: srun -p development -t 0:30:00 -n 32 --pty /bin/bash -l
Blacklight is an SGI UV 1000cc-NUMA shared-memory system comprising 256 blades. Each blade holds 2 Intel Xeon X7560 (Nehalem) eight-core processors, for a total of 4096 cores across the whole machine. Here’s documentation for Blacklight: Blacklight User Guide
Here are the steps to run the tutorial on Blacklight:
The preferred way to run the tutorial is from the Blacklight login nodes rather than from a remote system.
-
Ensure you have enabled ssh keys for passwordless access to the Blacklight login nodes (Only necessary if running from remote)
-
If you are running on the login nodes, set jobManager: "local:pbs"
-
Set workDirectory to /tmp/your_username_on_blacklight
-
Set the following line in the swift-tutorial/swift.conf file.
sites: [blacklight]
Note
|
Blacklight has $WORK, $HOME mounted on a shared filesystem. |
Notes:
# List queues and status qstat -q # List your jobs and state qstat -u $USER
Gordon is an XSEDE cluster at SDSC with 1024 16-core compute nodes and 64 I/O nodes. Detailed documentation can be found in the Gordon User Guide.
Warning
|
The swift client cannot run on the gordon login nodes due to memory limits on the machine. Swift must be run from a remote location. |
Here are the steps to run the tutorial on Gordon:
-
Ensure you have enabled ssh keys for passwordless access to the Gordon login nodes
-
Set workDirectory to /tmp/your_username_on_blacklight
-
Set the following line in the swift-tutorial/swift.conf file.
sites: [gordon]
Notes:
# List queues and status qstat -q # List your jobs and state qstat -u $USER
Trestles is a dedicated XSEDE cluster designed by Appro and SDSC consisting of 324 compute nodes. Each compute node contains four sockets, each with an 8-core 2.4 GHz AMD Magny-Cours processor, for a total of 32 cores per node and 10,368 total cores for the system. . Here’s documentation for Trestles: Trestles User Guide
Warning
|
The swift client cannot run on the gordon login nodes due to memory limits on the machine. Swift must be run from a remote location. |
Here are the steps to run the tutorial on Trestles:
-
Ensure you have enabled ssh keys for passwordless access to the Trestles.
-
Set workDirectory to /tmp/your_username_on_blacklight
-
Set the following line in the swift-tutorial/swift.conf file.
sites: [trestles]
Notes:
# List queues and status qstat -q # List your jobs and state qstat -u $USER