-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
467 lines (355 loc) · 19.6 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
The ParGAP package
The ParGAP (Parallel GAP) package provides a way of writing parallel
programs using the GAP language. Former names of the package were
ParGAP/MPI and GAP/MPI; the word MPI refers to Message Passing Interface,
a well-known standard for parallelism. ParGAP is based on the MPI
standard, and this distribution includes a subset implementation of MPI,
to provide a portable layer with a high level interface to BSD sockets.
Since knowledge of MPI is not required for use of this software, we now
refer to the package as simply ParGAP. For more information visit the
author's ParGAP home page at:
http://www.ccs.neu.edu/home/gene/pargap.html
ParGAP may be obtained as `pargap-XXX.zoo' (for some version number XXX)
from the same places as GAP.
`pargap' is available for download via the GAP www page at
http://www-gap.mcs.st-and.ac.uk/Packages/packages.html
or, alternatively, `pargap-XXX.tar.gz' (which is assured to be the
most recent version) can be obtained from the author's ftp site:
ftp://ftp.ccs.neu.edu/pub/people/gene/pargap/
ParGAP has been tested on Linux (ELF), Solaris 2.6, OSF 1 (alpha) and OS X.
MPI Libraries
The ParGAP package uses an MPI (Message Passing Interface) library to
communicate between processes. One such library, called MPINU, is
included with this package, but you can also run ParGAP using a version
of MPI that is already present on your system. When you install
ParGAP it will try to find and use a system MPI implementation, and if
not then it will use its own MPINU. MPINU only works on Unix variants
such as Linux, Solaris and OS X, but it should be possible to port it to
Windows under Cygwin.
Using a system MPI implementation is recommended: they can have better
performance, support more systems and are more robust. Two popular
implementations that are known to work with ParGAP are
MPICH2 http://www.mcs.anl.gov/research/projects/mpich2/
Open MPI http://www.open-mpi.org/
They can be downloaded from their website, or may be available via your
operating system's standard package management mechanism.
This version of ParGAP has an issue on Macs when when using both MPINU2
and a system MPI implementation, so we recommend using the original MPINU
library on these systems.
Installing the ParGAP package
To install the ParGAP package, move the file `pargap-XXX.zoo' or
`pargap-XXX.tar.gz' into the `pkg' directory in which you plan to install
ParGAP. Usually, this will be the directory `pkg' in the hierarchy of
your version of GAP 4. Also note that currently it is not possible to
have the `pkg' directory separate from GAP's `pkg' directory; we hope
to remedy this in future versions of ParGAP (so that it will also
possible to keep an additional `pkg' directory in your private
directories; section "ref:Installing GAP Packages" of the GAP 4 reference
manual gives details on how to do this, when it's possible.) (If you are
not a system administrator and your system administrator won't install
ParGAP for you on the system and you don't have enough disk space in your
own directory to create a whole new GAP, what you can do is create the
illusion of having a complete version of GAP in your own directory using
symbolic links (sorry! currently that's all we can offer.)
Now change into the `pkg' directory in which you plan to install ParGAP.
If you got a `.zoo' file, unpack it with:
unzoo -x pargap-XXX
If you got a `.tar.gz' file and your `tar' command supports the `z'
option, unpack it with:
tar zxf pargap-XXX.tar.gz
or otherwise unpack in two steps with:
gunzip pargap-XXX.tar
tar xvf pargap-XXX.tar
Whether you got the `.zoo' or `.tar.gz' archive you should now have a new
directory `pargap'. As for a generic GAP package, do:
cd pargap
./configure
make
If you have a system-wide implementation of MPI (such as MPICH2 or Open
MPI) then GAP will also need to be rebuilt, and configure will stop with
a message about this. If you are content for GAP to be rebuilt with no
special configure options then run the ParGAP configure with
./configure --with-basic-gap-configure
otherwise, follow the instructions given after running the ParGAP
./configure.
Your ParGAP should now be ready to use. In the `bin' subdirectory there
will be a script
pargap.sh
which you should use to start ParGAP. Edit the script if necessary, copy
it to a standard path and rename it according to how you intend to call
ParGAP (e.g. rename it: `pargap').
Running ParGAP
To run ParGAP when built with a system MPI library, you need to use your
system's MPI launcher. With both MPICH and Open MPI this is called
mpiexec, and you can type
mpiexec -n 3 pargap
to run three copies of ParGAP, i.e. one master and two slaves (this
assumes that you have renamed the script as `pargap').
If ParGAP was built using MPINU then you should run ParGAP by calling
`pargap' directly. In this case, it looks for a `procgroup' file which
defines the master and slave processes that will be used by ParGAP.
A sample `procgroup' file can be found in the `bin' subdirectory, and
when ParGAP is started this should be in the current directory, or the
full path to the file supplied using the `-p4pg' option. Thus if you
renamed your shell script `pargap', the following are valid ways of
starting ParGAP:
pargap
(if current directory contains the file: `procgroup'), or
pargap -p4pg myprocgroupfile
(where `myprocgroupfile' is the complete path of your procgroup file -
there is no restriction on how you name it).
If you had trouble installing ParGAP, please see the next section of this
file. Otherwise, try it out:
gap> # This assumes your procgroup file includes two slave processes.
gap> PingSlave(1); #a `true' response indicates Slave 1 is alive
true
gap> # Print() on slave appears on standard output
gap> # i.e. after the master's prompt.
gap> SendMsg( "Print(3+4)" );
gap> 7
gap> # A <return> was input above to get a fresh prompt.
gap> #
gap> # To get special characters (including newline: `\n')
gap> # into a string, escape them with a `\'.
gap> SendMsg( "Print(3+4,\"\\n\")" );
gap> 7
gap> # Again, a <return> was input above after the 7 and new-line
gap> # were printed to get a fresh prompt.
gap> #
gap> # Each SendMsg() is normally balanced by a RecvMsg().
gap> SendMsg( "3+4", 2);
gap> RecvMsg( 2 );
7
gap> # The following is equivalent to the two previous commands.
gap> SendRecvMsg( "3+4", 2);
7
gap> # The two SendMsg() commands that were sent to Slave 1 earlier have
gap> # responses that are waiting in the message queue from that slave.
gap> # Check that there is a message waiting. With some MPI implementations
gap> # the message is not immediately available, but when ProbeMsg() does
gap> # return true then RecvMsg() is guaranteed to succeed.
gap> ProbeMsgNonBlocking( 1 );
false
gap> ProbeMsgNonBlocking( 1 );
true
gap> # Print() is a `no-value' functions, and so the result of a RecvMsg()
gap> # in both these cases is "<no_return_val>".
gap> RecvMsg( 1 );
"<no_return_val>"
gap> RecvMsg( 1 );
"<no_return_val>"
gap> # As with Print() the result of Exec() appears on standard
gap> # output, and the result is "<no_return_val>".
gap> SendRecvMsg( "Exec(\"pwd\")" ); # Your pwd will differ :-)
/home/gene
"<no_return_val>"
gap> # Define a variable on a slave
gap> SendRecvMsg( "a:=45; 3+4", 1 );
7
gap> # Note "a" is defined on slave 1, not slave 2.
gap> SendMsg( "a", 2 ); # Slave prints error, output on master
gap> Variable: 'a' must have a value
gap> # <return> entered to get fresh prompt.
gap> RecvMsg( 2 ); # No value for last SendMsg() command
"<no_return_val>"
gap> RecvMsg( 1 );
45
gap> # Execute analogue of GAP's List() in parallel on slaves.
gap> squares := ParList( [1..100], x->x^2 );
[ 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256,
289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841,
900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600,
1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601,
2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844,
3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329,
5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056,
7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025,
9216, 9409, 9604, 9801, 10000 ]
gap> # Send a large, local (non-remote) data structure to a slave
gap> Concatenation("x := ", PrintToString([1..10]*2));
"x := [ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 ]\n\000"
gap> SendMsg( Concatenation("x := ", PrintToString([1..10]*2)) );
gap> RecvMsg();
[ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 ]
gap> # Send a local (non-remote) function to a slave
gap> myfnc := function() return 42; end;;
gap> # Use PrintToString() to define myfnc on all slave processes
gap> BroadcastMsg( PrintToString( "myfnc := ", myfnc ) );
gap> SendRecvMsg( "myfnc()", 1 );
42
gap> # Ensure problem shared data is read into master and slaves.
gap> # Try one of your GAP program files instead.
gap> ParRead( "/home/gene/myprogram.g");
The ParGAP package was designed and written by:
Gene Cooperman
College of Computer Science
Northeastern University, Boston, MA, U.S.A.
If you use ParGAP to solve a problem then please send a short email to
`gene@ccs.neu.edu' about it, and reference the ParGAP package as follows:
\bibitem[Coo99]{Coo99}
Cooperman, Gene,
{\sl Parallel GAP/MPI (ParGAP/MPI)}, Version 1,
College of Computer Science, Northeastern University, 1999,
\verb|http://www.ccs.neu.edu/home/gene/pargapmpi.html|.
=========================================================================
Troubleshooting
General problems
0. If you are using ParGAP on a Mac with MPINU2 or a system MPI
implementation then {\ParGAP} may consistently crash on startup. If
this is the case then try using MPINU instead by reconfiguring ParGAP
with
./configure --with-mpi=MPINU
This is a known issue which will be fixed in a forthcoming version.
1. Do you have enough swap space to support multiple GAP processes? A
simple way to check this is with the UNIX command, `top'. The Linux
version of `top' sorts by memory usage if you type `M'.
2. `make' tries to automatically create:
pkg/pargap/bin/pargap.sh
and copy the parameters from `<GAP_ROOT>/bin/gap.sh'. <GAP_ROOT> was
specified when you executed `./configure <GAP_ROOT>' to install
ParGAP. This can be error-prone if your site has an unusual setup. If
you execute `<GAP_ROOT>/bin/gap.sh', does gap come up? If so, compare
it with `pargap.sh' and check for correct settings in
`.../pkg/pargap/bin/pargap.sh'?
3. Were the remote slave processes able to start up? If so, could they
connect back to the master? To test connectivity problems, try
manually starting a remote slave by executing a line in the script.
Try a simple `ssh remote_hostname' to see if the issue is with
security.
4. If the previous step failed due to security issues, such as
requesting a password, you have several options. `man ssh' tells you
the security model at your site. Then read "Problems with Passwords
(Getting Around Security)" in the ParGAP manual in the `doc' directory.
5. Is `pargap' listed in `.../pkg/ALLPKG'?
[It's needed to autostart slaves.]
6. Inside ParGAP, has MPI been successfully initialized?
Try:
gap> MPI_Initialized();
7. A remote (slave) ParGAP process starts in your home directory and
tries to cd to a directory of the same name as your local directory.
Check your assumptions about the remote machine. Try:
gap> SendRecvMsg("Exec(pwd)"); SendRecvMsg("UNIX_Hostname()");
gap> SendRecvMsg("UNIX_Getpid()");
8. Every ParGAP slave process displays its GAP banner and startup
messages on the terminal of the master process. If you have many
slaves and do not wish to see these messages, then pass the `-b'
and/or `-q' switches to {\ParGAP} when it starts, to disable the
banner or all messages respectively. See the GAP Reference Manual for
further details.
9. Read the documentation for further possible problems.
Problems when using MPINU
1. Did ParGAP find your `procgroup' file?
[It looks in the current directory for `procgroup', or for:
... -p4pg PATH/procgroup
on the command line.]
2. Is the `procgroup' file in your current directory set correctly?
Test it. If you are calling it on a remote host, manually type:
ssh <HOSTNAME> <ParGAP>
where <HOSTNAME> and <ParGAP> appear exactly as in `procgroup', e.g.
ssh denali.ccs.neu.edu /usr/local/gap4r3/bin/pargap.sh
In some cases, `exec' is used to save process overhead. Also try:
ssh <HOSTNAME> exec <ParGAP>
If you plan to call it on localhost, try just: <ParGAP>
Note that if not all the slave processes succeed in connecting
to the master, then ParGAP writes out a file:
/tmp/pargapmpi-ssh.$$
where $$ is replaced by the the process id of the ParGAP process.
3. If the connection dies at random, after some period of time:
You can experiment with SO_KEEPALIVE and variants. (man setsockopt)
This periodically sends *null messages* so the remote machine does
not think that the originating machine is dead. However, if the
remote machine fails to reply, the local process sends a SIGPIPE
signal to notify current processes of a broken socket, even though
there might have been only a temporary lapse in connectivity.
`ssh' specifies `KeepAlive yes' by default, but setting `KeepAlive no'
might get you through some transient lapses in connectivity due to
high congestion.
You may also want to experiment with: `setenv SSH "ssh -n"'
4. If a host is on multiple networks, it will have multiple IP addresses
and usually multiple hostnames. In this case, the master process
cannot always guess correctly which IP address (which internet
address) should be passed to the slave process, so that the slave
process can call back to the master. In such cases, you may need to
tell {\ParGAP} which hostname or IP address to use for the callback.
This is done by setting the UNIX environment variable,
`CALLBACK_HOST', as in the example below.
# [ in sh/bash/... ]
CALLBACK_HOST=denali.ccs.neu.edu; export CALLBACK_HOST
# [ in csh/tcsh/... ]
setenv CALLBACK_HOST=denali.ccs.neu.edu
The appropriate line for your shell can be placed in your shell
initialization file. Alternatively, you can set this up for all users
by placing the Bourne shell version (for `sh') somewhere between the
first and last line of `.../pkg/pargap/bin/pargap.sh'.
5. ParGAP is supplied with two different versions of MPINU: the original
MPINU and a later version, MPINU2, and it will also work with other
MPI libraries if they are present on your system. By default, if you
do not have a system MPI implementation then MPINU2 is used. If you
have problems which appear to be MPI-related, try rebuilding ParGAP
with a different MPI library. For example, to use MPINU instead of
MPINU2 then run configure using
./configure --with-mpi=MPINU
Problems when using a system MPI library
1. Line editing at the GAP command prompt is unlikely to work when
ParGAP is invoked with an MPI launcher, since they tend to do their
own processing of the terminal I/O (stdin/stdout/stderr) which does
not work well either the readline library used in newer versions of
GAP or the in-built terminal editing in earlier versions of GAP. It
may be useful to run ParGAP through the `rlwrap' utility, if
available. For example, if ParGAP is run using `mpiexec', then try
rlwrap mpiexec -n 3 pargap
This should restore some of the line editing, although tab completion
is limited to commands that `rlwrap' has already seen you use. For
more information, try `man rlwrap'.
2. The command `FlushAllMsgs()' is not available when using a system MPI
implementation, since it tests show that `ProbeMsgNonBlocking()',
which it uses cannot be relied upon to always return `true' the first
time that it is called after a message has been sent. If your system
MPI implementation does exhibit this desired behaviour for
`ProbeMsgNonBlocking()' then you can install your own local copy of
`FlushAllMsgs()' by copying the code for this function from
`lib/slavelist.g', removing the `if' statement and renaming the
function.
3. The command `ParReset()' (see~"ParReset") is not available when using
a system MPI implementation. When using a MPINU library, the slaves
are launched by ParGAP itself and so can be contacted and restarted,
but with a system MPI library the slaves are launched by `mpiexec'
(or whichever MPI launcher you use) and so cannot be reset from
within ParGAP. There is no known workaround for this.
4. GAP and, in particular, the \package{IO} Package install handlers for
the SIGCHLD signal. Many implementations of MPI also install their
own SIGCHLD handler, which may then conflict with {\ParGAP}. Testing
has revealed no issues, but we cannot guarantee that there will be no
interaction between the two. In particular, this may result in
temporary files not being cleaned up properly.
5. The GAP memory manager, GASMAN, can run into problems extending the
GAP workspace if external libraries use `malloc' to allocate their
own memory. MPINU avoids the use of `malloc' as much as possible, but
system MPI implementations may not be as careful. This can be
resolved by starting ParGAP with the `-s' command-line switch, which
asks ParGAP to pre-allocate memory before it starts. You can safely
pre-allocate more memory than you will actually need since physical
memory will only be mapped when it is actually used, so for example
you could allocate 3Gb:
mpiexec -n 3 pargap -s 3g
The `-a' and `-m' switches can also be used to control memory usage.
See the GAP Reference Manuel for further information.
News of any other issues or solutions would be gratefully accepted.
=========================================================================
Final Notes
Note that this package modifies the GAP `src' and `bin' files, and
creates a new GAP kernel. This new GAP kernel can be shared by
traditional users of the old, sequential GAP kernel, and by those doing
parallel processing.
The GAP kernel will have identical behavior to the old GAP kernel when
invoked through the gap.sh script or the `bin/@GAParch@/gap' binary. The
new ParGAP variables will appear to the end user _ONLY_ if the GAP binary
was invoked as `pargapmpi': a symbolic link to the actual GAP binary. The
script, `pargap.sh', does this.
So, in a multi-user environment, traditional users can continue to use
`gap.sh' without noticing any difference. Only an invocation as
`pargap.sh' will add the new features.
Comments and contributions to a ParGAP user library, or any other type of
assistance, are gratefully accepted.
Gene Cooperman
gene@ccs.neu.edu