Skip to content

Modules

Graham Gower edited this page Oct 2, 2018 · 52 revisions

In order to provide users with access to a range of software, the ACAD servers provide a number of software modules via lmod. Typically, the user will need to load a module or modules prior to running an analysis. Please read the lmod documentation for the complete details on how the module system works. Below is a basic introduction, including a more specific description on how modules are used on the ACAD servers.

Quick start

Avoid modules with foss-2016a in the name, instead favour modules with foss-2016b.

Add the following to your $HOME/.bash_profile; then logout and login again.

export MODULEPATH=/data/acad/apps/modules/all:$MODULEPATH

Delete module cache, which may be required the first time you update your MODULEPATH, or if you require software that has been very recently installed.

$ rm -r $HOME/.lmod.d

List all available modules

$ module avail

Search for modules named 'python' (case insensitive)

$ module spider python

Load a module named 'Python' (case sensitive)

$ module load Python

Load a specific version of the Python module

$ module load Python/2.7.13-foss-2016b

Unload the Python module

$ module unload Python

Unload all modules

$ module purge

List currently loaded modules

$ module list

How commands are executed

When you type a command in your Terminal, there is some software known as the shell which interprets the command. The default shell on Mac and many Linux systems is bash, but other shells behave similarly. A command typed into the shell may refer to either a shell builtin (such as cd and echo), a shell alias/function, or a standalone program (such as vim or gzip). The shell first compares your command with its list of builtins, if not found it then compares the command with aliases and functions, and if still not found it looks for standalone programs in your PATH. If the command exists in multiple places, the first one takes precedence (builtin,alias,function,order in PATH) and if no corresponding command is found, an error is reported.

PATH is a special environment variable containing a colon separated list of directories. You can look at the value of an environment variable using the echo builtin. Note that when referencing an environment variable, the variable has a dollar symbol prefix (the prefix is not used when setting an environment variable).

$ echo $PATH
/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin

The above is a standard set of locations for binary files on Unix-like systems. The type builtin can be used to determine whether a command is a builtin, alias, function, or a standalone program.

$ type type
type is a shell builtin
$ type ls
ls is aliased to `ls --color=auto'
$ type vim
vim is /usr/bin/vim
$ type fixworldpoverty
-bash: type: fixworldpoverty: not found

Note that when we asked for the type of command vim, we were told its location. For the ls command, we were told it is an alias and what the alias is. An alias is a simple translation of one command into another command, and in this case ls is an alias to ls --color=auto. In fact, ls is a standalone program and the alias gives it a default parameter without it having to be typed in full each time. We can find the location of ls in the PATH too, using the which command.

$ which ls
alias ls='ls --color=auto'
        /bin/ls

The module system modifies your PATH

The module command can be used to load or unload software. Loading software causes new directories to be prepended to your PATH.

$ echo $PATH
/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
$ module load Python
$ echo $PATH
/apps/software/Python/2.7.13-foss-2016b/bin:/apps/software/SQLite/3.13.0-foss-2016b/bin:/apps/software/Tcl/8.6.5-foss-2016b/bin:/apps/software/libreadline/6.3-foss-2016b/bin:/apps/software/ncurses/6.0-foss-2016b/bin:/apps/software/bzip2/1.0.6-foss-2016b/bin:/apps/software/FFTW/3.3.4-gompi-2016b/bin:/apps/software/OpenBLAS/0.2.18-GCC-5.4.0-2.26-LAPACK-3.6.1/bin:/apps/software/OpenMPI/1.10.3-GCC-5.4.0-2.26/bin:/apps/software/hwloc/1.11.3-GCC-5.4.0-2.26/sbin:/apps/software/hwloc/1.11.3-GCC-5.4.0-2.26/bin:/apps/software/numactl/2.0.11-GCC-5.4.0-2.26/bin:/apps/software/binutils/2.26-GCCcore-5.4.0/bin:/apps/software/GCCcore/5.4.0/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin

module is a shell function

$ type module
module is a function
module () 
{ 
    eval $($LMOD_CMD bash "$@");
    [ $? = 0 ] && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
}
$ echo $LMOD_CMD
/usr/share/lmod/lmod/libexec/lmod
$ file /usr/share/lmod/lmod/libexec/lmod
/usr/share/lmod/lmod/libexec/lmod: a /usr/bin/lua script text executable

The module command is a shell function. The shell is a fully fledged programming language, which provides the ability to write functions. type shows us the function definition too. This function evaluates whatever is in the LMOD_CMD environment variable and passes it the 'bash' parameter and also any additional parameters given to the module command ($@). file shows the type of file for the lmod executable. Lmod is written in the Lua programming language.

This appears to be a very awkward and opaque design for calling a program. Please do not follow its example when implementing your own programs.

Location of modules

Looking at the lmod documentation, we find that lmod looks for modules in directories listed in the MODULEPATH environment variable. Actually, when we load a module, lmod looks in the MODULEPATH directories for a script (written in either the TCL or Lua programming languages) with the corresponding name, and then runs that script. The module script for the software is responsible for modifying PATH and other environment variables, and may load additional modules which are dependencies. So the software might really be installed anywhere, as long as the module script for that software modifies the PATH accordingly.

$ echo $MODULEPATH
/apps/modules/all:/usr/share/modulefiles/Linux:/usr/share/modulefiles/Core:/usr/share/lmod/lmod/modulefiles/Core
$ ls /apps/modules/all
AdapterRemoval  BLAST      FastQC         HDF5           metafast   QIIME
AdmixTools      BLAST+     FASTX-Toolkit  HTSlib         NASM       R
Anaconda2       Boost      FFTW           hwloc          ncurses    RAxML
Anaconda3       Bowtie     flex           Java           netCDF     SAMtools
angsd           Bowtie2    foss           Kaiju          numactl    ScaLAPACK
ART             BWA        GATK           libcerf        OpenBLAS   seqtk
Autoconf        bwa-meth   GCC            libevent       OpenMPI    SQLite
Automake        bwa-pssm   GCCcore        libgtextutils  paleomix   Szip
Autotools       bzip2      GDAL           libjpeg-turbo  Pango      tabix
BCFtools        cairo      GDB            libpng         parallel   taxator-tk
bcl2fastq       capnproto  gdsl           libreadline    PCRE       Tcl
bedops          CMake      gettext        LibTIFF        Perl       Tk
BEDOPS          cURL       GHC            libtool        picard     tmux
BEDTools        cutadapt   ghostscript    libxml2        PileOMeth  toolshed
binutils        Doxygen    GMAP-GSNAP     libxslt        PLINK      treemix
bioawk          EasyBuild  GMP            M4             preseq     VCFtools
Biopython       EIGENSOFT  gnuplot        MAFFT          PROJ       XZ
Bismark         ExaML      gompi          mapDamage      Pysam      zlib
Bison           expat      GSL            mash           Python
$ ls /apps/modules/all/Python/
2.7.11-foss-2016a
2.7.13-foss-2016b.lua

ACAD modules

By modifying the MODULEPATH, we can tell lmod to search additional locations for module scripts.

$ ls /data/acad/apps/modules/all
AdapterRemoval  Boost      HTSlib  mapDamage        SAMtools
ART             BWA        HUMAnN  paleomix-meta    SCons
BCFtools        freetype   Kaiju   Python-packages  texlive
BEDTools        grg-utils  LMAT    R-packages
$ ls /data/acad/apps/modules/all/LMAT
1.2.6-foss-2016b.lua
$ export MODULEPATH=/data/acad/apps/modules/all:$MODULEPATH
$ module load LMAT/1.2.6-foss-2016b

You can save your changes to the MODULEPATH by adding the export line to the file $HOME/.bash_profile, a script that is run by your shell when you logon.

foss/2016b is a toolchain

Building software from source typically requires a suite of programs, such as a compiler, linker and libc. This suite of programs is often referred to as a toolchain. The compiler turns a human readable source file into a machine readable object file and the linker links one or more object files, together with any libraries of functions, into an executable which can be run (executed) from the command line. The C library (libc) is a standardised collection of useful functions that may be used by programs and can be included directly into an executable by the linker. Most software on Linux is either written in the C programming language or uses components that are written in C (e.g. Python and R are both implemented in C), and thus depend upon libc.

The ACAD servers have a toolchain installed already, but additional toolchains are provided as modules. One reason for this is that different software can depend upon different toolchain versions. The recommended toolchain module on the ACAD servers is foss/2016b, and most other modules have been built using this toolchain for consistency. There is also a foss/2016a toolchain, however it has been found to produce executables that are unable to run on both ACAD1 and ACAD2 and should thus be avoided. Most modules have a version that indicates the toolchain from which the software was built. Because of the problem with the foss/2016a toolchain, some programs that specify foss-2016a as their version will be unable to run on either ACAD1 or ACAD2. A symptom of this problem is seeing the error: illegal instruction (core dumped).

[a1158147@acad1 ~]$ module purge
[a1158147@acad1 ~]$ module load R/3.3.1-foss-2016a
[a1158147@acad1 ~]$ R
Illegal instruction (core dumped)

The dynamic linker

The libc also provides a linker, the dynamic linker, which enables executables to find library functions at run time in dynamic libraries instead of requiring them to be included directly into the executable by the (static) linker. Most libraries are built in both dynamic and static forms, and can therefore be linked to an executable statically (included at build time) or dynamically (finding the library is deferred until the program is executed). Dynamic linking reduces disk space by allowing many programs that are linked to the library to share the same library file, whereas statically linking requires each program to contain its own copy of the library functions. For this reason, dynamic linking is very common. Also note that an executable may be linked statically to one library and dynamically to another.

The dynamic linker, ld.so, is called implicitly when a dynamically linked program is executed, and attempts to resolve the location of dynamically linked libraries and the location of required functions within those libraries. See the ld.so manual page for more details. We can trace which libraries are resolved by the dynamic linker using the ldd command.

$ module load SAMtools
$ which samtools
/data/acad/apps/software/SAMtools/1.4.1-foss-2016b/bin/samtools
$ ldd `which samtools`
        linux-vdso.so.1 =>  (0x00007fff54dff000)
        libz.so.1 => /apps/software/zlib/1.2.8-foss-2016b/lib/libz.so.1 (0x00007f70c78da000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003fb0800000)
        libbz2.so.1.0 => /apps/software/bzip2/1.0.6-foss-2016b/lib/libbz2.so.1.0 (0x00007f70c78b8000)
        liblzma.so.5 => /apps/software/XZ/5.2.2-foss-2016b/lib/liblzma.so.5 (0x00007f70c7892000)
        libcurl.so.4 => /apps/software/cURL/7.49.1-foss-2016b/lib/libcurl.so.4 (0x00007f70c782c000)
        libcrypto.so.10 => /usr/lib64/libcrypto.so.10 (0x000000312bc00000)
        libncursesw.so.6 => /apps/software/ncurses/6.0-foss-2016b/lib/libncursesw.so.6 (0x00007f70c77c0000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003fb0000000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003fafc00000)
        librt.so.1 => /lib64/librt.so.1 (0x0000003e02200000)
        libssl.so.10 => /usr/lib64/libssl.so.10 (0x0000003e2b600000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003faf800000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003faf400000)
        libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x0000003e2ba00000)
        libkrb5.so.3 => /lib64/libkrb5.so.3 (0x0000003e2be00000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x000000312ac00000)
        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x0000003e2c200000)
        libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x0000003e2c600000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x0000003fb6400000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x0000003fb1c00000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003e2a200000)

We can see that samtools is dynamically linked to a range of libraries. Actually, many of these are indirect - one library can be linked other libraries, and ldd shows all dependencies. The dynamic linker, ld.so, has a list of directories that it searches in order to resolve libraries. The dynamic linker first tries directories built in to the executable when it was built (known as the rpath), then it tries directories specified in the environment variable LD_LIBRARY_PATH (colon separated entries, like PATH), then directories that are configured from ldconfig (usually via a configuration file /etc/ld.so.conf) and finally the dynamic linker searches built in paths such as /lib64. Many libraries are provided as modules, and these modules modify the LD_LIBRARY_PATH in order for the dynamic linker to resolve the libraries when a dependent program is run.

$ module purge
$ echo $LD_LIBRARY_PATH

$ module load SAMtools
$ echo $LD_LIBRARY_PATH
/data/acad/apps/software/SAMtools/1.4.1-foss-2016b/lib:/apps/software/ncurses/6.0-foss-2016b/lib:/data/acad/apps/software/HTSlib/1.4.1-foss-2016b/lib:/apps/software/cURL/7.49.1-foss-2016b/lib:/apps/software/XZ/5.2.2-foss-2016b/lib:/apps/software/bzip2/1.0.6-foss-2016b/lib:/apps/software/zlib/1.2.8-foss-2016b/lib:/apps/software/ScaLAPACK/2.0.2-gompi-2016b-OpenBLAS-0.2.18-LAPACK-3.6.1/lib:/apps/software/FFTW/3.3.4-gompi-2016b/lib:/apps/software/OpenBLAS/0.2.18-GCC-5.4.0-2.26-LAPACK-3.6.1/lib:/apps/software/OpenMPI/1.10.3-GCC-5.4.0-2.26/lib:/apps/software/hwloc/1.11.3-GCC-5.4.0-2.26/lib:/apps/software/numactl/2.0.11-GCC-5.4.0-2.26/lib:/apps/software/binutils/2.26-GCCcore-5.4.0/lib:/apps/software/GCCcore/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0:/apps/software/GCCcore/5.4.0/lib64:/apps/software/GCCcore/5.4.0/lib

Comparing the LD_LIBRARY_PATH above, with the ldd output for samtools, we can see that some libraries are found outside the default /lib64 directory, presumably having been resolved using entries from the LD_LIBRARY_PATH. Lets modify the LD_LIBRARY_PATH and see what happens.

$ export LD_LIBRARY_PATH=
$ echo $LD_LIBRARY_PATH

$ ldd `which samtools`
        linux-vdso.so.1 =>  (0x00007fffbe7ff000)
        libz.so.1 => /lib64/libz.so.1 (0x0000003f51c00000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003fb0800000)
        libbz2.so.1.0 => not found
        liblzma.so.5 => not found
        libcurl.so.4 => /usr/lib64/libcurl.so.4 (0x0000003e28a00000)
        libcrypto.so.10 => /usr/lib64/libcrypto.so.10 (0x000000312bc00000)
        libncursesw.so.6 => not found
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003fb0000000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003fafc00000)
        libidn.so.11 => /lib64/libidn.so.11 (0x0000003fb7400000)
        libldap-2.4.so.2 => /lib64/libldap-2.4.so.2 (0x00000034d3000000)
        librt.so.1 => /lib64/librt.so.1 (0x0000003e02200000)
        libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x0000003e2ba00000)
        libkrb5.so.3 => /lib64/libkrb5.so.3 (0x0000003e2be00000)
        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x0000003e2c200000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x000000312ac00000)
        libssl3.so => /usr/lib64/libssl3.so (0x00000034d2400000)
        libsmime3.so => /usr/lib64/libsmime3.so (0x0000003e74400000)
        libnss3.so => /usr/lib64/libnss3.so (0x0000003e73800000)
        libnssutil3.so => /usr/lib64/libnssutil3.so (0x0000003e73c00000)
        libplds4.so => /lib64/libplds4.so (0x0000003e73000000)
        libplc4.so => /lib64/libplc4.so (0x0000003e73400000)
        libnspr4.so => /lib64/libnspr4.so (0x0000003e74000000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003faf800000)
        libssh2.so.1 => /usr/lib64/libssh2.so.1 (0x0000003e28e00000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003faf400000)
        liblber-2.4.so.2 => /lib64/liblber-2.4.so.2 (0x00000034d2800000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x0000003fb1c00000)
        libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x0000003fb4c00000)
        libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x0000003e2c600000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x0000003fb6400000)
        libssl.so.10 => /usr/lib64/libssl.so.10 (0x0000003e2b600000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x0000003fb3400000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x0000003e2a200000)
        libfreebl3.so => /lib64/libfreebl3.so (0x0000003fb2800000)

Some of the libraries cannot be resolved. Note also, that libz.so.1 is now found in the builtin /lib64 directory, instead of in /apps/software/zlib/1.2.8-foss-2016b/lib/. If we try to run samtools now, it fails.

$ samtools
samtools: error while loading shared libraries: libbz2.so.1.0: cannot open shared object file: No such file or directory

Providing a custom module

In order to provide an additional module, you must write a script and place it in one of the directories specified in the MODULEPATH. The filesystem mounted at /data/acad is accessible from both ACAD1 and ACAD2, and is writeable for all ACAD users. It is recommended that new module scripts be placed in /data/acad/apps/modules/all and that this be prepended to your MODULEPATH.

Here is an example of a custom module, with the software (a git checkout) placed under /data/acad/software/grg-utils.

$ cat /data/acad/apps/modules/all/grg-utils/git.lua
help([[Utilities written by Graham. -grg]])
whatis([[Misc bits and pieces, mostly python and some c.]])

if not isloaded("Python/2.7.13-foss-2016b") then
        load("Python/2.7.13-foss-2016b")
end

if not isloaded("Python-packages/Python-2.7.13-foss-2016b") then
        load("Python-packages/Python-2.7.13-foss-2016b")
end

local root = "/data/acad/apps/software/grg-utils"
prepend_path("PATH", root)

This module has name grg-utils, from the directory in which the script is located, and has version git, from the filename. This script has a .lua extension, hence it is written in the Lua programming language, and uses functions such as isloaded() and prepend_path() that are provided by lmod. Lmod supports module scripts without an extension, which it assumes are written in TCL. Most of the programs provided by grg-utils are written in Python and thus this module script loads a specific Python module that is known to work with these programs. Using module commands, we can now inspect and load the grg-utils module.

$ echo $MODULEPATH
/data/acad/apps/modules/all:/apps/modules/all:/usr/share/modulefiles/Linux:/usr/share/modulefiles/Core:/usr/share/lmod/lmod/modulefiles/Core
$ module spider grg-utils

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  grg-utils: grg-utils/git
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    This module can be loaded directly: module load grg-utils/git

    Help:
      Utilities written by Graham. -grg


$ module load grg-utils/git
$ echo $PATH | cut -d: -f1
/data/acad/apps/software/grg-utils

Please see the lmod documentation for more details about writing module scripts, and see other scripts in your MODULEPATH for more examples.

Using EasyBuild

Manually building and then writing a module file for a program is not always necessary. Most of the modules on ACAD servers have been automatically downloaded and built using EasyBuild, which also generates a Lua module script. See the EasyBuild documentation for details and see /data/acad/apps/eb for example easybuild scripts. The general process is as follows.

$ module purge
$ module load foss/2016b EasyBuild
$ vim foo.eb # write the easybuild script
$ eb --prefix=/data/acad/apps foo.eb
...

Installing R packages

R allows local package installation into a user's home directory. I provide the R-packages module for a shared location of additional R packages by modifying the R_LIBS_USER environment variable. E.g. to install package 'foo':

$ module load R-packages
$ echo $R_LIBS_USER
/data/acad/apps/software/R-packages/R-3.3.1-foss-2016b/lib
$ R
...
> install.packages('foo')

Installing Python packages

There are many ways to install Python packages, including native package managers, pip, conda, virtualenv. I have provided the Python-packages module as a shared place for additional Python packages which can be installed by anyone. This package exports the PYTHONUSERBASE environment variable, which is used to specify user python modules and has the advantage that new packages can be installed with pip. E.g. to install package foo:

$ module load Python-packages
$ echo $PYTHONUSERBASE
/data/acad/apps/software/Python-packages/Python-2.7.13-foss-2016b
$ pip install --user foo

This will only work for packages distributed via pypi (and can thus be installed with pip). If instead the package provides instructions of the form (1) download, or git clone, (2) python setup.py install, then the package will need to be installed from the source distribution. E.g. for the comb-p python package:

$ module load Python-packages
$ git clone https://github.com/brentp/combined-pvalues.git
$ cd combined-pvalues
$ python setup.py install --prefix=$PYTHONUSERBASE
...

Installing Perl packages

A typical Perl package is distributed via CPAN and can be installed via the cpan command. Perl packages can be installed locally using the local::lib package, with some modifications to environment variables. I have provided a Perl-packages module which directs new packages to be installed in an appropriate location under /data/acad/ by setting the relevant environment variables. These packages may be modified and/or added to by any acad users. E.g. to install BioPerl:

$ module load Perl-packages
$ perl -e 'print "@INC"'
/data/acad/apps/software/Perl-packages/Perl-5.24.0-foss-2016b/lib/perl5/5.24.0/x86_64-linux-thread-multi /data/acad/apps/software/Perl-packages/Perl-5.24.0-foss-2016b/lib/perl5/5.24.0 /data/acad/apps/software/Perl-packages/Perl-5.24.0-foss-2016b/lib/perl5/x86_64-linux-thread-multi /data/acad/apps/software/Perl-packages/Perl-5.24.0-foss-2016b/lib/perl5 /apps/software/Perl/5.24.0-foss-2016b/lib/perl5/site_perl/5.24.0/x86_64-linux-thread-multi /apps/software/Perl/5.24.0-foss-2016b/lib/perl5/site_perl/5.24.0 /apps/software/Perl/5.24.0-foss-2016b/lib/perl5/5.24.0/x86_64-linux-thread-multi /apps/software/Perl/5.24.0-foss-2016b/lib/perl5/5.24.0 .
$ cpan Bio::Perl
...
$ perl -e 'use Bio::Seq; print(Bio::Seq->new(-seq=>'aaaatg')->revcom->seq, "\n");'
catttt

Fixing permissions

After you have installed a new module, Python package, Perl package, or R package, the new files will be owned by your user. They will be accessible by other acad_users, but only you will be able to modify/delete these files which may be problematic in the future. Please run chmod g+w on any new files, or use the following script to fix permissions for any files under /data/acad/apps (errors can be safely ignored).

$ sh /data/acad/apps/fix_perms.sh 2>/dev/null