Merge branch 'cl_optimizations' of

https://github.com/StreamHPC/pyPaSWAS.git into streamcomputing Conflicts: .gitignore
swarris · Feb 6, 2018 · 0b2551b · 0b2551b
2 parents fe9cb2e + 548fc25
commit 0b2551b
Show file tree

Hide file tree

Showing 14 changed files with 1,028 additions and 965 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,12 @@ pyPaSWAS
 ========
 [![DOI](https://zenodo.org/badge/28648467.svg)](https://zenodo.org/badge/latestdoi/28648467)
 
-Extented python version of PaSWAS. Original paper in PLOS ONE: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0122524
+Extented python version of PaSWAS. Original papers in PLOS ONE: 
+
+http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0122524
+
+http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190279
+
 
 For DNA/RNA/protein sequence alignment and trimming. 
 
@@ -23,6 +28,15 @@ Platforms supported:
 
 More information: https://github.com/swarris/pyPaSWAS/wiki
 
+Docker
+------
+
+The pyPasWAS source contains several docker install files. Clone the repository: 
+
+git clone https://github.com/swarris/pyPaSWAS.git
+
+Then use one of the docker images in the _docker_ folder. For more information, see the [README](https://github.com/swarris/pyPaSWAS/tree/master/docker) 
+
 Installation
 ------------
 In most cases it is enough to clone the repository. After that, please install:

diff --git a/docker/README.md b/docker/README.md
@@ -0,0 +1,40 @@
+# pyPaSWAS Docker Containers
+
+This folder contains the Docker files for building Containers containing the `pyPaSWAS` software. These containers are based on Ubuntu 16.04 and come supplied with Python3 and the Nvidia CUDA software as they are based on the `nvidia/cuda:8.0-devel-ubuntu16.04` container image [supplied by Nvidia](https://hub.docker.com/r/nvidia/cuda/).
+
+## Running existing Docker Containers
+
+The Docker engine is required for running the container, see their [excellent installation instructions](https://docs.docker.com/engine/installation/) for further details.
+Next, these containers require low-level access to the hardware (i.e. the GPU) and therefore the use of the `nvidia-docker` utility, installation instructions are available on its [github page](https://github.com/NVIDIA/nvidia-docker/tree/2.0). 
+
+`nvidia-docker run --rm -ti mkempenaar/pypaswas:nvidia-opencl_cuda8.0 bash` will download the container, start and attach to a bash session running inside the container. Here you will find the software at `/root/pyPasWAS`. Running the performance tests on a clean container is as simple as (note: this will take a while):
+
+```
+cd /root/pyPaSWAS
+sh data/runPerformanceTests.sh
+```
+
+* ## Container(s) available on [Docker Hub](https://hub.docker.com/r/mkempenaar/pypaswas/)
+
+    **`mkempenaar/pypaswas:nvidia-opencl_cuda8.0` [*Docker file*](https://raw.githubusercontent.com/swarris/pyPaSWAS/master/docker/nvidia/Dockerfile)**
+
+    This container can be used for testing all availabilities of the `pyPaSWAS` sequence aligner as it contains the Intel and Nvidia OpenCL runtime libraries and Nvidia CUDA support.
+
+
+## Building custom Docker Containers
+
+As most hardware manufacturers have their own acceleration libraries (multiple versions of OpenCL, Nvidia CUDA, etc.) the available containers might not work for your hardware. Therefore, a few custom build files are available depending on your hardware and requirements (i.e. only CUDA support or only Intel OpenCL). 
+
+### Downloading and Building
+
+Cloning this repository gives the currently available Dockerfiles for building custom images which can be found in the `pyPaSWAS/docker` folder. Building a container locally can be done by going to the folder of choice (each contains a single `Dockerfile`; a container description) and running:
+
+```
+docker build -t pypaswas:custom .
+```
+
+Currently available:
+
+* [Intel OpenCL + Nvidia CUDA](https://raw.githubusercontent.com/swarris/pyPaSWAS/master/docker/intel/Dockerfile), `pyPaSWAS/data/docker/intel/Dockerfile`: Suitable for Intel Core and Xeon CPUs and GPUs from the 3rd generation (Ivy Bridge) and newer, combined with Nvidia CUDA from the base container image.
+* [Intel OpenCL + Nvidia CUDA](https://raw.githubusercontent.com/swarris/pyPaSWAS/master/docker/intel/sandybridge/Dockerfile), `pyPaSWAS/data/docker/intel/sandybridge/Dockerfile`: Only suitable for 2nd generation (Sandy Bridge) Intel Core CPUs, combined with Nvidia CUDA from the base container image.
+* [Intel OpenCL + Nvidia OpenCL + Nvidia CUDA](https://raw.githubusercontent.com/swarris/pyPaSWAS/master/docker/nvidia/Dockerfile), `pyPaSWAS/data/docker/nvidia/Dockerfile`: Full package for 3rd generation and newer Intel Core and Xeon CPUs and GPUs, combined with Nvidia OpenCL and CUDA support.
diff --git a/docker/intel/Dockerfile b/docker/intel/Dockerfile
@@ -0,0 +1,32 @@
+FROM nvidia/cuda:8.0-devel-ubuntu16.04
+
+MAINTAINER Marcel Kempenaar (m.kempenaar@pl.hanze.nl)
+
+## OpenCL dependencies, runtime and development packages
+RUN apt-get update && apt-get install -y --no-install-recommends \
+	beignet ocl-icd-opencl-dev libffi-dev clinfo && \
+    rm -rf /var/lib/apt/lists/*
+
+ENV PATH /usr/local/cuda/bin:${PATH}
+ENV LD_LIBRARY_PATH /usr/local/cuda/lib:/usr/local/cuda/lib64
+
+## Python3 and dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        python3 python3-dev python3-pip python3-setuptools git opencl-headers \
+        autoconf libtool pkg-config && \
+    ln -s /usr/bin/python3 /usr/bin/python && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN ln -s /usr/local/cuda/lib64/libOpenCL* /usr/lib/ && \
+    pip3 install --upgrade pip
+
+RUN pip3 install wheel
+
+RUN pip3 install numpy
+
+RUN pip3 install biopython
+
+RUN pip3 install pyopencl
+
+## pyPaSWAS installation
+RUN git clone https://github.com/swarris/pyPaSWAS.git /root/pyPaSWAS
diff --git a/docker/intel/sandybridge/Dockerfile b/docker/intel/sandybridge/Dockerfile
@@ -0,0 +1,58 @@
+FROM nvidia/cuda:8.0-devel-ubuntu16.04
+
+MAINTAINER Marcel Kempenaar (m.kempenaar@pl.hanze.nl)
+
+## OpenCL dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+	rpm alien libnuma1 curl fakeroot libffi-dev clinfo && \
+    rm -rf /var/lib/apt/lists/*
+
+## Intel 2nd Generation OpenCL 1.2 support
+RUN curl http://registrationcenter-download.intel.com/akdlm/irc_nas/9019/opencl_runtime_16.1.1_x64_ubuntu_6.4.0.25.tgz | tar xz
+
+RUN cd opencl_runtime_16.1.1_x64_ubuntu_6.4.0.25/rpm && \
+    fakeroot alien --to-deb opencl-1.2-base-6.4.0.25-1.x86_64.rpm && \
+    fakeroot alien --to-deb opencl-1.2-intel-cpu-6.4.0.25-1.x86_64.rpm
+
+RUN cd opencl_runtime_16.1.1_x64_ubuntu_6.4.0.25/rpm && \
+    dpkg -i opencl-1.2-base_6.4.0.25-2_amd64.deb && \
+    dpkg -i opencl-1.2-intel-cpu_6.4.0.25-2_amd64.deb && \
+    rm -Rf /opencl_runtime_16.1.1_x64_ubuntu_6.4.0.25
+
+RUN echo "/opt/intel/opencl-1.2-6.4.0.25/lib64/clinfo" > /etc/ld.so.conf.d/intelOpenCL.conf
+
+RUN mkdir -p /etc/OpenCL/vendors && \
+    ln /opt/intel/opencl-1.2-6.4.0.25/etc/intel64.icd /etc/OpenCL/vendors/intel64.icd && \
+    ldconfig
+
+ENV PATH /usr/local/cuda/bin:${PATH}
+ENV LD_LIBRARY_PATH /usr/local/cuda/lib:/usr/local/cuda/lib64
+
+## Python3 and dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3 python3-dev python3-pip python3-setuptools git opencl-headers \
+    autoconf libtool pkg-config && \
+    ln -s /usr/bin/python3 /usr/bin/python
+
+RUN ln -s /usr/local/cuda/lib64/libOpenCL* /usr/lib/ && \
+    pip3 install --upgrade pip
+
+RUN pip3 install wheel
+
+RUN pip3 install numpy
+
+RUN pip3 install biopython
+
+RUN export PATH=/usr/local/cuda/bin:$PATH && pip3 install pycuda
+
+## Custom pyOpenCL installation forcing the use of version 1.2
+RUN export PATH=/usr/local/cuda/bin:$PATH && \
+    export LD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/lib64 && \
+    export LDFLAGS=-L/usr/local/cuda/lib64 && \
+    git clone https://github.com/pyopencl/pyopencl.git && \
+    cd pyopencl && python3 configure.py && \
+    echo 'CL_PRETEND_VERSION = "1.2"' >> siteconf.py && \
+    pip3 install .
+
+## pyPaSWAS installation
+RUN git clone https://github.com/swarris/pyPaSWAS.git /root/pyPaSWAS
diff --git a/docker/nvidia/Dockerfile b/docker/nvidia/Dockerfile
@@ -0,0 +1,41 @@
+FROM nvidia/cuda:8.0-devel-ubuntu16.04
+
+MAINTAINER Marcel Kempenaar (m.kempenaar@pl.hanze.nl)
+
+## OpenCL dependencies, runtime and development packages
+RUN apt-get update && apt-get install -y --no-install-recommends \
+	beignet ocl-icd-opencl-dev libffi-dev clinfo && \
+    rm -rf /var/lib/apt/lists/*
+
+## NVIDIA OpenCL support, taken from: https://gitlab.com/nvidia/opencl/blob/ubuntu14.04/runtime/Dockerfile
+RUN mkdir -p /etc/OpenCL/vendors && \
+    echo "libnvidia-opencl.so.1" > /etc/OpenCL/vendors/nvidia.icd
+
+RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
+    echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
+
+ENV PATH /usr/local/cuda/bin:${PATH}
+ENV LD_LIBRARY_PATH /usr/local/cuda/lib:/usr/local/cuda/lib64
+
+## Python3 and dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3 python3-dev python3-pip python3-setuptools git opencl-headers \
+    autoconf libtool pkg-config && \
+    ln -s /usr/bin/python3 /usr/bin/python && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN ln -s /usr/local/cuda/lib64/libOpenCL* /usr/lib/ && \
+    pip3 install --upgrade pip
+
+RUN pip3 install wheel
+
+RUN pip3 install numpy
+
+RUN pip3 install biopython
+
+RUN export PATH=/usr/local/cuda/bin:$PATH && pip3 install pycuda
+
+RUN pip3 install pyopencl
+
+## pyPaSWAS installation
+RUN git clone https://github.com/swarris/pyPaSWAS.git /root/pyPaSWAS
diff --git a/pyPaSWAS/Core/Formatters.py b/pyPaSWAS/Core/Formatters.py
@@ -33,11 +33,16 @@ def _set_name(self):
         '''Name of the formatter. Used for logging'''
         self.name = 'defaultformatter'
 
+    def _get_hits(self):
+        '''Returns ordered list of hits'''
+        hits = self.hitlist.real_hits.values()
+        return sorted(hits, key=lambda hit: (hit.get_seq_id(), hit.get_target_id(), hit.score))
+
     def print_results(self):
         '''sets, formats and prints the results to a file.'''
         self.logger.debug('printing results...')
         output = open(self.outputfile, 'w')
-        for hit in self.hitlist.real_hits.values():
+        for hit in self._get_hits():
             formatted_hit = self._format_hit(hit)
             output.write(formatted_hit + "\n")
         self.logger.debug('finished printing results')
@@ -81,7 +86,7 @@ def print_results(self):
         '''sets, formats and prints the results to a file.'''
         self.logger.info('formatting results...')
         #format header and hit lines
-        for hit in self.hitlist.real_hits.values():
+        for hit in self._get_hits():
             self._format_hit(hit)
 
         self.logger.debug('printing results...')
@@ -132,7 +137,7 @@ def print_results(self):
         '''sets, formats and prints the results to a file.'''
         self.logger.info('formatting results...')
         #format header and hit lines
-        for hit in self.hitlist.real_hits.values():
+        for hit in self._get_hits():
             self._format_hit(hit)
 
         self.logger.debug('printing results...')
@@ -173,7 +178,7 @@ def print_results(self):
         '''sets, formats and prints the results to a file.'''
         self.logger.info('formatting results...')
         #format header and hit lines
-        for hit in self.hitlist.real_hits.values():
+        for hit in self._get_hits():
             self._format_hit(hit)
 
         self.logger.debug('printing results...')

diff --git a/pyPaSWAS/Core/PaSWAS.py b/pyPaSWAS/Core/PaSWAS.py
@@ -18,14 +18,20 @@ def __init__(self, logger):
         self.score_source = ''
         self.main_source = ''
 
+    def read_source(self, filename):
+        '''Read source code from the specified file and prefix it
+        with line number and file name info for better compilation error messages.
+        '''
+        return '#line 1 "{}"\n'.format(filename) + read_file(filename)
+
     def set_shared_xy_code(self, sharedx=8, sharedy=8):
         '''
         Sets the horizontal and the vertical sizes of the smallest alignment matrices in shared memory
         :param sharedx:
         :param sharedy:
         '''
         #self.logger.debug('Setting sharedx to {0}, sharedy to {1}'.format(sharedx, sharedy))
-        code_t = Template(read_file(self.main_source))
+        code_t = Template(self.read_source(self.main_source))
         self.shared_xy_code = code_t.safe_substitute(SHARED_X=sharedx, SHARED_Y=sharedy)
 
     def set_direction_code(self, no_direction=0, up_left=1, up=2, left=3, stop=4):
@@ -39,7 +45,7 @@ def set_direction_code(self, no_direction=0, up_left=1, up=2, left=3, stop=4):
         '''
         #self.logger.debug('Setting directions:\n\tno = {0}\n\tup_left = {1}\n\tup = {2}\n\tleft = {3}\n\t'
         #                  'stop = {3}'.format(no_direction, up_left, up, left, stop))
-        direction_t = Template(read_file(self.direction_source))
+        direction_t = Template(self.read_source(self.direction_source))
         self.directions = direction_t.safe_substitute(NO_DIRECTION=no_direction,
                                                       UP_LEFT_DIRECTION=up_left,
                                                       UP_DIRECTION=up,
@@ -50,7 +56,7 @@ def set_score_code(self, score):
         '''Formats information contained in a score.
         '''
         #self.logger.debug('Sourcing the scorepart of the cuda code')
-        score_part_t = Template(read_file(self.score_source))
+        score_part_t = Template(self.read_source(self.score_source))
         gap_extension = 0.0
         if score.gap_extension != None:
             gap_extension = score.gap_extension
@@ -69,7 +75,7 @@ def set_variable_code(self, number_sequences, number_targets, x_val, y_val, char
         '''Sets the variable part of the code'''
         #self.logger.debug('Setting the variable part of the cuda code\n\t(using: n_seq: {}, n_targets: {}, '
         #                  'x_val: {}, y_val: {})'.format(number_sequences, number_targets, x_val, y_val))
-        variable_t = Template(read_file(self.variable_source))
+        variable_t = Template(self.read_source(self.variable_source))
         self.variable_part = variable_t.safe_substitute(N_SEQUENCES=number_sequences,
                                                         N_TARGETS=number_targets,
                                                         X=x_val,
@@ -104,7 +110,6 @@ class OCLcode(Code):
     '''
     def __init__(self, logger):
         Code.__init__(self, logger)
-        self.variable_source = resource_filename(__name__, 'ocl/default_variable.cl')
         self.direction_source = resource_filename(__name__, 'ocl/default_direction.cl')
         self.score_source = resource_filename(__name__, 'ocl/default_score.cl')
 
@@ -116,6 +121,7 @@ class GPUcode(OCLcode):
     def __init__(self, logger):
         OCLcode.__init__(self, logger)
         self.main_source = resource_filename(__name__, 'ocl/default_main_gpu.cl')
+        self.variable_source = resource_filename(__name__, 'ocl/default_variable_gpu.cl')
 
 class CPUcode(OCLcode):
     '''
@@ -125,6 +131,7 @@ class CPUcode(OCLcode):
     def __init__(self, logger):
         OCLcode.__init__(self, logger)
         self.main_source = resource_filename(__name__, 'ocl/default_main_cpu.cl')
+        self.variable_source = resource_filename(__name__, 'ocl/default_variable_cpu.cl')
 
     def set_shared_xy_code(self, sharedx=8, sharedy=8, workloadx=4, workloady=4):
         '''
@@ -133,5 +140,5 @@ def set_shared_xy_code(self, sharedx=8, sharedy=8, workloadx=4, workloady=4):
         :param sharedy:
         '''
         #self.logger.debug('Setting sharedx to {0}, sharedy to {1}'.format(sharedx, sharedy))
-        code_t = Template(read_file(self.main_source))
-        self.shared_xy_code = code_t.safe_substitute(SHARED_X=sharedx, SHARED_Y=sharedy, WORKLOAD_X=workloadx, WORKLOAD_Y=workloady)
+        code_t = Template(self.read_source(self.main_source))
+        self.shared_xy_code = code_t.safe_substitute(SHARED_X=sharedx, SHARED_Y=sharedy, WORKLOAD_X=workloadx, WORKLOAD_Y=workloady)
diff --git a/pyPaSWAS/Core/Programs.py b/pyPaSWAS/Core/Programs.py
@@ -35,14 +35,9 @@ def __init__(self, logger, score, settings):
         self.settings = settings
         if (self.settings.framework.upper() == 'OPENCL'):
             if(self.settings.device_type.upper() == 'GPU'):
-                if(self.settings.platform_name.upper() == 'NVIDIA'):
-                    self.logger.debug('Using OpenCL NVIDIA implementation')
-                    from pyPaSWAS.Core.SmithWatermanOcl import SmithWatermanNVIDIA
-                    self.smith_waterman = SmithWatermanNVIDIA(self.logger, self.score, settings)
-                else:
-                    self.logger.debug('Using OpenCL GPU implementation')
-                    from pyPaSWAS.Core.SmithWatermanOcl import SmithWatermanGPU
-                    self.smith_waterman = SmithWatermanGPU(self.logger, self.score, settings)
+                self.logger.debug('Using OpenCL GPU implementation')
+                from pyPaSWAS.Core.SmithWatermanOcl import SmithWatermanGPU
+                self.smith_waterman = SmithWatermanGPU(self.logger, self.score, settings)
             elif(self.settings.device_type.upper() == 'CPU'):
                 self.logger.debug('Using OpenCL CPU implementation')
                 from pyPaSWAS.Core.SmithWatermanOcl import SmithWatermanCPU
@@ -74,11 +69,15 @@ def process(self, records_seqs, targets, pypaswas):
         self.logger.debug('Aligner processing...')
         target_index = 0
 
+        all_targets_length = sum(len(s.seq) for s in targets)
+        all_sequences_length = sum(len(s.seq) for s in records_seqs)
+        self.smith_waterman.set_total_work_size(all_targets_length * all_sequences_length)
+
         while target_index < len(targets):
             self.logger.debug('At target: {0} of {1}'.format(target_index, len(targets)))
 
 
-            last_target_index = self.smith_waterman.set_targets(targets, target_index)
+            last_target_index = self.smith_waterman.set_targets(targets, target_index, records_seqs=records_seqs, use_all_records_seqs=False)
             # results should be a Hitlist()
             results = self.smith_waterman.align_sequences(records_seqs, targets, target_index)
             self.hitlist.extend(results)
@@ -104,7 +103,11 @@ def process(self, records_seqs, targets, pypaswas):
             max_length = len(targets[0])
         else:
             max_length = None
-
+
+        all_targets_length = sum(len(s.seq) for s in targets)
+        all_sequences_length = sum(len(s.seq) for s in records_seqs)
+        self.smith_waterman.set_total_work_size(all_targets_length * all_sequences_length)
+
         while target_index < len(targets):
             self.logger.debug('At target: {0} of {1}'.format(target_index, len(targets)))