Using parallelization protocols/standards for acceleration algorithm's performance
OpenMP Application Protocol Interface (API) and subset of functions of POSIX threadsstandard (pthreads) to speed up the Smith-Waterman algorithm for local alignment of sequences. A simplified form of omega statistic, to detect positive selection in DNA sequences. Exports performance statistics. Applied for N random data.
Benchmarked on Intel(R) Core(TM) i7-1065G7 @ 1.30GHz 1.50 GHz with 8GB DDR3 memory.
- Input
.txt
files for test.D_SIZE
number of pairs of sequences of characters, with each sequence being on a separate line or extending to more lines for ease of reading. Read input file and reserve D,Q variables with variables from command line. For more info, see How to run.
dataset.txt
2
Q: abc
D: xxxabxcxxxaabbcc
Q: aaabcd
D: abababcabababcd
- Understanding of the SW algorithm
- Total number of Q-D sequence pairs
- Total number of cells that got a value
- Total number of traceback steps
- Total program execution time
- Total time to calculate cells
- Total time for traceback
- CUPS: Cell Updates Per second based on the total runtime
- CUPS based on cell computation time.
On Linux env, create a folder named Datasets
for your dataset.txt
(in scripting.sh
input file named D1.txt
, D2.txt
etc)
home
└───user
└───Desktop
└───project
└───Datasets
Or change the path in scripting.sh
input file executions
- GCC installation
$ gcc --version
$ sudo apt install gcc
- Compile .c file
gcc -o newserial newserial.c
- Run in command-line flags and arguments on linux terminal
./newserial -name ID -input PATH -match INT1 -mismatch INT2 -gap INT3
where
- ID => string for .out file
- PATH => .txt path
- INT => int variable
- OpenMP config
$ echo | cpp -fopenmp -dM | grep -i open
$ sudo apt install libomp-dev
- Setting the number of threads7
$ export OMP_NUM_THREADS=8
- Run in command-line flags and arguments on linux terminal
gcc -fopenmp -o OMPX <OMPa.c>
./OMPX -name ID -input PATH -match INT1 -mismatch INT2 -gap INT3 -threads INT4
where
THREADS => num of threads , and X is the preference on OMP implementation (3 implementation of OMP based on task granularity 8 the different computation-to-communication ratio)
- OMPa : Fine grained
- OMPb : Fine grained
- OMPd : Course grained
- Run in command-line flags and arguments on linux terminal
gcc -pthread POSIXX.c -o POSIXX.
./POSIXX -name ID -input PATH -match INT1 -mismatch INT2 -gap INT3 -threads INT4
where
X is the preference on POSIX implementation (2 implementation of POSIX based on task granularity 9 the different computation-to-communication ratio)
- POSIXa : Fine
- POSIXc : Course
See scripting.sh
8 for more..
Script variables initialized as:
- N = 10000000.
- Threads = [2 4]
- Processors = [2 4].
- This project was implemented for the requirements of the lesson Architecture of Parallel and Distributed Computers
Footnotes
-
https://en.wikipedia.org/wiki/Smith-Waterman_algorithm#Linear ↩
-
http://www.cs.cmu.edu/afs/cs/academic/class/15492-f07/www/pthreads.html ↩
-
Default number of threads (ignoring this command) is defined from the specification of CPU ↩
-
Run
scripting.sh
to compile and run all files. Must be executable. ↩ ↩2 -
https://en.wikipedia.org/wiki/Granularity_(parallel_computing)#:~:text=In%20parallel%20computing%2C%20granularity%20%28or%20grain%20size%29%20of,communication%20overhead%20between%20multiple%20processors%20or%20processing%20elements. ↩