Skip to content

Commit

Permalink
Merge branch 'dev-v0.6.0' into poa-gfa
Browse files Browse the repository at this point in the history
  • Loading branch information
Joyjit Daw committed Dec 4, 2020
2 parents 74ba3a3 + c68960e commit 6f4a0fd
Show file tree
Hide file tree
Showing 116 changed files with 157,276 additions and 2,481 deletions.
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,6 @@
[submodule "3rdparty/kseqpp"]
path = 3rdparty/kseqpp
url = https://github.com/cartoonist/kseqpp.git
[submodule "3rdparty/htslib"]
path = 3rdparty/htslib
url = https://github.com/samtools/htslib.git
1 change: 1 addition & 0 deletions 3rdparty/htslib
Submodule htslib added at a79009
9 changes: 6 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ string(STRIP ${GW_VERSION} GW_VERSION)
project(${GW_PROJECT_NAME})

# Process options.
option(gw_enable_tests "Build GenomeWorks unit tests" OFF)
option(gw_enable_benchmarks "Build GenomeWorks benchmarks" OFF)
option(gw_enable_tests "Build GenomeWorks unit tests" ON)
option(gw_enable_benchmarks "Build GenomeWorks benchmarks" ON)
option(gw_build_shared "Build GenomeWorks libraries as shared objects" OFF)
option(gw_device_synchronize_kernels "Run cudaDeviceSynchronize() in GW_CU_CHECK_ERR calls" OFF)
option(gw_optimize_for_native_cpu "Build with march=native" ON)
Expand All @@ -40,7 +40,9 @@ option(gw_enable_cudapoa_nw_print "Enable verbose prints within cudapoa NW kerne
option(gw_profiling "Compile a binary for profiling with NVTX markers." OFF)
option(gw_enable_caching_allocator "Enable caching allocator." ON)
option(gw_generate_docs "Generate Doxygen documentation" ON)
option(gw_cuda_gen_all_arch "ON: Generate optimized CUDA code for all architectures | OFF: for detected architectures only" ON)
option(gw_cuda_gen_all_arch "ON: Generate optimized CUDA code for all architectures | OFF: for detected architectures only" OFF)
# Optionally build htslib for SAM/BAM support. Requires autoconf to be installed
option(gw_build_htslib "Build 3rdparty htslib that allows output in SAM/BAM format" ON)

# Must be included before others for options value validation
include(cmake/Utils.cmake)
Expand Down Expand Up @@ -83,6 +85,7 @@ add_subdirectory(common/io)
add_subdirectory(cudapoa)
add_subdirectory(cudamapper)
add_subdirectory(cudaaligner)
add_subdirectory(cudaextender)

# Add documentation generation.
validate_boolean(gw_generate_docs)
Expand Down
83 changes: 12 additions & 71 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,81 +6,17 @@ GenomeWorks is a GPU-accelerated library for biological sequence analysis. This
For more detailed API documentation please refer to the [documentation](#enable-doc-generation).

* Modules
* [cudamapper](#cudamapper) - CUDA-accelerated sequence to sequence mapping
* [cudapoa](#cudapoa) - CUDA-accelerated partial order alignment
* [cudaaligner](#cudaaligner) - CUDA-accelerated pairwise sequence alignment
* [cudamapper](cudamapper/README.md) - CUDA-accelerated sequence to sequence mapping
* [cudapoa](cudapoa/README.md) - CUDA-accelerated partial order alignment
* [cudaaligner](cudaaligner/README.md) - CUDA-accelerated pairwise sequence alignment
* [cudaextender](cudaextender/README.md) - CUDA-accelerated seed extension
* Setup GenomeWorks
* [Clone GenomeWorks](#clone-genomeworks)
* [System Requirements](#system-requirements)
* [GenomeWorks Installation](#genomeworks-setup)
* [Python API](#genomeworks-python-api)
* [Development Support](#development-support)

### cudamapper

The `cudamapper` package provides minimizer-based GPU-accelerated approximate mapping.

#### Tool - *cudamapper*

`cudamapper` is an end-to-end command line to for sequence to sequence mapping. `cudamapper` outputs
mappings in the PAF format and is currently optimised for all-vs-all long read (ONT, Pacific Biosciences) sequences.

To run all-vs all overlaps use the following command:

`cudamapper in.fasta in.fasta`

A query fasta can be mapped to a reference as follows:

`cudamapper query.fasta target.fasta`

To access more information about running cudamapper, run `cudamapper --help`.

#### Library - *libcudamapper.so*

* `Indexer` module to generate an index of minimizers from a list of sequences.
* `Matcher` module to find locations of matching pairs of minimizers between sequences using minimizer indices.
* `Overlapper` module to generate overlaps from sequence of minimizer matches generated by matcher.

#### Sample - *sample_cudamapper*

A prototypical binary highlighting the usage of `libcudamapper.so` APIs (indexer, matcher and overlapper) and
techniques to tie them into an application.

### cudapoa

The `cudapoa` package provides a GPU-accelerated implementation of the [Partial Order Alignment](https://simpsonlab.github.io/2015/05/01/understanding-poa/)
algorithm. It is heavily influenced by [SPOA](https://github.com/rvaser/spoa) and in many cases can be considered a GPU-accelerated replacement. Features include:

#### Tool - *cudapoa*

A command line tool for generating consensus and MSA from a list of `fasta`/`fastq` files. The tool
is built on top of `libcudapoa.so` and showcases optimization strategies for writing high performance
applications with `libcudapoa.so`.

#### Library - *libcudapoa.so*

* Generation of consensus sequences
* Generation of multi-sequence alignments (MSAs)
* Custom adaptive band implementation of POA
* Support for long and short read sequences

#### Sample - *sample_cudapoa*

A prototypical binary to showcase the use of `libcudapoa.so` APIs.

### cudaaligner

The `cudaaligner` package provides GPU-accelerated global alignment. Features include:

#### Library - *libcudaaligner.so*

* Short and long read support
* Banded implementation with configurable band width for flexible performance and accuracy trade-off

#### Sample - *sample_cudaaligner*

A prototypical binary to showcase the use of `libcudaaligner.so` APIs.

## Clone GenomeWorks

### Latest released version
Expand All @@ -104,9 +40,12 @@ Minimum requirements -

1. Ubuntu 16.04 or Ubuntu 18.04
2. CUDA 9.0+ (official instructions for installing CUDA are available [here](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html))
3. gcc/g++ 5.4.0+
4. Python 3.6.7+
5. CMake (>= 3.0)
3. GPU generation Pascal and later (compute capability >= 6.0)
4. gcc/g++ 5.4.0+ / 7.x.x
5. Python 3.6.7+
6. CMake (>= 3.10.2)
7. autoconf (required to output SAM/BAM files)
8. automake (required to output SAM/BAM files)

## GenomeWorks Setup

Expand All @@ -124,6 +63,8 @@ NOTE : The `gw_cuda_gen_all_arch=OFF` option pre-generates optimized code only f
For building a binary that pre-generates opimized code for all common GPU architectures, please remove the option
or set it to `ON`.

NOTE : (OPTIONAL) To enable outputting overlaps in SAM/BAM format, pass the `gw_build_htslib=ON` option.

### Package generation
Package generation puts the libraries, headers and binaries built by the `make` command above
into a `.deb`/`.rpm` for portability and easy installation. The package generation itself doesn't
Expand Down
1 change: 1 addition & 0 deletions VERSION
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
0.6.0

2 changes: 1 addition & 1 deletion ci/checks/check_copyright.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ def copyright_present(f):
f - Path to file
"""
with io.open(f, "r", encoding="utf-8") as fh:
return re.search('Copyright 20[0-9]+-20[0-9]+', fh.read())
return re.search('Copyright (20[0-9][0-9]-)?20[0-9][0-9] NVIDIA CORPORATION', fh.read())


def parse_args():
Expand Down
2 changes: 2 additions & 0 deletions ci/common/build-test-sdk.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ cmake .. "${CMAKE_COMMON_VARIABLES[@]}" \
-Dgw_enable_tests=ON \
-Dgw_enable_benchmarks=ON \
-Dgw_build_shared=ON \
-Dgw_cuda_gen_all_arch=ON \
-Dgw_build_htslib=ON \
-DCMAKE_INSTALL_PREFIX="${LOCAL_BUILD_DIR}/install" \
-GNinja

Expand Down
15 changes: 13 additions & 2 deletions ci/common/prep-init-env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,19 @@ python --version
logger "Conda install GenomeWorks custom packages - clang-format"
conda install --override-channels -c sarcasm clang-format

logger "Conda install GenomeWorks custom packages - doxygen ninja cmake"
conda install --override-channels -c conda-forge doxygen ninja cmake">=3.10.2"
logger "Conda install GenomeWorks custom packages - doxygen ninja"
conda install --override-channels -c conda-forge doxygen ninja

# Building cudamapper, using CUDA_SELECT_NVCC_ARCH_FLAGS() with the 'common' argument, generates
# the "-gencode;arch=compute_72,code=sm_72" arch flag which is incompatible with CUDA 9.0 and causes nvcc command to fail
# Using cmake with a version older than 3.18 does not create 'compute_72' arch.
if [ "$(echo ${CUDA_VERSION} | cut -d"." -f1-2)" = "9.0" ]; then
logger "Conda install cmake for CUDA 9.0"
conda install --override-channels -c conda-forge cmake"=3.17"
else
logger "Conda install cmake"
conda install --override-channels -c conda-forge cmake">=3.10.2"
fi

logger "Update LD_LIBRARY_PATH"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
Expand Down
2 changes: 1 addition & 1 deletion ci/cpu/prebuild.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@
# Note we still _BUILD_ for GPU, we just don't (can't) test on it
export BUILD_FOR_GPU=1
export TEST_ON_GPU=0
export CONDA_ENV_NAME="gdf"
export CONDA_ENV_NAME="parabricks"
2 changes: 1 addition & 1 deletion ci/gpu/prebuild.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@

export BUILD_FOR_GPU=1
export TEST_ON_GPU=1
export CONDA_ENV_NAME="gdf"
export CONDA_ENV_NAME="parabricks"
7 changes: 7 additions & 0 deletions cmake/3rdparty.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,10 @@ set_property(TARGET cub APPEND PROPERTY INTERFACE_INCLUDE_DIRECTORIES "${CUB_DIR

set(KSEQPP_DIR ${PROJECT_SOURCE_DIR}/3rdparty/kseqpp/src CACHE STRING
"Path to kseqpp repo")

if (gw_build_htslib)
include(cmake/BuildHTSLib.cmake)
build_htslib_source()
else()
message(STATUS "Not building htslib, overlap output to SAM & BAM unavailable. Enable with -Dgw_build_htslib=ON")
endif()
42 changes: 42 additions & 0 deletions cmake/BuildHTSLib.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#
# Copyright 2019-2020 NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

function(build_htslib_source)
message(STATUS "Building htslib")
set(HTSLIB_DIR ${PROJECT_SOURCE_DIR}/3rdparty/htslib/ CACHE STRING
"Path to htslib repo")
set(MAKE_COMMAND make)
set(HTSLIB_INSTALL ${MAKE_COMMAND} install prefix=${CMAKE_BINARY_DIR}/3rdparty/htslib)
set(htslib_PREFIX ${CMAKE_BINARY_DIR}/3rdparty/htslib)
include(ExternalProject)
ExternalProject_Add(htslib_project
PREFIX ${htslib_PREFIX}
SOURCE_DIR ${PROJECT_SOURCE_DIR}/3rdparty/htslib
BUILD_IN_SOURCE 1
CONFIGURE_COMMAND autoheader && autoconf && ${HTSLIB_DIR}configure --disable-bz2 --disable-lzma --disable-libcurl --disable-s3 --disable-gcs
BUILD_COMMAND "${HTSLIB_INSTALL}"
INSTALL_COMMAND ""
BUILD_BYPRODUCTS ${CMAKE_BINARY_DIR}/3rdparty/htslib/lib/libhts.a
LOG_CONFIGURE 0
LOG_BUILD 0
LOG_TEST 0
LOG_INSTALL 0
)

include_directories(${CMAKE_BINARY_DIR}/3rdparty/htslib/include/htslib)
add_library(htslib STATIC IMPORTED)
set_property(TARGET htslib APPEND PROPERTY IMPORTED_LOCATION ${CMAKE_BINARY_DIR}/3rdparty/htslib/lib/libhts.a)
endfunction()
4 changes: 3 additions & 1 deletion common/base/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@ message(STATUS "nvcc flags for ${MODULE_NAME}: ${CUDA_NVCC_FLAGS}")
get_property(gw_library_type GLOBAL PROPERTY gw_library_type)
add_library(${MODULE_NAME} ${gw_library_type}
src/cudautils.cpp
src/logging.cpp)
src/logging.cpp
src/graph.cpp
)
target_link_libraries(${MODULE_NAME} PUBLIC spdlog ${CUDA_LIBRARIES})

if (gw_profiling)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,10 @@
// Due to a header file incompatibility with nvcc in CUDA 9.0
// logging through the logger class in GW is disabled for any .cu files.
#pragma message("Logging disabled for CUDA Toolkit < 9.2")
#elif __GNUC__ >= 9
// Due to a ISO C++ standard incompatibility the spdlog fails to pass
// pedantic requirements.
#pragma message("Logging disabled for GCC >= 9")
#else
#include <spdlog/spdlog.h>
#endif
Expand Down Expand Up @@ -137,6 +141,8 @@ LoggingStatus SetHeader(bool logTime, bool logLocation);
/// parameters as per https://github.com/gabime/spdlog/blob/v1.x/README.md
#ifdef GW_CUDA_BEFORE_9_2
#define GW_LOG_DEBUG(...)
#elif __GNUC__ >= 9
#define GW_LOG_DEBUG(...)
#else
#define GW_LOG_DEBUG(...) SPDLOG_DEBUG(__VA_ARGS__)
#endif
Expand All @@ -148,6 +154,8 @@ LoggingStatus SetHeader(bool logTime, bool logLocation);
/// parameters as per https://github.com/gabime/spdlog/blob/v1.x/README.md
#ifdef GW_CUDA_BEFORE_9_2
#define GW_LOG_INFO(...)
#elif __GNUC__ >= 9
#define GW_LOG_INFO(...)
#else
#define GW_LOG_INFO(...) SPDLOG_INFO(__VA_ARGS__)
#endif
Expand All @@ -159,6 +167,8 @@ LoggingStatus SetHeader(bool logTime, bool logLocation);
/// parameters as per https://github.com/gabime/spdlog/blob/v1.x/README.md
#ifdef GW_CUDA_BEFORE_9_2
#define GW_LOG_WARN(...)
#elif __GNUC__ >= 9
#define GW_LOG_WARN(...)
#else
#define GW_LOG_WARN(...) SPDLOG_WARN(__VA_ARGS__)
#endif
Expand All @@ -170,6 +180,8 @@ LoggingStatus SetHeader(bool logTime, bool logLocation);
/// parameters as per https://github.com/gabime/spdlog/blob/v1.x/README.md
#ifdef GW_CUDA_BEFORE_9_2
#define GW_LOG_ERROR(...)
#elif __GNUC__ >= 9
#define GW_LOG_ERROR(...)
#else
#define GW_LOG_ERROR(...) SPDLOG_ERROR(__VA_ARGS__)
#endif
Expand All @@ -181,6 +193,8 @@ LoggingStatus SetHeader(bool logTime, bool logLocation);
/// parameters as per https://github.com/gabime/spdlog/blob/v1.x/README.md
#ifdef GW_CUDA_BEFORE_9_2
#define GW_LOG_CRITICAL(...)
#elif __GNUC__ >= 9
#define GW_LOG_CRITICAL(...)
#else
#define GW_LOG_CRITICAL(...) SPDLOG_CRITICAL(__VA_ARGS__)
#endif
Expand Down
7 changes: 5 additions & 2 deletions common/base/include/claraparabricks/genomeworks/types.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#if __cplusplus >= 201703
#include <optional>
#include <string_view>
#include <cstddef>
#else
#include <experimental/optional>
#include <experimental/string_view>
Expand All @@ -45,21 +46,23 @@ using position_in_read_t = std::uint32_t;
using number_of_basepairs_t = position_in_read_t;

// TODO: Once minimal supported GCC version is moved to GCC 7.1
// or higher, thegw_optional_t and gw_string_view_t aliases
// can be removed and std::optional and std::string_view can
// or higher, gw_optional_t, gw_string_view_t and gw_byte_t aliases
// can be removed and std::optional, std::string_view and std::byte can
// be used directly instead
#if __cplusplus >= 201703
template <typename T>
using gw_optional_t = std::optional<T>;
using gw_nullopt_t = std::nullopt_t;
constexpr gw_nullopt_t gw_nullopt = std::nullopt;
using gw_string_view_t = std::string_view;
using gw_byte_t = std::byte;
#else
template <typename T>
using gw_optional_t = std::experimental::optional<T>;
using gw_nullopt_t = std::experimental::nullopt_t;
constexpr gw_nullopt_t gw_nullopt = std::experimental::nullopt;
using gw_string_view_t = std::experimental::string_view;
using gw_byte_t = unsigned char;
#endif

} // namespace genomeworks
Expand Down
Loading

0 comments on commit 6f4a0fd

Please sign in to comment.