Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cudapoa] improving cudaPOA performance #552

Merged

Conversation

r-mafi
Copy link
Contributor

@r-mafi r-mafi commented Sep 1, 2020

a round of optimization consisting of reducing back register usage, hiding global memory access where applicable and reducing NW inner while loop iterations in an effort to improve compute-time and SOL metrics.
in cudapoa binary API, added a new option -s to allow managing allocated memory for adaptive score matrix

r-mafi added 30 commits August 7, 2020 10:40
…ecessor - rev3"

rev3 change did not optimize much and even in some cases could a bit slow down, therefore reverted.
This reverts commit 833ac3ad
…s where POA groups have the same number of reads- rev 4b
…ptive, static and full alignments are separate, reg count for static down to 71 from 83; rev 7
… by changing banded_score_matrix_size from int64_t to float, reduced 1 register! :) (from 79 to 78)
…ngle-thread work in updating vertical scores, removed set_and_get_first_column_score().
… loop. This reduced register count from 78 to 75.
…umn == 0 to -1, to get rid of it is a better solution! also replaced get_score() with get_score_adaptive() in nw_adaptive, better solution is to uify similar kernels
…onvince compiler finding a way to minimize register usage down to 72. It worked without any register spills. rev 8
@r-mafi r-mafi added enhancement New feature or request cudapoa GPU-based partial order alignment labels Sep 1, 2020
@r-mafi r-mafi self-assigned this Sep 1, 2020
@r-mafi r-mafi linked an issue Sep 1, 2020 that may be closed by this pull request
@r-mafi r-mafi requested a review from tijyojwad September 1, 2020 22:54
cudapoa/src/cudapoa_kernels.cuh Outdated Show resolved Hide resolved
cudapoa/src/cudapoa_nw_banded.cuh Show resolved Hide resolved
@r-mafi r-mafi requested a review from tijyojwad September 11, 2020 19:17
@tijyojwad tijyojwad merged commit 0e9a6f3 into NVIDIA-Genomics-Research:dev-v0.6.0 Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudapoa GPU-based partial order alignment enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[cudapoa] reduce register count in cudapoa kernels
2 participants