Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increased version number #74

Merged
merged 1 commit into from
Nov 14, 2018
Merged

increased version number #74

merged 1 commit into from
Nov 14, 2018

Conversation

ntrost57
Copy link
Contributor

No description provided.

@ntrost57 ntrost57 merged commit 643ae38 into ROCm:develop Nov 14, 2018
ntrost57 pushed a commit that referenced this pull request May 20, 2020
ntrost57 pushed a commit that referenced this pull request May 20, 2020
ntrost57 pushed a commit that referenced this pull request Jun 15, 2020
* optimized csr2bsr_nnz

* rebase csr2bsr_optimization branch to squash commits

Working on optimizing csr2bsr device code

changed blocksize to 16 as this runs twice as fast

clang formatting

removing comments

performance optimizations

clang formatting

improve performance

clang formatting

csr2bsr optimization

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Co-authored-by: jsandham <james.sandham@amd.com>
ntrost57 pushed a commit that referenced this pull request Jul 6, 2020
* started creating skeleton code for bsrmm

* rebase bsrmm to squash commits

clang formatting

Allow library dependencies to be installed from CI (#49)

csrgeam (#46)

* csrgeam API added

* csrgeam tests and benchmark added

* flops, bandwidth and host implementation for csrgeam

* csrgeam unit tests

* removed webbase_1M test

* csrgeam (functional) added

* added tests for invalid sizes

* typos and year

* clang-format

* csrgeam performance scripts

bump version

Replace host code in bsr2csr (#48)

* removed host bsr2csr and csr2bsr code and replaced it with device calls

* clang formatting

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

added some examples (#50)

* added sparse level 1 examples

* added examples for sparse level 2 and 3

* clang-format

* added sparse extra examples

* bump version

hipclang related fixes (#51)

* hipclang related fixes

* bump version

sanity check for matrix download (#52)

added fallback for unit test matrix downloads (#53)

examples fix (#54)

* header fix for examples

* bump version

got bsrmm working for block dim less than 8

clang formatting

fixing bugs and getting benchmark to work

optimizing and working on kernels for block dimension greater than 8

kernels and code for block dimension greater than 8 and B matrix transposed

expanded loop unrolling up to block dimension 16

clang formatting

Remove gpg check for CI package CentOS install (#57)

updated internal function names (#61)

* renamed internal csrtr to trm

* clang-format

added missing header (#62)

fixes to documentation

remove compile time evaluation of direction to help reduce the number of kernels

clang formatting

small performance improvements to transpose kernel

clang formatting

increase transpose performance

clang formatting

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Allow library dependencies to be installed from CI (#49)

csrgeam (#46)

* csrgeam API added

* csrgeam tests and benchmark added

* flops, bandwidth and host implementation for csrgeam

* csrgeam unit tests

* removed webbase_1M test

* csrgeam (functional) added

* added tests for invalid sizes

* typos and year

* clang-format

* csrgeam performance scripts

added some examples (#50)

* added sparse level 1 examples

* added examples for sparse level 2 and 3

* clang-format

* added sparse extra examples

* bump version

examples fix (#54)

* header fix for examples

* bump version

Remove gpg check for CI package CentOS install (#57)

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

adding fortran example code

fixing fortran compile error

adding bsrmm to fortran_module.f90

fixing fortran example array order

fix fortran compile error

fix fortran compile error

adding cpp example code for bsrmm

clang formatting

working on optimizing kernels

working on optimizing kernels

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

reverting back to original kernels

optimizing bsrmm

making test2 kernel active for block dim 8

optimizing bsrmm

significant performance improvement for block dimensions 5 to 32

further performance improvements to transpose and non-transpose case

reduce compile times and replaced general kernel

optimizing for n <= 16

Correction to the cmake RUNPATH parameter (#79)

Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com>

bump version

cmake update (#80)

* cmake update

* disabling OpenMP until this is fixed within hipclang

Csr2bsr optimization (#78)

* optimized csr2bsr_nnz

* rebase csr2bsr_optimization branch to squash commits

Working on optimizing csr2bsr device code

changed blocksize to 16 as this runs twice as fast

clang formatting

removing comments

performance optimizations

clang formatting

improve performance

clang formatting

csr2bsr optimization

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Co-authored-by: jsandham <james.sandham@amd.com>

* reducing number of tests

* removing bank conflicts

* removing duplicate code from rocsparse-functions header

* fixing line in rocspasrse-functions header changed by bad merge

* fix formating from merge

* fix formatting errors from merge

Co-authored-by: jsandham <james.sandham@amd.com>
ntrost57 pushed a commit that referenced this pull request Jul 21, 2020
* Creating skeleton code for bsric02

* clang formatting

work on implementing bsric02 kernel

bsric02 working for block dim equal 2, 4, 8, 16

implementing binary search kernel

fixing kernel bugs

clang formatting

fixing thread divergence in warp errors

Work on optimizing and testing

clang formatting

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

clang formatting

fixing documentation for bsric02

removing comments

removing comments

adjusting test yaml file

changing test yaml file

removing comments

adding fortran example

clang formatting

adding fortran bsric0 example to CMakelist

fix compiler errors

fixing bug in fortran example

fix fortran compiler errors

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

adding gfx908 work around

Correction to the cmake RUNPATH parameter (#79)

Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com>

bump version

cmake update (#80)

* cmake update

* disabling OpenMP until this is fixed within hipclang

Csr2bsr optimization (#78)

* optimized csr2bsr_nnz

* rebase csr2bsr_optimization branch to squash commits

Working on optimizing csr2bsr device code

changed blocksize to 16 as this runs twice as fast

clang formatting

removing comments

performance optimizations

clang formatting

improve performance

clang formatting

csr2bsr optimization

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Co-authored-by: jsandham <james.sandham@amd.com>

Change default compiler in install script to hipclang (#81)

hipclang updated readme

hipclang doc update (#82)

* hipclang doc update

* doxygen seem to struggle with tabs and spaces

* doc auto version sync

clang formatting

fixing bug in csric0 and bsric0 where we were not using conj for complex matrices

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

adding float and double versions for std::conj

moving conj into rocsparse_math.hpp

optimizing bsric0

optimizing bsric0

optimize bsric0

optimize bsric0

optimizing bsric0

optimize bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

optimizing bsric0

clang formatting

optimizing bsric0

clang formatting

optimizing bsric0

Bsrmm (#56)

* started creating skeleton code for bsrmm

* rebase bsrmm to squash commits

clang formatting

Allow library dependencies to be installed from CI (#49)

csrgeam (#46)

* csrgeam API added

* csrgeam tests and benchmark added

* flops, bandwidth and host implementation for csrgeam

* csrgeam unit tests

* removed webbase_1M test

* csrgeam (functional) added

* added tests for invalid sizes

* typos and year

* clang-format

* csrgeam performance scripts

bump version

Replace host code in bsr2csr (#48)

* removed host bsr2csr and csr2bsr code and replaced it with device calls

* clang formatting

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

added some examples (#50)

* added sparse level 1 examples

* added examples for sparse level 2 and 3

* clang-format

* added sparse extra examples

* bump version

hipclang related fixes (#51)

* hipclang related fixes

* bump version

sanity check for matrix download (#52)

added fallback for unit test matrix downloads (#53)

examples fix (#54)

* header fix for examples

* bump version

got bsrmm working for block dim less than 8

clang formatting

fixing bugs and getting benchmark to work

optimizing and working on kernels for block dimension greater than 8

kernels and code for block dimension greater than 8 and B matrix transposed

expanded loop unrolling up to block dimension 16

clang formatting

Remove gpg check for CI package CentOS install (#57)

updated internal function names (#61)

* renamed internal csrtr to trm

* clang-format

added missing header (#62)

fixes to documentation

remove compile time evaluation of direction to help reduce the number of kernels

clang formatting

small performance improvements to transpose kernel

clang formatting

increase transpose performance

clang formatting

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Allow library dependencies to be installed from CI (#49)

csrgeam (#46)

* csrgeam API added

* csrgeam tests and benchmark added

* flops, bandwidth and host implementation for csrgeam

* csrgeam unit tests

* removed webbase_1M test

* csrgeam (functional) added

* added tests for invalid sizes

* typos and year

* clang-format

* csrgeam performance scripts

added some examples (#50)

* added sparse level 1 examples

* added examples for sparse level 2 and 3

* clang-format

* added sparse extra examples

* bump version

examples fix (#54)

* header fix for examples

* bump version

Remove gpg check for CI package CentOS install (#57)

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

adding fortran example code

fixing fortran compile error

adding bsrmm to fortran_module.f90

fixing fortran example array order

fix fortran compile error

fix fortran compile error

adding cpp example code for bsrmm

clang formatting

working on optimizing kernels

working on optimizing kernels

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

reverting back to original kernels

optimizing bsrmm

making test2 kernel active for block dim 8

optimizing bsrmm

significant performance improvement for block dimensions 5 to 32

further performance improvements to transpose and non-transpose case

reduce compile times and replaced general kernel

optimizing for n <= 16

Correction to the cmake RUNPATH parameter (#79)

Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com>

bump version

cmake update (#80)

* cmake update

* disabling OpenMP until this is fixed within hipclang

Csr2bsr optimization (#78)

* optimized csr2bsr_nnz

* rebase csr2bsr_optimization branch to squash commits

Working on optimizing csr2bsr device code

changed blocksize to 16 as this runs twice as fast

clang formatting

removing comments

performance optimizations

clang formatting

improve performance

clang formatting

csr2bsr optimization

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Co-authored-by: jsandham <james.sandham@amd.com>

* reducing number of tests

* removing bank conflicts

* removing duplicate code from rocsparse-functions header

* fixing line in rocspasrse-functions header changed by bad merge

* fix formating from merge

* fix formatting errors from merge

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

level3/gemmi feature (#83)

* gemmi benchmark

* gemmi tests

* gemmi samples

* gemmi documentation

* gemmi API

* gemmi fortran binding and example

* internal gemmi structure

* gemmi kernel for transposed B

* minor tweaks

* bump version

Change package dependency to hip-rocclr (#84)

bump version

adding libomp.so to rpath for clients (#85)

* fix for clients: adding libomp.so to rpath

* bump version

gfx908 asic revision check (#86)

* added asic revision to rocsparse handle

* asic revision is available with 3.7+

* bump version

clang formatting

cleaning up comments

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

Correction to the cmake RUNPATH parameter (#79)

Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com>

cmake update (#80)

* cmake update

* disabling OpenMP until this is fixed within hipclang

Csr2bsr optimization (#78)

* optimized csr2bsr_nnz

* rebase csr2bsr_optimization branch to squash commits

Working on optimizing csr2bsr device code

changed blocksize to 16 as this runs twice as fast

clang formatting

removing comments

performance optimizations

clang formatting

improve performance

clang formatting

csr2bsr optimization

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Co-authored-by: jsandham <james.sandham@amd.com>

Change default compiler in install script to hipclang (#81)

hipclang updated readme

hipclang doc update (#82)

* hipclang doc update

* doxygen seem to struggle with tabs and spaces

* doc auto version sync

Bsrmm (#56)

* started creating skeleton code for bsrmm

* rebase bsrmm to squash commits

clang formatting

Allow library dependencies to be installed from CI (#49)

csrgeam (#46)

* csrgeam API added

* csrgeam tests and benchmark added

* flops, bandwidth and host implementation for csrgeam

* csrgeam unit tests

* removed webbase_1M test

* csrgeam (functional) added

* added tests for invalid sizes

* typos and year

* clang-format

* csrgeam performance scripts

bump version

Replace host code in bsr2csr (#48)

* removed host bsr2csr and csr2bsr code and replaced it with device calls

* clang formatting

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

added some examples (#50)

* added sparse level 1 examples

* added examples for sparse level 2 and 3

* clang-format

* added sparse extra examples

* bump version

hipclang related fixes (#51)

* hipclang related fixes

* bump version

sanity check for matrix download (#52)

added fallback for unit test matrix downloads (#53)

examples fix (#54)

* header fix for examples

* bump version

got bsrmm working for block dim less than 8

clang formatting

fixing bugs and getting benchmark to work

optimizing and working on kernels for block dimension greater than 8

kernels and code for block dimension greater than 8 and B matrix transposed

expanded loop unrolling up to block dimension 16

clang formatting

Remove gpg check for CI package CentOS install (#57)

updated internal function names (#61)

* renamed internal csrtr to trm

* clang-format

added missing header (#62)

fixes to documentation

remove compile time evaluation of direction to help reduce the number of kernels

clang formatting

small performance improvements to transpose kernel

clang formatting

increase transpose performance

clang formatting

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Allow library dependencies to be installed from CI (#49)

csrgeam (#46)

* csrgeam API added

* csrgeam tests and benchmark added

* flops, bandwidth and host implementation for csrgeam

* csrgeam unit tests

* removed webbase_1M test

* csrgeam (functional) added

* added tests for invalid sizes

* typos and year

* clang-format

* csrgeam performance scripts

added some examples (#50)

* added sparse level 1 examples

* added examples for sparse level 2 and 3

* clang-format

* added sparse extra examples

* bump version

examples fix (#54)

* header fix for examples

* bump version

Remove gpg check for CI package CentOS install (#57)

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

adding fortran example code

fixing fortran compile error

adding bsrmm to fortran_module.f90

fixing fortran example array order

fix fortran compile error

fix fortran compile error

adding cpp example code for bsrmm

clang formatting

working on optimizing kernels

working on optimizing kernels

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

reverting back to original kernels

optimizing bsrmm

making test2 kernel active for block dim 8

optimizing bsrmm

significant performance improvement for block dimensions 5 to 32

further performance improvements to transpose and non-transpose case

reduce compile times and replaced general kernel

optimizing for n <= 16

Correction to the cmake RUNPATH parameter (#79)

Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com>

bump version

cmake update (#80)

* cmake update

* disabling OpenMP until this is fixed within hipclang

Csr2bsr optimization (#78)

* optimized csr2bsr_nnz

* rebase csr2bsr_optimization branch to squash commits

Working on optimizing csr2bsr device code

changed blocksize to 16 as this runs twice as fast

clang formatting

removing comments

performance optimizations

clang formatting

improve performance

clang formatting

csr2bsr optimization

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Co-authored-by: jsandham <james.sandham@amd.com>

* reducing number of tests

* removing bank conflicts

* removing duplicate code from rocsparse-functions header

* fixing line in rocspasrse-functions header changed by bad merge

* fix formating from merge

* fix formatting errors from merge

Co-authored-by: jsandham <james.sandham@amd.com>

level3/gemmi feature (#83)

* gemmi benchmark

* gemmi tests

* gemmi samples

* gemmi documentation

* gemmi API

* gemmi fortran binding and example

* internal gemmi structure

* gemmi kernel for transposed B

* minor tweaks

* bump version

Change package dependency to hip-rocclr (#84)

adding libomp.so to rpath for clients (#85)

* fix for clients: adding libomp.so to rpath

* bump version

gfx908 asic revision check (#86)

* added asic revision to rocsparse handle

* asic revision is available with 3.7+

* bump version

clang formatting

adding bsric0 to rocsparse_module.f90

making sure bsrsv analysis reuse works

adding asic to bsric0

fix compile error

fixing atomicOr

removing comments

latexpdf (#88)

* latexpdf

* bump version

Launchbounds (#87)

* added launch_bounds to kernel calls

* clang format

* bump version

disabled xnack (#89)

* fixed some compiler warnings

* disabled xnack

Update README.md

moving some of the quick tests to be pre checkin tests

adding underscore

clang formatting

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

Correction to the cmake RUNPATH parameter (#79)

Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com>

cmake update (#80)

* cmake update

* disabling OpenMP until this is fixed within hipclang

Csr2bsr optimization (#78)

* optimized csr2bsr_nnz

* rebase csr2bsr_optimization branch to squash commits

Working on optimizing csr2bsr device code

changed blocksize to 16 as this runs twice as fast

clang formatting

removing comments

performance optimizations

clang formatting

improve performance

clang formatting

csr2bsr optimization

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Co-authored-by: jsandham <james.sandham@amd.com>

Change default compiler in install script to hipclang (#81)

hipclang updated readme

Bsrmm (#56)

* started creating skeleton code for bsrmm

* rebase bsrmm to squash commits

clang formatting

Allow library dependencies to be installed from CI (#49)

csrgeam (#46)

* csrgeam API added

* csrgeam tests and benchmark added

* flops, bandwidth and host implementation for csrgeam

* csrgeam unit tests

* removed webbase_1M test

* csrgeam (functional) added

* added tests for invalid sizes

* typos and year

* clang-format

* csrgeam performance scripts

bump version

Replace host code in bsr2csr (#48)

* removed host bsr2csr and csr2bsr code and replaced it with device calls

* clang formatting

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

added some examples (#50)

* added sparse level 1 examples

* added examples for sparse level 2 and 3

* clang-format

* added sparse extra examples

* bump version

hipclang related fixes (#51)

* hipclang related fixes

* bump version

sanity check for matrix download (#52)

added fallback for unit test matrix downloads (#53)

examples fix (#54)

* header fix for examples

* bump version

got bsrmm working for block dim less than 8

clang formatting

fixing bugs and getting benchmark to work

optimizing and working on kernels for block dimension greater than 8

kernels and code for block dimension greater than 8 and B matrix transposed

expanded loop unrolling up to block dimension 16

clang formatting

Remove gpg check for CI package CentOS install (#57)

updated internal function names (#61)

* renamed internal csrtr to trm

* clang-format

added missing header (#62)

fixes to documentation

remove compile time evaluation of direction to help reduce the number of kernels

clang formatting

small performance improvements to transpose kernel

clang formatting

increase transpose performance

clang formatting

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Allow library dependencies to be installed from CI (#49)

csrgeam (#46)

* csrgeam API added

* csrgeam tests and benchmark added

* flops, bandwidth and host implementation for csrgeam

* csrgeam unit tests

* removed webbase_1M test

* csrgeam (functional) added

* added tests for invalid sizes

* typos and year

* clang-format

* csrgeam performance scripts

added some examples (#50)

* added sparse level 1 examples

* added examples for sparse level 2 and 3

* clang-format

* added sparse extra examples

* bump version

examples fix (#54)

* header fix for examples

* bump version

Remove gpg check for CI package CentOS install (#57)

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

adding fortran example code

fixing fortran compile error

adding bsrmm to fortran_module.f90

fixing fortran example array order

fix fortran compile error

fix fortran compile error

adding cpp example code for bsrmm

clang formatting

working on optimizing kernels

working on optimizing kernels

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

optimizing bsrmm

reverting back to original kernels

optimizing bsrmm

making test2 kernel active for block dim 8

optimizing bsrmm

significant performance improvement for block dimensions 5 to 32

further performance improvements to transpose and non-transpose case

reduce compile times and replaced general kernel

optimizing for n <= 16

Correction to the cmake RUNPATH parameter (#79)

Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com>

bump version

cmake update (#80)

* cmake update

* disabling OpenMP until this is fixed within hipclang

Csr2bsr optimization (#78)

* optimized csr2bsr_nnz

* rebase csr2bsr_optimization branch to squash commits

Working on optimizing csr2bsr device code

changed blocksize to 16 as this runs twice as fast

clang formatting

removing comments

performance optimizations

clang formatting

improve performance

clang formatting

csr2bsr optimization

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

Co-authored-by: jsandham <james.sandham@amd.com>

* reducing number of tests

* removing bank conflicts

* removing duplicate code from rocsparse-functions header

* fixing line in rocspasrse-functions header changed by bad merge

* fix formating from merge

* fix formatting errors from merge

Co-authored-by: jsandham <james.sandham@amd.com>

Change package dependency to hip-rocclr (#84)

adding libomp.so to rpath for clients (#85)

* fix for clients: adding libomp.so to rpath

* bump version

gfx908 asic revision check (#86)

* added asic revision to rocsparse handle

* asic revision is available with 3.7+

* bump version

Launchbounds (#87)

* added launch_bounds to kernel calls

* clang format

* bump version

disabled xnack (#89)

* fixed some compiler warnings

* disabled xnack

Update README.md

clang formatting

removing duplicate target file in CMakeList

remove duplicate template specialization

atomics (#90)

* replaced systemwide atomic with atomicOr and added some threadfences

* adjusted doc

* bump version

* syncthreads is only a blockwide fence

* fixing some wrong fences

* clang format

* fixing duplicate code added from bad merge

* fixing user manual formatting

Co-authored-by: jsandham <james.sandham@amd.com>
ntrost57 pushed a commit that referenced this pull request Nov 11, 2020
* optimized csr2bsr_nnz

* rebase csr2bsr_optimization branch to squash commits

Working on optimizing csr2bsr device code

changed blocksize to 16 as this runs twice as fast

clang formatting

removing comments

performance optimizations

clang formatting

improve performance

clang formatting

csr2bsr optimization

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

bump version

Single thread compile in install script (#63)

pyyaml package name fix for centos8 (#60)

* pyyaml package name fix for centos8

* this should also account for rhel8

* bump version

Update README.md

pivot test fix (#65)

* adding device sync in spin loop tests to not overwrite pivots before checking them

* bump version

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bump version

mtx pattern fix (#73)

Added centos 8 dependency fixes (#74)

bump version

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

added missing header (#62)

re-ordering row pointer and column arrays for csr2csr_compress (#59)

* re-ordering row pointer and column arrays for csr2csr_compress

* fixing broken tests

* fixing incorrect order in log_trace

* moving deletion of temporary arry to ensure it is always called

Co-authored-by: jsandham <james.sandham@amd.com>

Single thread compile in install script (#63)

Update README.md

Removing rock-dkms (#66)

Revert "Single thread compile in install script (#63)" (#69)

Fortran interface (#55)

* fortran interface draft with examples added

* example fix to properly work with return values

* force cmake to add .f90 module to package

* added some more missing level1, level3 and conversion routines

* added few more missing functions to wrapper

* csric0 and csrilu0 fortran examples

* csrgemm_buffer_size binding name fixed

* fortran example fix, stop allows only constant expressions

* fix for string passing

* added enums to fortran; example for aux functions; fixes to pointer arguments

* more examples

* updated fortran example output of csrilu0 and csric0

* updated install.sh script and dockerfiles to install gfortran dependencies

* fix for device pointer mode

* few changes to make it consistent with hipfort

* bump version

ddoti fortran fix (#71)

bsrmv smem sync? (#70)

bsrsv (#72)

* general working version of bsrsv for lower and upper non transposed matrices

* fixing bsr_to_bsc order

* added functionality for transposed matrix

* enabling complex numbers

* optimized bsrsv for BSR dimensions from 2x2 to 32x32

* gfx908

* fortran functions and example

* disabling some unit diagonal tests with nos1 and nos2

* bump version

fortran module fixes (#75)

centos 6 (#76)

* centos6 support

* bump version

* deleting commented code

* remove commented code

Co-authored-by: jsandham <james.sandham@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant