-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changed csrmv analysis to carry csr_val, just in case #52
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ntrost57
added a commit
that referenced
this pull request
Apr 6, 2020
ntrost57
pushed a commit
that referenced
this pull request
Jul 6, 2020
* started creating skeleton code for bsrmm * rebase bsrmm to squash commits clang formatting Allow library dependencies to be installed from CI (#49) csrgeam (#46) * csrgeam API added * csrgeam tests and benchmark added * flops, bandwidth and host implementation for csrgeam * csrgeam unit tests * removed webbase_1M test * csrgeam (functional) added * added tests for invalid sizes * typos and year * clang-format * csrgeam performance scripts bump version Replace host code in bsr2csr (#48) * removed host bsr2csr and csr2bsr code and replaced it with device calls * clang formatting Co-authored-by: jsandham <james.sandham@amd.com> bump version added some examples (#50) * added sparse level 1 examples * added examples for sparse level 2 and 3 * clang-format * added sparse extra examples * bump version hipclang related fixes (#51) * hipclang related fixes * bump version sanity check for matrix download (#52) added fallback for unit test matrix downloads (#53) examples fix (#54) * header fix for examples * bump version got bsrmm working for block dim less than 8 clang formatting fixing bugs and getting benchmark to work optimizing and working on kernels for block dimension greater than 8 kernels and code for block dimension greater than 8 and B matrix transposed expanded loop unrolling up to block dimension 16 clang formatting Remove gpg check for CI package CentOS install (#57) updated internal function names (#61) * renamed internal csrtr to trm * clang-format added missing header (#62) fixes to documentation remove compile time evaluation of direction to help reduce the number of kernels clang formatting small performance improvements to transpose kernel clang formatting increase transpose performance clang formatting re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Allow library dependencies to be installed from CI (#49) csrgeam (#46) * csrgeam API added * csrgeam tests and benchmark added * flops, bandwidth and host implementation for csrgeam * csrgeam unit tests * removed webbase_1M test * csrgeam (functional) added * added tests for invalid sizes * typos and year * clang-format * csrgeam performance scripts added some examples (#50) * added sparse level 1 examples * added examples for sparse level 2 and 3 * clang-format * added sparse extra examples * bump version examples fix (#54) * header fix for examples * bump version Remove gpg check for CI package CentOS install (#57) added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version adding fortran example code fixing fortran compile error adding bsrmm to fortran_module.f90 fixing fortran example array order fix fortran compile error fix fortran compile error adding cpp example code for bsrmm clang formatting working on optimizing kernels working on optimizing kernels optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm reverting back to original kernels optimizing bsrmm making test2 kernel active for block dim 8 optimizing bsrmm significant performance improvement for block dimensions 5 to 32 further performance improvements to transpose and non-transpose case reduce compile times and replaced general kernel optimizing for n <= 16 Correction to the cmake RUNPATH parameter (#79) Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com> bump version cmake update (#80) * cmake update * disabling OpenMP until this is fixed within hipclang Csr2bsr optimization (#78) * optimized csr2bsr_nnz * rebase csr2bsr_optimization branch to squash commits Working on optimizing csr2bsr device code changed blocksize to 16 as this runs twice as fast clang formatting removing comments performance optimizations clang formatting improve performance clang formatting csr2bsr optimization added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Co-authored-by: jsandham <james.sandham@amd.com> * reducing number of tests * removing bank conflicts * removing duplicate code from rocsparse-functions header * fixing line in rocspasrse-functions header changed by bad merge * fix formating from merge * fix formatting errors from merge Co-authored-by: jsandham <james.sandham@amd.com>
ntrost57
pushed a commit
that referenced
this pull request
Jul 21, 2020
* Creating skeleton code for bsric02 * clang formatting work on implementing bsric02 kernel bsric02 working for block dim equal 2, 4, 8, 16 implementing binary search kernel fixing kernel bugs clang formatting fixing thread divergence in warp errors Work on optimizing and testing clang formatting re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version clang formatting fixing documentation for bsric02 removing comments removing comments adjusting test yaml file changing test yaml file removing comments adding fortran example clang formatting adding fortran bsric0 example to CMakelist fix compiler errors fixing bug in fortran example fix fortran compiler errors optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 adding gfx908 work around Correction to the cmake RUNPATH parameter (#79) Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com> bump version cmake update (#80) * cmake update * disabling OpenMP until this is fixed within hipclang Csr2bsr optimization (#78) * optimized csr2bsr_nnz * rebase csr2bsr_optimization branch to squash commits Working on optimizing csr2bsr device code changed blocksize to 16 as this runs twice as fast clang formatting removing comments performance optimizations clang formatting improve performance clang formatting csr2bsr optimization added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Co-authored-by: jsandham <james.sandham@amd.com> Change default compiler in install script to hipclang (#81) hipclang updated readme hipclang doc update (#82) * hipclang doc update * doxygen seem to struggle with tabs and spaces * doc auto version sync clang formatting fixing bug in csric0 and bsric0 where we were not using conj for complex matrices optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 adding float and double versions for std::conj moving conj into rocsparse_math.hpp optimizing bsric0 optimizing bsric0 optimize bsric0 optimize bsric0 optimizing bsric0 optimize bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 optimizing bsric0 clang formatting optimizing bsric0 clang formatting optimizing bsric0 Bsrmm (#56) * started creating skeleton code for bsrmm * rebase bsrmm to squash commits clang formatting Allow library dependencies to be installed from CI (#49) csrgeam (#46) * csrgeam API added * csrgeam tests and benchmark added * flops, bandwidth and host implementation for csrgeam * csrgeam unit tests * removed webbase_1M test * csrgeam (functional) added * added tests for invalid sizes * typos and year * clang-format * csrgeam performance scripts bump version Replace host code in bsr2csr (#48) * removed host bsr2csr and csr2bsr code and replaced it with device calls * clang formatting Co-authored-by: jsandham <james.sandham@amd.com> bump version added some examples (#50) * added sparse level 1 examples * added examples for sparse level 2 and 3 * clang-format * added sparse extra examples * bump version hipclang related fixes (#51) * hipclang related fixes * bump version sanity check for matrix download (#52) added fallback for unit test matrix downloads (#53) examples fix (#54) * header fix for examples * bump version got bsrmm working for block dim less than 8 clang formatting fixing bugs and getting benchmark to work optimizing and working on kernels for block dimension greater than 8 kernels and code for block dimension greater than 8 and B matrix transposed expanded loop unrolling up to block dimension 16 clang formatting Remove gpg check for CI package CentOS install (#57) updated internal function names (#61) * renamed internal csrtr to trm * clang-format added missing header (#62) fixes to documentation remove compile time evaluation of direction to help reduce the number of kernels clang formatting small performance improvements to transpose kernel clang formatting increase transpose performance clang formatting re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Allow library dependencies to be installed from CI (#49) csrgeam (#46) * csrgeam API added * csrgeam tests and benchmark added * flops, bandwidth and host implementation for csrgeam * csrgeam unit tests * removed webbase_1M test * csrgeam (functional) added * added tests for invalid sizes * typos and year * clang-format * csrgeam performance scripts added some examples (#50) * added sparse level 1 examples * added examples for sparse level 2 and 3 * clang-format * added sparse extra examples * bump version examples fix (#54) * header fix for examples * bump version Remove gpg check for CI package CentOS install (#57) added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version adding fortran example code fixing fortran compile error adding bsrmm to fortran_module.f90 fixing fortran example array order fix fortran compile error fix fortran compile error adding cpp example code for bsrmm clang formatting working on optimizing kernels working on optimizing kernels optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm reverting back to original kernels optimizing bsrmm making test2 kernel active for block dim 8 optimizing bsrmm significant performance improvement for block dimensions 5 to 32 further performance improvements to transpose and non-transpose case reduce compile times and replaced general kernel optimizing for n <= 16 Correction to the cmake RUNPATH parameter (#79) Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com> bump version cmake update (#80) * cmake update * disabling OpenMP until this is fixed within hipclang Csr2bsr optimization (#78) * optimized csr2bsr_nnz * rebase csr2bsr_optimization branch to squash commits Working on optimizing csr2bsr device code changed blocksize to 16 as this runs twice as fast clang formatting removing comments performance optimizations clang formatting improve performance clang formatting csr2bsr optimization added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Co-authored-by: jsandham <james.sandham@amd.com> * reducing number of tests * removing bank conflicts * removing duplicate code from rocsparse-functions header * fixing line in rocspasrse-functions header changed by bad merge * fix formating from merge * fix formatting errors from merge Co-authored-by: jsandham <james.sandham@amd.com> bump version level3/gemmi feature (#83) * gemmi benchmark * gemmi tests * gemmi samples * gemmi documentation * gemmi API * gemmi fortran binding and example * internal gemmi structure * gemmi kernel for transposed B * minor tweaks * bump version Change package dependency to hip-rocclr (#84) bump version adding libomp.so to rpath for clients (#85) * fix for clients: adding libomp.so to rpath * bump version gfx908 asic revision check (#86) * added asic revision to rocsparse handle * asic revision is available with 3.7+ * bump version clang formatting cleaning up comments re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) Correction to the cmake RUNPATH parameter (#79) Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com> cmake update (#80) * cmake update * disabling OpenMP until this is fixed within hipclang Csr2bsr optimization (#78) * optimized csr2bsr_nnz * rebase csr2bsr_optimization branch to squash commits Working on optimizing csr2bsr device code changed blocksize to 16 as this runs twice as fast clang formatting removing comments performance optimizations clang formatting improve performance clang formatting csr2bsr optimization added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Co-authored-by: jsandham <james.sandham@amd.com> Change default compiler in install script to hipclang (#81) hipclang updated readme hipclang doc update (#82) * hipclang doc update * doxygen seem to struggle with tabs and spaces * doc auto version sync Bsrmm (#56) * started creating skeleton code for bsrmm * rebase bsrmm to squash commits clang formatting Allow library dependencies to be installed from CI (#49) csrgeam (#46) * csrgeam API added * csrgeam tests and benchmark added * flops, bandwidth and host implementation for csrgeam * csrgeam unit tests * removed webbase_1M test * csrgeam (functional) added * added tests for invalid sizes * typos and year * clang-format * csrgeam performance scripts bump version Replace host code in bsr2csr (#48) * removed host bsr2csr and csr2bsr code and replaced it with device calls * clang formatting Co-authored-by: jsandham <james.sandham@amd.com> bump version added some examples (#50) * added sparse level 1 examples * added examples for sparse level 2 and 3 * clang-format * added sparse extra examples * bump version hipclang related fixes (#51) * hipclang related fixes * bump version sanity check for matrix download (#52) added fallback for unit test matrix downloads (#53) examples fix (#54) * header fix for examples * bump version got bsrmm working for block dim less than 8 clang formatting fixing bugs and getting benchmark to work optimizing and working on kernels for block dimension greater than 8 kernels and code for block dimension greater than 8 and B matrix transposed expanded loop unrolling up to block dimension 16 clang formatting Remove gpg check for CI package CentOS install (#57) updated internal function names (#61) * renamed internal csrtr to trm * clang-format added missing header (#62) fixes to documentation remove compile time evaluation of direction to help reduce the number of kernels clang formatting small performance improvements to transpose kernel clang formatting increase transpose performance clang formatting re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Allow library dependencies to be installed from CI (#49) csrgeam (#46) * csrgeam API added * csrgeam tests and benchmark added * flops, bandwidth and host implementation for csrgeam * csrgeam unit tests * removed webbase_1M test * csrgeam (functional) added * added tests for invalid sizes * typos and year * clang-format * csrgeam performance scripts added some examples (#50) * added sparse level 1 examples * added examples for sparse level 2 and 3 * clang-format * added sparse extra examples * bump version examples fix (#54) * header fix for examples * bump version Remove gpg check for CI package CentOS install (#57) added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version adding fortran example code fixing fortran compile error adding bsrmm to fortran_module.f90 fixing fortran example array order fix fortran compile error fix fortran compile error adding cpp example code for bsrmm clang formatting working on optimizing kernels working on optimizing kernels optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm reverting back to original kernels optimizing bsrmm making test2 kernel active for block dim 8 optimizing bsrmm significant performance improvement for block dimensions 5 to 32 further performance improvements to transpose and non-transpose case reduce compile times and replaced general kernel optimizing for n <= 16 Correction to the cmake RUNPATH parameter (#79) Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com> bump version cmake update (#80) * cmake update * disabling OpenMP until this is fixed within hipclang Csr2bsr optimization (#78) * optimized csr2bsr_nnz * rebase csr2bsr_optimization branch to squash commits Working on optimizing csr2bsr device code changed blocksize to 16 as this runs twice as fast clang formatting removing comments performance optimizations clang formatting improve performance clang formatting csr2bsr optimization added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Co-authored-by: jsandham <james.sandham@amd.com> * reducing number of tests * removing bank conflicts * removing duplicate code from rocsparse-functions header * fixing line in rocspasrse-functions header changed by bad merge * fix formating from merge * fix formatting errors from merge Co-authored-by: jsandham <james.sandham@amd.com> level3/gemmi feature (#83) * gemmi benchmark * gemmi tests * gemmi samples * gemmi documentation * gemmi API * gemmi fortran binding and example * internal gemmi structure * gemmi kernel for transposed B * minor tweaks * bump version Change package dependency to hip-rocclr (#84) adding libomp.so to rpath for clients (#85) * fix for clients: adding libomp.so to rpath * bump version gfx908 asic revision check (#86) * added asic revision to rocsparse handle * asic revision is available with 3.7+ * bump version clang formatting adding bsric0 to rocsparse_module.f90 making sure bsrsv analysis reuse works adding asic to bsric0 fix compile error fixing atomicOr removing comments latexpdf (#88) * latexpdf * bump version Launchbounds (#87) * added launch_bounds to kernel calls * clang format * bump version disabled xnack (#89) * fixed some compiler warnings * disabled xnack Update README.md moving some of the quick tests to be pre checkin tests adding underscore clang formatting Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) Correction to the cmake RUNPATH parameter (#79) Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com> cmake update (#80) * cmake update * disabling OpenMP until this is fixed within hipclang Csr2bsr optimization (#78) * optimized csr2bsr_nnz * rebase csr2bsr_optimization branch to squash commits Working on optimizing csr2bsr device code changed blocksize to 16 as this runs twice as fast clang formatting removing comments performance optimizations clang formatting improve performance clang formatting csr2bsr optimization added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Co-authored-by: jsandham <james.sandham@amd.com> Change default compiler in install script to hipclang (#81) hipclang updated readme Bsrmm (#56) * started creating skeleton code for bsrmm * rebase bsrmm to squash commits clang formatting Allow library dependencies to be installed from CI (#49) csrgeam (#46) * csrgeam API added * csrgeam tests and benchmark added * flops, bandwidth and host implementation for csrgeam * csrgeam unit tests * removed webbase_1M test * csrgeam (functional) added * added tests for invalid sizes * typos and year * clang-format * csrgeam performance scripts bump version Replace host code in bsr2csr (#48) * removed host bsr2csr and csr2bsr code and replaced it with device calls * clang formatting Co-authored-by: jsandham <james.sandham@amd.com> bump version added some examples (#50) * added sparse level 1 examples * added examples for sparse level 2 and 3 * clang-format * added sparse extra examples * bump version hipclang related fixes (#51) * hipclang related fixes * bump version sanity check for matrix download (#52) added fallback for unit test matrix downloads (#53) examples fix (#54) * header fix for examples * bump version got bsrmm working for block dim less than 8 clang formatting fixing bugs and getting benchmark to work optimizing and working on kernels for block dimension greater than 8 kernels and code for block dimension greater than 8 and B matrix transposed expanded loop unrolling up to block dimension 16 clang formatting Remove gpg check for CI package CentOS install (#57) updated internal function names (#61) * renamed internal csrtr to trm * clang-format added missing header (#62) fixes to documentation remove compile time evaluation of direction to help reduce the number of kernels clang formatting small performance improvements to transpose kernel clang formatting increase transpose performance clang formatting re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Allow library dependencies to be installed from CI (#49) csrgeam (#46) * csrgeam API added * csrgeam tests and benchmark added * flops, bandwidth and host implementation for csrgeam * csrgeam unit tests * removed webbase_1M test * csrgeam (functional) added * added tests for invalid sizes * typos and year * clang-format * csrgeam performance scripts added some examples (#50) * added sparse level 1 examples * added examples for sparse level 2 and 3 * clang-format * added sparse extra examples * bump version examples fix (#54) * header fix for examples * bump version Remove gpg check for CI package CentOS install (#57) added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version adding fortran example code fixing fortran compile error adding bsrmm to fortran_module.f90 fixing fortran example array order fix fortran compile error fix fortran compile error adding cpp example code for bsrmm clang formatting working on optimizing kernels working on optimizing kernels optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm optimizing bsrmm reverting back to original kernels optimizing bsrmm making test2 kernel active for block dim 8 optimizing bsrmm significant performance improvement for block dimensions 5 to 32 further performance improvements to transpose and non-transpose case reduce compile times and replaced general kernel optimizing for n <= 16 Correction to the cmake RUNPATH parameter (#79) Co-authored-by: Pruthvi Madugundu <mpruthvi@gmail.com> bump version cmake update (#80) * cmake update * disabling OpenMP until this is fixed within hipclang Csr2bsr optimization (#78) * optimized csr2bsr_nnz * rebase csr2bsr_optimization branch to squash commits Working on optimizing csr2bsr device code changed blocksize to 16 as this runs twice as fast clang formatting removing comments performance optimizations clang formatting improve performance clang formatting csr2bsr optimization added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> bump version Single thread compile in install script (#63) pyyaml package name fix for centos8 (#60) * pyyaml package name fix for centos8 * this should also account for rhel8 * bump version Update README.md pivot test fix (#65) * adding device sync in spin loop tests to not overwrite pivots before checking them * bump version Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bump version mtx pattern fix (#73) Added centos 8 dependency fixes (#74) bump version bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version added missing header (#62) re-ordering row pointer and column arrays for csr2csr_compress (#59) * re-ordering row pointer and column arrays for csr2csr_compress * fixing broken tests * fixing incorrect order in log_trace * moving deletion of temporary arry to ensure it is always called Co-authored-by: jsandham <james.sandham@amd.com> Single thread compile in install script (#63) Update README.md Removing rock-dkms (#66) Revert "Single thread compile in install script (#63)" (#69) Fortran interface (#55) * fortran interface draft with examples added * example fix to properly work with return values * force cmake to add .f90 module to package * added some more missing level1, level3 and conversion routines * added few more missing functions to wrapper * csric0 and csrilu0 fortran examples * csrgemm_buffer_size binding name fixed * fortran example fix, stop allows only constant expressions * fix for string passing * added enums to fortran; example for aux functions; fixes to pointer arguments * more examples * updated fortran example output of csrilu0 and csric0 * updated install.sh script and dockerfiles to install gfortran dependencies * fix for device pointer mode * few changes to make it consistent with hipfort * bump version ddoti fortran fix (#71) bsrmv smem sync? (#70) bsrsv (#72) * general working version of bsrsv for lower and upper non transposed matrices * fixing bsr_to_bsc order * added functionality for transposed matrix * enabling complex numbers * optimized bsrsv for BSR dimensions from 2x2 to 32x32 * gfx908 * fortran functions and example * disabling some unit diagonal tests with nos1 and nos2 * bump version fortran module fixes (#75) centos 6 (#76) * centos6 support * bump version Co-authored-by: jsandham <james.sandham@amd.com> * reducing number of tests * removing bank conflicts * removing duplicate code from rocsparse-functions header * fixing line in rocspasrse-functions header changed by bad merge * fix formating from merge * fix formatting errors from merge Co-authored-by: jsandham <james.sandham@amd.com> Change package dependency to hip-rocclr (#84) adding libomp.so to rpath for clients (#85) * fix for clients: adding libomp.so to rpath * bump version gfx908 asic revision check (#86) * added asic revision to rocsparse handle * asic revision is available with 3.7+ * bump version Launchbounds (#87) * added launch_bounds to kernel calls * clang format * bump version disabled xnack (#89) * fixed some compiler warnings * disabled xnack Update README.md clang formatting removing duplicate target file in CMakeList remove duplicate template specialization atomics (#90) * replaced systemwide atomic with atomicOr and added some threadfences * adjusted doc * bump version * syncthreads is only a blockwide fence * fixing some wrong fences * clang format * fixing duplicate code added from bad merge * fixing user manual formatting Co-authored-by: jsandham <james.sandham@amd.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.