Skip to content

Releases: CNugteren/CLBlast

Preview version 0.7.0

08 May 19:28
Compare
Choose a tag to compare

Version 0.7.0

  • Added exports to be able to create a DLL on Windows (thanks to Marco Hutter)
  • Made the library thread-safe
  • Performance and correctness tests can now (on top of clBLAS) be performed against CPU BLAS libraries
  • Fixed the use of events within the library
  • Changed the enum parameters to match the raw values of the cblas standard
  • Fixed the cache of previously compiled binaries and added a function to fill or clear it
  • Various minor fixes and enhancements
  • Added a preliminary version of the API documentation
  • Added additional sample programs
  • Added tuned parameters for various devices (see README)
  • Added level-1 routines:
    • SNRM2/DNRM2/ScNRM2/DzNRM2
    • SASUM/DASUM/ScASUM/DzASUM
    • SSUM/DSUM/ScSUM/DzSUM (non-absolute version of the above xASUM BLAS routines)
    • iSAMAX/iDAMAX/iCAMAX/iZAMAX
    • iSMAX/iDMAX/iCMAX/iZMAX (non-absolute version of the above ixAMAX BLAS routines)
    • iSMIN/iDMIN/iCMIN/iZMIN (non-absolute minimum version of the above ixAMAX BLAS routines)

Note:
Binary releases are experimental, build from source code if possible.

Preview version 0.6.0

13 Mar 10:10
Compare
Choose a tag to compare

Version 0.6.0

  • Added support for MSVC (Visual Studio) 2015
  • Added tuned parameters for various devices (see README)
  • Now automatically generates C++ code from JSON tuning results
  • Added level-2 routines:
    • SGER/DGER
    • CGERU/ZGERU
    • CGERC/ZGERC
    • CHER/ZHER
    • CHPR/ZHPR
    • CHER2/ZHER2
    • CHPR2/ZHPR2
    • CSYR/ZSYR
    • CSPR/ZSPR
    • CSYR2/ZSYR2
    • CSPR2/ZSPR2

Preview version 0.5.0

17 Oct 13:58
Compare
Choose a tag to compare

Version 0.5.0

  • Improved structure and performance of level-2 routines (xSYMV/xHEMV)
  • Reduced compilation time of level-3 OpenCL kernels
  • Added level-1 routines:
    • SSWAP/DSWAP/CSWAP/ZSWAP
    • SSCAL/DSCAL/CSCAL/ZSCAL
    • SCOPY/DCOPY/CCOPY/ZCOPY
    • SDOT/DDOT
    • CDOTU/ZDOTU
    • CDOTC/ZDOTC
  • Added level-2 routines:
    • SGBMV/DGBMV/CGBMV/ZGBMV
    • CHBMV/ZHBMV
    • CHPMV/ZHPMV
    • SSBMV/DSBMV
    • SSPMV/DSPMV
    • STRMV/DTRMV/CTRMV/ZTRMV
    • STBMV/DTBMV/CTBMV/ZTBMV
    • STPMV/DTPMV/CTPMV/ZTPMV

Preview version 0.4.0

22 Aug 10:48
Compare
Choose a tag to compare

Version 0.4.0

  • Now using the Claduc C++11 interface to OpenCL
  • Added plain C API for increased compatibility (clblast_c.h)
  • Re-organized tuner infrastructure and added JSON output
  • Removed clBLAS sources, it should now be installed separately for testing
  • Added Travis continuous integration
  • Added level-2 routines:
    • CHEMV/ZHEMV
    • SSYMV/DSYMV

Preview version 0.3.0

27 Jul 07:38
Compare
Choose a tag to compare

Version 0.3.0

  • Re-organized test/client infrastructure to avoid code duplication
  • Added an optional bypass for pre/post-processing kernels in level-3 routines
  • Significantly improved performance of level-3 routines on AMD GPUs
  • Added level-3 routines:
    • CHEMM/ZHEMM
    • SSYRK/DSYRK/CSYRK/ZSYRK
    • CHERK/ZHERK
    • SSYR2K/DSYR2K/CSYR2K/ZSYR2K
    • CHER2K/ZHER2K
    • STRMM/DTRMM/CTRMM/ZTRMM

Preview version 0.2.0

21 Jun 07:28
Compare
Choose a tag to compare

Version 0.2.0

  • Added support for complex conjugate transpose
  • Several host-code performance improvements
  • Improved testing infrastructure and coverage
  • Added level-2 routines:
    • SGEMV/DGEMV/CGEMV/ZGEMV
  • Added level-3 routines:
    • CGEMM/ZGEMM
    • CSYMM/ZSYMM

Preview version 0.1.0

17 Jun 15:14
Compare
Choose a tag to compare

Version 0.1.0

  • Initial preview version release to GitHub
  • Supported level-1 routines:
    • SAXPY/DAXPY/CAXPY/ZAXPY
  • Supported level-3 routines:
    • SGEMM/DGEMM
    • SSYMM/DSYMM