Releases: HiPerCoRe/KTT
Releases · HiPerCoRe/KTT
Version 0.6 RC1
- Added support for multiple compute queues and asynchronous operations
- Added support for online autotuning - kernel tuning combined with regular kernel running
- Added support for kernel arguments with user-defined data types
- Users now have greater control over kernel argument handling, tuner run modes were deprecated as a result
- Validated kernel arguments can now have user-defined comparator
- Added MCMC searcher
- Added local memory argument modifiers which work similarly to kernel thread size modifiers
- Added new buffer handling methods to tuning manipulator API
- Added support for floating-point kernel parameters
- Added method for retrieving kernel source code for specified kernel configuration
- Implemented caching of compiled kernels when using tuning manipulator
- Fixed several bugs in kernel composition methods
- Fixed several rare bugs which could occur while using tuning manipulator
- Added tutorials and several new examples
- Fixed paths to kernel files in examples on Linux
- Significantly improved documentation and added FAQ
- Added macro definitions for KTT version
Version 0.5 Beta
- Added support for kernel compositions
- Added two different tuner modes - tuning mode and low overhead computation mode
- Added support for storing buffers in host memory, including support for zero-copy buffers when computation mode is used
- Kernel arguments can now be retrieved through API by utilizing new method for running kernels
- Added an option to automatically ensure that global size is multiple of local size
- Best kernel configuration can now be retrieved through API
- Added an option to switch between CUDA and OpenCL global size notation
- Improvements to tuning manipulator API
- Usability improvements to dimension vector
- Tweaks to CUDA backend
- Minor improvements to result printer
- Improved examples and documentation
Version 0.4 Beta
- Added support for CUDA API
- Significantly improved tuning manipulator API
- Simplified baseline tuning manipulator and reference class usage
- Improved overall tuner performance
- Added support for uploading arguments into local (shared) memory
- Configurations with local size larger than maximum of the current device are now automatically excluded from computation
- Fixed memory leak in OpenCL backend
- Fixed several bugs in tuning manipulator API
- Fixed crash in annealing searcher
- Added an option to print results from failed kernel runs
- Improved tuner info messages
- Improved CSV printing method
- KTT is now compiled as dynamic (shared) library
- Added build customization options to premake script
- Additions and improvements to examples
- Improved documentation
Version 0.3.1 Beta
- Added support for new argument data types (8, 16, 32 and 64 bits long)
- Added support for time unit specification for result printing
- Added new utility methods to tuning manipulator API
- Improvements to tuning manipulator
- Fixed bugs in tuning manipulator API
- Read-only arguments are now cached in OpenCL backend
- Improved documentation
Version 0.3 Beta
- Added tuning manipulator interface
- Added support for validating multiple arguments with reference class
- Added support for short argument data type
- Added method for printing content of kernel arguments to file
- Added method for specifying location for info messages printing
- Additions and improvements to documentation
- Improvements to samples
- Fixed bug in CSV printing method
- Other minor bug fixes and improvements
Version 0.2 Beta
- Added methods for result printing
- Added methods for kernel output validation
- Additions and improvements to samples
- Added API documentation
- Implemented annealing searcher
- Fixed build under Linux
Version 0.1 Beta
- Basic autotuning functionality is now available