Version 0.4 Beta
Pre-release
Pre-release
- Added support for CUDA API
- Significantly improved tuning manipulator API
- Simplified baseline tuning manipulator and reference class usage
- Improved overall tuner performance
- Added support for uploading arguments into local (shared) memory
- Configurations with local size larger than maximum of the current device are now automatically excluded from computation
- Fixed memory leak in OpenCL backend
- Fixed several bugs in tuning manipulator API
- Fixed crash in annealing searcher
- Added an option to print results from failed kernel runs
- Improved tuner info messages
- Improved CSV printing method
- KTT is now compiled as dynamic (shared) library
- Added build customization options to premake script
- Additions and improvements to examples
- Improved documentation