Skip to content
Ronan Keryell edited this page Mar 10, 2015 · 3 revisions

clBLAS accuracy test programs

The primary correctness test tools for clBLAS is based on the googletest test framework located in ./src/tests/correctness. When the test tools are built, 3 executables are generated based on googletest which are similar in function except for the amount of execution time the tests run.

Test Name Test Length
test-short can run from minutes to hours
test-medium can run from hours up to a day
test-correctness can run for days

Setting up the tests to run

The cmake generated build projects define an INSTALL target that can be built, which in addition to compiling the sources also goes through the extra work of creating a ./bin/clBLAS/develop/vs10x64/package subdirectory and copying all built executables and libraries together into the same directory. This is super convenient for performance measurement and testing, as typically the build tree has the executables and libraries scattered and built in their own build directories.

If the test executables are built using ACML as the reference library (the default), the test-short|medium|correctness executables have an external dependency on ACML as the reference library. Since ACML is not built with CMake, it does not know to copy the libacml_dll.<PlatExt> file into the package directory, so this is a manual step that needs to completed to successfully run the tests. Depending on the version of ACML used, ACML itself may have dependencies on the Fortran runtime files which should also be copied into the ./bin/clBLAS/develop/vs10x64/package subdirectory alongside libacml_dll.<PlatExt>.

Execution examples

Executing the test programs with --help shows the gtest related flags that control how tests are run

F:\code\GitHub\kknox\bin\clBLAS\develop\vs10x64\package\bin64> .\test-short.exe --help
This program contains tests written using Google Test. You can use the
following command line flags to control its behavior:

Test Selection:
  --gtest_list_tests
      List the names of all tests instead of running them. The name of
      TEST(Foo, Bar) is "Foo.Bar".
  --gtest_filter=POSTIVE_PATTERNS[-NEGATIVE_PATTERNS]
      Run only the tests whose name matches one of the positive patterns but
      none of the negative patterns. '?' matches any single character; '*'
      matches any substring; ':' separates two patterns.
  --gtest_also_run_disabled_tests
      Run all disabled tests too.

Test Execution:
  --gtest_repeat=[COUNT]
      Run the tests repeatedly; use a negative count to repeat forever.
  --gtest_shuffle
      Randomize tests' orders on every iteration.
  --gtest_random_seed=[NUMBER]
      Random number seed to use for shuffling test orders (between 1 and
      99999, or 0 to use a seed based on the current time).

Test Output:
  --gtest_color=(yes|no|auto)
      Enable/disable colored output. The default is auto.
  --gtest_print_time=0
      Don't print the elapsed time of each test.
  --gtest_output=xml[:DIRECTORY_PATH\|:FILE_PATH]
      Generate an XML report in the given directory or with the given file
      name. FILE_PATH defaults to test_details.xml.

Assertion Behavior:
  --gtest_break_on_failure
      Turn assertion failures into debugger break-points.
  --gtest_throw_on_failure
      Turn assertion failures into C++ exceptions.
  --gtest_catch_exceptions=0
      Do not report exceptions as test failures. Instead, allow them
      to crash the program or throw a pop-up (on Windows).

Except for --gtest_list_tests, you can alternatively set the corresponding
environment variable of a flag (all letters in upper-case). For example, to
disable colored text output, you can either specify --gtest_color=no or set
the GTEST_COLOR environment variable to no.

For more information, please read the Google Test documentation at
http://code.google.com/p/googletest/. If you find a bug in Google Test
(not one in your own code or tests), please report it to
<googletestframework@googlegroups.com>.
Initialize OpenCL and clAmdBlas...
---- Advanced Micro Devices, Inc.
SetUp: about to create command queues

Test environment:

Device name: Tahiti
Device vendor: Advanced Micro Devices, Inc.
Platform (bit): Windows (x64)
clAmdBlas version: 2.1.0
Driver version: 1124.2 (VM)
Device version: OpenCL 1.2 AMD-APP (1124.2)
Global mem size: 2048 MB
---------------------------------------------------------

The exact time it takes for the test executables to finish is dependent on the hardware under test, and varies widely. For day to day use or to automate in a continuous integration setting, it is recommended to use test-short. Gtest filters can additionally be applied to narrow testing to only a fraction of the overall tests, if it is known that recent code edits are limited to a strict subset of functionality. The gtest_filter flag takes a regular expression that it matches to the test name, and each test name is unique.

Testing a subset of tests

An example of running only sgemm related tests

F:\code\GitHub\kknox\bin\clBLAS\develop\vs10x64\package\bin64> .\test-short.exe --gtest_filter=*sgemm*
Initialize OpenCL and clAmdBlas...
---- Advanced Micro Devices, Inc.
SetUp: about to create command queues

Test environment:

Device name: Tahiti
Device vendor: Advanced Micro Devices, Inc.
Platform (bit): Windows (x64)
clAmdBlas version: 2.1.0
Driver version: 1124.2 (VM)
Device version: OpenCL 1.2 AMD-APP (1124.2)
Global mem size: 2048 MB
---------------------------------------------------------

Note: Google Test filter = *sgemm*
[==========] Running 308 tests from 7 test cases.
[----------] Global test environment set-up.
[----------] 72 tests from ColumnMajor_SmallRange/GEMM
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/0
clAmdBlasColumnMajor, clAmdBlasNoTrans, clAmdBlasNoTrans
M = 63, N = 63, K = 63
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 63, ldc = 63
seed = 12345
queues = 1
Generating input data... Done
Calling reference xGEMM routine... Done
Calling clAmdBlas xGEMM routine... Done
[       OK ] ColumnMajor_SmallRange/GEMM.sgemm/0 (398 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/1
clAmdBlasColumnMajor, clAmdBlasNoTrans, clAmdBlasNoTrans
M = 63, N = 63, K = 128
offA = 0, offB = 0, offC = 0
lda = 63, ldb = 128, ldc = 63
seed = 12345
queues = 1

Eliminating tests with negative filters

Negative test filters are also available with the '-' operator to filter out tests. An example that filters out all ztrmm tests would look like .\test-short.exe --gtest_filter=-*ztrmm* with much the same output as above, except no ztrmm tests will run in the test pass.

Combination of positive and negative filters

A complicated expression can be created using both positive and negative filters, separated by the ':' character. This is just using standard googletest filter notation. .\test-short.exe --gtest_filter=*ColumnMajor*:-*ztrmm*

clBLAS interface test program

The primary API and interface test tool for clBLAS is based on the googletest test framework and is located in ./src/tests/functional. After building the INSTALL target, the binary is located alongside the other test-* tests in ./bin/clBLAS/develop/vs10x64/package and is named test-functional. The scope of test-functional is to test permutations of correct and incorrect parameters into the BLAS API and to also validate the return results, including error results. As a function of testing the API, a subset of tests in test-functional create multiple OpenCL devices and call into the clBLAS with multiple queue's. As the logic handling the input parameters validation are not likely to change often once written, this test program is usually only run manually by a developer to sanity check their changes.

clBLAS performance test program ( DEPRECATED )

test-performance is based on the googletest test framework and is located in ./src/tests/performance. After building the INSTALL target, the binary is located alongside the other test-* tests in ./bin/clBLAS/develop/vs10x64/package and is named test-performance. test-performance is a performance testing program that compares the performance of a particular BLAS algorithm running on CPU vs. GPU, and reports results as a speedup of GPU over CPU. Each individual test case reported the result of a particular matrix size and function. While this program was simple to create and served its purpose well during the beginning of clBLAS development, it is now deprecated as the python scripts in scripts\perf are more flexible and can create graphs of performance sweeping over the matrix size on the x-axis. However, the code is still available as an example for writing gtest based performance tests. Example output of running test-performance is

F:\code\GitHub\kknox\bin\clBLAS\develop\vs10x64\package\bin64> .\test-performance.exe
Initialize OpenCL and CLBLAS...
---- Advanced Micro Devices, Inc.
SetUp: about to create command queues
[==========] Running 48124 tests from 92 test cases.
[----------] Global test environment set-up.
[----------] 2304 tests from Generic/GEMM
[ RUN      ] Generic/GEMM.sgemm/0
clAmdBlasColumnMajor, clAmdBlasNoTrans, clAmdBlasNoTrans
M = 2048, N = 2048, K = 2048
offA = 0, offB = 0, offC = 0
lda = 2048, ldb = 2048, ldc = 2048
seed = 12345
queues = 1
Acml reference function has worked in 678 milliseconds, clBlas function has worked in 16 milliseconds, time ratio i
s 42.2137
clBlas GFLOPS : 1069.53


[       OK ] Generic/GEMM.sgemm/0 (5456 ms)
[ RUN      ] Generic/GEMM.sgemm/1
clAmdBlasColumnMajor, clAmdBlasNoTrans, clAmdBlasNoTrans
M = 2048, N = 2048, K = 2800
offA = 0, offB = 0, offC = 0
lda = 2048, ldb = 2800, ldc = 2048
seed = 12345
queues = 1
Acml reference function has worked in 938 milliseconds, clBlas function has worked in 21 milliseconds, time ratio i
s 43.0902
clBlas GFLOPS : 1078.43

Creating a standalone test program with make-ktest

It can be convenient to automatically generate a standalone executable which executes a BLAS call of interest.
For instance, the googletest suite has may find a failure in a particular function, and it would be nice to submit a bug report with an example test case without having to distribute the entire test application. Or, a performance regression of a particular function was found with specific parameters. Regardless of reason, the make-ktest application is a tool that can help automate this process

F:\code\GitHub\kknox\bin\clBLAS\develop\vs10x64\package\bin64>make-ktest.exe --help

Application Arguments:
  --config arg (=ktest.cfg) Configuration file
  -h [ --help ]             Show this help message

Generator Arguments:
  --cpp arg (=ktest.cpp) Output file name for C++ generated source
  --cl arg               Output file name for OpenCL generated source
  --data arg (=random)   Data generation pattern
                         Format: {random | unit | sawtooth}
  --skip-accuracy        Don't generate code for accuracy check. Applicable if
                         the program is needed only for performance measurement

OpenCL Arguments:
  --platform arg (=AMD Accelerated Parallel Processing)
                                        Platform name
  --device arg (=Tahiti)                Device name
  --build-options arg                   Build options

BLAS Arguments:
  -f [ --function ] arg Function name, mandatory
                        Format: {s | d | c | z}{BLAS function}
  --order arg (=row)    Data ordering
                        Format: {column | row}
  --side arg (=left)    The side matrix A is located relative to matrix B
                        Format: {left | right}
  --uplo arg (=upper)   Upper or lower triangle of matrix is being referenced
                        Format: {upper | lower}
  --transA arg (=n)     Matrix A transposition operation
                        Format: {n | t | c}
  --transB arg (=n)     Matrix B transposition operation
                        Format: {n | t | c}
  --diag arg (=nonunit) Whether the matrix is unit triangular
                        Format: {unit | nonunit}
  -M [ --M ] arg (=256)
  -N [ --N ] arg (=256)
  -K [ --K ] arg (=256)
  --alpha arg (=1)      Alpha multiplier
                        Format: real[,imag]
  --beta arg (=1)       Beta multiplier
                        Format: real[,imag]
  --lda arg             Leading dimension of the matrix A
  --ldb arg             Leading dimension of the matrix B
  --ldc arg             Leading dimension of the matrix C
  --offA arg (=0)       Start offset in buffer of matrix A
  --offBX arg (=0)      Start offset in buffer of matrix B or vector X
  --offCY arg (=0)      Start offset in buffer of matrix C or vector Y
  --incx arg (=1)       Increment in the array X
  --incy arg (=1)       Increment in the array Y

Decomposition Options:
  -d [ --decomposition ] arg SubproblemDim
                             Format: {subdims[0].x},{subdims[0].y},
                                     {subdims[0].bwidth},
                                     {subdims[1].x},{subdims[1].y},
                                     {subdims[1].bwidth}
  --multikernel arg (=0)     Allow division of one BLAS function between
                             several kernels

In the directory that the tool is launched from, it saves a host application source file with a name defined by the 'cpp' command line argument, and either one (typical) or several files with kernels. The file <BLAS_SRC_ROOT>/src/library/tools/ktest/naive/naive_blas.cpp should be copied to the same directory as the files generated above. This file contains a naive blas implementation to check accuracy and is referenced by the host application source. BLAS_SRC_ROOT is the root directory of the library's source code.

Adding support for new functions

To add ability for generating test cases for other functions, a class derived from the class amd::Step should implemented.

Clone this wiki locally