Skip to content

Upgrade v0.6.X to v0.7.X

Marcel Koester edited this page Feb 18, 2021 · 1 revision

General Notes

A new OpenCL backend has been added that supports OpenCL C 2.0 (or higher) compatible GPUs. The OpenCL backend does not require an OpenCL SDK to be installed/configured. There is the possibility to query all supported OpenCL accelerators via CLAccelerator.CLAccelerators. Since NVIDIA GPUs typically does not support OpenCL C 2.0 (or higher), they are usually not contained in this list. However, if you still want to access those devices via the OpenCL API you can query CLAccelerators.AllCLAccelerators. Note that the global list of all accelerators Accelerator.Accelerators will contain supported accelerators only. It is highly recommended to use the CudaAccelerator class for NVIDIA GPUs and the CLAccelerator class for Intel and AMD GPUs. Furthermore, it is not necessary to worry about native library dependencies regarding OpenCL (except, of course, for the actual GPU drivers).

The XMath class has been removed as it contained many software implementations for different platforms that are not related to the actual ILGPU compiler. However, there are several math functions that are supported on all platforms which are still exposed via the new IntrinsicMath class. There is also a class IntrinsicMath.CPU which contains implementations for all math functions for the CPUAccelerator. Please note that these functions are not supported on other accelerators except the CPUAccelerator. If you want to use the full range of math functions refer to the XMath class of the ILGPU.Algorithms library.

The new version of the ILGPU.Algorithms library offers support for a set of commonly used algorithms (like Scan or Reduce). Moreover, it offers GroupExtensions and WarpExtensions to support group/warp-wide reductions or prefix sums within kernels.

New Algorithms Library

The new ILGPU.Algorithms library comes in a separate nuget package. In order to use any of the exposed group/warp/math extensions you have to enable the library. This setups all internal ILGPU hooks and custom code-generators to emit code that realizes the extensions in the right places. This is achieved by using the new extension and intrinsic API.

using ILGPU.Algorithms;
class ...
{
    static void ...(...)
    {
        using var context = new Context();

        // Enable all algorithms and extension methods
        context.EnableAlgorithms();

        ...
    }
}

Math Functions

As mentioned here, the XMath class has been removed from the actual GPU compiler framework. Leverage the IntrinsicMath class to use math functions that are available on all supported accelerators. If you want to access all math functions use the newly designed XMath class of the ILGPU.Algorithms library.

class ...
{
    static void ...(...)
    {
        // Old way (obsolete and no longer supported)
        float x = ILGPU.XMath.Sin(...);

        // New way
        // 1) Don't forget to enable algorithm support ;)
        context.EnableAlgorithms();
        
        // 2) Use the new XMath class
        float x = ILGPU.Algorithms.XMath.Sin(...);
    }
}

Warp & Group Intrinsics

Previous versions of ILGPU had several warp-shuffle overloads to expose the native warp and group intrinsics to the programmer. However, these functions are typically available for int and float data types. More complex or larger types required programming of custom IShuffleOperation interfaces that had to be passed to the shuffle functions. This inconvenient way of programming is no longer required. The new warp and group intrinsics support generic data structures. ILGPU will automatically generate the required code for every target platform and use case.

The intrinsics Group.Broadcast and Warp.Broadcast have been added. In contrast to Warp.Shuffle, the Warp.Broadcast intrinsic requires that all participating threads read from the same lane. Warp.Shuffle supports different source lanes in every thread. Group.Broadcast works like Warp.Broadcast, but for all threads in a group.

class ...
{
    static void ...(...)
    {
        ComplexDataType y = ...;
        ComplexDataType x = Warp.Shuffle(y, threadIdx);

        ...

        ComplexDataType y = ...;
        ComplexDataType x = Group.Broadcast(y, groupIdx);
    }
}

Grid & Group Indices

It is no longer required to access grid and group indices via the GroupedIndex(|2|3) index parameter of a kernel. Instead, you can access the static properties Grid.Index(X|Y|Z) and Group.Index(X|Y|Z) from every function in the scope of a kernel. This simplifies programming of helper methods significantly. Furthermore, this also feels natural to experienced Cuda and OpenCL developers.

class ... { static void ...(GroupedIndex index) { // Common ILGPU way (still supported) int gridIdx = index.GridIdx; int groupIdx = index.GroupIdx;

    // New ILGPU way
    int gridIdx = Grid.IndexX;
    int groupIdx = Group.IndexX;
}

}