[compiler] Do not mix kernels with different sub-group sizes. #649

hvdijk · 2025-01-18T01:30:38Z

Overview

[compiler] Do not mix kernels with different sub-group sizes.

Reason for change

Per OpenCL 3.0 API 3.2.1 Mapping Work-items Onto an Nd-range, all sub-groups within a work-group will be the same size, apart from the sub-group with the maximum index which may be smaller if the size of the work-group is not evenly divisible by the size of the sub-groups. We were not meeting this requirement: in cases where we would not or could not generate a predicated vectorized kernel, we would execute the scalar kernel in a loop for any remaining work items, possibly resulting in multiple sub-groups that are smaller than the maximum sub-group size.

Description of change

To avoid this situation, we need to avoid mixing vector and scalar kernels if those kernels use different sub-group sizes. If we can handle all items with vector kernels, possibly with predication, continue to do so. If the vector and scalar kernels do not depend on the sub-group size, also continue to handle this as before. If the vector and scalar kernels do depend on the sub-group size, and the vector kernel cannot handle all work items, we need to switch to the scalar kernel for all work items.

Anything else we should know?

This includes a small optimization where if we know the kernel does not use sub-group information, we avoid setting sub-group IDs.

This includes one change to createLoop which permits nullptr PHIs. They will be skipped over, and are useful since PHIs must be referred to by index in the callback function. This allows indices to be constant even when the caller has multiple optional PHIs.

This also includes one bugfix to ControlFlowConversionPass to fix a crash seen now, where we use the result of createMasked{Load,Store} before checking whether it succeeded.

This also includes one improvement to CompileKernelToBin.cmake. If the executed command fails, it will now be printed in a format that can be copied and pasted.

Checklist

Read and follow the project Code of Conduct.
Make sure the project builds successfully with your changes.
Run relevant testing locally to avoid regressions.
Run clang-format-19 on all modified code.

Per OpenCL 3.0 API 3.2.1 Mapping Work-items Onto an Nd-range, all sub-groups within a work-group will be the same size, apart from the sub-group with the maximum index which may be smaller if the size of the work-group is not evenly divisible by the size of the sub-groups. We were not meeting this requirement: in cases where we would not or could not generate a predicated vectorized kernel, we would execute the scalar kernel in a loop for any remaining work items, possibly resulting in multiple sub-groups that are smaller than the maximum sub-group size. To avoid this situation, we need to avoid mixing vector and scalar kernels if those kernels use different sub-group sizes. If we can handle all items with vector kernels, possibly with predication, continue to do so. If the vector and scalar kernels do not depend on the sub-group size, also continue to handle this as before. If the vector and scalar kernels do depend on the sub-group size, and the vector kernel cannot handle all work items, we need to switch to the scalar kernel for all work items. This includes a small optimization where if we know the kernel does not use sub-group information, we avoid setting sub-group IDs. This includes one change to createLoop which permits nullptr PHIs. They will be skipped over, and are useful since PHIs must be referred to by index in the callback function. This allows indices to be constant even when the caller has multiple optional PHIs. This also includes one bugfix to ControlFlowConversionPass to fix a crash seen now, where we use the result of createMasked{Load,Store} before checking whether it succeeded. This also includes one improvement to CompileKernelToBin.cmake. If the executed command fails, it will now be printed in a format that can be copied and pasted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compiler] Do not mix kernels with different sub-group sizes. #649

[compiler] Do not mix kernels with different sub-group sizes. #649

hvdijk commented Jan 18, 2025

[compiler] Do not mix kernels with different sub-group sizes. #649

Are you sure you want to change the base?

[compiler] Do not mix kernels with different sub-group sizes. #649

Conversation

hvdijk commented Jan 18, 2025

Overview

Reason for change

Description of change

Anything else we should know?

Checklist