Add 3D Texture Bandwidth metric #4336

Summary: Pull Request resolved: pytorch#4336 This diff introduces a profiler that obtains the maximum and minimum bandwidth for reading unique addresses from 3D textures in each of its dimensions, using the following shader, where A is a 3D texture and B is a writeonly buffer. The calculation of the texel position will depend on the dimension that is being benchmarked x : pos = ivec3(offset, 0, 0) y : pos = ivec3(0, offset, 0) z : pos = ivec3(0, 0, offset) void main() { vec4 sum = vec4(0); const uint workgroup_width = local_group_size * niter * ${NUNROLL}; uint offset = (gl_WorkGroupID[0] * workgroup_width + gl_LocalInvocationID[0]) & addr_mask; int i = 0; for (; i < niter; ++i) { sum *= texelFetch(A, pos, 0); offset = (offset + local_group_size) & addr_mask; ... ... sum *= texelFetch(A, pos, 0); offset = (offset + local_group_size) & addr_mask; } vec4 zero = vec4(i>>31); B[gl_LocalInvocationID[0]] = sum + zero; } The address mask allows us to control how many unique addresses we are accessing. If the number of unique vectors we want to read is 3, the offset will jump between three unique addresses throughout the iterations, giving us the bandwidth for that specific size of data. If the size of the unique data read is larger than the work group size, then each run will have its own block of data to read, defined by the initial offset calculation, where the offset is obtained through the workgroup ID and the local invocation ID. Finally, we make sure to use the `sum` and `i ` variables so that the compiler's optimizer does not flatten the loops. For a Samsung S22, the bandwidth behaves like this for each of the dimensions. {F1767497386} Comparing the bandwidth for the X dimension to OpenCL, which was obtained through [ArchProbe](https://github.com/microsoft/ArchProbe), we can observe that, although the behavior is the same, Vulkan has an increased bandwidth for most access sizes. {F1767497972} Comparing to the bandwidth for buffers, we can observe that the bandwidth is similar to regular buffers, but still much smaller than UBOs at small access sizes. {F1767497707} Reviewed By: jorgep31415 Differential Revision: D59980139

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 3D Texture Bandwidth metric #4336

Add 3D Texture Bandwidth metric #4336

Commits on Jul 30, 2024