-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor and class split #4432
Refactor and class split #4432
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4432
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit ba7f3ab with merge base b7c8378 (): NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D60290882 |
This pull request was exported from Phabricator. Differential Revision: D60290882 |
220cccd
to
5c211eb
Compare
Summary: Pull Request resolved: pytorch#4432 Big classes are scary☹️ This diff subdivides the tests into categories, places them as functions inside the gpuinfo namespace, instead of as part of the App class, and the App class is now only for persisting device information and configuration. Differential Revision: D60290882
This pull request was exported from Phabricator. Differential Revision: D60290882 |
Summary: Pull Request resolved: pytorch#4432 Big classes are scary☹️ This diff subdivides the tests into categories, places them as functions inside the gpuinfo namespace, instead of as part of the App class, and the App class is now only for persisting device information and configuration. Differential Revision: D60290882
5c211eb
to
61ae401
Compare
This pull request was exported from Phabricator. Differential Revision: D60290882 |
61ae401
to
236412c
Compare
Summary: Pull Request resolved: pytorch#4432 Big classes are scary☹️ This diff subdivides the tests into categories, places them as functions inside the gpuinfo namespace, instead of as part of the App class, and the App class is now only for persisting device information and configuration. Differential Revision: D60290882
Summary: Pull Request resolved: pytorch#4432 Big classes are scary☹️ This diff subdivides the tests into categories, places them as functions inside the gpuinfo namespace, instead of as part of the App class, and the App class is now only for persisting device information and configuration. Differential Revision: https://internalfb.com/D60290882
This pull request was exported from Phabricator. Differential Revision: D60290882 |
Summary: Pull Request resolved: pytorch#4432 Big classes are scary☹️ This diff subdivides the tests into categories, places them as functions inside the gpuinfo namespace, instead of as part of the App class, and the App class is now only for persisting device information and configuration. Reviewed By: jorgep31415 Differential Revision: D60290882
236412c
to
755aab9
Compare
Summary: Pull Request resolved: pytorch#4432 Big classes are scary☹️ This diff subdivides the tests into categories, places them as functions inside the gpuinfo namespace, instead of as part of the App class, and the App class is now only for persisting device information and configuration. Differential Revision: https://internalfb.com/D60290882
Summary: Pull Request resolved: pytorch#4336 This diff introduces a profiler that obtains the maximum and minimum bandwidth for reading unique addresses from 3D textures in each of its dimensions, using the following shader, where A is a 3D texture and B is a writeonly buffer. The calculation of the texel position will depend on the dimension that is being benchmarked x : pos = ivec3(offset, 0, 0) y : pos = ivec3(0, offset, 0) z : pos = ivec3(0, 0, offset) void main() { vec4 sum = vec4(0); const uint workgroup_width = local_group_size * niter * ${NUNROLL}; uint offset = (gl_WorkGroupID[0] * workgroup_width + gl_LocalInvocationID[0]) & addr_mask; int i = 0; for (; i < niter; ++i) { sum *= texelFetch(A, pos, 0); offset = (offset + local_group_size) & addr_mask; ... ... sum *= texelFetch(A, pos, 0); offset = (offset + local_group_size) & addr_mask; } vec4 zero = vec4(i>>31); B[gl_LocalInvocationID[0]] = sum + zero; } The address mask allows us to control how many unique addresses we are accessing. If the number of unique vectors we want to read is 3, the offset will jump between three unique addresses throughout the iterations, giving us the bandwidth for that specific size of data. If the size of the unique data read is larger than the work group size, then each run will have its own block of data to read, defined by the initial offset calculation, where the offset is obtained through the workgroup ID and the local invocation ID. Finally, we make sure to use the `sum` and `i ` variables so that the compiler's optimizer does not flatten the loops. For a Samsung S22, the bandwidth behaves like this for each of the dimensions. {F1767497386} Comparing the bandwidth for the X dimension to OpenCL, which was obtained through [ArchProbe](https://github.com/microsoft/ArchProbe), we can observe that, although the behavior is the same, Vulkan has an increased bandwidth for most access sizes. {F1767497972} Comparing to the bandwidth for buffers, we can observe that the bandwidth is similar to regular buffers, but still much smaller than UBOs at small access sizes. {F1767497707} Reviewed By: jorgep31415 Differential Revision: D59980139
Summary: Pull Request resolved: pytorch#4337 Now that the tool is getting larger, a configuration file for defining which tests to run and which to skip, as well as specifying some values like thresholds and ranges, comes in handy. This diff adds support for a JSON config file with specifications for each test. Reviewed By: jorgep31415 Differential Revision: D60060188
Summary: Pull Request resolved: pytorch#4421 This diff introduces a metric to calculate the maximum concurrent cache line accesses for each dimension of a 3D texture. The experiment works by allowing each thread to access a different texel on the texture and slowly increasing the number of threads, until the cache line is no longer able to handle all simultaneous accesses. By detecting a jump in latency, we can define the optimal maximum size that can be accessed concurrently on each dimension. NOTE: ArchProbe uses this information to[ obtain a supposed cache line size for textures](https://fburl.com/98xiou3g). However, it is unclear why they define the cache line size as being the ratio between the larger concurrency value over the lower, times the texel size. It is also unclear how to extend their calculations to three dimensions. TODO: Understand the relationship between concurrency and cache line size, and modify this metric to output the cache line size. For a Samsung S22, the latency graph looks like this: {F1780375117} Reviewed By: copyrightly Differential Revision: D60246121
Summary: Pull Request resolved: pytorch#4432 Big classes are scary☹️ This diff subdivides the tests into categories, places them as functions inside the gpuinfo namespace, instead of as part of the App class, and the App class is now only for persisting device information and configuration. Reviewed By: jorgep31415 Differential Revision: D60290882
755aab9
to
ba7f3ab
Compare
This pull request was exported from Phabricator. Differential Revision: D60290882 |
Summary: Pull Request resolved: pytorch#4432 Big classes are scary☹️ This diff subdivides the tests into categories, places them as functions inside the gpuinfo namespace, instead of as part of the App class, and the App class is now only for persisting device information and configuration. Reviewed By: jorgep31415 Differential Revision: D60290882
This pull request has been merged in e03181d. |
Summary:☹️
Big classes are scary
This diff subdivides the tests into categories, places them as functions inside the gpuinfo namespace, instead of as part of the App class, and the App class is now only for persisting device information and configuration.
Differential Revision: D60290882