Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Texture support][Part 0] Device API and runtime support #7711

Merged
merged 27 commits into from
Jun 5, 2021
Merged

[Texture support][Part 0] Device API and runtime support #7711

merged 27 commits into from
Jun 5, 2021

Conversation

csullivan
Copy link
Contributor

@csullivan csullivan commented Mar 19, 2021

This PR introduces 2d texture memory support to the OpenCL Device API runtime.

Device runtime

  • The device runtime supports allocating texture memory both as a temporal workspace and as a runtime data space. In the latter case, special invocation of AllocDataSpace with a memory_scope == "texture(:weight)" is required. Special memory scopes were added to the runtime in [Runtime] Special Memory Scope Support #7488.
  • Workspace allocations are handled via a set of idle texture pools which are grown to match the requested sizes. The strategy employed is to first pick the pool which requires the least amount of extra space beyond, and then to minimize the amount of wasted space that growing a two dimensional pool may incur. A similar approach is taken for the ahead of time graph runtime memory planner for data space allocations (see: [Texture support][Part 5] Graph runtime and memory planning support for 2d allocations  #7690).
  • CopyFromTo support is expanded to handle the case of directly reading from / writing to image buffers from host.

See RFC here: https://discuss.tvm.apache.org/t/rfc-texture-memory-support/9467

include/tvm/runtime/device_api.h Outdated Show resolved Hide resolved
src/runtime/c_runtime_api.cc Outdated Show resolved Hide resolved
src/runtime/texture.h Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented Mar 20, 2021

Thanks @csullivan some quick comments, will read more carefully in the incoming week.

@csullivan csullivan marked this pull request as ready for review May 4, 2021 18:59
@csullivan
Copy link
Contributor Author

@tqchen, @ZihengJiang, would you kindly consider reviewing once more?

The main change is to remove texture specific device apis and rely on tir.tvm_call_packed (cf #7932) for texture workspace allocations and AllocDataSpace (w/ scope).

I also introduced an opencl buffer descriptor that tracks the allocation layout. With the layout and the DLTensor CopyDataFromTo overload I've verified that a sub-texture allocation can be correctly copied out of a 2d texture pool of larger extent. This solves an issue I raised in Part 4.

I appreciate any additional feedback you have.

@ZihengJiang
Copy link
Contributor

LGTM. @tqchen, could you check whether the pr look good to you?

tqchen
tqchen previously requested changes May 11, 2021
src/runtime/opencl/opencl_common.h Outdated Show resolved Hide resolved
src/runtime/opencl/opencl_common.h Outdated Show resolved Hide resolved
src/runtime/opencl/opencl_device_api.cc Outdated Show resolved Hide resolved
src/runtime/opencl/opencl_device_api.cc Outdated Show resolved Hide resolved
src/runtime/texture_pool.cc Outdated Show resolved Hide resolved
case cl::BufferDescriptor::MemoryLayout::IMAGE_2D_ACTIVATION:
case cl::BufferDescriptor::MemoryLayout::IMAGE_2D_WEIGHT:
auto image_info = GetImageInfo(from_desc, from);
// TODO(csullivan): Support calculating row_pitch correctly in the case of reuse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to add a few testcases in python that demonstrates the copy into image where image size is bigger than the normal one. Perhaps the easiest way is to construct an NDArray then write a PackedFunc that takes a smaller view from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the good suggestion, I added a test to demonstrate writing to a subview of a texture and check that the data from larger allocation contains the expected two dimensional strides when copied back to host.

@tqchen tqchen added the status: need update need update based on feedbacks label May 14, 2021
csullivan and others added 20 commits May 18, 2021 16:18
This should be replaced with AOT memory planning
when the relay/tir/compile engine refactor lands.
allocations and copying to/from host/image2d directly.
Allocation employs a lowering convention to 2d images
for activations and weights.
utilities that can be shared by codegen and the runtime.
git@github.com:ZihengJiang/tvm.git:52822c5bd
[RUNTIME] OpenCL texture memory.
memory layout through OpenCL Device API.
overload and tensor shapes to calculate image extent
when copying date directly to or from texture cache.
@csullivan
Copy link
Contributor Author

@tqchen Thanks for the great feedback, could you take a look again?

@csullivan csullivan requested a review from tqchen May 29, 2021 23:36
@tqchen tqchen dismissed their stale review June 4, 2021 12:52

feedback addressed

@tqchen
Copy link
Member

tqchen commented Jun 4, 2021

Thanks @csullivan . will let @ZihengJiang manage the PR

@ZihengJiang ZihengJiang merged commit 010d11b into apache:main Jun 5, 2021
@ZihengJiang
Copy link
Contributor

ZihengJiang commented Jun 5, 2021

Merged now. Thanks @csullivan for the hard working.

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 17, 2021
* Add TVMBackendAllocTexture and support in OpenCL device API.

* Add runtime optimized caching allocator.
This should be replaced with AOT memory planning
when the relay/tir/compile engine refactor lands.

* Few bug fixes for runtime texture allocator.

* Add OpenCL device api support for image2d<float16> textures.

* Update OpenCL DeviceAPI to support Image2D data space
allocations and copying to/from host/image2d directly.
Allocation employs a lowering convention to 2d images
for activations and weights.

* Fix to follow OpenCL spec. for indexing.

* Rename texture_pool.h -> texture.h

* Move Nd to 2d lowering convention code into runtime texture
utilities that can be shared by codegen and the runtime.

* Update texture lowering utilities

* Add TODO comment about pitch support

* Remove FreeTexture

* Fix ICHECK comment

* Partial cherry pick from @ZihengJiang
git@github.com:ZihengJiang/tvm.git:52822c5bd
[RUNTIME] OpenCL texture memory.

* Remove runtime and device texture APIs.

* Add OpenCL packed functions for texture workspace (de)allocations.

* Add OpenCLBuffer structure to track
memory layout through OpenCL Device API.

* Rebase: TVMContext -> Device

* Implement DLTensor* overload of CopyDataToFrom in OpenCL DeviceAPI.

* Implement OpenCL CopyDataFromTo(DLTensor*...)
overload and tensor shapes to calculate image extent
when copying date directly to or from texture cache.

* Update format (cpp-lint)

* Update format (clang)

* Buffer descriptor name change and formatting.

* Add texture pool documentation.

* Update runtime to use new global.texture scope.

* Move texture_pool.cc into opencl impl.

* Add test coverage for copying in and out
of storage allocs of texture scope.

* Documented APIs and structures, renamed buffer descriptor layout tags.

Co-authored-by: ZihengJiang <ziheng@apache.org>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 17, 2021
* Add TVMBackendAllocTexture and support in OpenCL device API.

* Add runtime optimized caching allocator.
This should be replaced with AOT memory planning
when the relay/tir/compile engine refactor lands.

* Few bug fixes for runtime texture allocator.

* Add OpenCL device api support for image2d<float16> textures.

* Update OpenCL DeviceAPI to support Image2D data space
allocations and copying to/from host/image2d directly.
Allocation employs a lowering convention to 2d images
for activations and weights.

* Fix to follow OpenCL spec. for indexing.

* Rename texture_pool.h -> texture.h

* Move Nd to 2d lowering convention code into runtime texture
utilities that can be shared by codegen and the runtime.

* Update texture lowering utilities

* Add TODO comment about pitch support

* Remove FreeTexture

* Fix ICHECK comment

* Partial cherry pick from @ZihengJiang
git@github.com:ZihengJiang/tvm.git:52822c5bd
[RUNTIME] OpenCL texture memory.

* Remove runtime and device texture APIs.

* Add OpenCL packed functions for texture workspace (de)allocations.

* Add OpenCLBuffer structure to track
memory layout through OpenCL Device API.

* Rebase: TVMContext -> Device

* Implement DLTensor* overload of CopyDataToFrom in OpenCL DeviceAPI.

* Implement OpenCL CopyDataFromTo(DLTensor*...)
overload and tensor shapes to calculate image extent
when copying date directly to or from texture cache.

* Update format (cpp-lint)

* Update format (clang)

* Buffer descriptor name change and formatting.

* Add texture pool documentation.

* Update runtime to use new global.texture scope.

* Move texture_pool.cc into opencl impl.

* Add test coverage for copying in and out
of storage allocs of texture scope.

* Documented APIs and structures, renamed buffer descriptor layout tags.

Co-authored-by: ZihengJiang <ziheng@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need update need update based on feedbacks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants