Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Switch to safer data_ptr API #654

Closed
Stonepia opened this issue Jul 26, 2024 · 1 comment
Closed

Proposal: Switch to safer data_ptr API #654

Stonepia opened this issue Jul 26, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@Stonepia
Copy link
Contributor

Stonepia commented Jul 26, 2024

🐛 Describe the bug

TL;DR

  1. We should use safer data_ptr accessing API. Newer template APIs have additional checks. If possible, use tensor.mutable_data_ptr<T>() and tensor.const_data_ptr<T>(). Stop using <scalar_t>tensor.data_ptr().
  2. Stop using char*. Either use int * or use the allocator directly for buffer accessing.

1. Use template data_ptr API for tensor

1.1 Existing code usage and its problem

Existing code uses raw pointers like this:

// non-template usage. Avoid using this
kernel_call(
    (scalar_t*)tensor.data_ptr();
);

The above usage doesn't check whether the tensor is initialized or not. When the tensor's storage is not initialized (for example, FakeTensor), this data_ptr is null. Then the kernel may try to write to a non-initialized memory, and causing a page fault.

1.2 template data_ptr

The template data_ptr will do additional checks. Like has_storage() and storage_initialized(). Please refer to the PyTorch TensorImpl.h for detail.

In view of that, change to a safer usage for old usage.

// template usage. Not encouraged.
input_.data_ptr<scalar_t>();

1.3 New API: mutable_data_ptr() and const_data_ptr()

PyTorch has introduced new APIs for tensor.data_ptr(). Their usage is the same with the template data_ptr. Their meaning is just as the name suggests.

// Use this
input_.const_data_ptr<scalar_t>();
input_.mutable_data_ptr<scalar_t>();

2. Avoid using char * data ptr for buffer

The existing code has the following pattern:

For example, the code snippet in torch-xpu-ops/...Reduce.h:

Tensor buffer;
Tensor semaphores;

buffer = at::empty(
        config.global_memory_size(),
        at::TensorOptions().dtype(kChar).device(kXPU));
semaphores = at::empty(
        config.semaphore_size(), at::TensorOptions().dtype(kChar).device(kXPU));
    at::detail::Array<char*, 1> data;

data[0] = (char*)semaphores.data_ptr();
buf_ptr = buffer.defined() ? (void*)buffer.data_ptr() : nullptr,

The template tensor.data_ptr<T> does not support the char type. Thus, we should avoid using this. We could align with PyTorch's writing usage with the following:

2.1 Use int* instead of char*

For example, PyTorch has the following code in Normalization.cuh

at::Tensor semaphores;
semaphores = at::zeros({grid.x}, input.options().dtype(at::kInt));

int* semaphores_ptr = grid.y > 1 ? semaphores.mutable_data_ptr<int>() : nullptr;

However, the above usage is not the best practice, this usage is constructing a tensor and using data in it. We could directly use an allocator. See section 2.2.

2.2 Use the allocator directly.

For example, for the code snippet in Reduce.cuh:

at::DataPtr buffer;
at::DataPtr semaphores;

if (config.should_global_reduce()) {
    auto& allocator = *c10::cuda::CUDACachingAllocator::get();
    buffer = allocator.allocate(config.global_memory_size());
    semaphores = allocator.allocate(config.semaphore_size());
   //...
}

auto reduce_kernel = ReduceOp<scalar_t, ops_t, uint32_t, out_scalar_t, vt0>(
       buffer.get(),
      (int*)semaphores.get(),
);

It directly uses the allocator, rather than constructing the tensor. I believe that this usage could have some performance gain.

@Stonepia
Copy link
Contributor Author

For the existing code, I changed the first part in PR : #655
The second part is not solved yet.

@Stonepia Stonepia added the enhancement New feature or request label Jul 26, 2024
@intel intel locked and limited conversation to collaborators Jul 30, 2024
@Stonepia Stonepia converted this issue into discussion #660 Jul 30, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant