Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change TensorShape to typically not allocate heap memory #9542

Merged
merged 31 commits into from
Nov 8, 2021

Conversation

RyanUnderhill
Copy link
Member

Description: Switch TensorShape from being a std::vector to having a small buffer optimization so that the common cases do not allocate heap memory. Larger shapes will still allocate memory as before

The bulk of the changes are fixing up all code that treats a TensorShape as a std::vector. It's now switched to using a gsl::span as that handles both cases of using the fixed buffer or the dynamically allocated one.

Motivation and Context
The allocations from TensorShape are bad for performance as memory allocations are a huge bottleneck. At least one other team has requested we optimize this to reduce the number of memory allocations.

Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for tensor_shape_test.cc but couldn't find one. Do you plan to add one?

include/onnxruntime/core/framework/tensor_shape.h Outdated Show resolved Hide resolved
onnxruntime/core/framework/tensor_shape.cc Outdated Show resolved Hide resolved
onnxruntime/core/framework/tensor_shape.cc Show resolved Hide resolved
@pranavsharma
Copy link
Contributor

This change is almost guaranteed to have a positive performance impact. Can we measure it on some well known models to quantify the impact? Thanks!

Copy link

@eindenbom eindenbom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. TensorShape::FromExistingBuffer seems to be dangerous.
  2. There is still a massive use of std::vector<> in many cases (may be it is better to resolve in follow up PR).
  3. sizeof(TensorShape) seems to be excessively large.

include/onnxruntime/core/framework/tensor_shape.h Outdated Show resolved Hide resolved
include/onnxruntime/core/framework/tensor_shape.h Outdated Show resolved Hide resolved
onnxruntime/contrib_ops/cpu/bert/attention_cpu_base.h Outdated Show resolved Hide resolved
@@ -30,7 +30,7 @@ class ExpandDims final : public OpKernel {
if (X == nullptr) return Status(common::ONNXRUNTIME, common::FAIL, "input count mismatch");
const TensorShape& X_shape = X->Shape();

std::vector<int64_t> expanded_shape(X_shape.GetDims());
std::vector<int64_t> expanded_shape(X_shape.GetDimsAsVector());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocation and copy into heap buffer, the thing this PR is trying to avoid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, to fix further things we'll need to have a small block optimized vector that allows modifying, as a few lines later this code exists:

  expanded_shape.insert(expanded_shape.begin() + axis, 1);

We'd need to replace all instances of code like this to use a replacement for std::vector that doesn't allocate for small sizes. This would be easy to incrementally do in a later change (by just searching for all of the GetDimsAsVector()s)


gsl::span<int64_t> values_;
int64_t small_buffer_[5];
std::unique_ptr<int64_t[]> allocated_buffer_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that we could use std::pmr::vector with pool allocator sitting on top of the buffer with upstream new_delete. That way we perfectly simulate small value optimization with first bytes being taking from the inline buffer and bigger buffers from heap. This way we would get rid of unique_ptr and all the logic associated with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, we could abstract all the logic of manipulating mutable shapes inside TensorShape based on pmr and enjoy correctness and the absence of heap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about it outside of github, here's a summary:

std::pmr::vector would be useful in all of the code that currently uses TensorShape::GetDimsAsVector to take advantage of a small size optimized vector. As TensorShape isn't a vector (no runtime adding/removing of elements) it's not needed there (and would take up more space).

Sadly there's no way to remove the unique_ptr member as without it there is no way to know if the memory is the small block, allocated memory, or external memory from TensorShape::FromExistingBuffer.

Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍


TensorShape(const std::vector<int64_t>& dims, size_t start, size_t end);
// Create a TensorShape that points to an existing buffer internally. As no copy is made, 'data' must remain valid for the life of the TensorShape
static const TensorShape FromExistingBuffer(const std::vector<int64_t>& data) { return TensorShape(External{}, gsl::span<int64_t>(const_cast<int64_t*>(data.data()), data.size())); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can use a new line for the function body

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, hopefully it'll pass the build tests now.. hitting random breaks unrelated to my change.

pranavsharma
pranavsharma previously approved these changes Nov 5, 2021
@RyanUnderhill RyanUnderhill merged commit 24e35fb into master Nov 8, 2021
@RyanUnderhill RyanUnderhill deleted the ryanunderhill/tensor_shape_fix branch November 8, 2021 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants