Change TensorShape to typically not allocate heap memory #9542

RyanUnderhill · 2021-10-26T03:00:38Z

Description: Switch TensorShape from being a std::vector to having a small buffer optimization so that the common cases do not allocate heap memory. Larger shapes will still allocate memory as before

The bulk of the changes are fixing up all code that treats a TensorShape as a std::vector. It's now switched to using a gsl::span as that handles both cases of using the fixed buffer or the dynamically allocated one.

Motivation and Context
The allocations from TensorShape are bad for performance as memory allocations are a huge bottleneck. At least one other team has requested we optimize this to reduce the number of memory allocations.

…o ryanunderhill/tensor_shape_fix

pranavsharma

I was looking for tensor_shape_test.cc but couldn't find one. Do you plan to add one?

include/onnxruntime/core/framework/tensor_shape.h

onnxruntime/core/framework/tensor_shape.cc

Use gsl::span internally in Tensorshape

pranavsharma · 2021-10-27T08:04:55Z

This change is almost guaranteed to have a positive performance impact. Can we measure it on some well known models to quantify the impact? Thanks!

include/onnxruntime/core/framework/tensor_shape.h

onnxruntime/gsl/gsl-lite.hpp

eindenbom

TensorShape::FromExistingBuffer seems to be dangerous.
There is still a massive use of std::vector<> in many cases (may be it is better to resolve in follow up PR).
sizeof(TensorShape) seems to be excessively large.

include/onnxruntime/core/framework/tensor_shape.h

onnxruntime/contrib_ops/cpu/bert/attention_cpu_base.h

eindenbom · 2021-10-28T08:15:52Z

onnxruntime/contrib_ops/cpu/expand_dims.h

@@ -30,7 +30,7 @@ class ExpandDims final : public OpKernel {
    if (X == nullptr) return Status(common::ONNXRUNTIME, common::FAIL, "input count mismatch");
    const TensorShape& X_shape = X->Shape();

-    std::vector<int64_t> expanded_shape(X_shape.GetDims());
+    std::vector<int64_t> expanded_shape(X_shape.GetDimsAsVector());


Allocation and copy into heap buffer, the thing this PR is trying to avoid.

Yeah, to fix further things we'll need to have a small block optimized vector that allows modifying, as a few lines later this code exists:

expanded_shape.insert(expanded_shape.begin() + axis, 1);

We'd need to replace all instances of code like this to use a replacement for std::vector that doesn't allocate for small sizes. This would be easy to incrementally do in a later change (by just searching for all of the GetDimsAsVector()s)

Other small improvements

…o ryanunderhill/tensor_shape_fix

Fix merge breaks

…s for now

…o ryanunderhill/tensor_shape_fix

yuslepukhin · 2021-11-04T20:58:35Z

include/onnxruntime/core/framework/tensor_shape.h

+
+  gsl::span<int64_t> values_;
+  int64_t small_buffer_[5];
+  std::unique_ptr<int64_t[]> allocated_buffer_;


I feel that we could use std::pmr::vector with pool allocator sitting on top of the buffer with upstream new_delete. That way we perfectly simulate small value optimization with first bytes being taking from the inline buffer and bigger buffers from heap. This way we would get rid of unique_ptr and all the logic associated with it.

Furthermore, we could abstract all the logic of manipulating mutable shapes inside TensorShape based on pmr and enjoy correctness and the absence of heap.

We talked about it outside of github, here's a summary:

std::pmr::vector would be useful in all of the code that currently uses TensorShape::GetDimsAsVector to take advantage of a small size optimized vector. As TensorShape isn't a vector (no runtime adding/removing of elements) it's not needed there (and would take up more space).

Sadly there's no way to remove the unique_ptr member as without it there is no way to know if the memory is the small block, allocated memory, or external memory from TensorShape::FromExistingBuffer.

…o ryanunderhill/tensor_shape_fix

pranavsharma

LGTM 👍

pranavsharma · 2021-11-05T20:05:19Z

include/onnxruntime/core/framework/tensor_shape.h

-
-  TensorShape(const std::vector<int64_t>& dims, size_t start, size_t end);
+  // Create a TensorShape that points to an existing buffer internally. As no copy is made, 'data' must remain valid for the life of the TensorShape
+  static const TensorShape FromExistingBuffer(const std::vector<int64_t>& data) { return TensorShape(External{}, gsl::span<int64_t>(const_cast<int64_t*>(data.data()), data.size())); }


nit: can use a new line for the function body

Changed, hopefully it'll pass the build tests now.. hitting random breaks unrelated to my change.

RyanUnderhill added 7 commits October 25, 2021 19:11

TensorShape no longer uses std::vector

59f3045

Merge branch 'master' of https://github.com/Microsoft/onnxruntime int…

c9f916a

…o ryanunderhill/tensor_shape_fix

Clang Format

36d9f6a

Remove comment

48ab110

Use std::size instead of _countof

a9f2330

Use std::size vs _countof

9717adb

Fix some build breaks

06095c3

pranavsharma reviewed Oct 26, 2021

View reviewed changes

include/onnxruntime/core/framework/tensor_shape.h Outdated Show resolved Hide resolved

onnxruntime/core/framework/tensor_shape.cc Outdated Show resolved Hide resolved

onnxruntime/core/framework/tensor_shape.cc Show resolved Hide resolved

edgchen1 reviewed Oct 26, 2021

View reviewed changes

onnxruntime/core/framework/tensor_shape.cc Outdated Show resolved Hide resolved

RyanUnderhill added 6 commits October 26, 2021 12:52

Switch over Training Code

f7e6717

Add unit test

554925b

Use gsl::span internally in Tensorshape

Fix breaks

7dfff97

Build Breaks

b1352a5

Build Fixes

1b258ee

Build Fixes

e7cc62d

RyanUnderhill added 3 commits October 27, 2021 01:06

Build Fixes

8cace97

Build Fixes

6828641

Build Fixes

fe303ea

skottmckay reviewed Oct 27, 2021

View reviewed changes

include/onnxruntime/core/framework/tensor_shape.h Show resolved Hide resolved

skottmckay reviewed Oct 27, 2021

View reviewed changes

include/onnxruntime/core/framework/tensor_shape.h Show resolved Hide resolved

skottmckay reviewed Oct 27, 2021

View reviewed changes

onnxruntime/gsl/gsl-lite.hpp Show resolved Hide resolved

Bring back code for existing buffers, plus code review feedback.

766d5f6

eindenbom reviewed Oct 28, 2021

View reviewed changes

RyanUnderhill added 6 commits October 28, 2021 17:27

Code review feedback

dfdadd6

Other small improvements

Merge branch 'master' of https://github.com/Microsoft/onnxruntime int…

453b725

…o ryanunderhill/tensor_shape_fix

Merge with master

c411d89

Fix merge breaks

Revert back gsl::span change to api.h, have to keep allocating vector…

93ea6f9

…s for now

Revert parts of transpose_optimizer.cc

0c7a774

Fix rocm

8ceef80

Fix rocm

c16631d

RyanUnderhill requested a review from skottmckay November 4, 2021 18:38

Merge branch 'master' of https://github.com/Microsoft/onnxruntime int…

58aaec8

…o ryanunderhill/tensor_shape_fix

yuslepukhin reviewed Nov 4, 2021

View reviewed changes

RyanUnderhill added 3 commits November 4, 2021 18:15

Fix ROCM break due to master merge

622de71

Merge branch 'master' of https://github.com/Microsoft/onnxruntime int…

19d6b07

…o ryanunderhill/tensor_shape_fix

Bump minimal build size up

d22e227

pranavsharma reviewed Nov 5, 2021

View reviewed changes

RyanUnderhill added 2 commits November 5, 2021 15:05

Code review feedback

7d4efe6

Merge with master

3524689

pranavsharma previously approved these changes Nov 5, 2021

View reviewed changes

Increase minimal build size by 1608 bytes

d7e0e64

RyanUnderhill dismissed pranavsharma’s stale review via d7e0e64 November 5, 2021 23:20

RyanUnderhill requested a review from pranavsharma November 7, 2021 02:35

pranavsharma approved these changes Nov 8, 2021

View reviewed changes

RyanUnderhill merged commit 24e35fb into master Nov 8, 2021

RyanUnderhill deleted the ryanunderhill/tensor_shape_fix branch November 8, 2021 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change TensorShape to typically not allocate heap memory #9542

Change TensorShape to typically not allocate heap memory #9542

RyanUnderhill commented Oct 26, 2021

pranavsharma left a comment

pranavsharma commented Oct 27, 2021

eindenbom left a comment

eindenbom Oct 28, 2021

RyanUnderhill Oct 28, 2021

yuslepukhin Nov 4, 2021

yuslepukhin Nov 4, 2021

RyanUnderhill Nov 4, 2021

pranavsharma left a comment

pranavsharma Nov 5, 2021

RyanUnderhill Nov 5, 2021

Change TensorShape to typically not allocate heap memory #9542

Change TensorShape to typically not allocate heap memory #9542

Conversation

RyanUnderhill commented Oct 26, 2021

pranavsharma left a comment

Choose a reason for hiding this comment

pranavsharma commented Oct 27, 2021

eindenbom left a comment

Choose a reason for hiding this comment

eindenbom Oct 28, 2021

Choose a reason for hiding this comment

RyanUnderhill Oct 28, 2021

Choose a reason for hiding this comment

yuslepukhin Nov 4, 2021

Choose a reason for hiding this comment

yuslepukhin Nov 4, 2021

Choose a reason for hiding this comment

RyanUnderhill Nov 4, 2021

Choose a reason for hiding this comment

pranavsharma left a comment

Choose a reason for hiding this comment

pranavsharma Nov 5, 2021

Choose a reason for hiding this comment

RyanUnderhill Nov 5, 2021

Choose a reason for hiding this comment