Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Subclass arrow::Buffer to keep MATLAB data backing arrow::Arrays alive #36614

Closed
sgilmore10 opened this issue Jul 11, 2023 · 1 comment · Fixed by #36615
Closed

[MATLAB] Subclass arrow::Buffer to keep MATLAB data backing arrow::Arrays alive #36614

sgilmore10 opened this issue Jul 11, 2023 · 1 comment · Fixed by #36615

Comments

@sgilmore10
Copy link
Member

sgilmore10 commented Jul 11, 2023

Describe the enhancement requested

When building arrow.array.<Numeric>Arrays from native MATLAB arrays, we avoid copying by

  1. Wrapping the MATLAB data inside a non-owning arrow::Buffer
  2. Constructing an arrow::ArrayData object from the arrow::Buffer
  3. Constructing the arrow::Array from the arrow::Data object.

Because the array's buffer does not own its backing data, we have been storing the original MATLAB array as a property called MatlabArray on the MATLAB arrow.array.<Numeric>Arrays classes.

This solution is not ideal because the backing MATLAB array is kept separate from the actual std::shared_ptr<arrow::Array>, which is stored within the C++ proxy objects (e.g. arrow::matlab::array::proxy::NumericArray<float64>). A better solution would be to create a new subclass of arrow::Buffer called MatlabBuffer. MatlabBuffer will keep the original MATLAB array in memory.

Component(s)

MATLAB

@sgilmore10
Copy link
Member Author

take

@sgilmore10 sgilmore10 changed the title [MATLAB] Store backing MATLAB array within C++ proxy objects instead of MATLAB classes [MATLAB] Subclass arrow::Buffer to keep MATLAB data backing arrow::Arrays alive Jul 12, 2023
kevingurney pushed a commit that referenced this issue Jul 12, 2023
… arrow::Arrays alive (#36615)

### Rationale for this change

When building `arrow.array.<Numeric>Arrays` from native MATLAB arrays, we avoid copying by

1. Wrapping the MATLAB data inside a non-owning `arrow::Buffer`
2. Constructing an `arrow::ArrayData` object from the `arrow::Buffer`
3. Constructing the `arrow::Array` from the `arrow::Data` object.

Because the `Array`'s underlying `Buffer` does not have ownership of its backing data, we have been storing the original MATLAB array as a property called `MatlabArray` on the MATLAB `arrow.array.NumericArray` class.

This solution is not ideal because the backing MATLAB array is kept separate from the actual `std::shared_ptr<arrow::Array>`, which is stored within the C++ proxy objects (e.g. `arrow::matlab::array::proxy::NumericArray<float64>`).  A better solution would be to create a new subclass of `arrow::Buffer` called `MatlabBuffer`, which will keep the original MATLAB array alive by taking it in as an input parameter and storing it as a member variable. 

### What changes are included in this PR?

1. Removed the `MatlabArray` property from the MATLAB class `matlab.io.NumericArray` 
2. Added a private member variable called `mda_array` to `arrow::matlab::array::proxy::NumericArray<CType>` and `arrow::matlab::array::proxy::TimestampArray`. `mda_array` is a `matlab::data::Array` object. This member variable stores the MATLAB array that owns the memory the `arrow::Array` objects are backed by.

### Are these changes tested?

Existing tests used.

### Are there any user-facing changes?
No.

* Closes: #36614

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
@kevingurney kevingurney added this to the 14.0.0 milestone Jul 12, 2023
chelseajonesr pushed a commit to chelseajonesr/arrow that referenced this issue Jul 20, 2023
…acking arrow::Arrays alive (apache#36615)

### Rationale for this change

When building `arrow.array.<Numeric>Arrays` from native MATLAB arrays, we avoid copying by

1. Wrapping the MATLAB data inside a non-owning `arrow::Buffer`
2. Constructing an `arrow::ArrayData` object from the `arrow::Buffer`
3. Constructing the `arrow::Array` from the `arrow::Data` object.

Because the `Array`'s underlying `Buffer` does not have ownership of its backing data, we have been storing the original MATLAB array as a property called `MatlabArray` on the MATLAB `arrow.array.NumericArray` class.

This solution is not ideal because the backing MATLAB array is kept separate from the actual `std::shared_ptr<arrow::Array>`, which is stored within the C++ proxy objects (e.g. `arrow::matlab::array::proxy::NumericArray<float64>`).  A better solution would be to create a new subclass of `arrow::Buffer` called `MatlabBuffer`, which will keep the original MATLAB array alive by taking it in as an input parameter and storing it as a member variable. 

### What changes are included in this PR?

1. Removed the `MatlabArray` property from the MATLAB class `matlab.io.NumericArray` 
2. Added a private member variable called `mda_array` to `arrow::matlab::array::proxy::NumericArray<CType>` and `arrow::matlab::array::proxy::TimestampArray`. `mda_array` is a `matlab::data::Array` object. This member variable stores the MATLAB array that owns the memory the `arrow::Array` objects are backed by.

### Are these changes tested?

Existing tests used.

### Are there any user-facing changes?
No.

* Closes: apache#36614

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this issue Aug 20, 2023
…acking arrow::Arrays alive (apache#36615)

### Rationale for this change

When building `arrow.array.<Numeric>Arrays` from native MATLAB arrays, we avoid copying by

1. Wrapping the MATLAB data inside a non-owning `arrow::Buffer`
2. Constructing an `arrow::ArrayData` object from the `arrow::Buffer`
3. Constructing the `arrow::Array` from the `arrow::Data` object.

Because the `Array`'s underlying `Buffer` does not have ownership of its backing data, we have been storing the original MATLAB array as a property called `MatlabArray` on the MATLAB `arrow.array.NumericArray` class.

This solution is not ideal because the backing MATLAB array is kept separate from the actual `std::shared_ptr<arrow::Array>`, which is stored within the C++ proxy objects (e.g. `arrow::matlab::array::proxy::NumericArray<float64>`).  A better solution would be to create a new subclass of `arrow::Buffer` called `MatlabBuffer`, which will keep the original MATLAB array alive by taking it in as an input parameter and storing it as a member variable. 

### What changes are included in this PR?

1. Removed the `MatlabArray` property from the MATLAB class `matlab.io.NumericArray` 
2. Added a private member variable called `mda_array` to `arrow::matlab::array::proxy::NumericArray<CType>` and `arrow::matlab::array::proxy::TimestampArray`. `mda_array` is a `matlab::data::Array` object. This member variable stores the MATLAB array that owns the memory the `arrow::Array` objects are backed by.

### Are these changes tested?

Existing tests used.

### Are there any user-facing changes?
No.

* Closes: apache#36614

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants