-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33854: [MATLAB] Add basic libmexclass integration code to MATLAB interface #34563
Conversation
CMakeLists.txt. Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
…libmexclass CMakeLists.txt in the libmexclass/cpp directory. Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com> Co-authored-by: Fiona la <fionala7@gmail.com>
Also, update the register proxy macro used to reflect the libmexclass changes for mathworks/libmexclass#20. Co-authored-by: Fiona la <fionala7@gmail.com>
…_matlab. Also, updates to reflect latest libmexclass.
…to arrow.array.proxy.DoubleArray.
…se arrow.array.proxy.DoubleArray.
I believe we've addressed all the feedback on this pull request at this point. Thank you for all the helpful suggestions! The refactored |
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
…Y_LIBRARY_INCLUDE_DIRS. Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com>
It looks like the CI failure for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks!
Benchmark runs are scheduled for baseline = 966a804 and contender = 9009dd7. 9009dd7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…ays (#35479) ### Rationale for this change This pull request is a followup to #34563. To facilitate the implementation of future array types, we would like to first create a C++ template class for numeric arrays. We also want to start adding basic tests for the array functionality in the MATLAB interface. ### What changes are included in this PR? 1. Added a C++ template Class called `NumericArray` templated on `CType`. 2. Re-implemented the `Float64Array` C++ proxy class in terms of the new template class, i.e. `NumericArray<double>`. 3. Added a method called `double()` on the MATLAB Float64Array class to convert the arrow.[Type]Array to a MATLAB `double` array. 4. Added basic tests for round-tripping float64 arrays. 5. Created a base C++ proxy `Array` class that all proxy array classes will inherit from. 6. Renamed `Print()` to `ToString()` and made it return a string instead of printing to the screen. ### Are these changes tested? Yes, we added automated test cases to the test class `tFloat64Array.m`. In addition, we manually qualified these changes on macOS. ### Are there any user-facing changes? Yes, the `Print()` method is no longer public and there is now a method called `double()` on `arrow.array.Float64Array`. Included below is a simple example of using the `double()` method: ```matlab >> arrowArray = arrow.array.Float64Array([1, 2, 3]) arrowArray = [ 1, 2, 3 ] >> matlabArray = double(arrowArray) matlabArray = 1 2 3 >> class(arrowArray) ans = 'arrow.array.Float64Array' >> class(matlabArray) ans = 'double' ``` ### Future Directions 1. Support the rest of the numeric types. 2. Add an abstract MATLAB base class called `arrow.array.Array`. 3. Continue building out the methods (e.g. `length()`) 4. Support `null` values (validity bitmap). 5. Handle converting non-ascii characters from `UTF-8` to `UTF-16`. 6. Handle errors in the C++ layer. * Closes: #35411 Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com> Co-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…TLAB interface (apache#34563) ### Rationale for this change This pull request is a follow up to [this mailing list discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) about integrating [`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the MATLAB Interface to Arrow code base. We've spent the last few months working on building `libmexclass` from scratch in order to ease development of the MATLAB Interface to Arrow. `libmexclass` essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the [Proxy Design Pattern](https://en.wikipedia.org/wiki/Proxy_pattern). Our hope is that using `libmexclass` will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects. ### What changes are included in this PR? 1. Modifications were made to the CMake build system for the MATLAB interface to use `libmexclass` under the hood. This includes the addition of a new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building the new code that uses `libmexclass` under the hood. 2. To illustrate the basic usage of `libmexclass`, we have added one new MATLAB class `arrow.array.Float64Array`. This class allows users to construct an Arrow array with logical type `Float64` from a MATLAB `double` array with zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is responsible for delegating method calls on an `arrow.array.Float64Array` to the corresponding Arrow C++ `Float64Array`. ### Are these changes tested? Yes, these changes have been tested on Linux, macOS, and Windows. 1. We've modified the MATLAB CI GitHub Actions workflow (`.github/workflows/matlab.yml`) to build the new `arrow.array.Float64Array` code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` to the `cmake` command call in `ci/scripts/matlab_build.sh`. 2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` which tests for successful construction of an `arrow.array.Float64Array`. This [test is passing successfully in the MATLAB CI workflow](https://github.com/mathworks/arrow/actions/runs/4419365852/jobs/7747694543#step:6:50). 3. We've confirmed that the [`Dev` CI workflow linting checks are all passing](https://github.com/mathworks/arrow/actions/runs/4419365845) and appropriate Apache license headers have been added. 4. We've manually tested creation, deletion, and assignment of multiple `arrow.array.Float64Array` instances on Linux, macOS, and Windows with a variety of different MATLAB `double` arrays. ### Are there any user-facing changes? Yes, there is now a public class named `arrow.array.Float64Array` which is added to the MATLAB Path. Included below is a simple example of creating two different `arrow.array.Float64Array` objects in MATLAB: ```matlab >> A = arrow.array.Float64Array([1, 2, 3]) A = [ 1, 2, 3 ] >> random = arrow.array.Float64Array(rand(1, 10, 100)) random = [ 0.6311887342690112, 0.355073651878849, 0.9970032716066477, 0.22417149898312716, 0.6524510729686149, 0.6049906419082594, 0.38724543148313495, 0.14218715929050407, 0.025134985710203117, 0.4211122537652413, ... 0.6228027906591304, 0.7966246853083961, 0.74587490154065, 0.12553623135481973, 0.8223940067590204, 0.02515050142850217, 0.41442888092403163, 0.7314074679729372, 0.7813740002759628, 0.367285915131369 ] ``` **Note**: This is an early stage PR, so the naming scheme `arrow.array.<Type>Array` might change in the future. ### Future Directions 1. Currently, the "old" `featherread`/`featherwrite` code is still being built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) we should consider re-implementing this functionality in terms of the new APIs. 2. We would like to start adding more numeric array classes like (`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.). 3. We only added one very basic test for `arrow.array.Float64Array` in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc. 4. We don't have any documentation for `arrow.array.Float64Array` right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them. 5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go. ### Notes 1. Creating `libmexclass` and integrating it with the Arrow code base was a team effort! Thank you to @ sreeharihegden, @ lafiona, @ sgilmore10, @ jhughes-mw, and others at @ MathWorks for their help with this pull request! 2. Closes: apache#33854 Lead-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Fiona La <fionala7@gmail.com> Co-authored-by: shegden <shegden@mathworks.com> Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com> Co-authored-by: Fiona la <fionala7@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…ic Arrays (apache#35479) ### Rationale for this change This pull request is a followup to apache#34563. To facilitate the implementation of future array types, we would like to first create a C++ template class for numeric arrays. We also want to start adding basic tests for the array functionality in the MATLAB interface. ### What changes are included in this PR? 1. Added a C++ template Class called `NumericArray` templated on `CType`. 2. Re-implemented the `Float64Array` C++ proxy class in terms of the new template class, i.e. `NumericArray<double>`. 3. Added a method called `double()` on the MATLAB Float64Array class to convert the arrow.[Type]Array to a MATLAB `double` array. 4. Added basic tests for round-tripping float64 arrays. 5. Created a base C++ proxy `Array` class that all proxy array classes will inherit from. 6. Renamed `Print()` to `ToString()` and made it return a string instead of printing to the screen. ### Are these changes tested? Yes, we added automated test cases to the test class `tFloat64Array.m`. In addition, we manually qualified these changes on macOS. ### Are there any user-facing changes? Yes, the `Print()` method is no longer public and there is now a method called `double()` on `arrow.array.Float64Array`. Included below is a simple example of using the `double()` method: ```matlab >> arrowArray = arrow.array.Float64Array([1, 2, 3]) arrowArray = [ 1, 2, 3 ] >> matlabArray = double(arrowArray) matlabArray = 1 2 3 >> class(arrowArray) ans = 'arrow.array.Float64Array' >> class(matlabArray) ans = 'double' ``` ### Future Directions 1. Support the rest of the numeric types. 2. Add an abstract MATLAB base class called `arrow.array.Array`. 3. Continue building out the methods (e.g. `length()`) 4. Support `null` values (validity bitmap). 5. Handle converting non-ascii characters from `UTF-8` to `UTF-16`. 6. Handle errors in the C++ layer. * Closes: apache#35411 Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com> Co-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…TLAB interface (apache#34563) ### Rationale for this change This pull request is a follow up to [this mailing list discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) about integrating [`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the MATLAB Interface to Arrow code base. We've spent the last few months working on building `libmexclass` from scratch in order to ease development of the MATLAB Interface to Arrow. `libmexclass` essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the [Proxy Design Pattern](https://en.wikipedia.org/wiki/Proxy_pattern). Our hope is that using `libmexclass` will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects. ### What changes are included in this PR? 1. Modifications were made to the CMake build system for the MATLAB interface to use `libmexclass` under the hood. This includes the addition of a new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building the new code that uses `libmexclass` under the hood. 2. To illustrate the basic usage of `libmexclass`, we have added one new MATLAB class `arrow.array.Float64Array`. This class allows users to construct an Arrow array with logical type `Float64` from a MATLAB `double` array with zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is responsible for delegating method calls on an `arrow.array.Float64Array` to the corresponding Arrow C++ `Float64Array`. ### Are these changes tested? Yes, these changes have been tested on Linux, macOS, and Windows. 1. We've modified the MATLAB CI GitHub Actions workflow (`.github/workflows/matlab.yml`) to build the new `arrow.array.Float64Array` code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` to the `cmake` command call in `ci/scripts/matlab_build.sh`. 2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` which tests for successful construction of an `arrow.array.Float64Array`. This [test is passing successfully in the MATLAB CI workflow](https://github.com/mathworks/arrow/actions/runs/4419365852/jobs/7747694543#step:6:50). 3. We've confirmed that the [`Dev` CI workflow linting checks are all passing](https://github.com/mathworks/arrow/actions/runs/4419365845) and appropriate Apache license headers have been added. 4. We've manually tested creation, deletion, and assignment of multiple `arrow.array.Float64Array` instances on Linux, macOS, and Windows with a variety of different MATLAB `double` arrays. ### Are there any user-facing changes? Yes, there is now a public class named `arrow.array.Float64Array` which is added to the MATLAB Path. Included below is a simple example of creating two different `arrow.array.Float64Array` objects in MATLAB: ```matlab >> A = arrow.array.Float64Array([1, 2, 3]) A = [ 1, 2, 3 ] >> random = arrow.array.Float64Array(rand(1, 10, 100)) random = [ 0.6311887342690112, 0.355073651878849, 0.9970032716066477, 0.22417149898312716, 0.6524510729686149, 0.6049906419082594, 0.38724543148313495, 0.14218715929050407, 0.025134985710203117, 0.4211122537652413, ... 0.6228027906591304, 0.7966246853083961, 0.74587490154065, 0.12553623135481973, 0.8223940067590204, 0.02515050142850217, 0.41442888092403163, 0.7314074679729372, 0.7813740002759628, 0.367285915131369 ] ``` **Note**: This is an early stage PR, so the naming scheme `arrow.array.<Type>Array` might change in the future. ### Future Directions 1. Currently, the "old" `featherread`/`featherwrite` code is still being built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) we should consider re-implementing this functionality in terms of the new APIs. 2. We would like to start adding more numeric array classes like (`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.). 3. We only added one very basic test for `arrow.array.Float64Array` in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc. 4. We don't have any documentation for `arrow.array.Float64Array` right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them. 5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go. ### Notes 1. Creating `libmexclass` and integrating it with the Arrow code base was a team effort! Thank you to @ sreeharihegden, @ lafiona, @ sgilmore10, @ jhughes-mw, and others at @ MathWorks for their help with this pull request! 2. Closes: apache#33854 Lead-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Fiona La <fionala7@gmail.com> Co-authored-by: shegden <shegden@mathworks.com> Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com> Co-authored-by: Fiona la <fionala7@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…ic Arrays (apache#35479) ### Rationale for this change This pull request is a followup to apache#34563. To facilitate the implementation of future array types, we would like to first create a C++ template class for numeric arrays. We also want to start adding basic tests for the array functionality in the MATLAB interface. ### What changes are included in this PR? 1. Added a C++ template Class called `NumericArray` templated on `CType`. 2. Re-implemented the `Float64Array` C++ proxy class in terms of the new template class, i.e. `NumericArray<double>`. 3. Added a method called `double()` on the MATLAB Float64Array class to convert the arrow.[Type]Array to a MATLAB `double` array. 4. Added basic tests for round-tripping float64 arrays. 5. Created a base C++ proxy `Array` class that all proxy array classes will inherit from. 6. Renamed `Print()` to `ToString()` and made it return a string instead of printing to the screen. ### Are these changes tested? Yes, we added automated test cases to the test class `tFloat64Array.m`. In addition, we manually qualified these changes on macOS. ### Are there any user-facing changes? Yes, the `Print()` method is no longer public and there is now a method called `double()` on `arrow.array.Float64Array`. Included below is a simple example of using the `double()` method: ```matlab >> arrowArray = arrow.array.Float64Array([1, 2, 3]) arrowArray = [ 1, 2, 3 ] >> matlabArray = double(arrowArray) matlabArray = 1 2 3 >> class(arrowArray) ans = 'arrow.array.Float64Array' >> class(matlabArray) ans = 'double' ``` ### Future Directions 1. Support the rest of the numeric types. 2. Add an abstract MATLAB base class called `arrow.array.Array`. 3. Continue building out the methods (e.g. `length()`) 4. Support `null` values (validity bitmap). 5. Handle converting non-ascii characters from `UTF-8` to `UTF-16`. 6. Handle errors in the C++ layer. * Closes: apache#35411 Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com> Co-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…TLAB interface (apache#34563) ### Rationale for this change This pull request is a follow up to [this mailing list discussion](https://lists.apache.org/thread/2kgxbs54dw4wvcwrthzvb1ljqcvnrv7h) about integrating [`mathworks/libmexclass`](https://github.com/mathworks/libmexclass/) with the MATLAB Interface to Arrow code base. We've spent the last few months working on building `libmexclass` from scratch in order to ease development of the MATLAB Interface to Arrow. `libmexclass` essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the [Proxy Design Pattern](https://en.wikipedia.org/wiki/Proxy_pattern). Our hope is that using `libmexclass` will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects. ### What changes are included in this PR? 1. Modifications were made to the CMake build system for the MATLAB interface to use `libmexclass` under the hood. This includes the addition of a new build flag `-D MATLAB_ARROW_INTERFACE = ON | OFF` which toggles building the new code that uses `libmexclass` under the hood. 2. To illustrate the basic usage of `libmexclass`, we have added one new MATLAB class `arrow.array.Float64Array`. This class allows users to construct an Arrow array with logical type `Float64` from a MATLAB `double` array with zero data copies. Under the hood, a `Proxy` wraps and bounds the lifetime of the underlying Arrow C++ `Float64Array` object. In addition, this `Proxy` is responsible for delegating method calls on an `arrow.array.Float64Array` to the corresponding Arrow C++ `Float64Array`. ### Are these changes tested? Yes, these changes have been tested on Linux, macOS, and Windows. 1. We've modified the MATLAB CI GitHub Actions workflow (`.github/workflows/matlab.yml`) to build the new `arrow.array.Float64Array` code using `libmexclass`. This includes passing `-D MATLAB_ARROW_INTERFACE=ON` to the `cmake` command call in `ci/scripts/matlab_build.sh`. 2. We've added a new basic MATLAB test `test/arrow/array/tFloat64Array.m` which tests for successful construction of an `arrow.array.Float64Array`. This [test is passing successfully in the MATLAB CI workflow](https://github.com/mathworks/arrow/actions/runs/4419365852/jobs/7747694543#step:6:50). 3. We've confirmed that the [`Dev` CI workflow linting checks are all passing](https://github.com/mathworks/arrow/actions/runs/4419365845) and appropriate Apache license headers have been added. 4. We've manually tested creation, deletion, and assignment of multiple `arrow.array.Float64Array` instances on Linux, macOS, and Windows with a variety of different MATLAB `double` arrays. ### Are there any user-facing changes? Yes, there is now a public class named `arrow.array.Float64Array` which is added to the MATLAB Path. Included below is a simple example of creating two different `arrow.array.Float64Array` objects in MATLAB: ```matlab >> A = arrow.array.Float64Array([1, 2, 3]) A = [ 1, 2, 3 ] >> random = arrow.array.Float64Array(rand(1, 10, 100)) random = [ 0.6311887342690112, 0.355073651878849, 0.9970032716066477, 0.22417149898312716, 0.6524510729686149, 0.6049906419082594, 0.38724543148313495, 0.14218715929050407, 0.025134985710203117, 0.4211122537652413, ... 0.6228027906591304, 0.7966246853083961, 0.74587490154065, 0.12553623135481973, 0.8223940067590204, 0.02515050142850217, 0.41442888092403163, 0.7314074679729372, 0.7813740002759628, 0.367285915131369 ] ``` **Note**: This is an early stage PR, so the naming scheme `arrow.array.<Type>Array` might change in the future. ### Future Directions 1. Currently, the "old" `featherread`/`featherwrite` code is still being built by CMake and installed to the specified `CMAKE_INSTALL_PREFIX`. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g. `arrow.Table`, `arrow.Schema`, `arrow.RecordBatch`, etc.) we should consider re-implementing this functionality in terms of the new APIs. 2. We would like to start adding more numeric array classes like (`arrow.array.UInt8Array`, `arrow.array.Int64Array`, etc.). 3. We only added one very basic test for `arrow.array.Float64Array` in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc. 4. We don't have any documentation for `arrow.array.Float64Array` right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them. 5. Lots more! This is just the beginning of building out the MATLAB Interface to Arrow APIs. We plan on creating GitHub issues for tracking work as we go. ### Notes 1. Creating `libmexclass` and integrating it with the Arrow code base was a team effort! Thank you to @ sreeharihegden, @ lafiona, @ sgilmore10, @ jhughes-mw, and others at @ MathWorks for their help with this pull request! 2. Closes: apache#33854 Lead-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Fiona La <fionala7@gmail.com> Co-authored-by: shegden <shegden@mathworks.com> Co-authored-by: Sreehari Hegden <sreehari.hegden@gmail.com> Co-authored-by: Fiona la <fionala7@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…ic Arrays (apache#35479) ### Rationale for this change This pull request is a followup to apache#34563. To facilitate the implementation of future array types, we would like to first create a C++ template class for numeric arrays. We also want to start adding basic tests for the array functionality in the MATLAB interface. ### What changes are included in this PR? 1. Added a C++ template Class called `NumericArray` templated on `CType`. 2. Re-implemented the `Float64Array` C++ proxy class in terms of the new template class, i.e. `NumericArray<double>`. 3. Added a method called `double()` on the MATLAB Float64Array class to convert the arrow.[Type]Array to a MATLAB `double` array. 4. Added basic tests for round-tripping float64 arrays. 5. Created a base C++ proxy `Array` class that all proxy array classes will inherit from. 6. Renamed `Print()` to `ToString()` and made it return a string instead of printing to the screen. ### Are these changes tested? Yes, we added automated test cases to the test class `tFloat64Array.m`. In addition, we manually qualified these changes on macOS. ### Are there any user-facing changes? Yes, the `Print()` method is no longer public and there is now a method called `double()` on `arrow.array.Float64Array`. Included below is a simple example of using the `double()` method: ```matlab >> arrowArray = arrow.array.Float64Array([1, 2, 3]) arrowArray = [ 1, 2, 3 ] >> matlabArray = double(arrowArray) matlabArray = 1 2 3 >> class(arrowArray) ans = 'arrow.array.Float64Array' >> class(matlabArray) ans = 'double' ``` ### Future Directions 1. Support the rest of the numeric types. 2. Add an abstract MATLAB base class called `arrow.array.Array`. 3. Continue building out the methods (e.g. `length()`) 4. Support `null` values (validity bitmap). 5. Handle converting non-ascii characters from `UTF-8` to `UTF-16`. 6. Handle errors in the C++ layer. * Closes: apache#35411 Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com> Co-authored-by: Kevin Gurney <kgurney@mathworks.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Rationale for this change
This pull request is a follow up to this mailing list discussion about integrating
mathworks/libmexclass
with the MATLAB Interface to Arrow code base.We've spent the last few months working on building
libmexclass
from scratch in order to ease development of the MATLAB Interface to Arrow.libmexclass
essentially provides a way to connect MATLAB classes with corresponding C++ classes using an approach inspired by the Proxy Design Pattern.Our hope is that using
libmexclass
will enable us to more easily build out an object-oriented MATLAB Interface to Arrow memory by wrapping corresponding Arrow C++ classes and "proxying" method calls on these MATLAB objects to the underlying Arrow C++ objects.What changes are included in this PR?
libmexclass
under the hood. This includes the addition of a new build flag-D MATLAB_ARROW_INTERFACE = ON | OFF
which toggles building the new code that useslibmexclass
under the hood.libmexclass
, we have added one new MATLAB classarrow.array.Float64Array
. This class allows users to construct an Arrow array with logical typeFloat64
from a MATLABdouble
array with zero data copies. Under the hood, aProxy
wraps and bounds the lifetime of the underlying Arrow C++Float64Array
object. In addition, thisProxy
is responsible for delegating method calls on anarrow.array.Float64Array
to the corresponding Arrow C++Float64Array
.Are these changes tested?
Yes, these changes have been tested on Linux, macOS, and Windows.
.github/workflows/matlab.yml
) to build the newarrow.array.Float64Array
code usinglibmexclass
. This includes passing-D MATLAB_ARROW_INTERFACE=ON
to thecmake
command call inci/scripts/matlab_build.sh
.test/arrow/array/tFloat64Array.m
which tests for successful construction of anarrow.array.Float64Array
. This test is passing successfully in the MATLAB CI workflow.Dev
CI workflow linting checks are all passing and appropriate Apache license headers have been added.arrow.array.Float64Array
instances on Linux, macOS, and Windows with a variety of different MATLABdouble
arrays.Are there any user-facing changes?
Yes, there is now a public class named
arrow.array.Float64Array
which is added to the MATLAB Path.Included below is a simple example of creating two different
arrow.array.Float64Array
objects in MATLAB:Note: This is an early stage PR, so the naming scheme
arrow.array.<Type>Array
might change in the future.Future Directions
featherread
/featherwrite
code is still being built by CMake and installed to the specifiedCMAKE_INSTALL_PREFIX
. This slows down the build process and complicates the build system logic. In addition, these Feather functions only support reading and writing a subset of Feather V1 files. We should considering disabling building of this legacy code by default or removing it entirely. In the long term, when we have more Arrow types in MATLAB (e.g.arrow.Table
,arrow.Schema
,arrow.RecordBatch
, etc.) we should consider re-implementing this functionality in terms of the new APIs.arrow.array.UInt8Array
,arrow.array.Int64Array
, etc.).arrow.array.Float64Array
in this pull request. We should add a lot more tests as the APIs develop to test things like indexing, copying, slicing, etc.arrow.array.Float64Array
right now. In general, we should start adding detailed documentation for the new APIs as we start to implement them.Notes
libmexclass
and integrating it with the Arrow code base was a team effort! Thank you to @sreeharihegden, @lafiona, @sgilmore10, @jhughes-mw, and others at @mathworks for their help with this pull request!libmexclass
integration code to MATLAB interface #33854