-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-36363: [MATLAB] Create proxy classes for the DataType class hierarchy #36419
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added myself as a reviewer per #36549. |
kevingurney
approved these changes
Jul 7, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
github-actions
bot
added
awaiting committer review
Awaiting committer review
and removed
awaiting review
Awaiting review
labels
Jul 7, 2023
kou
reviewed
Jul 8, 2023
github-actions
bot
added
awaiting changes
Awaiting changes
and removed
awaiting committer review
Awaiting committer review
labels
Jul 8, 2023
kou
reviewed
Jul 9, 2023
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
github-actions
bot
added
awaiting change review
Awaiting change review
and removed
awaiting changes
Awaiting changes
labels
Jul 10, 2023
kevingurney
pushed a commit
that referenced
this pull request
Jul 17, 2023
…ay` subclasses from existing proxy ids (#36731) ### Rationale for this change Now that the issue #36363 is closed via PR #36419, we can initialize the `Type` property of `arrow.array.Array` subclasses from existing proxy ids. Currently, we create a new proxy `Type` object whose underlying `arrow::DataType` are semantically equal to - but not the same as - the `arrow::DataType` owned by the Array proxy. It would be preferable if the `Type` and `Array` proxy classes refer to the same `arrow::DataType` object (i.e. the same object on the heap). ### What changes are included in this PR? 1. Upgraded `libmexclass` to commit [d04f88d](mathworks/libmexclass@d04f88d). In this commit, we added a static "make-like" function to `Proxy` called `create`. 2. Modified the constructors of all `Type` objects to expect a single `Proxy` object as input. This is a breaking change and means clients are no longer expected to build `Type` objects via their constructors. Instead, we introduced standalone functions that clients can use to construct `Type` objects, i.e. `arrow.type.int8`, `arrow.type.string`, `arrow.type.timestamp`, etc. These functions deal with creating the `Proxy` objects to pass to the `Type` constructors. Below is an example of the new workflow for creating `Type` objects. ```matlab >> timestampType = arrow.type.timestamp(TimeUnit="second", TimeZone="America/New_York") timestampType = TimestampType with properties: ID: Timestamp ``` NOTE: We plan on enhancing the display to show the `TimeUnit` and `TimeZone` properties. 3. Made `Type` a [dependent](https://www.mathworks.com/help/matlab/matlab_oop/access-methods-for-dependent-properties.html) property on `arrow.array.Array`. The `get.Type` method constructs a `Type` object on demand by making a proxy that wraps the same `arrow::DataType` object stored within the `arrow::Array`. ### Are these changes tested? Yes, updated existing tests. ### Are there any user-facing changes? Yes, we added new standalone functions for creating `Type` objects. Below is a table mapping standalone functions to the `Type` object they output: | Standalone Function | Output Type Object | |----------------------|---------------------| |`arrow.type.boolean`| `arrow.type.BooleanType`| |`arrow.type.int8`| `arrow.type.Int8Type`| |`arrow.type.int16`| `arrow.type.Int16Type`| |`arrow.type.int32`| `arrow.type.Int32Type`| |`arrow.type.int64`| `arrow.type.Int64Type`| |`arrow.type.uint8`| `arrow.type.UInt8Type`| |`arrow.type.uint16`| `arrow.type.UInt16Type`| |`arrow.type.uint32`| `arrow.type.UInt32Type`| |`arrow.type.uint64`| `arrow.type.UInt64Type`| |`arrow.type.string`| `arrow.type.StringType`| |`arrow.type.timestamp`| `arrow.type.TimestampType`| ### Notes Thanks @ kevingurney for the advice! * Closes: #36652 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
chelseajonesr
pushed a commit
to chelseajonesr/arrow
that referenced
this pull request
Jul 20, 2023
…hierarchy (apache#36419) ### Rationale for this change In the original pull request in which we added the MATLAB `arrow.type.<Type>Type` classes (e.g. `arrow.type.Float32Type`), we did implement these classes as proxies. At the time, we weren't sure if it would be advantageous to implement the type classes as proxies, but now realize it will be for composite data structures, i.e. `Schema`, `StructArray`, `ListArray`. ### What changes are included in this PR? 1. All classes within the `arrow.type.Type` class hierarchy are implemented as proxies. ### Are these changes tested? Yes, we had existing tests for these classes. ### Are there any user-facing changes? No. ### Future Directions 1. In a followup PR request, we plan on integrating the proxy type classes and the array classes so that they share the same underlying C++` arrow::DataType` object. We thought doing so in this change would be too much code churn. ### Notes Thank you @ kevingurney for the help! * Closes: apache#36363 Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
chelseajonesr
pushed a commit
to chelseajonesr/arrow
that referenced
this pull request
Jul 20, 2023
…ay.Array` subclasses from existing proxy ids (apache#36731) ### Rationale for this change Now that the issue apache#36363 is closed via PR apache#36419, we can initialize the `Type` property of `arrow.array.Array` subclasses from existing proxy ids. Currently, we create a new proxy `Type` object whose underlying `arrow::DataType` are semantically equal to - but not the same as - the `arrow::DataType` owned by the Array proxy. It would be preferable if the `Type` and `Array` proxy classes refer to the same `arrow::DataType` object (i.e. the same object on the heap). ### What changes are included in this PR? 1. Upgraded `libmexclass` to commit [d04f88d](mathworks/libmexclass@d04f88d). In this commit, we added a static "make-like" function to `Proxy` called `create`. 2. Modified the constructors of all `Type` objects to expect a single `Proxy` object as input. This is a breaking change and means clients are no longer expected to build `Type` objects via their constructors. Instead, we introduced standalone functions that clients can use to construct `Type` objects, i.e. `arrow.type.int8`, `arrow.type.string`, `arrow.type.timestamp`, etc. These functions deal with creating the `Proxy` objects to pass to the `Type` constructors. Below is an example of the new workflow for creating `Type` objects. ```matlab >> timestampType = arrow.type.timestamp(TimeUnit="second", TimeZone="America/New_York") timestampType = TimestampType with properties: ID: Timestamp ``` NOTE: We plan on enhancing the display to show the `TimeUnit` and `TimeZone` properties. 3. Made `Type` a [dependent](https://www.mathworks.com/help/matlab/matlab_oop/access-methods-for-dependent-properties.html) property on `arrow.array.Array`. The `get.Type` method constructs a `Type` object on demand by making a proxy that wraps the same `arrow::DataType` object stored within the `arrow::Array`. ### Are these changes tested? Yes, updated existing tests. ### Are there any user-facing changes? Yes, we added new standalone functions for creating `Type` objects. Below is a table mapping standalone functions to the `Type` object they output: | Standalone Function | Output Type Object | |----------------------|---------------------| |`arrow.type.boolean`| `arrow.type.BooleanType`| |`arrow.type.int8`| `arrow.type.Int8Type`| |`arrow.type.int16`| `arrow.type.Int16Type`| |`arrow.type.int32`| `arrow.type.Int32Type`| |`arrow.type.int64`| `arrow.type.Int64Type`| |`arrow.type.uint8`| `arrow.type.UInt8Type`| |`arrow.type.uint16`| `arrow.type.UInt16Type`| |`arrow.type.uint32`| `arrow.type.UInt32Type`| |`arrow.type.uint64`| `arrow.type.UInt64Type`| |`arrow.type.string`| `arrow.type.StringType`| |`arrow.type.timestamp`| `arrow.type.TimestampType`| ### Notes Thanks @ kevingurney for the advice! * Closes: apache#36652 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 6baf6a7. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
R-JunmingChen
pushed a commit
to R-JunmingChen/arrow
that referenced
this pull request
Aug 20, 2023
…hierarchy (apache#36419) ### Rationale for this change In the original pull request in which we added the MATLAB `arrow.type.<Type>Type` classes (e.g. `arrow.type.Float32Type`), we did implement these classes as proxies. At the time, we weren't sure if it would be advantageous to implement the type classes as proxies, but now realize it will be for composite data structures, i.e. `Schema`, `StructArray`, `ListArray`. ### What changes are included in this PR? 1. All classes within the `arrow.type.Type` class hierarchy are implemented as proxies. ### Are these changes tested? Yes, we had existing tests for these classes. ### Are there any user-facing changes? No. ### Future Directions 1. In a followup PR request, we plan on integrating the proxy type classes and the array classes so that they share the same underlying C++` arrow::DataType` object. We thought doing so in this change would be too much code churn. ### Notes Thank you @ kevingurney for the help! * Closes: apache#36363 Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com> Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
R-JunmingChen
pushed a commit
to R-JunmingChen/arrow
that referenced
this pull request
Aug 20, 2023
…ay.Array` subclasses from existing proxy ids (apache#36731) ### Rationale for this change Now that the issue apache#36363 is closed via PR apache#36419, we can initialize the `Type` property of `arrow.array.Array` subclasses from existing proxy ids. Currently, we create a new proxy `Type` object whose underlying `arrow::DataType` are semantically equal to - but not the same as - the `arrow::DataType` owned by the Array proxy. It would be preferable if the `Type` and `Array` proxy classes refer to the same `arrow::DataType` object (i.e. the same object on the heap). ### What changes are included in this PR? 1. Upgraded `libmexclass` to commit [d04f88d](mathworks/libmexclass@d04f88d). In this commit, we added a static "make-like" function to `Proxy` called `create`. 2. Modified the constructors of all `Type` objects to expect a single `Proxy` object as input. This is a breaking change and means clients are no longer expected to build `Type` objects via their constructors. Instead, we introduced standalone functions that clients can use to construct `Type` objects, i.e. `arrow.type.int8`, `arrow.type.string`, `arrow.type.timestamp`, etc. These functions deal with creating the `Proxy` objects to pass to the `Type` constructors. Below is an example of the new workflow for creating `Type` objects. ```matlab >> timestampType = arrow.type.timestamp(TimeUnit="second", TimeZone="America/New_York") timestampType = TimestampType with properties: ID: Timestamp ``` NOTE: We plan on enhancing the display to show the `TimeUnit` and `TimeZone` properties. 3. Made `Type` a [dependent](https://www.mathworks.com/help/matlab/matlab_oop/access-methods-for-dependent-properties.html) property on `arrow.array.Array`. The `get.Type` method constructs a `Type` object on demand by making a proxy that wraps the same `arrow::DataType` object stored within the `arrow::Array`. ### Are these changes tested? Yes, updated existing tests. ### Are there any user-facing changes? Yes, we added new standalone functions for creating `Type` objects. Below is a table mapping standalone functions to the `Type` object they output: | Standalone Function | Output Type Object | |----------------------|---------------------| |`arrow.type.boolean`| `arrow.type.BooleanType`| |`arrow.type.int8`| `arrow.type.Int8Type`| |`arrow.type.int16`| `arrow.type.Int16Type`| |`arrow.type.int32`| `arrow.type.Int32Type`| |`arrow.type.int64`| `arrow.type.Int64Type`| |`arrow.type.uint8`| `arrow.type.UInt8Type`| |`arrow.type.uint16`| `arrow.type.UInt16Type`| |`arrow.type.uint32`| `arrow.type.UInt32Type`| |`arrow.type.uint64`| `arrow.type.UInt64Type`| |`arrow.type.string`| `arrow.type.StringType`| |`arrow.type.timestamp`| `arrow.type.TimestampType`| ### Notes Thanks @ kevingurney for the advice! * Closes: apache#36652 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rationale for this change
In the original pull request in which we added the MATLAB
arrow.type.<Type>Type
classes (e.g.arrow.type.Float32Type
), we did implement these classes as proxies. At the time, we weren't sure if it would be advantageous to implement the type classes as proxies, but now realize it will be for composite data structures, i.e.Schema
,StructArray
,ListArray
.What changes are included in this PR?
arrow.type.Type
class hierarchy are implemented as proxies.Are these changes tested?
Yes, we had existing tests for these classes.
Are there any user-facing changes?
No.
Future Directions
arrow::DataType
object. We thought doing so in this change would be too much code churn.Notes
Thank you @kevingurney for the help!
DataType
class hierarchy #36363