Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36363: [MATLAB] Create proxy classes for the DataType class hierarchy #36419

Merged
merged 15 commits into from
Jul 12, 2023

Conversation

sgilmore10
Copy link
Member

@sgilmore10 sgilmore10 commented Jun 30, 2023

Rationale for this change

In the original pull request in which we added the MATLAB arrow.type.<Type>Type classes (e.g. arrow.type.Float32Type), we did implement these classes as proxies. At the time, we weren't sure if it would be advantageous to implement the type classes as proxies, but now realize it will be for composite data structures, i.e. Schema, StructArray, ListArray.

What changes are included in this PR?

  1. All classes within the arrow.type.Type class hierarchy are implemented as proxies.

Are these changes tested?

Yes, we had existing tests for these classes.

Are there any user-facing changes?

No.

Future Directions

  1. In a followup PR request, we plan on integrating the proxy type classes and the array classes so that they share the same underlying C++ arrow::DataType object. We thought doing so in this change would be too much code churn.

Notes

Thank you @kevingurney for the help!

@kevingurney
Copy link
Member

kevingurney commented Jul 7, 2023

Added myself as a reviewer per #36549.

Copy link
Member

@kevingurney kevingurney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

matlab/src/cpp/arrow/matlab/type/proxy/type.h Outdated Show resolved Hide resolved
matlab/src/cpp/arrow/matlab/type/proxy/type.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jul 7, 2023
@assignUser assignUser removed their request for review July 7, 2023 21:49
matlab/src/cpp/arrow/matlab/proxy/factory.cc Outdated Show resolved Hide resolved
matlab/src/cpp/arrow/matlab/type/proxy/type.cc Outdated Show resolved Hide resolved
matlab/src/cpp/arrow/matlab/type/proxy/type.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 8, 2023
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 10, 2023
@kevingurney kevingurney merged commit 6baf6a7 into apache:main Jul 12, 2023
@kevingurney kevingurney removed the awaiting change review Awaiting change review label Jul 12, 2023
kevingurney pushed a commit that referenced this pull request Jul 17, 2023
…ay` subclasses from existing proxy ids (#36731)

### Rationale for this change

Now that the issue #36363 is closed via PR #36419, we can initialize the `Type` property of `arrow.array.Array` subclasses from existing proxy ids. Currently, we create a new proxy `Type` object whose underlying `arrow::DataType` are semantically equal to  - but not the same as - the `arrow::DataType` owned by the Array proxy. It would be preferable if the `Type` and `Array` proxy classes refer to the same `arrow::DataType` object (i.e. the same object on the heap).

### What changes are included in this PR?

1. Upgraded `libmexclass` to commit [d04f88d](mathworks/libmexclass@d04f88d). In this commit, we added a static "make-like" function to `Proxy` called `create`.
2. Modified the constructors of all `Type` objects to expect a single `Proxy` object as input. This is a breaking change and  means clients are no longer expected to build `Type` objects via their constructors. Instead, we introduced standalone functions that clients can use to construct `Type` objects, i.e.   `arrow.type.int8`, `arrow.type.string`, `arrow.type.timestamp`, etc. These functions deal with creating the `Proxy` objects to pass to the `Type` constructors. Below is an example of the new workflow for creating `Type` objects. 

```matlab
>> timestampType = arrow.type.timestamp(TimeUnit="second", TimeZone="America/New_York")

timestampType = 

  TimestampType with properties:

    ID: Timestamp
```
NOTE: We plan on enhancing the display to show the `TimeUnit` and `TimeZone` properties. 

3. Made `Type` a [dependent](https://www.mathworks.com/help/matlab/matlab_oop/access-methods-for-dependent-properties.html) property on `arrow.array.Array`. The `get.Type` method constructs a `Type` object on demand by making a proxy that wraps the same `arrow::DataType` object stored within the `arrow::Array`.

### Are these changes tested?

Yes, updated existing tests.

### Are there any user-facing changes?

Yes, we added new standalone functions for creating `Type` objects. Below is a table mapping standalone  functions to the `Type` object they output: 

| Standalone Function | Output Type Object |
|----------------------|---------------------|
|`arrow.type.boolean`| `arrow.type.BooleanType`|
|`arrow.type.int8`| `arrow.type.Int8Type`|
|`arrow.type.int16`| `arrow.type.Int16Type`|
|`arrow.type.int32`| `arrow.type.Int32Type`|
|`arrow.type.int64`| `arrow.type.Int64Type`|
|`arrow.type.uint8`| `arrow.type.UInt8Type`|
|`arrow.type.uint16`| `arrow.type.UInt16Type`|
|`arrow.type.uint32`| `arrow.type.UInt32Type`|
|`arrow.type.uint64`| `arrow.type.UInt64Type`|
|`arrow.type.string`| `arrow.type.StringType`|
|`arrow.type.timestamp`| `arrow.type.TimestampType`|

### Notes

Thanks @ kevingurney for the advice!
* Closes: #36652

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
chelseajonesr pushed a commit to chelseajonesr/arrow that referenced this pull request Jul 20, 2023
…hierarchy (apache#36419)

### Rationale for this change

In the original pull request in which we added the MATLAB `arrow.type.<Type>Type` classes (e.g. `arrow.type.Float32Type`), we did implement these classes as proxies. At the time, we weren't sure if it would be advantageous to implement the type classes as proxies, but now realize it will be for composite data structures, i.e. `Schema`, `StructArray`, `ListArray`. 

### What changes are included in this PR?

1. All classes within the `arrow.type.Type` class hierarchy are implemented as proxies. 

### Are these changes tested?

Yes, we had existing tests for these classes. 

### Are there any user-facing changes?

No.

### Future Directions

1. In a followup PR request, we plan on integrating the proxy type classes and the array classes so that they share the same underlying C++` arrow::DataType` object. We thought doing so in this change would be too much code churn.

### Notes

Thank you @ kevingurney for the help!

* Closes: apache#36363

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
chelseajonesr pushed a commit to chelseajonesr/arrow that referenced this pull request Jul 20, 2023
…ay.Array` subclasses from existing proxy ids (apache#36731)

### Rationale for this change

Now that the issue apache#36363 is closed via PR apache#36419, we can initialize the `Type` property of `arrow.array.Array` subclasses from existing proxy ids. Currently, we create a new proxy `Type` object whose underlying `arrow::DataType` are semantically equal to  - but not the same as - the `arrow::DataType` owned by the Array proxy. It would be preferable if the `Type` and `Array` proxy classes refer to the same `arrow::DataType` object (i.e. the same object on the heap).

### What changes are included in this PR?

1. Upgraded `libmexclass` to commit [d04f88d](mathworks/libmexclass@d04f88d). In this commit, we added a static "make-like" function to `Proxy` called `create`.
2. Modified the constructors of all `Type` objects to expect a single `Proxy` object as input. This is a breaking change and  means clients are no longer expected to build `Type` objects via their constructors. Instead, we introduced standalone functions that clients can use to construct `Type` objects, i.e.   `arrow.type.int8`, `arrow.type.string`, `arrow.type.timestamp`, etc. These functions deal with creating the `Proxy` objects to pass to the `Type` constructors. Below is an example of the new workflow for creating `Type` objects. 

```matlab
>> timestampType = arrow.type.timestamp(TimeUnit="second", TimeZone="America/New_York")

timestampType = 

  TimestampType with properties:

    ID: Timestamp
```
NOTE: We plan on enhancing the display to show the `TimeUnit` and `TimeZone` properties. 

3. Made `Type` a [dependent](https://www.mathworks.com/help/matlab/matlab_oop/access-methods-for-dependent-properties.html) property on `arrow.array.Array`. The `get.Type` method constructs a `Type` object on demand by making a proxy that wraps the same `arrow::DataType` object stored within the `arrow::Array`.

### Are these changes tested?

Yes, updated existing tests.

### Are there any user-facing changes?

Yes, we added new standalone functions for creating `Type` objects. Below is a table mapping standalone  functions to the `Type` object they output: 

| Standalone Function | Output Type Object |
|----------------------|---------------------|
|`arrow.type.boolean`| `arrow.type.BooleanType`|
|`arrow.type.int8`| `arrow.type.Int8Type`|
|`arrow.type.int16`| `arrow.type.Int16Type`|
|`arrow.type.int32`| `arrow.type.Int32Type`|
|`arrow.type.int64`| `arrow.type.Int64Type`|
|`arrow.type.uint8`| `arrow.type.UInt8Type`|
|`arrow.type.uint16`| `arrow.type.UInt16Type`|
|`arrow.type.uint32`| `arrow.type.UInt32Type`|
|`arrow.type.uint64`| `arrow.type.UInt64Type`|
|`arrow.type.string`| `arrow.type.StringType`|
|`arrow.type.timestamp`| `arrow.type.TimestampType`|

### Notes

Thanks @ kevingurney for the advice!
* Closes: apache#36652

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 6baf6a7.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this pull request Aug 20, 2023
…hierarchy (apache#36419)

### Rationale for this change

In the original pull request in which we added the MATLAB `arrow.type.<Type>Type` classes (e.g. `arrow.type.Float32Type`), we did implement these classes as proxies. At the time, we weren't sure if it would be advantageous to implement the type classes as proxies, but now realize it will be for composite data structures, i.e. `Schema`, `StructArray`, `ListArray`. 

### What changes are included in this PR?

1. All classes within the `arrow.type.Type` class hierarchy are implemented as proxies. 

### Are these changes tested?

Yes, we had existing tests for these classes. 

### Are there any user-facing changes?

No.

### Future Directions

1. In a followup PR request, we plan on integrating the proxy type classes and the array classes so that they share the same underlying C++` arrow::DataType` object. We thought doing so in this change would be too much code churn.

### Notes

Thank you @ kevingurney for the help!

* Closes: apache#36363

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
R-JunmingChen pushed a commit to R-JunmingChen/arrow that referenced this pull request Aug 20, 2023
…ay.Array` subclasses from existing proxy ids (apache#36731)

### Rationale for this change

Now that the issue apache#36363 is closed via PR apache#36419, we can initialize the `Type` property of `arrow.array.Array` subclasses from existing proxy ids. Currently, we create a new proxy `Type` object whose underlying `arrow::DataType` are semantically equal to  - but not the same as - the `arrow::DataType` owned by the Array proxy. It would be preferable if the `Type` and `Array` proxy classes refer to the same `arrow::DataType` object (i.e. the same object on the heap).

### What changes are included in this PR?

1. Upgraded `libmexclass` to commit [d04f88d](mathworks/libmexclass@d04f88d). In this commit, we added a static "make-like" function to `Proxy` called `create`.
2. Modified the constructors of all `Type` objects to expect a single `Proxy` object as input. This is a breaking change and  means clients are no longer expected to build `Type` objects via their constructors. Instead, we introduced standalone functions that clients can use to construct `Type` objects, i.e.   `arrow.type.int8`, `arrow.type.string`, `arrow.type.timestamp`, etc. These functions deal with creating the `Proxy` objects to pass to the `Type` constructors. Below is an example of the new workflow for creating `Type` objects. 

```matlab
>> timestampType = arrow.type.timestamp(TimeUnit="second", TimeZone="America/New_York")

timestampType = 

  TimestampType with properties:

    ID: Timestamp
```
NOTE: We plan on enhancing the display to show the `TimeUnit` and `TimeZone` properties. 

3. Made `Type` a [dependent](https://www.mathworks.com/help/matlab/matlab_oop/access-methods-for-dependent-properties.html) property on `arrow.array.Array`. The `get.Type` method constructs a `Type` object on demand by making a proxy that wraps the same `arrow::DataType` object stored within the `arrow::Array`.

### Are these changes tested?

Yes, updated existing tests.

### Are there any user-facing changes?

Yes, we added new standalone functions for creating `Type` objects. Below is a table mapping standalone  functions to the `Type` object they output: 

| Standalone Function | Output Type Object |
|----------------------|---------------------|
|`arrow.type.boolean`| `arrow.type.BooleanType`|
|`arrow.type.int8`| `arrow.type.Int8Type`|
|`arrow.type.int16`| `arrow.type.Int16Type`|
|`arrow.type.int32`| `arrow.type.Int32Type`|
|`arrow.type.int64`| `arrow.type.Int64Type`|
|`arrow.type.uint8`| `arrow.type.UInt8Type`|
|`arrow.type.uint16`| `arrow.type.UInt16Type`|
|`arrow.type.uint32`| `arrow.type.UInt32Type`|
|`arrow.type.uint64`| `arrow.type.UInt64Type`|
|`arrow.type.string`| `arrow.type.StringType`|
|`arrow.type.timestamp`| `arrow.type.TimestampType`|

### Notes

Thanks @ kevingurney for the advice!
* Closes: apache#36652

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
@sgilmore10 sgilmore10 deleted the GH-36363 branch August 21, 2023 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MATLAB] Create proxy classes for the DataType class hierarchy
3 participants