Skip to content

Commit

Permalink
GH-38354: [MATLAB] Implement fromMATLAB method for `arrow.array.Lis…
Browse files Browse the repository at this point in the history
…tArray` (#38561)

### Rationale for this change

We should implement a static `fromMATLAB` method for `arrow.array.ListArray` that takes in a MATLAB `cell` array and returns an instance of `arrow.array.ListArray`. Adding this method enables users to create an `arrow.array.ListArray` by passing a MATLAB `cell` array to the `arrow.array` gateway function:

```matlab
>> C = {[1 2 3], [4 5], 6};
>> array = arrow.array(C)

array = 

  ListArray with 3 elements and 0 null values:

    [
        [
            1,
            2,
            3
        ],
        [
            4,
            5
        ],
        [
            6
        ]
    ]
```
Internally, the `arrow.array` gateway function will call `arrow.array.ListArray.fromMATLAB` to construct a `ListArray` from the given `cell` array.

### What changes are included in this PR?

1. Implemented `fromMATLAB` method on `arrow.array.ListArray`. This method accepts a MATLAB `cell` array and returns an instance of `arrow.array.ListArray`. 
2. Set the `ArrayStaticConstructor` property of `arrow.type.traits.ListTraits` to `@ arrow.array.ListArray.fromMATLAB`.
3. Added a switch case for `"cell"` to the `arrow.array` gateway function that invokes `arrow.array.ListArray.fromMATLAB` with the input `cell` array.

### Are these changes tested?

Yes. I added a new test class to the `test/arrow/array/list` folder named `tFromMATLAB.m`.

### Are there any user-facing changes?

Yes. Users can now create instances of `arrow.array.ListArray` by passing `cell` arrays to `arrow.array`:

```matlab
>> C = {["A" "B"], ["C" "D" "E"], missing, ["F" "G"], string.empty(0, 1)};
>> array = arrow.array(C)

array = 

  ListArray with 5 elements and 1 null value:

    [
        [
            "A",
            "B"
        ],
        [
            "C",
            "D",
            "E"
        ],
        null,
        [
            "F",
            "G"
        ],
        []
    ]

```

* Closes: #38354

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
  • Loading branch information
sgilmore10 authored Nov 2, 2023
1 parent 1749e00 commit cd6e635
Show file tree
Hide file tree
Showing 7 changed files with 291 additions and 4 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
% The ASF licenses this file to you under the Apache License, Version
% 2.0 (the "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing, software
% distributed under the License is distributed on an "AS IS" BASIS,
% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
% implied. See the License for the specific language governing
% permissions and limitations under the License.

function idx = findFirstNonMissingElement(C)
idx = -1;
for ii=1:numel(C)
if ~isa(C{ii}, "missing")
idx = ii;
return;
end
end
end
54 changes: 54 additions & 0 deletions matlab/src/matlab/+arrow/+array/ListArray.m
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,60 @@
array = arrow.array.ListArray(proxy);
end

function array = fromMATLAB(C)
arguments
C(:, 1) cell {mustBeNonempty}
end
import arrow.array.internal.list.findFirstNonMissingElement
import arrow.array.internal.list.createValidator

idx = findFirstNonMissingElement(C);

if idx == -1
id = "arrow:array:list:CellArrayAllMissing";
msg = "The input cell array must contain at least one non-missing" + ...
" value to be converted to an Arrow array.";
error(id, msg);
end

validator = createValidator(C{idx});

numElements = numel(C);
valid = true([numElements 1]);
% All elements before the first non-missing value should be
% treated as null values.
valid(1:idx-1) = false;
offsets = zeros([numElements + 1, 1], "int32");

for ii = idx:numElements
element = C{ii};
if isa(element, "missing")
% Treat missing values as null values.
valid(ii) = false;
offsets(ii + 1) = offsets(ii);
else
validator.validateElement(element);
length = validator.getElementLength(element);
offsets(ii + 1) = offsets(ii) + length;
end
end

offsetArray = arrow.array(offsets);

validValueCellArray = validator.reshapeCellElements(C(valid));
values = vertcat(validValueCellArray{:});
valueArray = arrow.array(values);

args = struct(...
OffsetsProxyID=offsetArray.Proxy.ID, ...
ValuesProxyID=valueArray.Proxy.ID, ...
Valid=valid ...
);

proxyName = "arrow.array.proxy.ListArray";
proxy = arrow.internal.proxy.create(proxyName, args);
array = arrow.array.ListArray(proxy);
end
end

end
2 changes: 1 addition & 1 deletion matlab/src/matlab/+arrow/+type/+traits/ListTraits.m
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
ArrayConstructor = @arrow.array.ListArray
ArrayClassName = "arrow.array.ListArray"
ArrayProxyClassName = "arrow.array.proxy.ListArray"
ArrayStaticConstructor = missing
ArrayStaticConstructor = @arrow.array.ListArray.fromMATLAB
TypeConstructor = @arrow.type.ListType
TypeClassName = "arrow.type.ListType"
TypeProxyClassName = "arrow.type.proxy.ListType"
Expand Down
2 changes: 2 additions & 0 deletions matlab/src/matlab/+arrow/array.m
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@
arrowArray = arrow.array.Time64Array.fromMATLAB(data, varargin{:});
case "table"
arrowArray = arrow.array.StructArray.fromMATLAB(data, varargin{:});
case "cell"
arrowArray = arrow.array.ListArray.fromMATLAB(data, varargin{:});
otherwise
errid = "arrow:array:UnsupportedMATLABType";
msg = join(["Unable to convert MATLAB type" classname "to arrow array."]);
Expand Down
206 changes: 206 additions & 0 deletions matlab/test/arrow/array/list/tFromMATLAB.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
%TFROMMATLAB Unit tests for arrow.array.ListArray's froMATLAB method.

% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
% The ASF licenses this file to you under the Apache License, Version
% 2.0 (the "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing, software
% distributed under the License is distributed on an "AS IS" BASIS,
% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef tFromMATLAB < matlab.unittest.TestCase

methods (Test)
function EmptyCellArrayError(testCase)
% Verify fromMATLAB throws an error whose identifier is
% "MATLAB:validators:mustBeNonempty" if given an empty cell
% array as input.
import arrow.array.ListArray

fcn = @() ListArray.fromMATLAB({});
testCase.verifyError(fcn, "MATLAB:validators:mustBeNonempty");
end

function MustBeCellArrayError(testCase)
% Verify fromMATLAB throws an error whose identifier is
% "MATLAB:validation:UnableToConvert" if the input provided is
% not a cell array.
import arrow.array.ListArray

fcn = @() ListArray.fromMATLAB('a');
testCase.verifyError(fcn, "MATLAB:validation:UnableToConvert");
end

function AllMissingCellArrayError(testCase)
% Verify fromMATLAB throws an error whose identifier is
% "arrow:array:list:CellArrayAllMissing" if given a cell array
% containing only missing values.
import arrow.array.ListArray

C = {missing missing missing};
fcn = @() ListArray.fromMATLAB(C);
testCase.verifyError(fcn, "arrow:array:list:CellArrayAllMissing");
end

function ListOfFloat64(testCase)
% Verify fromMATLAB creates the expected ListArray whose
% Values property is a Float64Array.
import arrow.array.ListArray

C = {[1 2 3], [4 5], missing, [6 7 8], [], [9 10]};
actual = ListArray.fromMATLAB(C);

values = arrow.array(1:10);
offsets = arrow.array(int32([0 3 5 5 8 8 10]));
expected = ListArray.fromArrays(offsets, values, Valid=[1 2 4 5 6]);

testCase.verifyEqual(actual, expected);
end

function ListOfStruct(testCase)
% Verify fromMATLAB creates the expected ListArray whose
% Values property is a StructArray.
import arrow.array.ListArray

Number = (1:10)';
Text = compose("Test%d", (1:10)');
Date = datetime(2023, 11, 2) + days(0:9)';
T = table(Number, Text, Date);
C = {missing, T(1:3, :), T(4, :), T(1:0, :), T(5:10, :), missing};
actual = ListArray.fromMATLAB(C);

values = arrow.array(T);
offsets = arrow.array(int32([0 0 3 4 4 10 10]));
expected = ListArray.fromArrays(offsets, values, Valid=[2 3 4 5]);

testCase.verifyEqual(actual, expected);
end

function ListOfListOfString(testCase)
% Verify fromMATLAB creates the expected ListArray whose
% Values property is a ListArray.
import arrow.array.ListArray

rowOne = {["A" "B"], ["C" "D" "E"] missing};
rowTwo = missing;
rowThree = {"F" ["G" "H" "I"]};
C = {rowOne, rowTwo rowThree};
actual = ListArray.fromMATLAB(C);

stringValues = arrow.array(["A" "B" "C" "D" "E" "F" "G" "H" "I"]);
innerOffsets = arrow.array(int32([0 2 5 5 6 9]));
valuesList = ListArray.fromArrays(innerOffsets, stringValues, Valid=[1 2 4 5]);

outerOffsets = arrow.array(int32([0 3 3 5]));
expected = ListArray.fromArrays(outerOffsets, valuesList, Valid=[1 3]);

testCase.verifyEqual(actual, expected);
end

function OnlyEmptyElement(testCase)
% Create a ListArray containing only empty elements.
import arrow.array.ListArray

emptyDuration = duration.empty(0, 0);

C = {emptyDuration, emptyDuration, emptyDuration, emptyDuration};
actual = ListArray.fromMATLAB(C);

values = arrow.array(duration.empty);
offsets = arrow.array(int32([0 0 0 0 0]));
expected = ListArray.fromArrays(offsets, values);

testCase.verifyEqual(actual, expected);
end

function CellOfEmptyCell(testCase)
% Verify fromMATLAB creates a ListArray whose Values property
% is a StringArray when given a cell array containing just an
% empty cell array.
import arrow.array.ListArray

C = {{}};
actual = ListArray.fromMATLAB(C);

values = arrow.array(string.empty);
offsets = arrow.array(int32([0 0]));
expected = ListArray.fromArrays(offsets, values);

testCase.verifyEqual(actual, expected);
end

function CellOfMatrices(testCase)
% Verify fromMATLAB can handle cell arrays that contain
% matrices instead of just vectors - i.e. the matrices are
% reshaped as column vectors before they are concatenated
% together.
import arrow.array.ListArray

C = {[1 2 3; 4 5 6], [7 8; 9 10], 11};
actual = ListArray.fromMATLAB(C);

values = arrow.array([1 4 2 5 3 6 7 9 8 10 11]);
offsets = arrow.array(int32([0 6 10 11]));
expected = ListArray.fromArrays(offsets, values);

testCase.verifyEqual(actual, expected);
end

function ClassTypeMismatchError(testCase)
% Verify fromMATLAB throws an error whose identifier is
% "arrow:array:list:ClassTypeMismatch" if given a cell array
% containing arrays with different class types.
import arrow.array.ListArray

C = {1, [2 3 4], "A", 5};
fcn = @() ListArray.fromMATLAB(C);
testCase.verifyError(fcn, "arrow:array:list:ClassTypeMismatch");
end

function VariableNamesMismatchError(testCase)
% Verify fromMATLAB throws an error whose identifier is
% "arrow:array:list:VariableNamesMismatch" if given a cell
% array containing tables whose variable names don't match.
import arrow.array.ListArray

C = {table(1, "A"), table(2, "B", VariableNames=["X", "Y"])};
fcn = @() ListArray.fromMATLAB(C);
testCase.verifyError(fcn, "arrow:array:list:VariableNamesMismatch");
end

function ExpectedZonedDatetimeError(testCase)
% Verify fromMATLAB throws an error whose identifier is
% "arrow:array:list:ExpectedZonedDatetime" if given a cell
% array containing zoned and unzoned datetimes - in that order.

import arrow.array.ListArray

C = {datetime(2023, 11, 1, TimeZone="UTC"), datetime(2023, 11, 2)};
fcn = @() ListArray.fromMATLAB(C);
testCase.verifyError(fcn, "arrow:array:list:ExpectedZonedDatetime");
end

function ExpectedUnzonedDatetimeError(testCase)
% Verify fromMATLAB throws an error whose identifier is
% "arrow:array:list:ExpectedUnzonedDatetime" if given a cell
% array containing unzoned and zoned datetimes - in that order.

import arrow.array.ListArray

C = {datetime(2023, 11, 1), datetime(2023, 11, 2, TimeZone="UTC")};
fcn = @() ListArray.fromMATLAB(C);
testCase.verifyError(fcn, "arrow:array:list:ExpectedUnzonedDatetime");
end



end

end
5 changes: 3 additions & 2 deletions matlab/test/arrow/array/tArray.m
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@
{datetime(2022, 1, 1), "arrow.array.TimestampArray"}, ...
{seconds([1 2]), "arrow.array.Time64Array"}, ...
{["A" "B"], "arrow.array.StringArray"}, ...
{table(["A" "B"]'), "arrow.array.StructArray"}};
{table(["A" "B"]'), "arrow.array.StructArray"}, ...
{{[1, 2, 3], [4, 5]}, "arrow.array.ListArray"}};
end

methods(Test)
Expand All @@ -51,7 +52,7 @@ function UnsupportedMATLABTypeError(testCase)
% Verify arrow.array throws an error with the identifier
% "arrow:array:UnsupportedMATLABType" if the input array is not one
% we support converting into an Arrow array.
matlabArray = {table};
matlabArray = calmonths(12);
fcn = @() arrow.array(matlabArray);
errID = "arrow:array:UnsupportedMATLABType";
testCase.verifyError(fcn, errID);
Expand Down
2 changes: 1 addition & 1 deletion matlab/test/arrow/type/traits/tListTraits.m
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
ArrayConstructor = @arrow.array.ListArray
ArrayClassName = "arrow.array.ListArray"
ArrayProxyClassName = "arrow.array.proxy.ListArray"
ArrayStaticConstructor = missing
ArrayStaticConstructor = @arrow.array.ListArray.fromMATLAB
TypeConstructor = @arrow.type.ListType
TypeClassName = "arrow.type.ListType"
TypeProxyClassName = "arrow.type.proxy.ListType"
Expand Down

0 comments on commit cd6e635

Please sign in to comment.