Skip to content

Commit

Permalink
feat: Add dictionary support in integration test utility (#342)
Browse files Browse the repository at this point in the history
This PR implements dictionary support in the integration test utility
and fixes a few problems identified with integration testing to ensure
that it actually works end-to-end (via
apache/arrow#39302 ). The changes are:

- Batches that contain dictionaries can now be read, written, and
validated using integration testing JSON
- Fixed an issue in the integration test library (anything other than
the first batch previously segfaulted)
- Improved const correctness of nanoarrow.hpp (because dictionaries
required a `std::unordered_map<>` with a `UniqueSchema` and a few const
overloads were missing)
- Fixed the nullability of the top-level batch to match Arrow C++ output
- Fixed the null count of exported arrays (previously they were all
exported as having zero nulls)

It can now be tested with `archery` (after checking out
apache/arrow#39302 ):

```
export ARROW_CPP_EXE_PATH=/Users/deweydunnington/.r-arrow-dev-build/build/debug
export ARROW_NANOARROW_PATH=/path/to/arrow-nanoarrow/build
archery integration --with-cpp=true --with-nanoarrow=true --run-c-data
``` 

The current failures are limited to the remaining unimplemented types
(datetime types and decimal).

And for future me or anybody who has to/wants to launch a debugger with
a segfaulting integration test in VSCode, it can be done with this
launch.json:

```
{
    "type": "lldb",
    "request": "launch",
    "name": "Debug Integration Tests",
    "program": "${workspaceFolder}/.venv/bin/python",
    "args": ["-m", "archery.cli", "integration", "--with-cpp=true",  "--with-nanoarrow=true", "--run-c-data"],
    "cwd": "${workspaceFolder}",
    "env": {
        "ARROW_CPP_EXE_PATH": "/Users/deweydunnington/.r-arrow-dev-build/build/debug",
        "ARROW_NANOARROW_PATH": "${workspaceFolder}/out/build/user-local"
    }
}
```
  • Loading branch information
paleolimbot authored Jan 9, 2024
1 parent c1314f2 commit 6523f51
Show file tree
Hide file tree
Showing 7 changed files with 573 additions and 60 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -1123,6 +1123,9 @@ ArrowErrorCode ArrowIpcDecoderDecodeSchema(struct ArrowIpcDecoder* decoder,
return result;
}

// Top-level batch schema is typically non-nullable
tmp.flags = 0;

result = ArrowIpcDecoderSetChildren(&tmp, fields, error);
if (result != NANOARROW_OK) {
ArrowSchemaRelease(&tmp);
Expand Down
4 changes: 2 additions & 2 deletions src/nanoarrow/array.c
Original file line number Diff line number Diff line change
Expand Up @@ -1164,8 +1164,8 @@ static int ArrowArrayViewValidateFull(struct ArrowArrayView* array_view,

// Dictionary valiation not implemented
if (array_view->dictionary != NULL) {
ArrowErrorSet(error, "Validation for dictionary-encoded arrays is not implemented");
return ENOTSUP;
NANOARROW_RETURN_NOT_OK(ArrowArrayViewValidateFull(array_view->dictionary, error));
// TODO: validate the indices
}

return NANOARROW_OK;
Expand Down
3 changes: 1 addition & 2 deletions src/nanoarrow/array_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2121,9 +2121,8 @@ TEST(ArrayTest, ArrayViewTestDictionary) {
EXPECT_EQ(array_view.buffer_views[1].size_bytes, 2 * sizeof(int32_t));
EXPECT_EQ(array_view.dictionary->buffer_views[2].size_bytes, 6);

// Full validation not yet supported for dictionary
EXPECT_EQ(ArrowArrayViewValidate(&array_view, NANOARROW_VALIDATION_LEVEL_FULL, nullptr),
ENOTSUP);
NANOARROW_OK);

EXPECT_EQ(ArrowArrayViewGetIntUnsafe(&array_view, 0), 0);
EXPECT_EQ(ArrowArrayViewGetIntUnsafe(&array_view, 1), 1);
Expand Down
4 changes: 2 additions & 2 deletions src/nanoarrow/integration/c_data_integration.cc
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ static ArrowErrorCode ExportBatchFromJson(const char* json_path, int num_batch,
MaterializedArrayStream data;
NANOARROW_RETURN_NOT_OK(MaterializeJsonFilePath(json_path, &data, num_batch, error));

ArrowArrayMove(data.arrays[num_batch].get(), out);
ArrowArrayMove(data.arrays[0].get(), out);
return NANOARROW_OK;
}

Expand All @@ -173,7 +173,7 @@ static ArrowErrorCode ImportBatchAndCompareToJson(const char* json_path, int num
nanoarrow::testing::TestingJSONComparison comparison;
NANOARROW_RETURN_NOT_OK(comparison.SetSchema(data.schema.get(), error));
NANOARROW_RETURN_NOT_OK(
comparison.CompareBatch(actual.get(), data.arrays[num_batch].get(), error));
comparison.CompareBatch(actual.get(), data.arrays[0].get(), error));
if (comparison.num_differences() > 0) {
std::stringstream ss;
comparison.WriteDifferences(ss);
Expand Down
10 changes: 8 additions & 2 deletions src/nanoarrow/nanoarrow.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -206,15 +206,21 @@ class Unique {

/// \brief Move and take ownership of data wrapped by rhs
Unique(Unique&& rhs) : Unique(rhs.get()) {}
Unique& operator=(Unique&& rhs) {
reset(rhs.get());
return *this;
}

// These objects are not copyable
Unique(Unique& rhs) = delete;
Unique(const Unique& rhs) = delete;

/// \brief Get a pointer to the data owned by this object
T* get() noexcept { return &data_; }
const T* get() const noexcept { return &data_; }

/// \brief Use the pointer operator to access fields of this object
T* operator->() { return &data_; }
T* operator->() noexcept { return &data_; }
const T* operator->() const noexcept { return &data_; }

/// \brief Call data's release callback if valid
void reset() { release_pointer(&data_); }
Expand Down
Loading

0 comments on commit 6523f51

Please sign in to comment.