Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(extensions/nanoarrow_testing): Add nanoarrow_testing extension with testing JSON writer #317

Merged
merged 26 commits into from
Nov 21, 2023

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Nov 15, 2023

This PR adds the first few bits of infrastructure needed to implement integration testing:

  • nanoarrow_testing.hpp testing utility header
  • CI to build/run the tests
  • Batch + Column JSON writer for easy types to get things going

The design of the testing helper library is intentionally header-only to facilitate dropping in to projects where needed (although I'm happy to change that if there are opinions otherwise).

Some obvious follow-ups not included yet:

  • Implement ArrowSchema -> JSON
  • Support decimal and interval types
  • Schema/Array equality checking
  • JSON -> ArrowSchema
  • JSON -> ArrowArray

@paleolimbot paleolimbot marked this pull request as ready for review November 16, 2023 20:35
Comment on lines +73 to +74
out << R"(, "VALIDITY": )";
WriteBitmap(out, value->buffer_views[0].data.as_uint8, value->length);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the spec requires that a [1, 1, 1...] VALIDITY be provided when the buffer[0] is null? Isn't it OK with omitting the entry altogether? I imagine that is required if we are to have a 1-to-1 mapping between values and JSON representation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point...the example files in apache/arrow-testing seem to all have [1, 1, 1, 1] although I agree that it might be nice to know if the bitmap was allocated or not. When nanoarrow gets plugged into the integration tests we'll find out in a hurry if that assumption is correct!

Comment on lines +186 to +189
out << values[0];
for (int64_t i = 1; i < n_values; i++) {
out << ", " << static_cast<int64_t>(values[i]);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work without for both branches of the if, right? static_cast<int64_t> of an int64_t becomes a no-op.

Comment on lines 219 to 226
// Strings
out << R"(")" << ArrowArrayViewGetIntUnsafe(value, 0) << R"(")";
for (int64_t i = 1; i < value->length; i++) {
out << R"(, ")" << ArrowArrayViewGetIntUnsafe(value, i) << R"(")";
}
break;
case NANOARROW_TYPE_UINT64:
// Strings

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misplaced // Strings comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clarify this...I was trying to highlight the "quotedness" of the values here.

out << ", " << ArrowArrayViewGetIntUnsafe(value, i);
}
break;
case NANOARROW_TYPE_INT64:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is the same as above, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's subtle, but it generates "1234567" rather than 1234567. I'll make sure the comments make that more clear!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right! Because of JavaScript's 53-bit limitation on integers.

return ArrowSchemaInitFromType(schema, NANOARROW_TYPE_NA);
},
[](ArrowArray* array) { return NANOARROW_OK; }, &TestingJSON::WriteColumn,
R"({"name": null, "count": 0})");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to take a look at the Go implementation examples and paste them here as test data in separate files. This way you don't have to think about interesting examples yourself and makes the test more data-driven.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea! I'll probably keep these tests (bare minimum to get 100% coverage over the code) and add in those tests as well (for full type coverage!).

Copy link
Member Author

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review!

Comment on lines +73 to +74
out << R"(, "VALIDITY": )";
WriteBitmap(out, value->buffer_views[0].data.as_uint8, value->length);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point...the example files in apache/arrow-testing seem to all have [1, 1, 1, 1] although I agree that it might be nice to know if the bitmap was allocated or not. When nanoarrow gets plugged into the integration tests we'll find out in a hurry if that assumption is correct!

out << ", " << ArrowArrayViewGetIntUnsafe(value, i);
}
break;
case NANOARROW_TYPE_INT64:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's subtle, but it generates "1234567" rather than 1234567. I'll make sure the comments make that more clear!

Comment on lines 219 to 226
// Strings
out << R"(")" << ArrowArrayViewGetIntUnsafe(value, 0) << R"(")";
for (int64_t i = 1; i < value->length; i++) {
out << R"(, ")" << ArrowArrayViewGetIntUnsafe(value, i) << R"(")";
}
break;
case NANOARROW_TYPE_UINT64:
// Strings
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clarify this...I was trying to highlight the "quotedness" of the values here.

return ArrowSchemaInitFromType(schema, NANOARROW_TYPE_NA);
},
[](ArrowArray* array) { return NANOARROW_OK; }, &TestingJSON::WriteColumn,
R"({"name": null, "count": 0})");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea! I'll probably keep these tests (bare minimum to get 100% coverage over the code) and add in those tests as well (for full type coverage!).

@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (032cdd9) 87.01% compared to head (649d8b1) 87.17%.
Report is 2 commits behind head on main.

Files Patch % Lines
src/nanoarrow/nanoarrow_testing.hpp 97.75% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #317      +/-   ##
==========================================
+ Coverage   87.01%   87.17%   +0.16%     
==========================================
  Files          70       72       +2     
  Lines       10574    10496      -78     
==========================================
- Hits         9201     9150      -51     
+ Misses       1373     1346      -27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@paleolimbot paleolimbot merged commit cb2aa71 into apache:main Nov 21, 2023
27 checks passed
@paleolimbot paleolimbot deleted the nanoarrow-testing-ext branch November 22, 2023 14:03
@paleolimbot paleolimbot added this to the nanoarrow 0.4.0 milestone Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants