Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protobuf generate V2 and V3 protos and extend tests. #38324

Closed
wants to merge 3 commits into from

Conversation

rangadi
Copy link
Contributor

@rangadi rangadi commented Oct 20, 2022

This is a temporary PR on, made on top of Java-support PR #37738.

Outline:

  • Generates descriptor files and java classes for both V2 and V3
  • Updates the unit tests to check all variations (V2 & V3 x Java, decriptor file).
  • Adds a test with default values and required fields for V2.
  • Maven build is updated. But not clear how do the same in SBT.

@rangadi
Copy link
Contributor Author

rangadi commented Oct 20, 2022

FYI: @mposdev21, @SandishKumarHN

@rangadi rangadi marked this pull request as draft October 20, 2022 18:36
Copy link
Contributor

@mposdev21 mposdev21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me except for a few things I noted.

// TODO: Avoid carrying the file name. Read the contents of descriptor file only once
// at the start. Rest of the runs should reuse the buffer. Otherwise, it could
// cause inconsistencies if the file contents are changed the user after a few days.
// Same for the write side in [[CatalystDataToProtobuf]].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the comment completely. When buildDescriptor is called, the contents of the file is read and a descriptor is created. The first time messageDescriptor is evaluated, the file contents are read. Could you clarify ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added this in my Java class pr #38286.
The file contents are read on executors too when the ProtobufDataToCatalyst() class is initialized there. Notice @transient private lazy val messageDescriptor in the class.

*/
// protoc --java_out=connector/protobuf/src/test/resources/protobuf/ connector/protobuf/src/test/resources/protobuf/catalyst_types.proto
// protoc --descriptor_set_out=connector/protobuf/src/test/resources/protobuf/catalyst_types.desc --java_out=connector/protobuf/src/test/resources/protobuf/org/apache/spark/sql/protobuf/ connector/protobuf/src/test/resources/protobuf/catalyst_types.proto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still want to keep these comments ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Also added pyspark_test.proto to this directory.

required int32 id = 2;
optional int32 api_quota = 3 [default = 100]; // Default 100 qps.
optional string location = 4 [default = "Unknown"];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add Enum ? I tested with a default value which seemed to work. But then the following text in the spec:

"For enums, the default value is the first value listed in the enum's type definition."

If I interpret this to mean that if we don't specify a default value and the field is left uninitialized, it should pick the. first default value. That interpretation does not work because hasDefaultValue fails and hence the test would fail if you assert that it has the first value.

@@ -119,7 +116,7 @@ message SimpleMessageEnum {
string key = 1;
string value = 2;
enum NestedEnum {
ESTED_NOTHING = 0;
ESTED_NOTHING = 0; // TODO: Fix the name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fix this in v2 and v3 ? Seems like a small change.

</configuration>
</execution>
</executions>
</plugin>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we still have the desc files under resources/protobuf ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now.

.addRfloatValue(10903.0f)
.addRfloatValue(10902.0f)
.addRnestedEnum(NestedEnum.ESTED_NOTHING)
.addRnestedEnum(NestedEnum.NESTED_FIRST)
.build()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here SimpleMessageRepeated is a v3 class which is used to build the protobuf message. It might make sense to add one test case to use v2.SimpleMessageRepeated (or anything else) to build the message and do the test.

val repeatedMessageDesc = ProtobufUtils.buildDescriptor(testFileDesc, "RepeatedMessage")
val basicMessageDesc = ProtobufUtils.buildDescriptor(testFileDesc, "BasicMessage")
val repeatedMessageDesc = ProtobufUtils.buildDescriptor(descPathV3, "RepeatedMessage")
val basicMessageDesc = ProtobufUtils.buildDescriptor(descPathV3, "BasicMessage")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess. you are picking the descriptors (v2 or v3) for building the message randomly here. Is that correct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@LuciferYang
Copy link
Contributor

A newbie question: why we need support protobuf v2?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@rangadi
Copy link
Contributor Author

rangadi commented Oct 25, 2022

A newbie question: why we need support protobuf v2?

@LuciferYang many users still use v2 protobufs. It is not always easy to migrate the protobufs to v3 or even necessary. Note that v2 protobufs are still genrated with protoc 3.x.x. The v2 protos used in tests are built with 3.2.1. The syntax version (v2 or v3) is different from version used to parse and generate Java classes or descriptor sets.

@rangadi rangadi changed the title [Temp] Protobuf generate V2 and V3 protos and extend tests. Protobuf generate V2 and V3 protos and extend tests. Nov 3, 2022
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Feb 12, 2023
@github-actions github-actions bot closed this Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants