-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protobuf generate V2 and V3 protos and extend tests. #38324
Conversation
FYI: @mposdev21, @SandishKumarHN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good to me except for a few things I noted.
// TODO: Avoid carrying the file name. Read the contents of descriptor file only once | ||
// at the start. Rest of the runs should reuse the buffer. Otherwise, it could | ||
// cause inconsistencies if the file contents are changed the user after a few days. | ||
// Same for the write side in [[CatalystDataToProtobuf]]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand the comment completely. When buildDescriptor is called, the contents of the file is read and a descriptor is created. The first time messageDescriptor is evaluated, the file contents are read. Could you clarify ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added this in my Java class pr #38286.
The file contents are read on executors too when the ProtobufDataToCatalyst()
class is initialized there. Notice @transient private lazy val messageDescriptor
in the class.
*/ | ||
// protoc --java_out=connector/protobuf/src/test/resources/protobuf/ connector/protobuf/src/test/resources/protobuf/catalyst_types.proto | ||
// protoc --descriptor_set_out=connector/protobuf/src/test/resources/protobuf/catalyst_types.desc --java_out=connector/protobuf/src/test/resources/protobuf/org/apache/spark/sql/protobuf/ connector/protobuf/src/test/resources/protobuf/catalyst_types.proto | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want to keep these comments ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Also added pyspark_test.proto
to this directory.
required int32 id = 2; | ||
optional int32 api_quota = 3 [default = 100]; // Default 100 qps. | ||
optional string location = 4 [default = "Unknown"]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add Enum ? I tested with a default value which seemed to work. But then the following text in the spec:
"For enums, the default value is the first value listed in the enum's type definition."
If I interpret this to mean that if we don't specify a default value and the field is left uninitialized, it should pick the. first default value. That interpretation does not work because hasDefaultValue fails and hence the test would fail if you assert that it has the first value.
@@ -119,7 +116,7 @@ message SimpleMessageEnum { | |||
string key = 1; | |||
string value = 2; | |||
enum NestedEnum { | |||
ESTED_NOTHING = 0; | |||
ESTED_NOTHING = 0; // TODO: Fix the name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fix this in v2 and v3 ? Seems like a small change.
connector/protobuf/pom.xml
Outdated
</configuration> | ||
</execution> | ||
</executions> | ||
</plugin> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we still have the desc files under resources/protobuf ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now.
.addRfloatValue(10903.0f) | ||
.addRfloatValue(10902.0f) | ||
.addRnestedEnum(NestedEnum.ESTED_NOTHING) | ||
.addRnestedEnum(NestedEnum.NESTED_FIRST) | ||
.build() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here SimpleMessageRepeated is a v3 class which is used to build the protobuf message. It might make sense to add one test case to use v2.SimpleMessageRepeated (or anything else) to build the message and do the test.
val repeatedMessageDesc = ProtobufUtils.buildDescriptor(testFileDesc, "RepeatedMessage") | ||
val basicMessageDesc = ProtobufUtils.buildDescriptor(testFileDesc, "BasicMessage") | ||
val repeatedMessageDesc = ProtobufUtils.buildDescriptor(descPathV3, "RepeatedMessage") | ||
val basicMessageDesc = ProtobufUtils.buildDescriptor(descPathV3, "BasicMessage") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess. you are picking the descriptors (v2 or v3) for building the message randomly here. Is that correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
A newbie question: why we need support protobuf v2? |
Can one of the admins verify this patch? |
@LuciferYang many users still use v2 protobufs. It is not always easy to migrate the protobufs to v3 or even necessary. Note that v2 protobufs are still genrated with |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
This is a temporary PR on, made on top of Java-support PR #37738.
Outline: