-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40657] Add support for Java classes in Protobuf functions #38286
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few inline comments for reviewers.
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
Show resolved
Hide resolved
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
Show resolved
Hide resolved
...tobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala
Show resolved
Hide resolved
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
Show resolved
Hide resolved
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
Show resolved
Hide resolved
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
Show resolved
Hide resolved
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
Show resolved
Hide resolved
Can one of the admins verify this patch? |
...tobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala
Show resolved
Hide resolved
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
Outdated
Show resolved
Hide resolved
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala
Show resolved
Hide resolved
@SandishKumarHN, @mposdev21 PTAL. All the checks pass and I addressed all the comments. |
...tobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala
Show resolved
Hide resolved
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sbt tests are passing now.
@SandishKumarHN pyspark tests are failing could you take a look? I will look at other errors (though they seem unrelated) |
@rangadi since this feature is swapping the from_protobuf and to_protobuf params, descFilePath, messageName -> messageName, descFilePath. the same change has to be made in python/pyspark/sql/protobuf/functions.py |
@SandishKumarHN thanks. That makes sense. I will fix that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@gengliangwang, @SandishKumarHN some tests related timezone and timestamp type are failing in Run / Build modules: catalyst, hive-thriftserver They seem unrelated to this PR. Any idea what might be causing this? |
@rangadi I check the latest run and it fails the python check |
I saw this once yestearday. Tried retrigger the test and it passed. |
@rangadi test failures are not related to PR. why don't we rerun the failed test? and I see the below test failing consistently. tried merging with the master?
|
I think it merges with master before running the tests. |
I'm merging this since it only fails with Spark client python generation and @gengliangwang approved it. |
Thanks @rangadi for your contribution! I merged into master. |
@@ -24,6 +24,7 @@ import org.apache.spark.sql.{RandomDataGenerator, Row} | |||
import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow, NoopFilters, OrderedFilters, StructFilters} | |||
import org.apache.spark.sql.catalyst.expressions.{ExpressionEvalHelper, GenericInternalRow, Literal} | |||
import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, GenericArrayData, MapData} | |||
import org.apache.spark.sql.protobuf.protos.CatalystTypes.BytesMsg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk did you import it as maven or sbt project? Both should work though.
With maven, 'generating sources' should fix that (in maven 'Tool Window'). I am also seeing similar issue with it, IntelliJ does not seem to run the generate-test-sources
phase.
I tried just now and running ../../build/mvn test
in protobuf directory fixed this issue.
### What changes were proposed in this pull request? Adds support for compiled Java classes to Protobuf functions. This is tested with Protobuf v3 classes. V2 vs V3 issues will be handled in a separate PR. The main changes in this PR: - Changes to top level API: - Adds new version that takes just the class name. - Changes the order of arguments for existing API with descriptor files (`messageName` and `descFilePath` are swapped). - Protobuf utils methods to create descriptor from Java class name. - Many unit tests are update to check both versions : (1) with descriptor file and (2) with Java class name. - Maven build updates to generate Java classes to use in tests. - Miscellaneous changes: - Adds `proto` to package name in `proto` files used in tests. - A few TODO comments about improvements ### Why are the changes needed? Java compiled classes is a common method for users to provide Protobuf definitions. ### Does this PR introduce _any_ user-facing change? No. This updates interface, but for a new feature in active development. ### How was this patch tested? - Unit tests Closes apache#38286 from rangadi/protobuf-java. Authored-by: Raghu Angadi <raghu.angadi@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
### What changes were proposed in this pull request? Adds support for compiled Java classes to Protobuf functions. This is tested with Protobuf v3 classes. V2 vs V3 issues will be handled in a separate PR. The main changes in this PR: - Changes to top level API: - Adds new version that takes just the class name. - Changes the order of arguments for existing API with descriptor files (`messageName` and `descFilePath` are swapped). - Protobuf utils methods to create descriptor from Java class name. - Many unit tests are update to check both versions : (1) with descriptor file and (2) with Java class name. - Maven build updates to generate Java classes to use in tests. - Miscellaneous changes: - Adds `proto` to package name in `proto` files used in tests. - A few TODO comments about improvements ### Why are the changes needed? Java compiled classes is a common method for users to provide Protobuf definitions. ### Does this PR introduce _any_ user-facing change? No. This updates interface, but for a new feature in active development. ### How was this patch tested? - Unit tests Closes apache#38286 from rangadi/protobuf-java. Authored-by: Raghu Angadi <raghu.angadi@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
What changes were proposed in this pull request?
Adds support for compiled Java classes to Protobuf functions. This is tested with Protobuf v3 classes. V2 vs V3 issues will be handled in a separate PR. The main changes in this PR:
messageName
anddescFilePath
are swapped).proto
to package name inproto
files used in tests.Why are the changes needed?
Java compiled classes is a common method for users to provide Protobuf definitions.
Does this PR introduce any user-facing change?
No.
This updates interface, but for a new feature in active development.
How was this patch tested?