Cosmos Spark End to End Integration Test against Cosmos Emulator runs in CI #17952

moderakh · 2020-12-03T08:31:32Z

Now we have gated CI for end to end test spark <-> cosmos db emulator.

Cosmos Spark End to End Test against Cosmos Emulator runs in CI.

any test tagged by newly introduced tag, RequiresCosmosEndpoint will get included in sparkE2E test group and executed by the newly added CI, Spark_Integration_Tests_Java8 against cosmos db emulator.
see SparkE2EWriteSpec.scala for end to end spark sample test. It writes data from spark to cosmos db emulator. Emulator CI

We have two spark test groups running in the CI:

unit: (only unit tests) this is the default test group.
how to run locally in your dev machine: mvn -e -Dgpg.skip -Dmaven.javadoc.skip=true -Dspotbugs.skip=true -Dcheckstyle.skip=true -Drevapi.skip=true -pl ,azure-cosmos-spark_3-0_2-12 -am clean test
sparkE2E: requires cosmos db endpoint (integration tests runs against cosmos emulator)
how to run locally in your dev machine: mvn -e -Dgpg.skip -Dmaven.javadoc.skip=true -Dspotbugs.skip=true -Dcheckstyle.skip=true -Drevapi.skip=true -pl ,azure-cosmos-spark_3-0_2-12 -am -PsparkE2E clean test

TODO:

we need a CI for java11.
Cosmos Emulator requires windows, hence the current CI is running on windows, we should add a CI on Linux (targeting prod account)
some patterns in the integration tests need to be figured out.
-- proper resource (Database, Container) cleaning
-- proper shutdown of CosmosClient and Spark session
-- possible sharing of the CosmosClient and spark session between tests
-- which scala test style should be used?

FabianMeiswinkel · 2020-12-08T01:38:53Z

...os/azure-cosmos-spark_3-0_2-12/src/main/scala/com/azure/cosmos/spark/CosmosScanBuilder.scala

  override def pruneColumns(requiredSchema: StructType): Unit = {
-    // TODO moderakh add projection to the query
+    // TODO moderakh: we need to decide whether do a push down or not on the projection


Good point - I think it might be useful to see whether we can make that decision based on "avg." document size? Like < 1 KB don't push down pruning - but for larger documents do it?

Not blocking of course...

Thanks for the suggestion. good idea. I will look into this.

FabianMeiswinkel

Thanks!

This PR adds support for spark3 DataSourceV2 Catalog API: NOTE: this PR is the same as this PR (moderakh#15) targeting Azure repo. The original PR is already reviewed and signed off by reviewers. ```scala spark.conf.set(s"spark.sql.catalog.cosmoscatalog", "com.azure.cosmos.spark.CosmosCatalog") spark.conf.set(s"spark.sql.catalog.cosmoscatalog.spark.cosmos.accountEndpoint", cosmosEndpoint) spark.conf.set(s"spark.sql.catalog.cosmoscatalog.spark.cosmos.accountKey", cosmosMasterKey) spark.sql(s"CREATE DATABASE cosmoscatalog.mydb;") spark.sql(s"CREATE TABLE cosmoscatalog.mydb.myContainer (word STRING, number INT) using cosmos.items TBLPROPERTIES(partitionKeyPath = '/mypk', manualThroughput = '1100')") ``` Please see `CosmosCatalogSpec` for end to end integration tests. The integration testings will work once this earlier PR merges: #17952 getting merged. TODO: - There are some TODO in the code, (e.g., add support for table alter) - the integration tests resource management needs to be figured out. - This PR adds support for catalog metadata operation, we should also validate data operation through catalog api.

Change SubscriptionIdParameter to client instead of method (Azure#17952)

moderakh added 3 commits December 1, 2020 11:06

more code comments

74c8cf3

Merge branch 'feature/cosmos/spark30' into users/moderakh/20201101-spark

23abec5

spark end to end test

ad763b6

ghost added the Cosmos label Dec 3, 2020

moderakh added 3 commits December 3, 2020 00:34

added missing files

382619d

undid an unrelated change

ffbcd34

cleanup

ce5f0dd

moderakh changed the title ~~(DON'T REVIEW YET) Cosmos Spark End to End Test against Cosmos Emulator runs in CI~~ (DON'T REVIEW YET) Cosmos Spark End to End Integration Test against Cosmos Emulator runs in CI Dec 3, 2020

removed intentionally failing test

5ea9611

moderakh changed the title ~~(DON'T REVIEW YET) Cosmos Spark End to End Integration Test against Cosmos Emulator runs in CI~~ Cosmos Spark End to End Integration Test against Cosmos Emulator runs in CI Dec 3, 2020

updated comment

48830b5

moderakh marked this pull request as ready for review December 3, 2020 16:37

moderakh requested review from danieljurek, FabianMeiswinkel, JimSuplizio, kirankumarkolli, kushagraThapar, mbhaskar, milismsft, mitchdenny, simplynaveen20, weshaggard and xinlian12 as code owners December 3, 2020 16:37

moderakh mentioned this pull request Dec 7, 2020

cosmos spark3 DataSourceV2 catalog api implementation moderakh/azure-sdk-for-java#15

Closed

moderakh added the cosmos:spark3 Cosmos DB Spark3 OLTP Connector label Dec 8, 2020

FabianMeiswinkel reviewed Dec 8, 2020

View reviewed changes

FabianMeiswinkel approved these changes Dec 8, 2020

View reviewed changes

moderakh merged commit d7e9797 into Azure:feature/cosmos/spark30 Dec 8, 2020

moderakh mentioned this pull request Dec 8, 2020

Cosmos spark3 DataSourceV2 catalog api implementation #18011

Merged

moderakh linked an issue Dec 18, 2020 that may be closed by this pull request

Cosmos Spark End to End Integration Test against Cosmos Emulator runs in CI #18243

Closed

moderakh mentioned this pull request Dec 18, 2020

Cosmos Spark End to End Integration Test against Cosmos Emulator runs in CI #18243

Closed

moderakh deleted the users/moderakh/20201101-spark branch February 8, 2021 23:37

openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-java that referenced this pull request Feb 23, 2022

CodeGen from PR 17952 in Azure/azure-rest-api-specs

5a69e38

Change SubscriptionIdParameter to client instead of method (Azure#17952)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cosmos Spark End to End Integration Test against Cosmos Emulator runs in CI #17952

Cosmos Spark End to End Integration Test against Cosmos Emulator runs in CI #17952

moderakh commented Dec 3, 2020 •

edited

Loading

FabianMeiswinkel Dec 8, 2020

FabianMeiswinkel Dec 8, 2020

moderakh Dec 8, 2020 •

edited

Loading

FabianMeiswinkel left a comment

Cosmos Spark End to End Integration Test against Cosmos Emulator runs in CI #17952

Cosmos Spark End to End Integration Test against Cosmos Emulator runs in CI #17952

Conversation

moderakh commented Dec 3, 2020 • edited Loading

FabianMeiswinkel Dec 8, 2020

Choose a reason for hiding this comment

FabianMeiswinkel Dec 8, 2020

Choose a reason for hiding this comment

moderakh Dec 8, 2020 • edited Loading

Choose a reason for hiding this comment

FabianMeiswinkel left a comment

Choose a reason for hiding this comment

moderakh commented Dec 3, 2020 •

edited

Loading

moderakh Dec 8, 2020 •

edited

Loading