-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merging feature/cosmos/spark30 branch to master #20086
merging feature/cosmos/spark30 branch to master #20086
Conversation
* Initial skeleton * Added versioning * Fixes for scala stylecheck * Add tags for Spark/Scala dependencies * Updating version tags * Fixing versioning tags * Changing Readme to have expected title pattern * Trying to change Java version to 11 * Fixing pom.xml to express to DevOps pipeline that only Java 11 is supported * Updating dependency set-up after discussion with Jonathan Giles * Adding comment for external dependency * Fixing cosmos tag in dependencies * Fixing maven scala plugin version tag * Testing changing the version tag prefix to cosmos.spark * Switching back to cosmos_ version tag prefix
* Cosmos Spark v3 Datasource v2 write
* cosmos spark config infra
* Added support for key export. (Azure#17183) * Added support for exporting keys from an Azure Key Vault. * Removed ExportKeyOptions. * Fixed build error. * Added samples. * Fixed test issues. * Fixed samples issues. * Fixed checkstyle issues. * Fixed spotbugs issues. * Applied PR feedback: renamed KeyReleasePolicy to ReleasePolicy and removed it from KeyVaultKey. * Fixed spotbugs issues. * Added unit tests. * Renamed ReleasePolicy to KeyReleasePolicy. Added tests for creating an RSA key with publicExponent. * Added date for internal avro CHANGELOG (Azure#17483) * Sync eng/common directory with azure-sdk-tools for PR 1188 (Azure#17488) * Add debug flag to arm deployment command * Only set debug preference when $CI is true Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> * Remove invalid characters in basename sourced from username (Azure#17489) Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> * FixOfConnectionStateListenerTest (Azure#17481) * fix flakyness of connectionStateListener test Co-authored-by: Annie Liang <xinlian@microsoft.com> * Applied arch board feedback for Key Vault Administration (Azure#17284) * Removed exposure of implementation package and any usage of KeyVaultErrorException from public APIs. * Renamed KeyVaultRoleAssignmentScope to KeyVaultRoleScope. Changed the name type from UUID to String in role assignment APIs. * Renamed APIs for re-hydrating LROs. * Added ServiceVersion support in the clients and their builders. Internally this will not be used until some changes in the code generation tool are applied. * Annotated read-only classes with @immutable. Added the "allowed" prefix to some KeyVaultPermission properties. Change the type of `startTime` and `endTime` in KeyVaultLongRunningOperation from Long to OffsetDateTime. * Changed the KeyVaultRoleScope enum from using URI to URL and added an overload that takes a the string representation of a URL. * Added overloads that allow passing a custom polling interval to LROs. * Removed the use of KeyVaultRoleAssignmentProperties in clients' public APIs in favor of using the `roleDefinitionId` and `servicePrincipalId` values directly. * Fixed Javadoc and test issues. * Fixed checkstyle issues. * Applied arch board meeting and PR feedback: * Renamed parameters containing the 'Uri' suffix to 'Url'. * Changed the type of `startTime` and `endTime` in the constructor of KeyVaultLongRunningOperation and its subtypes from `Long` to `OffsetDateTime`. * Removed unnecessary versions from KeyVaultAdministrationServiceVersion. Additional changes: * Renamed `scope` in KeyVaultRoleAssignment to `roleScope` to align with the access client APIs. * Polished Javadoc * Removed APIs to refresh LROs based on PR feedback. * Removed unused import in KeyVaultBackupAsyncClientTest. * Increment package version after release of com.azure azure-ai-metricsadvisor (Azure#17456) * Sync eng/common directory with azure-sdk-tools for PR 1170 (Azure#17276) * Added the preprocess scripts. * string array to string Co-authored-by: Sima Zhu <sizhu@microsoft.com> * suppress the runtime exception in the KeyVaultClient class (Azure#17401) Co-authored-by: v-gaoh <v-gaoh@Microsoft.com> * End to End TLS SSL step #8 - Add support for PEM based certificates (Azure#17019) * End to End TLS SSL - step #9 - add Azure AD authentication URL (Azure#17074) * Added Azure AD authentication URL * Increment version for storage releases (Azure#17485) * Add Invoke-DevOpsAPI.ps1, Add functions for Canceling and Listing Builds (Azure#17178) Co-authored-by: Chidozie Ononiwu <chononiw@microsoft.com> * Change live test resource DeleteAfterHours tag to 8 hours (Azure#17537) Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> * add clientOptions for EventHubClientBuilder (Azure#17519) - add clientOptions for EventHubClientBuilder * Increment Form Recognizer version post patch release 3.0.3 (Azure#17540) * Prepare tables for October release (Azure#17541) * store authorzied clients into http session (Azure#17528) * Increment package version after release of com.azure azure-data-tables (Azure#17545) * [TA] Merge Novermber patch to master branch (Azure#17544) * cherry-pick 16c8d5d and dfdc8c6 * Added collapsible GA and Preview in each artifact. (Azure#17041) * Added support for encryption algorithms for symmetric keys (Azure#17209) * Added support for encryption AES encryption algorithms. * Added CryptographyOptions and ensured the initialization vector is populated before attempting to perform any local cryptography operations on symmetric keys. * Added APIs that accept CryptographyOptions to CryptographyClient. * Fixed Javadoc issues. * Fixed checkstyle issues. Added samples. * Added checkstyle exceptions. * Fixed test and spotbugs issues. * Applied PR feedback and added local tests. * Made the EncryptOptions and DecryptOptions constructor package-private, as well as their children's, and made them have factory methods for creating the former to help with discoverability. * Fixed build issues. * Changed EncryptOptions and DecryptOptions to use a factory model. * Added iv, additionalAuthenticatedDate and authenticationTag to EncryptResult. * Made `plainText` and `cipherText` all lowercase. * Sync eng/common directory with azure-sdk-tools for PR 1202 (Azure#17547) * Add debugging link on resource deployment failures to log output * Update aka link for live test help docs. Use here string and empty throw. Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> * Added small changes that missed PR Azure#17209 (Azure#17552) * Added support for encryption AES encryption algorithms. * Added CryptographyOptions and ensured the initialization vector is populated before attempting to perform any local cryptography operations on symmetric keys. * Added APIs that accept CryptographyOptions to CryptographyClient. * Fixed Javadoc issues. * Fixed checkstyle issues. Added samples. * Added checkstyle exceptions. * Fixed test and spotbugs issues. * Applied PR feedback and added local tests. * Made the EncryptOptions and DecryptOptions constructor package-private, as well as their children's, and made them have factory methods for creating the former to help with discoverability. * Fixed build issues. * Changed EncryptOptions and DecryptOptions to use a factory model. * Added iv, additionalAuthenticatedDate and authenticationTag to EncryptResult. * Made `plainText` and `cipherText` all lowercase. * Reverted capitalization change. * Added null check for `iv` in local decryption. * Key Vault Beta release CHANGELOG and README updates - November 2020 (Azure#17553) * Updated CHANGELOGs for Beta releases of Key Vault Keys and Key Vault Administration. * Updated READMEs. * Updated the KV Administration CHANGELOG to abide by the guidelines. * [AppConfig] App config apply released v1.1.7 patch to master (Azure#17548) * [AppConfig] Prepare for v1.1.7 patch release (Azure#17534) * upgrade release version to 1.1.7 * Change the method of obtaining tokens from implicit flow to pkce (Azure#17530) * Upgrade msal.js to a higher version to use PKCE. * [Communication] -Administration- Renaming the model from PhoneNumberSearch to PhoneNumberReservation (Azure#17253) * Renaming from PhoneNumberSearch to PhoneNumberReservation * Renaming from PhoneNumberSearch to PhoneNumberReservation * Renaming from PhoneNumberSearch to PhoneNumberReservation * upadating readme samples * Renaming the model CreateSearchReponse and CreateSearchOptions * Fixing tests * Fixing tests * fixing typo un reservations * Add AppConfig and Event Hubs samples for using Monitor exporter (Azure#17565) * Add AppConfig and Event Hubs samples for using exporters * Fix compiler warnings * Update sdk/monitor/microsoft-opentelemetry-exporter-azuremonitor/pom.xml * Update method names * Update amqp version post-1.7.0-beta.2 release (Azure#17577) * Sync eng/common directory with azure-sdk-tools for PR 1153 (Azure#17578) * Improve Update-ChangeLog Logic * Updates to ChangeLog-Operations.ps1, copy-docs-to-blobstorage.ps1, Invoke-GitHubAPI.ps1 and Package-Properties.ps1 * More changeLog Logic Improvements * Update date parsing Co-authored-by: Chidozie Ononiwu <chononiw@microsoft.com> * [Communication] -Administration- changing some createReservation from public to private (Azure#17576) * changing some public methods * updating Reade file * Fixing Readme * November release changelog (Azure#17571) * [Service Bus] Remove viaPartitionKey (Azure#17501) * [Service Bus] Remove viaPartitionKey * Remove unused imports * Enable the del/rename files link check. (Azure#17574) * update readme for sample and fix some question (Azure#17587) * Add user name information when modifying todolist * Modify the configuration information in the readme * mgmt, improve readme in packages (Azure#17251) * add sample name with "resource-server" (Azure#17589) * change artifactId and module * Update LanguageSetting.ps1 (Azure#17583) * Use BuildID to fix race ondition (Azure#17459) * Sync eng/common directory with azure-sdk-tools for PR 1210 (Azure#17579) * Cache created service principal for iteration Useful when testing changes over and over again without passing your own -TestApplicationId and -TestApplicationSecret. * Restore initial AzContext for New-TestResources * Make sure PSBoundParameters is correct Fixes Azure#1177 Co-authored-by: Heath Stewart <heaths@microsoft.com> * [Communication] - Administration - Disable Jacoco Coverage Check (Azure#17596) * Disable Adminiministrationg SDK coverage check * Adjusting min coverage checks * Adjusting min coverage checks * Adjusting min coverage checks Co-authored-by: Minnie Liu <peiliu@microsoft.com> * Fixed large file live test to add policy per call instead of per retry (Azure#17593) Co-authored-by: gapra <gapra@microsoft.com> * fixes Azure#17567 (Azure#17588) Add new configuration item: azure.activedirectory.jwk-set-cache-refresh-time * [Communication] - SMS - Enabling SMS Live tests using Static Resources (Azure#17599) * Enabling SMS Live tests * Clean up * Adding more variables * Fix SMS live test * Clean up unneeded module from SMS pom Co-authored-by: Minnie Liu <peiliu@microsoft.com> * Increment package version after release of com.azure azure-security-keyvault-administration (Azure#17600) * User Oauth2 WebClient instead of msal to get groups from graph. (Azure#17529) * User Oauth2 WebClient instead of msal to get groups from graph. * Increment key vault stable versions november 2020 (Azure#17609) * Updated the latest KV stable versions * Corrected Key Vault Keys README version to the latest beta released. * Mgmt: generate attestation.v2020_10_01 (Azure#17611) * add attestation/resource-manager * generate attestation/resource-manager 2020 10 * add ci and pom * remove unused configuration item (Azure#17618) * remove unused properties * Fixed resource address in CosmosException. (Azure#17279) * Fixed resource address in CosmosException. Added new API to expose regions contacted on CosmosDiagnostics * Fixed resource address in GATEWAY mode to have full physical address * Setting physical resource address in tests * [service bus] Use ServiceBusException rather than AmqpException and rename ReceiveMode to ServiceBusReceiveMode (Azure#17601) ServiceBusException is basically a friendly envelope around an AmqpException. It's primary purpose is to give the user something simple they can try/catch that has a 'reason' code so they can programatically react to certain kinds of failures. Also, renaming ReceiveMode to ServiceBusReceiveMode. Fixes Azure#17500 (exception type), Azure#17555 (receive mode) * update CHANGELOG (Azure#17620) - update Release history * update CHANGELOG (Azure#17634) - update Release history * Mgmt: GA all resourcemanager packages (Azure#17619) * Revert "Mgmt: GA azure, remove non-GA packages (Azure#16499)" This reverts commit 2756f50. * fix spring cloud * fix compile error * update version * fix spotbugs * session record * update readme version * update Release history (Azure#17646) - update release history * Increment package version after release of com.azure azure-security-keyvault-jca (Azure#17644) * Adding basic FeedRanges API (Azure#17570) * Initial draft of FeedRange artifacts * Iterating on FeedRange Apis * Adding public surface area * Adding FeedRange unit tests * Adding test FeedRangePKRangeId_GetEffectiveRangesAsync_Refresh * Adding test FeedRangePKRangeId_GetEffectiveRangesAsync_Null * Adding test feedRangeEPK_getPartitionKeyRangesAsync * Adding test feedRangePK_getPartitionKeyRangesAsync * Adding test feedRangePKRangeId_getPartitionKeyRangesAsync * Adding request visitor unit tests * Finishing FeedRange tests * Cleanup and prettifying * Prettifying feed range tests * Fixes and new test for Conatiner.getFeedRanges() * Addressing some SpotBug violations * Reacting to code review feedback * Update sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/feedranges/FeedRangeInternal.java Co-authored-by: Mohammad Derakhshani <moderakh@users.noreply.github.com> Co-authored-by: Mohammad Derakhshani <moderakh@users.noreply.github.com> * [BlobStorage] Reuse the http client configured in the provided HttpPipeline during BlobBatch construction. (Azure#17627) * Reuse the http client configured in the provided HttpPipeline during BlobBatch construction. * Include change log notes. * Added ability to specify timeout unit in RequestRetryOptions (Azure#17628) * Add Update-java-CIConfig (Azure#17631) * Add Update-java-CIConfig * Switch from BuildNumber to BuildID for test release versions * Update EH versions to beta 2 (Azure#17654) `5.4.0-beta.1` and `1.4.0-beta.1` versions were released for EH and checkpointstore earlier this week from a release branch. So, the auto version increment PR was not created against `master` branch. This PR is to update the versions in `master` branch. * Increment version for communication releases (Azure#17608) * Increment package version after release of com.azure azure-communication-sms * Increment package version after release of com.azure azure-communication-administration * Increment package version after release of com.azure azure-communication-common * Increment package version after release of com.azure azure-communication-chat * [service bus] Update ServiceBusProcessor sample to demo how to write a long-running processor. (Azure#17633) As part of the work to add in a ServiceBusErrorContext we also want to showcase how users can tease out the various errors that are reported. This PR updates the current processor sample to angle more towards how to keep a ServiceBusProcessor running long-term, including handling certain errors that _might_ be fatal (it's always up to the user to choose to terminate the processor). Fix for Azure#17490 * [service bus] Migration guide changes for errors, api changes, etc... (Azure#17656) Fixing some small stuff in the migration guide: - processor error handler has a different signature, need to use the session builder, etc.. - some syntax errors (variable called client, usage uses 'sender') - inconsistencies in some samples if they're pasted as is * Replace the invalid open source link with working one. (Azure#17602) * Fixed bug where query params were being parsed incorrectly if an encoded comma was the query value (Azure#17655) * [service bus] Updating connection strings in migration guide to be consistent with readme (Azure#17662) Updating to the same string constant we use elsewhere in the SDK for samples. Completes Azure#17656 * Sync eng/common directory with azure-sdk-tools for PR 1203 (Azure#17674) * Refactoring artifact-metadata-parsing.ps1, update-docs-metadata.ps1, and create-tags-and-git-release.ps1 * Clean up common imports * Refactor Update-docs-ci.ps1 Co-authored-by: Chidozie Ononiwu <chononiw@microsoft.com> * Keep aad legacy code (Azure#17664) * checkout jialin's commit and make build pass. * Change new property prefix from 'azure.activedirectory' to 'azure.active.directory'. * Add resource searching sample description (Azure#17615) * Add resource searching sample description * Add From Source Test Run to Live Tests (Azure#17584) * Add From Source test run to live tests * Add From Source to matrix * Amqp Message Update - API Change (Azure#17464) 1. New Type AmqpMessageId ( Based on discussion from Clemens, dotnet had) 2. New Type AmqpAddress ( Based on discussion from Clemens, dotnet had) 3. Return type change in AmqpMessageProperties for above two new types. 3. Following changes are to keep consistency with dotnet and common prefix -> AmqpMessage A. Renamed AmqpDataBody to AmqpMessageBody (to be same name as in dotnet) B. Renamed AmqpBodyType to AmqpMessageBodyType (Same name in dotnet) 4. Added AmqpMessageBody .getFirstData () Based on team review 5. Returning IterableStream in AmqpMessageBody .getData () Based on team review 6. Removed Copy constructor AmqpAnnotatedMessage (matching with dotnet) and because it is servicebus specific . So moving this logic into servicebus * [Communication] - Chat - Replaced ChatUserCredentialPolicy with BearerTokenAuthenticationPolicy (Azure#17452) * Replaced ChatUserCredentialPolicy with BearerTokenAuthenticationPolicy * Removed ChatUserCredentialPolicy * Added tests for CommunicationTokenCredential * [TA] Healthcare recognition and Analyze LRO (Azure#17687) * [TA] CodeGen based on 3.1-preview.3 (Azure#17182) - only codegen and fixes after codegen, there is no new implementation added. * [TA] Healthcare Analyze feature (Azure#17234) * Added support for Healthcare Analyze and Cancellation endpoints * [TA] Regenerate the swagger v3.1-Preview.3 with latest autorest version (Azure#17358) * regenerate with latest autorest and swagger, use 4.0.4 autorest to codegen instead of v4.0.2 * [TA] Add analyze tasks feature support (Azure#17267) * Add analyze multiple tasks and update healthcare features. * Update test resource region to Central US for FormRecognizer (Azure#17693) * Vijay receive message ttl fix (Azure#17678) * Fixing a regresion in message converter. * Changing version number. * [TA] Prepare for November Release. (Azure#17696) * no AAD but add note for what reason AAD is not working for healthcare * [service bus] Terminology clarification and small copy/paste errors in javadocs (Azure#17691) * Add tracing support for Service Bus processor (Azure#17684) * Add tracing support for SB processor * Make addContext packag-private * Resolve merge conflict * Updated the Key Vault CHANGELOGs to include past stable releases. (Azure#17701) * Updated the Key Vault CHANGELOGs to include information about past stable releases. * Updated Keys README. * Increment package version after release of com.azure azure-ai-textanalytics (Azure#17704) * Swtich back to smoke-test before doc publishing error gets fixed. (Azure#17697) Co-authored-by: Sima Zhu <sizhu@microsoft.com> * Increment package version after release of com.azure azure-security-keyvault-keys (Azure#17705) * Update '/eng/common/pipelines/templates/steps/create-pull-request.yml' to close test increment version pullrequests. (Azure#17695) * remove plugins to restore source and javadoc jar (Azure#17680) * Synapse: regenerate package-2019-06-01-preview (Azure#17713) * regenerate synapse package-2019-06-01-preview * update version * fix compile errors * - update release history. (Azure#17675) - update release history * Rename getAmqpAnnotatedMessage to getRawAmqpMessage (Azure#17712) Rename getAmqpAnnotatedMessage to getRawAmqpMessage * Sync eng/common directory with azure-sdk-tools for PR 1219 (Azure#17711) * Move entire docgeneration into common tools * Move docindex to common * Added the package replacement logic * Fixed on parameters * Fixed param * Change function to dash * Added regex on function * Added display name. * Update eng/common/docgeneration/Generate-DocIndex.ps1 Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> * Deal with js * Add no new line args * revert some test changes * Need to default to the double quotes for JS regex * Update Generate-DocIndex.ps1 * Added the appTitle * type Co-authored-by: Sima Zhu <sizhu@microsoft.com> Co-authored-by: Sima Zhu <48036328+sima-zhu@users.noreply.github.com> Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> * [Service Bus] Allow 0 prefetch and dynamically use batch size to request link credits (Azure#17546) * Fix Connection Closing on Timeout (Azure#17690) * Close connection when timeout occurs * Add CHANGELOG entry Co-authored-by: vcolin7 <vicolina@microsoft.com> Co-authored-by: Gauri Prasad <51212198+gapra-msft@users.noreply.github.com> Co-authored-by: Azure SDK Bot <53356347+azure-sdk@users.noreply.github.com> Co-authored-by: Ben Broderick Phillips <bebroder@microsoft.com> Co-authored-by: Annie Liang <64233642+xinlian12@users.noreply.github.com> Co-authored-by: Annie Liang <xinlian@microsoft.com> Co-authored-by: Sima Zhu <sizhu@microsoft.com> Co-authored-by: gaohan <1135494872@qq.com> Co-authored-by: v-gaoh <v-gaoh@Microsoft.com> Co-authored-by: Manfred Riem <manfred.riem@microsoft.com> Co-authored-by: Chidozie Ononiwu <chononiw@microsoft.com> Co-authored-by: zhihaoguo <zhihaoguo@microsoft.com> Co-authored-by: Sameeksha Vaity <savaity@microsoft.com> Co-authored-by: Brandon Siegel <96068+bsiegel@users.noreply.github.com> Co-authored-by: Yi Liu <yiliu6@microsoft.com> Co-authored-by: Shawn Fang <45607042+mssfang@users.noreply.github.com> Co-authored-by: Sima Zhu <48036328+sima-zhu@users.noreply.github.com> Co-authored-by: lzc-1997-abel <70368631+lzc-1997-abel@users.noreply.github.com> Co-authored-by: paola Mariana vicencio Hernandez <pvicencio@microsoft.com> Co-authored-by: Srikanta <51379715+srnagar@users.noreply.github.com> Co-authored-by: Jorge Beauregard <69869951+jbeauregardb@users.noreply.github.com> Co-authored-by: Ramya Rao <ramya.rao.a@outlook.com> Co-authored-by: Weidong Xu <weidxu@microsoft.com> Co-authored-by: Chidozie Ononiwu <31145988+chidozieononiwu@users.noreply.github.com> Co-authored-by: Heath Stewart <heaths@microsoft.com> Co-authored-by: minnieliu <minnieliu96@hotmail.com> Co-authored-by: Minnie Liu <peiliu@microsoft.com> Co-authored-by: gapra <gapra@microsoft.com> Co-authored-by: Jack Lu <dbqp99@msn.com> Co-authored-by: Rujun Chen <Rujun.Chen@microsoft.com> Co-authored-by: Tanyi Chen <tanchen@microsoft.com> Co-authored-by: Kushagra Thapar <kuthapar@microsoft.com> Co-authored-by: Richard Park <51494936+richardpark-msft@users.noreply.github.com> Co-authored-by: Fabian Meiswinkel <fabianm@microsoft.com> Co-authored-by: Francisco Fernández Castaño <francisco.fernandez.castano@gmail.com> Co-authored-by: M <v-moaryc@microsoft.com> Co-authored-by: Alan Zimmer <48699787+alzimmermsft@users.noreply.github.com> Co-authored-by: Hemant Tanwar <hemant_tanwar@hotmail.com> Co-authored-by: Vijaya Gopal Yarramneni <viyarr@microsoft.com> Co-authored-by: Chuang <54572251+xccc-msft@users.noreply.github.com> Co-authored-by: Wes Haggard <weshaggard@users.noreply.github.com> Co-authored-by: Yijun Xie <48257664+YijunXieMS@users.noreply.github.com>
support for spark query DataSourceV2 pipeline PR adds the support for spark query DataSourceV2 pipeline Notes: - TestReadE2EMain for end to end integration demo file - I have a separate PR for supporting spark filter translation to cosmos parametrized query: Azure#17789 TODO: we need to discuss on the following items who does what (to be done after this PR): - translate spark query filter to cosmos query: for now this is very basic to make `TestReadE2EMain` work. - For now only one spark task will be created. We need to utilize feed-range api to create one spark task per feed-range. - we need to cache comos-client, for now there is no caching hence not managing the lifetime of the cosmos-clients (memory leak) - this PR brings in the JsonSupport.scala and some code in CosmosRowConverter from v2 OLTP spark connector for translating ObjectNode to spark InternalRow. This code requires rewrite. This is only brought in for sake of making the `TestReadE2EMain` work.
spark passes the user filter predicates to DataSourceV2 implementation. Implementation needs to split the the filters into two sets 1. the set which can be pushed down to the database as predicates in a query 2. the set which cannot be pushed down (the database doesn't support) and hence has to be evaluated by the spark platform later. This PR - adds support for the above feature and also translates the supported filters into a cosmos db query predicate. see `FilterAnalyzer` which is the core of this PR. - adds the unit test for the feature. `FilterAnalyzerSpec`
- Ensures cosmos spark unit tests are running in the CI. - This only wires up unit tests for now not integration tests. - If a spark unit test fails CIs running unit tests will fail. TODO: - we need to wire up integration tests for spark to run tests against either emulator or prod endpoint. - suppress jacoco complain in analyzing java byte code generated from scala
… in CI (Azure#17952) Now we have gated CI for end to end test spark <-> cosmos db emulator. Cosmos Spark End to End Test against Cosmos Emulator runs in CI. - any test tagged by newly introduced tag, `RequiresCosmosEndpoint` will get included in sparkE2E test group and executed by the newly added CI, Spark_Integration_Tests_Java8 against cosmos db emulator. - see `SparkE2EWriteSpec.scala` for end to end spark sample test. It writes data from spark to cosmos db emulator. Emulator CI We have two spark test groups running in the CI: - **unit**: (only unit tests) this is the default test group. how to run locally in your dev machine: `mvn -e -Dgpg.skip -Dmaven.javadoc.skip=true -Dspotbugs.skip=true -Dcheckstyle.skip=true -Drevapi.skip=true -pl ,azure-cosmos-spark_3-0_2-12 -am clean test` - **sparkE2E**: requires cosmos db endpoint (integration tests runs against cosmos emulator) how to run locally in your dev machine: `mvn -e -Dgpg.skip -Dmaven.javadoc.skip=true -Dspotbugs.skip=true -Dcheckstyle.skip=true -Drevapi.skip=true -pl ,azure-cosmos-spark_3-0_2-12 -am -PsparkE2E clean test` <img width="1040" alt="Screen Shot 2020-12-03 at 9 21 28 AM" src="https://user-images.githubusercontent.com/22279672/101064552-f9374b00-3548-11eb-8ce5-63af31255864.png"> TODO: - we need a CI for java11. - Cosmos Emulator requires windows, hence the current CI is running on windows, we should add a CI on Linux (targeting prod account) - some patterns in the integration tests need to be figured out. -- proper resource (Database, Container) cleaning -- proper shutdown of CosmosClient and Spark session -- possible sharing of the CosmosClient and spark session between tests -- which scala test style should be used?
This PR adds support for spark3 DataSourceV2 Catalog API: NOTE: this PR is the same as this PR (#15) targeting Azure repo. The original PR is already reviewed and signed off by reviewers. ```scala spark.conf.set(s"spark.sql.catalog.cosmoscatalog", "com.azure.cosmos.spark.CosmosCatalog") spark.conf.set(s"spark.sql.catalog.cosmoscatalog.spark.cosmos.accountEndpoint", cosmosEndpoint) spark.conf.set(s"spark.sql.catalog.cosmoscatalog.spark.cosmos.accountKey", cosmosMasterKey) spark.sql(s"CREATE DATABASE cosmoscatalog.mydb;") spark.sql(s"CREATE TABLE cosmoscatalog.mydb.myContainer (word STRING, number INT) using cosmos.items TBLPROPERTIES(partitionKeyPath = '/mypk', manualThroughput = '1100')") ``` Please see `CosmosCatalogSpec` for end to end integration tests. The integration testings will work once this earlier PR merges: Azure#17952 getting merged. TODO: - There are some TODO in the code, (e.g., add support for table alter) - the integration tests resource management needs to be figured out. - This PR adds support for catalog metadata operation, we should also validate data operation through catalog api.
Cosmos Spark3 DataSourceV2 basic support for user provided schema in the query path - basic support for user provided schema in the query path - integration e2e test for query with user provided schema see `SparkE2EQuerySpec` for e2e test. sample usage: ```scala val customSchema = StructType(Array( StructField("id", StringType), StructField("name", StringType), StructField("type", StringType), StructField("age", IntegerType), StructField("isAlive", BooleanType) )) val df = spark.read.schema(customSchema).format("cosmos.items").options(cfg).load() ``` For now we only support user provided schema in the query path. TODO: - schema inference support
…zure#18040) This PR adds support for merging spark application config with dataframe options. Cosmos Spark3 DataSourceV2. example: ```scala val sparkConfig = new SparkConf() sparkConfig.set("spark.cosmos.accountEndpoint", cosmosEndpoint) sparkConfig.set("spark.cosmos.accountKey", cosmosMasterKey) val spark = SparkSession.builder() .appName("spark connector sample") .master("local") .config(sparkConfig) .getOrCreate() val options = Map( "spark.cosmos.database" -> cosmosDatabase, "spark.cosmos.container" -> cosmosContainer ) val df = spark.read.format("cosmos.items").options(options).load() ``` see `SparkE2EConfigResolutionSpec` e2e test for more examples. TODO: - we should investigate how to merge spark sql config. - we should investigate how to integrate this with catalog api (if possible)
…ver and re-use the caches in the Spark Executor. (Azure#18184) This PR provides the foundation for broadcasting cosmos client cache (for now only CollectionCache) in a spark cluster. **importance:** in a spark cluster spark driver JVm and spark executor JVMs running on different machines each will have a cosmos-client. When a spark driver orchestrate a spark job, new executors will be assigned and that requires instantiating CosmosClients on the executor JVMs. CosmosClient bootstrapping consumes master resource budget and hence very quickly we will get master resource RU throttled. This PR provides the foundation of capturing a snapshot of the CosmosClient collection cache from the spark driver and broadcasts it to all executor, hence all executors can re-use the pre-populated collection cache from driver. For now only we support collection-cache broadcast. TODO: - Later we will add support for broadcasting DatabaseAccount info (GlobalEndpoint Manager) - PartitionKeyRangeCache - AddressCache
This is a minor refactoring improvement. this PR makes `ImplementationBridgeHelpers.CosmosClientBuilderHelper.getCosmosClientBuilderAccessor()` package private hence preventing access to the accessor by end user.
This PR does not make any changes to the spark source code logic. All changes are scoped to the tests of spark connector. 1. Improves tests framework. -- Provides common test utils for cleanup resources (Cosmos DB database, spark session, etc) after tests -- provides framework for sharing CosmosClient, spark session, container in the scope of one test class -- improves test framework to rely on scalatest assertions instead of assertj as a pattern 2. adds a Dev-Readme.md file documenting the test execution instructions.
This reverts commit 9474753.
* Adding toobjectnode tests * Styling * Styling * more tests * Removing unused trait * Starting with new methods * Completing fromObjectNodeToRow * Finishing row to objectnode * Porting tests * Remove old code * Removing unused file * Rewire to use the new methods * removing assertThat * Forgot about UnsafeMap * Missing a toString * Adding specific null test * Adding string format parsing * Styles * String to number conversion
…into users/fabianm/mergemastertosparkfeaturebranch
* Starting with inference * More mappings * Adding config * Wiring config through * Using query * Filtering system props * Fixes * Adding tests * Adding config to disable inference * Adding table capability * Adding raw json support for when not inferred * Adding E2E * Another test * Wrong type * Printing schema * new line missing * new line missing * Adding comment * Default to false * Updating test * Supporting raw write scenario and tests * As warning * Defaulting to String * More tests * No magic numbers in test * Missing config * Updating comment * Rename and new tests
* Adding query config * Wiring query * Adding test
* Initial Skeleton for ChangeFeed Data Source Open issues (will be addressed in subsequent PRs) - Increased E2E test coverage - Partitioning - StartFrom configuration - Spark streaming (Micro Batch) support - Continous Processing support - ContinuationState persistence * Reacting to CR feedback * Changed parsing logic to pattern matching
cosmos spark: This PR adds support for the following **new functionality for `Append` save mode:** 1. extracting partitionKeyPath once for a given container, and use it for future write operations (should help on perf by bypassing the collection cache lookup in the core SDK) 2. configuring write strategy, config: `spark.cosmos.write.strategy` 3. retrying on write failure, config: `spark.cosmos.write.maxRetryCount` 4. tests for all above **bugfixed CosmosConfig default value support.** **tests capturing the current behaviour on the save mode (no functional change only test) for `Append`, `Overwrite`, `Ignore`, `ErrorIfExists`** see `SparkE2EWriteSpec` line 110 to the end parked the code I have for integrating patch (no new functionality for now needs more work, once patch is ready in BE will be enabled).
* Adding config for consistency * Adding config and tests * Correctly handling invalid case * With logging * Initial cache and tests * Adding broadcast tests * Replacing unique clients * Accessors * Refactor config * Removing extra clients * Removing imports * Spaces * Spaces * Readding missing code from merge * Whitespace * Addressing comments * adding lock * No match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
merging master to feature branch brought reactor upgrade and we are constantly getting this error on bulk ``` reactor.core.Exceptions$OverflowException: Backpressure overflow during Sinks.Many#emitNext ``` `EmitterProcessor` that we used in Spark for integrating against bulk is deprecated in the newer version of reactor and seems its behaviour has changed. replacing `EmitterProcessor `with `Sinks.many()` seems to be the recommended path forward and solved the problem. There are also many deprecated apis of reactor-core which are used in bulk implementation in the core sdk. I suspect later we have to fix those as well.
The template pipeline failures are known and being fixed in #20076 so they can be ignored for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from engineering perspective.
This is a spark connector to be used with spark not necessarily a java client. can be used in pyspark etc. Fixing the readme.
...osmos/azure-cosmos/src/main/java/com/azure/cosmos/models/CosmosChangeFeedRequestOptions.java
Show resolved
Hide resolved
* | ||
* @return The current size of a container in kilobytes. | ||
*/ | ||
public long getDocumentUsage() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please before reviewing this PR read the description.
This PR intends to merge feature/cosmos/spark30 branch to master.
NOTE: do NOT merge this PR. we don't want merge squash because we need to maintain history.
please sync with @moderakh
com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12
refresher of the prior discussion, as per industry standards scala version needs to be part of the artifactId.
As agreed, we are using
com.azure.cosmos.spark
groupId to avoid pollutingcom.azure
groupId