-
Notifications
You must be signed in to change notification settings - Fork 2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'users/moderakh/spark3-merging-to-master'
- Loading branch information
Showing
152 changed files
with
13,982 additions
and
158 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -90,6 +90,9 @@ venv | |
nbproject | ||
nb-configuration.xml | ||
|
||
# Scala Stylecheck | ||
scalastyle-output.xml | ||
|
||
# Emacs # | ||
|
||
#changebundle.txt# |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
*.log | ||
|
||
metastore_db/* | ||
spark-warehouse/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
*.log | ||
|
||
metastore_db/* | ||
spark-warehouse/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
## Release History | ||
|
||
## 4.0.0-beta.2 (Unreleased) | ||
|
||
## 4.0.0-beta.1 (2021-03-22) | ||
* Cosmos DB Spark 3.1.1 Connector Preview `4.0.0-beta.1` Release. | ||
### Features | ||
* Supports Spark 3.1.1 and Scala 2.12. | ||
* Integrated against Spark3 DataSourceV2 API. | ||
* Devloped ground up using Cosmos DB Java V4 SDK. | ||
* Added support for Spark Query, Write, and Streaming. | ||
* Added support for Spark3 Catalog metadata APIs. | ||
* Added support for Java V4 Throughput Control. | ||
* Added support for different partitioning strategies. | ||
* Integrated against Cosmos DB TCP protocol. | ||
* Added support for Databricks automated Maven Resolver. | ||
* Added support for broadcasting CosmosClient caches to reduce bootstrapping RU throttling. | ||
* Added support for unified jackson ObjectNode to SparkRow Converter. | ||
* Added support for Raw Json format. | ||
* Added support for Config Validation. | ||
* Added support for Spark application configuration consolidation. | ||
* Integrated against Cosmos DB FeedRange API to support Partition Split Proofing. | ||
* Automated CI testing on DataBricks and Cosmos DB live endpoint. | ||
* Automated CI Testing on Cosmos DB Emulator. | ||
|
||
### Known limitations | ||
* Spark structured streaming (micro batches) for consuming change feed has been implemented but not tested end-to-end fully so is considered experimental at this point. | ||
* No support for continuous processing (change feed) yet. | ||
* No perf tests / optimizations have been done yet - we will iterate on perf in the next preview releases. So usage should be limited to non-production environments with this preview. | ||
|
||
## 4.0.0-alpha.1 (2021-03-17) | ||
* Cosmos DB Spark 3.1.1 Connector Test Release. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# Contributing | ||
This instruction is guideline for building and code contribution. | ||
|
||
## Prequisites | ||
- JDK 11 and above | ||
- [Maven](https://maven.apache.org/) 3.0 and above | ||
|
||
## Build from source | ||
To build the project, run maven commands. | ||
|
||
```bash | ||
git clone https://github.com/Azure/azure-sdk-for-java.git | ||
cd sdk/cosmos/azure-cosmos-spark_3_2.12 | ||
mvnw clean install | ||
``` | ||
|
||
## Test | ||
There are integration tests on azure and on emulator to trigger integration test execution | ||
against Azure Cosmos DB and against | ||
[Azure Cosmos DB Emulator](https://docs.microsoft.com/azure/cosmos-db/local-emulator), you need to | ||
follow the link to set up emulator before test execution. | ||
|
||
- Run unit tests | ||
```bash | ||
mvn clean install -Dgpg.skip | ||
``` | ||
|
||
- Run integration tests | ||
- on Azure | ||
>**NOTE** Please note that integration test against Azure requires Azure Cosmos DB Document | ||
>API and will automatically create a Cosmos database in your Azure subscription, then there | ||
>will be **Azure usage fee.** | ||
Integration tests will require a Azure Subscription. If you don't already have an Azure | ||
subscription, you can activate your | ||
[MSDN subscriber benefits](https://azure.microsoft.com/pricing/member-offers/msdn-benefits-details/) | ||
or sign up for a [free Azure account](https://azure.microsoft.com/free/). | ||
|
||
1. Create an Azure Cosmos DB on Azure. | ||
- Go to [Azure portal](https://portal.azure.com/) and click +New. | ||
- Click Databases, and then click Azure Cosmos DB to create your database. | ||
- Navigate to the database you have created, and click Access keys and copy your | ||
URI and access keys for your database. | ||
|
||
2. Set environment variables ACCOUNT_HOST, ACCOUNT_KEY and SECONDARY_ACCOUNT_KEY, where value | ||
of them are Cosmos account URI, primary key and secondary key. | ||
|
||
So set the | ||
second group environment variables NEW_ACCOUNT_HOST, NEW_ACCOUNT_KEY and | ||
NEW_SECONDARY_ACCOUNT_KEY, the two group environment variables can be same. | ||
3. Run maven command with `integration-test-azure` profile. | ||
|
||
```bash | ||
set ACCOUNT_HOST=your-cosmos-account-uri | ||
set ACCOUNT_KEY=your-cosmos-account-primary-key | ||
set SECONDARY_ACCOUNT_KEY=your-cosmos-account-secondary-key | ||
|
||
set NEW_ACCOUNT_HOST=your-cosmos-account-uri | ||
set NEW_ACCOUNT_KEY=your-cosmos-account-primary-key | ||
set NEW_SECONDARY_ACCOUNT_KEY=your-cosmos-account-secondary-key | ||
mvnw -P integration-test-azure clean install | ||
``` | ||
|
||
- on Emulator | ||
|
||
Setup Azure Cosmos DB Emulator by following | ||
[this instruction](https://docs.microsoft.com/azure/cosmos-db/local-emulator), and set | ||
associated environment variables. Then run test with: | ||
```bash | ||
mvnw -P integration-test-emulator install | ||
``` | ||
|
||
|
||
- Skip tests execution | ||
```bash | ||
mvn clean install -Dgpg.skip-DskipTests | ||
``` | ||
|
||
## Version management | ||
Developing version naming convention is like `0.1.2-beta.1`. Release version naming convention is like `0.1.2`. | ||
|
||
## Contribute to code | ||
Contribution is welcome. Please follow | ||
[this instruction](https://github.com/Azure/azure-sdk-for-java/blob/master/CONTRIBUTING.md) to contribute code. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
# Azure Cosmos DB OLTP Spark 3 connector | ||
|
||
**Azure Cosmos DB OLTP Spark connector preview** provides Apache Spark support for Azure Cosmos DB using | ||
the [SQL API][sql_api_query]. | ||
[Azure Cosmos DB][cosmos_introduction] is a globally-distributed database service which allows | ||
developers to work with data using a variety of standard APIs, such as SQL, MongoDB, Cassandra, Graph, and Table. | ||
|
||
**NOTE this is a Preview build. | ||
This build has not been load or performance tested yet - and at this point is not recommended | ||
being used in production scenarios.** | ||
|
||
If you have any feedback or ideas on how to improve your experience please let us know here: | ||
https://github.com/Azure/azure-sdk-for-java/issues/new | ||
|
||
## Documentation | ||
|
||
- [Getting started](https://github.com/Azure/azure-sdk-for-java/blob/feature/cosmos/spark30/sdk/cosmos/azure-cosmos-spark_3-1_2-12/docs/quick-start.md) | ||
- [Catalog API](https://github.com/Azure/azure-sdk-for-java/blob/feature/cosmos/spark30/sdk/cosmos/azure-cosmos-spark_3-1_2-12/docs/catalog-api.md) | ||
- [Configuration Parameter Reference](https://github.com/Azure/azure-sdk-for-java/blob/feature/cosmos/spark30/sdk/cosmos/azure-cosmos-spark_3-1_2-12/docs/configuration-reference.md) | ||
|
||
[//]: # (//TODO: moderakh add more sections) | ||
[//]: # (//TODO: moderakh Enable Client Logging) | ||
[//]: # (//TODO: moderakh Examples) | ||
[//]: # (//TODO: moderakh Next steps) | ||
[//]: # (//TODO: moderakh Key concepts) | ||
[//]: # (//TODO: moderakh Azure Cosmos DB Partition) | ||
[//]: # (//TODO: moderakh Troubleshooting) | ||
|
||
## Version Compatibility | ||
|
||
| Connector | Spark | Minimum Java Version | Supported Scala Versions | | ||
| ------------- | ------------- | -------------------- | ----------------------- | | ||
| 4.0.0-beta.1 | 3.1.1 | 8 | 2.12 | | ||
|
||
## Download | ||
|
||
You can use the maven coordinate of the jar to auto install the Spark Connector to your Databricks Runtime 8 from Maven: | ||
`com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.0.0-beta.1` | ||
|
||
You can also integrate against Cosmos DB Spark Connector in your SBT project: | ||
```scala | ||
libraryDependencies += "com.azure.cosmos.spark" % "azure-cosmos-spark_3-1_2-12" % "4.0.0-beta.1" | ||
``` | ||
|
||
Cosmos DB Spark Connector is available on [Maven Central Repo](https://search.maven.org/artifact/com.azure.cosmos.spark/azure-cosmos-spark_3-1_2-12/4.0.0-beta.1/jar). | ||
|
||
### General | ||
|
||
If you encounter any bug, please file an issue [here](https://github.com/Azure/azure-sdk-for-java/issues/new). | ||
|
||
To suggest a new feature or changes that could be made, file an issue the same way you would for a bug. | ||
|
||
## License | ||
This project is under MIT license and uses and repackages other third party libraries as an uber jar. | ||
See [NOTICE.txt](https://github.com/Azure/azure-sdk-for-java/blob/feature/cosmos/spark30/NOTICE.txt). | ||
|
||
## Contributing | ||
|
||
This project welcomes contributions and suggestions. Most contributions require you to agree to a | ||
[Contributor License Agreement (CLA)][cla] declaring that you have the right to, and actually do, grant us the rights | ||
to use your contribution. | ||
|
||
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate | ||
the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to | ||
do this once across all repos using our CLA. | ||
|
||
This project has adopted the [Microsoft Open Source Code of Conduct][coc]. For more information see the [Code of Conduct FAQ][coc_faq] | ||
or contact [opencode@microsoft.com][coc_contact] with any additional questions or comments. | ||
|
||
<!-- LINKS --> | ||
[source_code]: src | ||
[cosmos_introduction]: https://docs.microsoft.com/azure/cosmos-db/ | ||
[cosmos_docs]: https://docs.microsoft.com/azure/cosmos-db/introduction | ||
[jdk]: https://docs.microsoft.com/java/azure/jdk/?view=azure-java-stable | ||
[maven]: https://maven.apache.org/ | ||
[cla]: https://cla.microsoft.com | ||
[coc]: https://opensource.microsoft.com/codeofconduct/ | ||
[coc_faq]: https://opensource.microsoft.com/codeofconduct/faq/ | ||
[coc_contact]: mailto:opencode@microsoft.com | ||
[azure_subscription]: https://azure.microsoft.com/free/ | ||
[samples]: https://github.com/Azure/azure-sdk-for-java/tree/master/sdk/cosmos/azure-spring-data-cosmos/src/samples/java/com/azure/spring/data/cosmos | ||
[sql_api_query]: https://docs.microsoft.com/azure/cosmos-db/sql-api-sql-query | ||
[local_emulator]: https://docs.microsoft.com/azure/cosmos-db/local-emulator | ||
[local_emulator_export_ssl_certificates]: https://docs.microsoft.com/azure/cosmos-db/local-emulator-export-ssl-certificates | ||
[azure_cosmos_db_partition]: https://docs.microsoft.com/azure/cosmos-db/partition-data | ||
[sql_queries_in_cosmos]: https://docs.microsoft.com/azure/cosmos-db/tutorial-query-sql-api | ||
[sql_queries_getting_started]: https://docs.microsoft.com/azure/cosmos-db/sql-query-getting-started | ||
|
||
|
Oops, something went wrong.