Skip to content

Commit

Permalink
Updates to delta-storage/-s3-dynamodb artifacts and java/scaladocs
Browse files Browse the repository at this point in the history
## Description

- `DynamoDBLogStore` renamed to `S3DynamoDBLogStore`
- `delta-storage-dynamodb` artifact renamed to `delta-storage-s3-dynamodb`
- `delta-storage` artifact name now has no scala version, and pom has no scala dependency
- `delta-storage-s3-dynamodb` artifact name now has no scala version, and pom has no scala dependency
- `io.delta.storage` scaladocs now contain only the `io.delta.storage` Java APIs
- NO CHANGE: `io.delta.storage` java APIs docs include only `LogStore.java` and `CloseableIterator.java`
- updated integration tests for new folder and new artifact name
- java artifacts are only generated when using scala 2.12 ... we do NOT double publish them / double generate the jars. e.g. `build/sbt '++2.13.5 publishM2' does not generate these jars

### Artifact Names and POMs
Ran `build/sbt publishM2` locally.

#### delta-storage published correctly
```
[info] 	published delta-storage to file:/Users/scott.sandre/.m2/repository/io/delta/delta-storage/1.2.0-SNAPSHOT/delta-storage-1.2.0-SNAPSHOT.pom
[info] 	published delta-storage to file:/Users/scott.sandre/.m2/repository/io/delta/delta-storage/1.2.0-SNAPSHOT/delta-storage-1.2.0-SNAPSHOT.jar
[info] 	published delta-storage to file:/Users/scott.sandre/.m2/repository/io/delta/delta-storage/1.2.0-SNAPSHOT/delta-storage-1.2.0-SNAPSHOT-sources.jar
[info] 	published delta-storage to file:/Users/scott.sandre/.m2/repository/io/delta/delta-storage/1.2.0-SNAPSHOT/delta-storage-1.2.0-SNAPSHOT-javadoc.jar

// pom.xml
<?xml version='1.0' encoding='UTF-8'?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0">
    <modelVersion>4.0.0</modelVersion>
    <groupId>io.delta</groupId>
    <artifactId>delta-storage</artifactId>
    <packaging>jar</packaging>
    <description>delta-storage</description>
    <version>1.2.0-SNAPSHOT</version>
    <licenses>
        <license>
            <name>Apache-2.0</name>
            <url>http://www.apache.org/licenses/LICENSE-2.0</url>
            <distribution>repo</distribution>
        </license>
    </licenses>
    <name>delta-storage</name>
    <organization>
        <name>io.delta</name>
    </organization>
    <url>https://delta.io/</url>
    <scm>
        <url>git@github.com:delta-io/delta.git</url>
        <connection>scm:git:git@github.com:delta-io/delta.git</connection>
    </scm>
    <developers>
        ...
    </developers>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.1</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>
</project>
```

#### delta-storage-s3-dynamodb published correctly
```
[info] 	published delta-storage-s3-dynamodb to file:/Users/scott.sandre/.m2/repository/io/delta/delta-storage-s3-dynamodb/1.2.0-SNAPSHOT/delta-storage-s3-dynamodb-1.2.0-SNAPSHOT.pom
[info] 	published delta-storage-s3-dynamodb to file:/Users/scott.sandre/.m2/repository/io/delta/delta-storage-s3-dynamodb/1.2.0-SNAPSHOT/delta-storage-s3-dynamodb-1.2.0-SNAPSHOT.jar
[info] 	published delta-storage-s3-dynamodb to file:/Users/scott.sandre/.m2/repository/io/delta/delta-storage-s3-dynamodb/1.2.0-SNAPSHOT/delta-storage-s3-dynamodb-1.2.0-SNAPSHOT-sources.jar
[info] 	published delta-storage-s3-dynamodb to file:/Users/scott.sandre/.m2/repository/io/delta/delta-storage-s3-dynamodb/1.2.0-SNAPSHOT/delta-storage-s3-dynamodb-1.2.0-SNAPSHOT-javadoc.jar

// pom.xml
<?xml version='1.0' encoding='UTF-8'?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0">
    <modelVersion>4.0.0</modelVersion>
    <groupId>io.delta</groupId>
    <artifactId>delta-storage-s3-dynamodb</artifactId>
    <packaging>jar</packaging>
    <description>delta-storage-s3-dynamodb</description>
    <version>1.2.0-SNAPSHOT</version>
    <licenses>
        <license>
            <name>Apache-2.0</name>
            <url>http://www.apache.org/licenses/LICENSE-2.0</url>
            <distribution>repo</distribution>
        </license>
    </licenses>
    <name>delta-storage-s3-dynamodb</name>
    <organization>
        <name>io.delta</name>
    </organization>
    <url>https://delta.io/</url>
    <scm>
        <url>git@github.com:delta-io/delta.git</url>
        <connection>scm:git:git@github.com:delta-io/delta.git</connection>
    </scm>
    <developers>
        ...
    </developers>
    <dependencies>
        <dependency>
            <groupId>io.delta</groupId>
            <artifactId>delta-storage</artifactId>
            <version>1.2.0-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>io.delta</groupId>
            <artifactId>delta-core_2.12</artifactId>
            <version>1.2.0-SNAPSHOT</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk</artifactId>
            <version>1.7.4</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>
</project>
```

#### other artifacts still generate correctly
```
ls /Users/scott.sandre/.m2/repository/io/delta
delta-contribs_2.12       delta-core_2.12           delta-storage             delta-storage-s3-dynamodb
```

### Javadocs and Scaladocs
Ran `build/sbt unidoc` locally.

#### Javadocs BEFORE
![image](https://user-images.githubusercontent.com/59617782/161815044-c0b7b650-bcc4-4bb1-9bb4-c90a3c943ff4.png)

#### Javadocs AFTER
- *Note*: the `:: Developer API ::` tag isn't working here ... but it's not working on branch-1.1 master either for me locally ... so I don't think this is an issue
![image](https://user-images.githubusercontent.com/59617782/161815083-4869cb2f-b259-4b51-8ed4-04c7e24add2c.png)

#### Scaladocs BEFORE
![image](https://user-images.githubusercontent.com/59617782/161814589-37ecb600-6f9a-47ab-be21-2ce69d9a47dc.png)

#### Scaladocs AFTER
![image](https://user-images.githubusercontent.com/59617782/161869888-08678416-4081-488c-b919-becb63d86874.png)

### Integration Tests
- also re-ran integration tests (to test the new artifact name, with no scala version)

Closes #1054

Signed-off-by: Scott Sandre <scott.sandre@databricks.com>
GitOrigin-RevId: 80763cb099c95c342bc102b5c8de11048b56060e
  • Loading branch information
scottsand-db authored and vkorukanti committed Apr 8, 2022
1 parent c55acf4 commit 952f25b
Show file tree
Hide file tree
Showing 10 changed files with 84 additions and 37 deletions.
81 changes: 66 additions & 15 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,18 @@ import java.nio.file.Files
val sparkVersion = "3.2.0"
val scala212 = "2.12.14"
val scala213 = "2.13.5"
val default_scala_version = scala212
val all_scala_versions = Seq(scala212, scala213)

scalaVersion := scala212
scalaVersion := default_scala_version

// crossScalaVersions must be set to Nil on the root project
crossScalaVersions := Nil

lazy val commonSettings = Seq(
organization := "io.delta",
scalaVersion := scala212,
crossScalaVersions := Seq(scala212, scala213),
scalaVersion := default_scala_version,
crossScalaVersions := all_scala_versions,
fork := true
)

Expand Down Expand Up @@ -166,7 +168,7 @@ lazy val storage = (project in file("storage"))
.settings (
name := "delta-storage",
commonSettings,
releaseSettings, // TODO: proper artifact name
javaOnlyReleaseSettings,
libraryDependencies ++= Seq(
// User can provide any 2.x or 3.x version. We don't use any new fancy APIs. Watch out for
// versions with known vulnerabilities.
Expand All @@ -177,14 +179,18 @@ lazy val storage = (project in file("storage"))
)
)

lazy val storageDynamodb = (project in file("storage-dynamodb"))
lazy val storageS3DynamoDB = (project in file("storage-s3-dynamodb"))
.dependsOn(storage % "compile->compile;test->test;provided->provided")
.dependsOn(core % "test->test")
.settings (
name := "delta-storage-dynamodb",
name := "delta-storage-s3-dynamodb",
commonSettings,
releaseSettings, // TODO: proper artifact name with no scala version
// Test / publishArtifact := true, // uncomment only when testing FailingDynamoDBLogStore
javaOnlyReleaseSettings,

// uncomment only when testing FailingS3DynamoDBLogStore. this will include test sources in
// a separate test jar.
// Test / publishArtifact := true,

libraryDependencies ++= Seq(
"com.amazonaws" % "aws-java-sdk" % "1.7.4" % "provided"
)
Expand Down Expand Up @@ -235,20 +241,27 @@ lazy val scalaStyleSettings = Seq(
* MIMA settings *
********************
*/
def getPrevVersion(currentVersion: String): String = {

/**
* @return tuple of (major, minor, patch) versions extracted from a version string.
* e.g. "1.2.3" would return (1, 2, 3)
*/
def getMajorMinorPatch(versionStr: String): (Int, Int, Int) = {
implicit def extractInt(str: String): Int = {
"""\d+""".r.findFirstIn(str).map(java.lang.Integer.parseInt).getOrElse {
throw new Exception(s"Could not extract version number from $str in $version")
}
}

val (major, minor, patch): (Int, Int, Int) = {
currentVersion.split("\\.").toList match {
case majorStr :: minorStr :: patchStr :: _ =>
(majorStr, minorStr, patchStr)
case _ => throw new Exception(s"Could not find previous version for $version.")
}
versionStr.split("\\.").toList match {
case majorStr :: minorStr :: patchStr :: _ =>
(majorStr, minorStr, patchStr)
case _ => throw new Exception(s"Could not parse version for $version.")
}
}

def getPrevVersion(currentVersion: String): String = {
val (major, minor, patch) = getMajorMinorPatch(currentVersion)

val majorToLastMinorVersions = Map(
0 -> 8
Expand Down Expand Up @@ -283,6 +296,13 @@ def ignoreUndocumentedPackages(packages: Seq[Seq[java.io.File]]): Seq[Seq[java.i
.map(_.filterNot(_.getName.contains("$")))
.map(_.filterNot(_.getCanonicalPath.contains("io/delta/sql")))
.map(_.filterNot(_.getCanonicalPath.contains("io/delta/tables/execution")))
.map { _.filterNot { f =>
// LogStore.java and CloseableIterator.java are the only public io.delta.storage APIs
f.getCanonicalPath.contains("io/delta/storage") &&
f.getName != "LogStore.java" &&
f.getName != "CloseableIterator.java"
}
}
.map(_.filterNot(_.getCanonicalPath.contains("spark")))
}

Expand All @@ -294,6 +314,19 @@ lazy val unidocSettings = Seq(
"-doc-title", "Delta Lake " + version.value.replaceAll("-SNAPSHOT", "") + " ScalaDoc"
),

ScalaUnidoc / unidoc / unidocAllSources := {
(ScalaUnidoc / unidoc / unidocAllSources).value
// ignore Scala (non-public) io.delta.storage classes
.map(_.filterNot(_.getCanonicalPath.contains("io/delta/storage"))) ++
// include public io.delta.storage classes
(JavaUnidoc / unidoc / unidocAllSources).value
.map { _.filter { f =>
f.getCanonicalPath.contains("io/delta/storage") &&
(f.getName == "LogStore.java" || f.getName == "CloseableIterator.java")
}
}
},

// Configure Java unidoc
JavaUnidoc / unidoc / javacOptions := Seq(
"-public",
Expand Down Expand Up @@ -325,6 +358,24 @@ lazy val skipReleaseSettings = Seq(
publish / skip := true
)

/**
* Release settings for artifact that contains only Java source code
*/
lazy val javaOnlyReleaseSettings = releaseSettings ++ Seq(
// drop off Scala suffix from artifact names
crossPaths := false,

// we publish jars for each scalaVersion in crossScalaVersions. however, we only need to publish
// one java jar. thus, only do so when the current scala version == default scala version
publishArtifact := {
val (expMaj, expMin, _) = getMajorMinorPatch(default_scala_version)
s"$expMaj.$expMin" == scalaBinaryVersion.value
},

// exclude scala-library from dependencies in generated pom.xml
autoScalaLibrary := false,
)

lazy val releaseSettings = Seq(
publishMavenStyle := true,
publishArtifact := true,
Expand Down
9 changes: 4 additions & 5 deletions run-integration-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,16 +107,15 @@ def run_dynamodb_logstore_integration_tests(root_dir, version, test_name, extra_
if use_local:
run_cmd(["build/sbt", "publishM2"])

test_dir = path.join(root_dir, path.join("storage-dynamodb", "integration_tests"))
test_dir = path.join(root_dir, path.join("storage-s3-dynamodb", "integration_tests"))
test_files = [path.join(test_dir, f) for f in os.listdir(test_dir)
if path.isfile(path.join(test_dir, f)) and
f.endswith(".py") and not f.startswith("_")]

python_root_dir = path.join(root_dir, "python")
extra_class_path = path.join(python_root_dir, path.join("delta", "testing"))
packages = "io.delta:delta-core_2.12:" + version
# TODO: update this with proper delta-storage artifact ID (i.e. no _2.12 scala version)
packages += "," + "io.delta:delta-storage-dynamodb_2.12:" + version
packages += "," + "io.delta:delta-storage-s3-dynamodb:" + version
if extra_packages:
packages += "," + extra_packages

Expand Down Expand Up @@ -293,7 +292,7 @@ def __exit__(self, tpe, value, traceback):
action="store_true",
help="Generate JARs from local source code and use to run tests")
parser.add_argument(
"--run-storage-dynamodb-integration-tests",
"--run-storage-s3-dynamodb-integration-tests",
required=False,
default=False,
action="store_true",
Expand Down Expand Up @@ -326,7 +325,7 @@ def __exit__(self, tpe, value, traceback):
run_scala = not args.python_only and not args.pip_only
run_pip = not args.python_only and not args.scala_only and not args.no_pip

if args.run_storage_dynamodb_integration_tests:
if args.run_storage_s3_dynamodb_integration_tests:
run_dynamodb_logstore_integration_tests(root_dir, args.version, args.test, args.maven_repo,
args.dbb_packages, args.dbb_conf, args.use_local)
quit()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,10 @@
export DELTA_TABLE_PATH=s3a://test-bucket/delta-test/
export DELTA_DYNAMO_TABLE=delta_log_test
export DELTA_DYNAMO_REGION=us-west-2
export DELTA_STORAGE=io.delta.storage.DynamoDBLogStore
export DELTA_STORAGE=io.delta.storage.S3DynamoDBLogStore
export DELTA_NUM_ROWS=16
./run-integration-tests.py --run-storage-dynamodb-integration-tests \
./run-integration-tests.py --run-storage-s3-dynamodb-integration-tests \
--dbb-packages org.apache.hadoop:hadoop-aws:3.3.1,com.amazonaws:aws-java-sdk-bundle:1.12.142 \
--dbb-conf spark.jars.ivySettings=/workspace/ivy.settings \
spark.driver.extraJavaOptions=-Dlog4j.configuration=file:debug/log4j.properties
Expand All @@ -56,8 +56,8 @@
concurrent_readers = int(os.environ.get("DELTA_CONCURRENT_READERS", 2))
num_rows = int(os.environ.get("DELTA_NUM_ROWS", 16))

# className to instantiate. io.delta.storage.DynamoDBLogStore or .FailingDynamoDBLogStore
delta_storage = os.environ.get("DELTA_STORAGE", "io.delta.storage.DynamoDBLogStore")
# className to instantiate. io.delta.storage.S3DynamoDBLogStore or .FailingS3DynamoDBLogStore
delta_storage = os.environ.get("DELTA_STORAGE", "io.delta.storage.S3DynamoDBLogStore")
dynamo_table_name = os.environ.get("DELTA_DYNAMO_TABLE", "delta_log_test")
dynamo_region = os.environ.get("DELTA_DYNAMO_REGION", "us-west-2")
dynamo_error_rates = os.environ.get("DELTA_DYNAMO_ERROR_RATES", "")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,8 @@
* -- complete (STRING, representing boolean, "true" or "false")
* -- commitTime (NUMBER, epoch seconds)
*/
public class DynamoDBLogStore extends BaseExternalLogStore {
private static final Logger LOG = LoggerFactory.getLogger(DynamoDBLogStore.class);
public class S3DynamoDBLogStore extends BaseExternalLogStore {
private static final Logger LOG = LoggerFactory.getLogger(S3DynamoDBLogStore.class);

/**
* Configuration keys for the DynamoDB client
Expand All @@ -95,7 +95,7 @@ public class DynamoDBLogStore extends BaseExternalLogStore {
private final String credentialsProviderName;
private final String regionName;

public DynamoDBLogStore(Configuration hadoopConf) throws IOException {
public S3DynamoDBLogStore(Configuration hadoopConf) throws IOException {
super(hadoopConf);

tableName = getParam(hadoopConf, DBB_CLIENT_TABLE, "delta_log");
Expand All @@ -105,9 +105,9 @@ public DynamoDBLogStore(Configuration hadoopConf) throws IOException {
"com.amazonaws.auth.DefaultAWSCredentialsProviderChain"
);
regionName = getParam(hadoopConf, DBB_CLIENT_REGION, "us-east-1");
LOG.info("DynamoDBLogStore using tableName {}", tableName);
LOG.info("DynamoDBLogStore using credentialsProviderName {}", credentialsProviderName);
LOG.info("DynamoDBLogStore using regionName {}", regionName);
LOG.info("using tableName {}", tableName);
LOG.info("using credentialsProviderName {}", credentialsProviderName);
LOG.info("using regionName {}", regionName);

client = getClient();
tryEnsureTableExists(hadoopConf);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,15 @@
* An ExternalLogStore implementation that allows for easy, probability-based error injection during
* runtime.
*
* This is used to test the error handling capabilities of DynamoDBLogStore during integration
* This is used to test the error-handling capabilities of S3DynamoDBLogStore during integration
* tests.
*/
public class FailingDynamoDBLogStore extends DynamoDBLogStore {
public class FailingS3DynamoDBLogStore extends S3DynamoDBLogStore {

private static java.util.Random rng = new java.util.Random();
private final ConcurrentHashMap<String, Float> errorRates;

public FailingDynamoDBLogStore(Configuration hadoopConf) throws IOException {
public FailingS3DynamoDBLogStore(Configuration hadoopConf) throws IOException {
super(hadoopConf);
errorRates = new ConcurrentHashMap<>();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,6 @@

package io.delta.storage.internal;

import scala.util.control.ControlThrowable;

import java.io.IOException;

public class LogStoreErrors {
Expand All @@ -31,8 +29,7 @@ public static boolean isNonFatal(Throwable t) {
if (t instanceof VirtualMachineError ||
t instanceof ThreadDeath ||
t instanceof InterruptedException ||
t instanceof LinkageError ||
t instanceof ControlThrowable) {
t instanceof LinkageError) {
return false;
}

Expand Down

0 comments on commit 952f25b

Please sign in to comment.