Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply data source v2 changes #576

Closed
wants to merge 71 commits into from
Closed

Apply data source v2 changes #576

wants to merge 71 commits into from

Conversation

mccheah
Copy link

@mccheah mccheah commented Jun 6, 2019

Applies the series of patches to get the latest data source v2 API, particularly for the table catalog and v2 logical plans.

dongjoon-hyun and others added 30 commits May 15, 2019 13:30
## What changes were proposed in this pull request?

This PR aims to make `DataSourceV2Strategy` normalize filters like [FileSourceStrategy](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L150-L158) when it pushes them into `SupportsPushDownFilters.pushFilters`.

## How was this patch tested?

Pass the Jenkins with the newly added test case.

Closes apache#23770 from dongjoon-hyun/SPARK-26865.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…rite.

## What changes were proposed in this pull request?

This adds two logical plans that implement the ReplaceData operation from the [logical plans SPIP](https://docs.google.com/document/d/1gYm5Ji2Mge3QBdOliFV5gSPTKlX4q1DCBXIkiyMv62A/edit?ts=5a987801#heading=h.m45webtwxf2d). These two plans will be used to implement Spark's `INSERT OVERWRITE` behavior for v2.

Specific changes:
* Add `SupportsTruncate`, `SupportsOverwrite`, and `SupportsDynamicOverwrite` to DSv2 write API
* Add `OverwriteByExpression` and `OverwritePartitionsDynamic` plans (logical and physical)
* Add new plans to DSv2 write validation rule `ResolveOutputRelation`
* Refactor `WriteToDataSourceV2Exec` into trait used by all DSv2 write exec nodes

## How was this patch tested?

* The v2 analysis suite has been updated to validate the new overwrite plans
* The analysis suite for `OverwriteByExpression` checks that the delete expression is resolved using the table's columns
* Existing tests validate that overwrite exec plan works
* Updated existing v2 test because schema is used to validate overwrite

Closes apache#23606 from rdblue/SPARK-26666-add-overwrite.

Authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?

Continue the API refactor for streaming write, according to the [doc](https://docs.google.com/document/d/1vI26UEuDpVuOjWw4WPoH2T6y8WAekwtI7qoowhOFnI4/edit?usp=sharing).

The major changes:
1. rename `StreamingWriteSupport` to `StreamingWrite`
2. add `WriteBuilder.buildForStreaming`
3. update existing sinks, to move the creation of `StreamingWrite` to `Table`

## How was this patch tested?

existing tests

Closes apache#23702 from cloud-fan/stream-write.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request?

This adds a v2 API for adding new catalog plugins to Spark.

* Catalog implementations extend `CatalogPlugin` and are loaded via reflection, similar to data sources
* `Catalogs` loads and initializes catalogs using configuration from a `SQLConf`
* `CaseInsensitiveStringMap` is used to pass configuration to `CatalogPlugin` via `initialize`

Catalogs are configured by adding config properties starting with `spark.sql.catalog.(name)`. The name property must specify a class that implements `CatalogPlugin`. Other properties under the namespace (`spark.sql.catalog.(name).(prop)`) are passed to the provider during initialization along with the catalog name.

This replaces apache#21306, which will be implemented in two multiple parts: the catalog plugin system (this commit) and specific catalog APIs, like `TableCatalog`.

## How was this patch tested?

Added test suites for `CaseInsensitiveStringMap` and for catalog loading.

Closes apache#23915 from rdblue/SPARK-24252-add-v2-catalog-plugins.

Authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?

- Support N-part identifier in SQL
- N-part identifier extractor in Analyzer

## How was this patch tested?

- A new unit test suite ResolveMultipartRelationSuite
- CatalogLoadingSuite

rblue cloud-fan mccheah

Closes apache#23848 from jzhuge/SPARK-26946.

Authored-by: John Zhuge <jzhuge@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…et Java 1.8

## What changes were proposed in this pull request?

Fix Scala 2.11 maven build issue after merging SPARK-26946.

## How was this patch tested?

Maven Scala 2.11 and 2.12 builds with `-Phadoop-provided -Phadoop-2.7 -Pyarn -Phive -Phive-thriftserver`.

Closes apache#24184 from jzhuge/SPARK-26946-1.

Authored-by: John Zhuge <jzhuge@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
…t path before delete it

## What changes were proposed in this pull request?
This is a followup PR to resolve comment: apache#23601 (review)

When Spark writes DataFrame with "overwrite" mode, it deletes the output path before actual writes. To safely handle the case that the output path doesn't exist,  it is suggested to follow the V1 code by checking the existence.

## How was this patch tested?

Apply apache#23836 and run unit tests

Closes apache#23889 from gengliangwang/checkFileBeforeOverwrite.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
… by data source

## What changes were proposed in this pull request?

In data source v2, if the data source scan implemented `SupportsReportStatistics`. `DataSourceV2Relation` should respect the row count reported by the data source.

## How was this patch tested?

New UT test.

Closes apache#23853 from ConeyLiu/report-row-count.

Authored-by: Xianyang Liu <xianyang.liu@intel.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ex in the write path

## What changes were proposed in this pull request?

In apache#23383, the file source V2 framework is implemented. In the PR, `FileIndex` is created as a member of `FileTable`, so that we can implement partition pruning like apache@0f9fcab in the future(As data source V2 catalog is under development, partition pruning is removed from the PR)

However, after write path of file source V2 is implemented, I find that a simple write will create an unnecessary `FileIndex`, which is required by `FileTable`. This is a sort of regression. And we can see there is a warning message when writing to ORC files
```
WARN InMemoryFileIndex: The directory file:/tmp/foo was not found. Was it deleted very recently?
```
This PR is to make `FileIndex` as a lazy value in `FileTable`, so that we can avoid creating unnecessary `FileIndex` in the write path.

## How was this patch tested?

Existing unit test

Closes apache#23774 from gengliangwang/moveFileIndexInV2.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ework

## What changes were proposed in this pull request?

The file source has a schema validation feature, which validates 2 schemas:
1. the user-specified schema when reading.
2. the schema of input data when writing.

If a file source doesn't support the schema, we can fail the query earlier.

This PR is to implement the same feature  in the `FileDataSourceV2` framework. Comparing to `FileFormat`, `FileDataSourceV2` has multiple layers. The API is added in two places:
1. Read path: the table schema is determined in `TableProvider.getTable`. The actual read schema can be a subset of the table schema.  This PR proposes to validate the actual read schema in  `FileScan`.
2.  Write path: validate the actual output schema in `FileWriteBuilder`.

## How was this patch tested?

Unit test

Closes apache#23714 from gengliangwang/schemaValidationV2.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?

Similar to `SaveMode`, we should remove streaming `OutputMode` from data source v2 API, and use operations that has clear semantic.

The changes are:
1. append mode: create `StreamingWrite` directly. By default, the `WriteBuilder` will create `Write` to append data.
2. complete mode: call `SupportsTruncate#truncate`. Complete mode means truncating all the old data and appending new data of the current epoch. `SupportsTruncate` has exactly the same semantic.
3. update mode: fail. The current streaming framework can't propagate the update keys, so v2 sinks are not able to implement update mode. In the future we can introduce a `SupportsUpdate` trait.

The behavior changes:
1. all the v2 sinks(foreach, console, memory, kafka, noop) don't support update mode. The fact is, previously all the v2 sinks implement the update mode wrong. None of them can really support it.
2. kafka sink doesn't support complete mode. The fact is, the kafka sink can only append data.

## How was this patch tested?

existing tests

Closes apache#23859 from cloud-fan/update.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
## What changes were proposed in this pull request?

Not all users wants to keep temporary checkpoint directories. Additionally hard to restore from it.

In this PR I've added a force delete flag which is default `false`. Additionally not clear for users when temporary checkpoint directory deleted so added log messages to explain this a bit more.

## How was this patch tested?

Existing + additional unit tests.

Closes apache#23732 from gaborgsomogyi/SPARK-26389.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…n it contains special chars

## What changes were proposed in this pull request?

When a user specifies a checkpoint location or a file sink output using a path containing special chars that need to be escaped in a path, the streaming query will store checkpoint and file sink metadata in a wrong place. In this PR, I uploaded a checkpoint that was generated by the following codes using Spark 2.4.0 to show this issue:

```
implicit val s = spark.sqlContext
val input = org.apache.spark.sql.execution.streaming.MemoryStream[Int]
input.addData(1, 2, 3)
val q = input.toDF.writeStream.format("parquet").option("checkpointLocation", ".../chk %#chk").start(".../output %#output")
q.stop()
```
Here is the structure of the directory:
```
sql/core/src/test/resources/structured-streaming/escaped-path-2.4.0
├── chk%252520%252525%252523chk
│   ├── commits
│   │   └── 0
│   ├── metadata
│   └── offsets
│       └── 0
├── output %#output
│   └── part-00000-97f675a2-bb82-4201-8245-05f3dae4c372-c000.snappy.parquet
└── output%20%25%23output
    └── _spark_metadata
        └── 0
```

In this checkpoint, the user specified checkpoint location is `.../chk %#chk` but the real path to store the checkpoint is `.../chk%252520%252525%252523chk` (this is generated by escaping the original path three times). The user specified output path is `.../output %#output` but the path to store `_spark_metadata` is `.../output%20%25%23output/_spark_metadata` (this is generated by escaping the original path once). The data files are still in the correct path (such as `.../output %#output/part-00000-97f675a2-bb82-4201-8245-05f3dae4c372-c000.snappy.parquet`).

This checkpoint will be used in unit tests in this PR.

The fix is just simply removing improper `Path.toUri` calls to fix the issue.

However, as the user may not read the release note and is not aware of this checkpoint location change, if they upgrade Spark without moving checkpoint to the new location, their query will just start from the scratch. In order to not surprise the users, this PR also adds a check to **detect the impacted paths and throws an error** to include the migration guide. This check can be turned off by an internal sql conf `spark.sql.streaming.checkpoint.escapedPathCheck.enabled`. Here are examples of errors that will be reported:

- Streaming checkpoint error:
```
Error: we detected a possible problem with the location of your checkpoint and you
likely need to move it before restarting this query.

Earlier version of Spark incorrectly escaped paths when writing out checkpoints for
structured streaming. While this was corrected in Spark 3.0, it appears that your
query was started using an earlier version that incorrectly handled the checkpoint
path.

Correct Checkpoint Directory: /.../chk %#chk
Incorrect Checkpoint Directory: /.../chk%252520%252525%252523chk

Please move the data from the incorrect directory to the correct one, delete the
incorrect directory, and then restart this query. If you believe you are receiving
this message in error, you can disable it with the SQL conf
spark.sql.streaming.checkpoint.escapedPathCheck.enabled.
```

- File sink error (`_spark_metadata`):
```
Error: we detected a possible problem with the location of your "_spark_metadata"
directory and you likely need to move it before restarting this query.

Earlier version of Spark incorrectly escaped paths when writing out the
"_spark_metadata" directory for structured streaming. While this was corrected in
Spark 3.0, it appears that your query was started using an earlier version that
incorrectly handled the "_spark_metadata" path.

Correct "_spark_metadata" Directory: /.../output %#output/_spark_metadata
Incorrect "_spark_metadata" Directory: /.../output%20%25%23output/_spark_metadata

Please move the data from the incorrect directory to the correct one, delete the
incorrect directory, and then restart this query. If you believe you are receiving
this message in error, you can disable it with the SQL conf
spark.sql.streaming.checkpoint.escapedPathCheck.enabled.
```

## How was this patch tested?

The new unit tests.

Closes apache#23733 from zsxwing/path-fix.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
…terruptedException

## What changes were proposed in this pull request?

Before a Kafka consumer gets assigned with partitions, its offset will contain 0 partitions. However, runContinuous will still run and launch a Spark job having 0 partitions. In this case, there is a race that epoch may interrupt the query execution thread after `lastExecution.toRdd`, and either `epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` or the next `runContinuous` will get interrupted unintentionally.

To handle this case, this PR has the following changes:

- Clean up the resources in `queryExecutionThread.runUninterruptibly`. This may increase the waiting time of `stop` but should be minor because the operations here are very fast (just sending an RPC message in the same process and stopping a very simple thread).
- Clear the interrupted status at the end so that it won't impact the `runContinuous` call. We may clear the interrupted status set by `stop`, but it doesn't affect the query termination because `runActivatedStream` will check `state` and exit accordingly.

I also updated the clean up codes to make sure exceptions thrown from `epochEndpoint.askSync[Unit](StopContinuousExecutionWrites)` won't stop the clean up.

## How was this patch tested?

Jenkins

Closes apache#24034 from zsxwing/SPARK-27111.

Authored-by: Shixiong Zhu <zsxwing@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
…cution

## What changes were proposed in this pull request?

Continuous processing is waiting on epochs which are not yet complete (for example one partition is not making progress) and stores pending items in queues. These queues are unbounded and can consume up all the memory easily. In this PR I've added `spark.sql.streaming.continuous.epochBacklogQueueSize` configuration possibility to make them bounded. If the related threshold reached then the query will stop with `IllegalStateException`.

## How was this patch tested?

Existing + additional unit tests.

Closes apache#23156 from gaborgsomogyi/SPARK-24063.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
… execution

## What changes were proposed in this pull request?

According to the [design](https://docs.google.com/document/d/1vI26UEuDpVuOjWw4WPoH2T6y8WAekwtI7qoowhOFnI4/edit?usp=sharing), the life cycle of `StreamingWrite` should be the same as the read side `MicroBatch/ContinuousStream`, i.e. each run of the stream query, instead of each epoch.

This PR fixes it.

## How was this patch tested?

existing tests

Closes apache#23981 from cloud-fan/dsv2.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
It's a little awkward to have 2 different classes(`CaseInsensitiveStringMap` and `DataSourceOptions`) to present the options in data source and catalog API.

This PR merges these 2 classes, while keeping the name `CaseInsensitiveStringMap`, which is more precise.

existing tests

Closes apache#24025 from cloud-fan/option.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This adds a new method, `capabilities` to `v2.Table` that returns a set of `TableCapability`. Capabilities are used to fail queries during analysis checks, `V2WriteSupportCheck`, when the table does not support operations, like truncation.

Existing tests for regressions, added new analysis suite, `V2WriteSupportCheckSuite`, for new capability checks.

Closes apache#24012 from rdblue/SPARK-26811-add-capabilities.

Authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…vel rules in the grammar file.

Currently in the grammar file the rule `query` is responsible to parse both select and insert statements. As a result, we need to have more semantic checks in the code to guard against in-valid insert constructs in a query. Couple of examples are in the `visitCreateView` and `visitAlterView` functions. One other issue is that, we don't catch the `invalid insert constructs` in all the places until checkAnalysis (the errors we raise can be confusing as well). Here are couple of examples :

```SQL
select * from (insert into bar values (2));
```
```
Error in query: unresolved operator 'Project [*];
'Project [*]
+- SubqueryAlias `__auto_generated_subquery_name`
   +- InsertIntoHiveTable `default`.`bar`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [c1]
      +- Project [cast(col1#18 as int) AS c1#20]
         +- LocalRelation [col1#18]
```

```SQL
select * from foo where c1 in (insert into bar values (2))
```
```
Error in query: cannot resolve '(default.foo.`c1` IN (listquery()))' due to data type mismatch:
The number of columns in the left hand side of an IN subquery does not match the
number of columns in the output of subquery.

Left side columns:
[default.foo.`c1`].
Right side columns:
[].;;
'Project [*]
+- 'Filter c1#6 IN (list#5 [])
   :  +- InsertIntoHiveTable `default`.`bar`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [c1]
   :     +- Project [cast(col1#7 as int) AS c1#9]
   :        +- LocalRelation [col1#7]
   +- SubqueryAlias `default`.`foo`
      +- HiveTableRelation `default`.`foo`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#6]
```

For both the cases above, we should reject the syntax at parser level.

In this PR, we create two top-level parser rules to parse `SELECT` and `INSERT` respectively.
I will create a small PR to allow CTEs in DESCRIBE QUERY after this PR is in.
Added tests to PlanParserSuite and removed the semantic check tests from SparkSqlParserSuites.

Closes apache#24150 from dilipbiswal/split-query-insert.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…o top-level rules in the grammar file."

This reverts commit c46db75.
…vel rules in the grammar file.

Currently in the grammar file the rule `query` is responsible to parse both select and insert statements. As a result, we need to have more semantic checks in the code to guard against in-valid insert constructs in a query. Couple of examples are in the `visitCreateView` and `visitAlterView` functions. One other issue is that, we don't catch the `invalid insert constructs` in all the places until checkAnalysis (the errors we raise can be confusing as well). Here are couple of examples :

```SQL
select * from (insert into bar values (2));
```
```
Error in query: unresolved operator 'Project [*];
'Project [*]
+- SubqueryAlias `__auto_generated_subquery_name`
   +- InsertIntoHiveTable `default`.`bar`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [c1]
      +- Project [cast(col1#18 as int) AS c1#20]
         +- LocalRelation [col1#18]
```

```SQL
select * from foo where c1 in (insert into bar values (2))
```
```
Error in query: cannot resolve '(default.foo.`c1` IN (listquery()))' due to data type mismatch:
The number of columns in the left hand side of an IN subquery does not match the
number of columns in the output of subquery.

Left side columns:
[default.foo.`c1`].
Right side columns:
[].;;
'Project [*]
+- 'Filter c1#6 IN (list#5 [])
   :  +- InsertIntoHiveTable `default`.`bar`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [c1]
   :     +- Project [cast(col1#7 as int) AS c1#9]
   :        +- LocalRelation [col1#7]
   +- SubqueryAlias `default`.`foo`
      +- HiveTableRelation `default`.`foo`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#6]
```

For both the cases above, we should reject the syntax at parser level.

In this PR, we create two top-level parser rules to parse `SELECT` and `INSERT` respectively.
I will create a small PR to allow CTEs in DESCRIBE QUERY after this PR is in.
Added tests to PlanParserSuite and removed the semantic check tests from SparkSqlParserSuites.

Closes apache#24150 from dilipbiswal/split-query-insert.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…o top-level rules in the grammar file."

This reverts commit b9a2061.
… ANSI SQL standard

## What changes were proposed in this pull request?
This pr targeted to define reserved/non-reserved keywords for Spark SQL based on the ANSI SQL standards and the other database-like systems (e.g., PostgreSQL). We assume that they basically follow the ANSI SQL-2011 standard, but it is slightly different between each other. Therefore, this pr documented all the keywords in `docs/sql-reserved-and-non-reserved-key-words.md`.

NOTE: This pr only added a small set of keywords as reserved ones and these keywords are reserved in all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011, and SQL-2016) and PostgreSQL. This is because there is room to discuss which keyword should be reserved or not, .e.g., interval units (day, hour, minute, second, ...) are reserved in the ANSI SQL standards though, they are not reserved in PostgreSQL. Therefore, we need more researches about the other database-like systems (e.g., Oracle Databases, DB2, SQL server) in follow-up activities.

References:
 - The reserved/non-reserved SQL keywords in the ANSI SQL standards: https://developer.mimer.com/wp-content/uploads/2018/05/Standard-SQL-Reserved-Words-Summary.pdf
 - SQL Key Words in PostgreSQL: https://www.postgresql.org/docs/current/sql-keywords-appendix.html

## How was this patch tested?
Added tests in `TableIdentifierParserSuite`.

Closes apache#23259 from maropu/SPARK-26215-WIP.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
## What changes were proposed in this pull request?
I see the following new warning from ANTR4 after SPARK-26215 after it added `SCHEMA` keyword in the reserved/unreserved list. This is a minor PR to cleanup the warning.

```
WARNING] warning(125): org/apache/spark/sql/catalyst/parser/SqlBase.g4:784:90: implicit definition of token SCHEMA in parser
[WARNING] .../apache/spark/org/apache/spark/sql/catalyst/parser/SqlBase.g4 [784:90]: implicit definition of token SCHEMA in parser
```
## How was this patch tested?
Manually built catalyst after the fix to verify

Closes apache#23897 from dilipbiswal/minor_parser_token.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…of a query.

Currently we can use `df.printSchema` to discover the schema information for a query. We should have a way to describe the output schema of a query using SQL interface.

Example:

DESCRIBE SELECT * FROM desc_table
DESCRIBE QUERY SELECT * FROM desc_table
```SQL

spark-sql> create table desc_table (c1 int comment 'c1-comment', c2 decimal comment 'c2-comment', c3 string);

spark-sql> desc select * from desc_table;
c1	int	        c1-comment
c2	decimal(10,0)	c2-comment
c3	string	        NULL

```
Added a new test under SQLQueryTestSuite and SparkSqlParserSuite

Closes apache#23883 from dilipbiswal/dkb_describe_query.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This moves parsing `CREATE TABLE ... USING` statements into catalyst. Catalyst produces logical plans with the parsed information and those plans are converted to v1 `DataSource` plans in `DataSourceAnalysis`.

This prepares for adding v2 create plans that should receive the information parsed from SQL without being translated to v1 plans first.

This also makes it possible to parse in catalyst instead of breaking the parser across the abstract `AstBuilder` in catalyst and `SparkSqlParser` in core.

For more information, see the [mailing list thread](https://lists.apache.org/thread.html/54f4e1929ceb9a2b0cac7cb058000feb8de5d6c667b2e0950804c613%3Cdev.spark.apache.org%3E).

This uses existing tests to catch regressions. This introduces no behavior changes.

Closes apache#24029 from rdblue/SPARK-27108-add-parsed-create-logical-plans.

Authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?

This adds a public Expression API that can be used to pass partition transformations to data sources.

## How was this patch tested?

Existing tests to validate no regressions. Added transform cases to DDL suite and v1 conversions suite.

Closes apache#24117 from rdblue/add-public-transform-api.

Authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?

This adds the TableCatalog API proposed in the [Table Metadata API SPIP](https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.m45webtwxf2d).

For `TableCatalog` to use `Table`, it needed to be moved into the catalyst module where the v2 catalog API is located. This also required moving `TableCapability`. Most of the files touched by this PR are import changes needed by this move.

## How was this patch tested?

This adds a test implementation and contract tests.

Closes apache#24246 from rdblue/SPARK-24252-add-table-catalog-api.

Authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This adds a v2 implementation for CTAS queries

* Update the SQL parser to parse CREATE queries using multi-part identifiers
* Update `CheckAnalysis` to validate partitioning references with the CTAS query schema
* Add `CreateTableAsSelect` v2 logical plan and `CreateTableAsSelectExec` v2 physical plan
* Update create conversion from `CreateTableAsSelectStatement` to support the new v2 logical plan
* Update `DataSourceV2Strategy` to convert v2 CTAS logical plan to the new physical plan
* Add `findNestedField` to `StructType` to support reference validation

We have been running these changes in production for several months. Also:

* Add a test suite `CreateTablePartitioningValidationSuite` for new analysis checks
* Add a test suite for v2 SQL, `DataSourceV2SQLSuite`
* Update catalyst `DDLParserSuite` to use multi-part identifiers (`Seq[String]`)
* Add test cases to `PlanResolutionSuite` for v2 CTAS: known catalog and v2 source implementation

Closes apache#24570 from rdblue/SPARK-24923-add-v2-ctas.

Authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@mccheah
Copy link
Author

mccheah commented Jun 6, 2019

@robert3005 @vinooganesh @yifeih

mccheah and others added 18 commits June 6, 2019 14:47
## What changes were proposed in this pull request?

To move DS v2 API to the catalyst module, we can't refer to an internal class (`MutableColumnarRow`) in `ColumnarBatch`.

This PR creates a read-only version of `MutableColumnarRow`, and use it in `ColumnarBatch`.

close apache#24546

## How was this patch tested?

existing tests

Closes apache#24581 from cloud-fan/mutable-row.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
## What changes were proposed in this pull request?

move
```scala
org.apache.spark.sql.execution.streaming.BaseStreamingSource
org.apache.spark.sql.execution.streaming.BaseStreamingSink
```
to java directory

## How was this patch tested?

Existing UT.

Closes apache#24222 from ConeyLiu/move-scala-to-java.

Authored-by: Xianyang Liu <xianyang.liu@intel.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
This is a followup of apache#24012 , to add the corresponding capabilities for streaming.

existing tests

Closes apache#24129 from cloud-fan/capability.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
There is a MemorySink v2 already so v1 can be removed. In this PR I've removed it completely.
What this PR contains:
* V1 memory sink removal
* V2 memory sink renamed to become the only implementation
* Since DSv2 sends exceptions in a chained format (linking them with cause field) I've made python side compliant
* Adapted all the tests

Existing unit tests.

Closes apache#24403 from gaborgsomogyi/SPARK-23014.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
## What changes were proposed in this pull request?

`BaseStreamingSource` and `BaseStreamingSink` is used to unify v1 and v2 streaming data source API in some code paths.

This PR removes these 2 interfaces, and let the v1 API extend v2 API to keep API compatibility.

The motivation is apache#24416 . We want to move data source v2 to catalyst module, but `BaseStreamingSource` and `BaseStreamingSink` are in sql/core.

## How was this patch tested?

existing tests

Closes apache#24471 from cloud-fan/streaming.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
## What changes were proposed in this pull request?

To move DS v2 to the catalyst module, we can't make v2 offset rely on v1 offset, as v1 offset is in sql/core.

## How was this patch tested?

existing tests

Closes apache#24538 from cloud-fan/offset.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Add a SQL config property for the default v2 catalog.

Existing tests for regressions.

Closes apache#24594 from rdblue/SPARK-27693-add-default-catalog-config.

Authored-by: Ryan Blue <blue@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@vinooganesh
Copy link

vinooganesh commented Jun 6, 2019

@mccheah should we just take Scala 2.12 rather than do the hacks?

francis0407 and others added 3 commits June 6, 2019 18:14
In DataSourceV2Strategy, it seems we eliminate the subqueries by mistake after normalizing filters.
We have a sql with a scalar subquery:

``` scala
val plan = spark.sql("select * from t2 where t2a > (select max(t1a) from t1)")
plan.explain(true)
```

And we get the log info of DataSourceV2Strategy:
```
Pushing operators to csv:examples/src/main/resources/t2.txt
Pushed Filters:
Post-Scan Filters: isnotnull(t2a#30)
Output: t2a#30, t2b#31
```

The `Post-Scan Filters` should contain the scalar subquery, but we eliminate it by mistake.
```
== Parsed Logical Plan ==
'Project [*]
+- 'Filter ('t2a > scalar-subquery#56 [])
   :  +- 'Project [unresolvedalias('max('t1a), None)]
   :     +- 'UnresolvedRelation `t1`
   +- 'UnresolvedRelation `t2`

== Analyzed Logical Plan ==
t2a: string, t2b: string
Project [t2a#30, t2b#31]
+- Filter (t2a#30 > scalar-subquery#56 [])
   :  +- Aggregate [max(t1a#13) AS max(t1a)#63]
   :     +- SubqueryAlias `t1`
   :        +- RelationV2[t1a#13, t1b#14] csv:examples/src/main/resources/t1.txt
   +- SubqueryAlias `t2`
      +- RelationV2[t2a#30, t2b#31] csv:examples/src/main/resources/t2.txt

== Optimized Logical Plan ==
Filter (isnotnull(t2a#30) && (t2a#30 > scalar-subquery#56 []))
:  +- Aggregate [max(t1a#13) AS max(t1a)#63]
:     +- Project [t1a#13]
:        +- RelationV2[t1a#13, t1b#14] csv:examples/src/main/resources/t1.txt
+- RelationV2[t2a#30, t2b#31] csv:examples/src/main/resources/t2.txt

== Physical Plan ==
*(1) Project [t2a#30, t2b#31]
+- *(1) Filter isnotnull(t2a#30)
   +- *(1) BatchScan[t2a#30, t2b#31] class org.apache.spark.sql.execution.datasources.v2.csv.CSVScan
```

ut

Closes apache#24321 from francis0407/SPARK-27411.

Authored-by: francis0407 <hanmingcong123@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.sources.StringContains$"),

// [SPARK-26216][SQL] Do not use case class as public API (UserDefinedFunction)
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.expressions.UserDefinedFunction$"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You likely need to revert changes below

@robert3005
Copy link

I think if you revert the sql parser changes then this is fine

bulldozer-bot bot pushed a commit that referenced this pull request Nov 7, 2019
###### _excavator_ is a bot for automating changes across repositories.

Changes produced by the roomba/latest-baseline check.

# Release Notes
## 0.50.0
[feature] Warn against .parallel() calls on Java streams (#537) 
[fix] Correct prioritisation of versions.props to match nebula logic (#533)


## 0.51.0
- [feature] New 'com.palantir.baseline-reproducibility' plugin (#539) 
- [improvement] `./gradlew idea` deletes redundant ipr files (#550) 
- [fix] ValidateConstantMessage error-prone check is more accurate. (#546)


## 0.51.1
- [fix] Fix cleanup of old idea project files (#559) 
- [fix] Remove stale references to no longer existing puppycrawl checkstyle DTDs (#556)

## 0.52.0
- [improvement] errorprone 2.3.2 -> 2.3.3 (#561) 
- [feature] new plugin: `com.palantir.baseline-exact-dependencies` helps declare necessary and sufficient dependencies (#548)
- [improvement] Split out circle style plugin into generic junit reports plugin #564

## 0.53.0
- [improvement] Disallow javafx imports with checkstyle #569
- [fix] Avoid lambda to allow build caching of checkstyle results #576

## 0.54.0
- [feature] New `com.palantir.baseline-release-compatibility` plugin (#582)

## 0.55.0
[break] Enable running of unique class check on multiple configurations (#583)

## 0.55.1
[fix] checkImplicitDependencies shouldn't count ignored artifacts (#601) 

## 0.55.2
[fix] BaselineReleaseCompatibility up-to-date checking of compile tasks (#605)

## 0.56.0
[feature] Add an errorprone rule GradleCacheableTaskAction that prevents passing a lambda to Task.doFirst or Task.doLast when implementing gradle plugins (#608)

## 0.57.0
* [feature] Error prone rule to replace `Iterables.partition(List, int)` with `Lists.partition(List, int)` (#622)
* [feature] Error prone rule to prefer `Lists` or `Collections2` `transfrom` over `Iterables.transform` (#623)

## 0.58.0
[improvement] make CheckClassUniquenessTask cacheable (#637)
[fix] Add Javac Settings to uncheck "Use compiler from module target JDK when possible" (#629)
[fix] class uniqueness rule must have a config (#638) 

## 0.59.0
[improvement] Spotless to remove blank lines at start of methods (#641) 

## 0.60.0
* [improvement] New PreferBuiltInConcurrentKeySet suggestion (#649) 
* [improvement] Start publishing plugin to the [Gradle plugin portal](https://plugins.gradle.org/plugin/com.palantir.baseline) (#613)

## 0.61.0
- [improvement] Sensible defaults for test tasks (HeapDumpOnOutOfMemory) (#652)

## 0.62.0
* [improvement] Ensure Optional#orElse argument is not method invocation (#655)


## 0.62.1
[fix] Revert "[improvement] Ensure Optional#orElse argument is not method invocation" (#659)

## 0.63.0
[improvement] Support auto-applying error-prone suggested fixes (#660) 

## 0.64.0
* [improvement] Refaster rule compilation (#661)

## 0.64.1
- [improvement] JUnit 5 boilerplate #666

## 0.65.0
[improvement] Error-prone check to help prevent logging AuthHeader and BearerToken (#654)
[fix] fix potential NPE when configuring testing (#669) 
[fix] Fix refaster compilation to support version recommendations (#667)

## 0.66.0
[improvement] Ignore DesignForExtension for ParameterizedTest (#673) 

## 0.66.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | The PreventTokenLogging error-prone check will now correctly handle null use in SLF4J and Safe/Unsafe Arg functions. | palantir/gradle-baseline#674 |


## 1.0.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Add refaster rule to migrate away from optional.orElse(supplier.get()) | palantir/gradle-baseline#679 |
| Fix | Projects can now compile using Java12, because the one errorprone check that breaks (Finally) is now disabled when you use this toolchain. It remains enabled when compiling against earlier JDKs. | palantir/gradle-baseline#681 |


## 1.1.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Ensure that format tasks execute after compilation | palantir/gradle-baseline#688 |


## 1.1.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Auto-fix OptionalOrElseMethodInvocation using `-PerrorProneApply`. | palantir/gradle-baseline#690 |


## 1.2.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Spotless check for disallowing dangling parenthesis. | palantir/gradle-baseline#687 |


## 1.3.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Don't cache test tasks in the build cache by default.<br>It's possible to restore caching by adding `com.palantir.baseline.restore-test-cache = true` to your `gradle.properties`. | palantir/gradle-baseline#694 |


## 1.4.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | No longer cache javaCompile tasks when applying errorprone or refaster checks. | palantir/gradle-baseline#696 |
| Feature | Test helper for refaster checks. | palantir/gradle-baseline#697 |


## 1.5.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Determine whether to use junitPlatform on a per source set basis | palantir/gradle-baseline#701 |
| Feature | OptionalOrElseMethodInvocation now checks for constructor invocations. | palantir/gradle-baseline#702 |


## 1.6.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | The severity of PreferSafeLoggableExceptions and PreferSafeLoggingPreconditions is now WARNING. | palantir/gradle-baseline#704 |
| Fix | OptionalOrElseMethodInvocation now allows method references in orElse. | palantir/gradle-baseline#709 |


## 1.6.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Do not overwrite user provided test configure when using junit5 | palantir/gradle-baseline#712 |


## 1.7.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Baseline can now re-format all your Java files using the Eclipse formatter. This is currently an opt-in preview, try it out by running `./gradlew format -Pcom.palantir.baseline-format.eclipse`. | palantir/gradle-baseline#707 |
| Improvement | Add errorprone check to ensure junit5 tests are not used with junit4 Rule/ClassRule | palantir/gradle-baseline#714 |


## 1.8.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Checkstyle now tolerates empty lambda bodies (e.g. `() -> {}` | palantir/gradle-baseline#715 |


## 1.8.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Correctly set dependency between spotlessApply and baselineUpdateConfig to prevent a race | palantir/gradle-baseline#724 |


## 1.8.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Correctly handle `EnableRuleMigrationSupport` in `JUnit5RuleUsage` errorprone-rule | palantir/gradle-baseline#725 |


## 1.9.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Wrap long parameterized types where necessary | palantir/gradle-baseline#716 |
| Improvement | Allow suppression of the TODO checkstyle check by giving it an ID. Clarify its comment to allow // TODO(username): ... | palantir/gradle-baseline#727 |
| Improvement | IntelliJ GitHub issue navigation | palantir/gradle-baseline#729 |
| Improvement | print out suggestion for module dependencies inclusion in useful format | palantir/gradle-baseline#733 |
| Fix | The `checkImplicitDependencies` task will no longer suggest a fix of the current project. | palantir/gradle-baseline#736, palantir/gradle-baseline#567 |
| Improvement | Implement DangerousCompletableFutureUsage errorprone check | palantir/gradle-baseline#740 |


## 1.10.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Refaster to use `execute` over `submit` when the result is ignored | palantir/gradle-baseline#741 |


## 1.10.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Enable applying refaster rules for repos with -Xlint:deprecation | palantir/gradle-baseline#742 |


## 1.11.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Apply `InputStreamSlowMultibyteRead` error prone check at ERROR severity | palantir/gradle-baseline#749 |


## 1.12.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | The `baseline-idea` plugin now generates configuration more closely aligned with Gradle defaults. | palantir/gradle-baseline#718 |
| Improvement | Apply the suggested fixes for `UnusedMethod` and `UnusedVariable`. | palantir/gradle-baseline#751 |
| Improvement | Refaster `stream.sorted().findFirst()` into `stream.min(Comparator.naturalOrder())` | palantir/gradle-baseline#752 |
| Improvement | Error prone validation that Stream.sort is invoked on comparable streams | palantir/gradle-baseline#753 |
| Improvement | `DangerousStringInternUsage`: Disallow String.intern() invocations | palantir/gradle-baseline#754 |


## 1.12.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Do not apply the suggested fixes for `UnusedMethod` and `UnusedVariable` which automaticall remove code with side effects. | palantir/gradle-baseline#757 |


## 1.13.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Remove errorprone `LogSafePreconditionsConstantMessage` | palantir/gradle-baseline#755 |
| Improvement | Disable errorprone `Slf4jLogsafeArgs` in test code | palantir/gradle-baseline#756 |
| Improvement | error-prone now detects `Duration#getNanos` mistakes and bans URL in equals methods | palantir/gradle-baseline#758 |


## 1.14.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Implement `OptionalOrElseThrowThrows` to prevent throwing from orElseThrow | palantir/gradle-baseline#759 |


## 1.15.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | LogSafePreconditionsMessageFormat disallows slf4j-style format characters | palantir/gradle-baseline#761 |
| Improvement | Error Prone LambdaMethodReference check | palantir/gradle-baseline#763 |


## 1.16.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | baseline-circleci no longer integrates with the (deprecated) FindBugs plugin, as a pre-requisite for Gradle 6.0 compatibility. | palantir/gradle-baseline#766 |


## 1.17.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | The `TypeParameterUnusedInFormals` errorprone check is disabled when compiling on Java 13, to workaround an error-prone bug. | palantir/gradle-baseline#767 |
| Improvement | Publish scm information within POM | palantir/gradle-baseline#769 |


## 1.17.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | LambdaMethodReference avoids suggestions for non-static methods | palantir/gradle-baseline#771 |


## 1.17.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Remove pom only dependencies from analysis in checkUnusedDependencies | palantir/gradle-baseline#773 |


## 1.18.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | When computing unused dependencies, compileOnly and annotationProcessor<br>dependencies are ignored due to false positives as these dependencies<br>will not appear as dependencies in the generated byte-code, but are in<br>fact necessary dependencies to compile a given module. | palantir/gradle-baseline#783 |


## 1.19.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Disable `PreconditionsConstantMessage` on gradle plugins | palantir/gradle-baseline#790 |


## 2.0.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Break | Add gradle 6.0-20190904072820+0000 compatibiltiy. This raises minimum required version of gradle for plugins from this repo to 5.0. | palantir/gradle-baseline#791 |


## 2.1.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Feature | Automatically configure the [Intellij Eclipse format plugin](https://plugins.jetbrains.com/plugin/6546-eclipse-code-formatter) to use the eclipse formatter | palantir/gradle-baseline#794 |


## 2.1.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Stop applying error prone patches for checks that have been turned off. | palantir/gradle-baseline#793 |


## 2.2.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | baseline-circleci now validates that the rootProject.name isn't the CircleCI default (`project`) as can interfere with publishing. | palantir/gradle-baseline#775 |
| Improvement | Remove JGit dependency | palantir/gradle-baseline#798 |


## 2.2.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Don't add whitespace to blank lines inside comments. Fixes apache#799 | palantir/gradle-baseline#800 |
| Fix | Eclipse formatter now aligns multicatch so that it passes checkstyle. | palantir/gradle-baseline#807 |


## 2.2.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | ClassUniquenessPlugin now checks the `runtimeClasspath` configuration by default. | palantir/gradle-baseline#810 |


## 2.3.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | SafeLoggingExceptionMessageFormat disallows `{}` in safelog exception messages | palantir/gradle-baseline#815 |


## 2.4.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | A new `StrictUnusedVariable` check will catch any unused arguments (e.g. AuthHeaders) to public methods. If you need to suppress this, rename your variable to have an underscore prefix (e.g. `s/foo/_foo/`) or just run `./gradlew classes -PerrorProneApply` to auto-fix | palantir/gradle-baseline#819 |
| Improvement | Message format checks use instanceof rather than catching | palantir/gradle-baseline#821 |


## 2.4.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Avoid false positives caused by `module-info.class` when checking class uniqueness | palantir/gradle-baseline#823 |


## 2.4.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Checkstyle tasks only check their own source set and only actual java sources. They don't look in your `src/*/resources` directory anymore. | palantir/gradle-baseline#830 |


## 2.4.3
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Add link to StrictUnusedVariable that directs users to baseline repo. | palantir/gradle-baseline#829 |
| Fix | Long try-with-resources statements are now aligned such that the first assignment stays on the first line. | palantir/gradle-baseline#835 |


## 2.5.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Error Prone StringBuilderConstantParameters. StringBuilder with a constant number of parameters should be replaced by simple concatenation. The Java compiler (jdk8) replaces concatenation of a constant number of arguments with a StringBuilder, while jdk 9+ take advantage of JEP 280 (https://openjdk.java.net/jeps/280) to efficiently pre-size the result for better performance than a StringBuilder. | palantir/gradle-baseline#832 |


## 2.6.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Excavator PRs that apply other refaster rules (e.g. Witchcraft ones) will not also apply baseline refaster rules. | palantir/gradle-baseline#827 |
| Improvement | Added a new ErrorProne check `PreferAssertj` to assist migration to AssertJ from legacy test frameworks. It may be necessary to add a dependency on `org.assertj:assertj-core` in modules which do not already depend on AssertJ. If there's a technical reason that AssertJ cannot be used, `PreferAssertj` may be explicitly disabled to prevent future upgrades from attempting to re-run the migration. | palantir/gradle-baseline#841 |


## 2.7.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | `StrictUnusedVariable` now ignores variables prefixed with `_` and the suggested fix will rename all unused parameters in public methods instead of removing them | palantir/gradle-baseline#833 |
| Improvement | ErrorProne will now detect dangerous usage of `@RunWith(Suite.class)` that references JUnit5 classes, as this can cause tests to silently not run! | palantir/gradle-baseline#843 |


## 2.8.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | PreferAssertj provides better replacements fixes | palantir/gradle-baseline#850 |
| Improvement | Do not run error prone on any code in the build directory | palantir/gradle-baseline#853 |


## 2.8.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Fix hamcrest arrayContainingInAnyOrder conversion | palantir/gradle-baseline#859 |


## 2.9.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | StrictUnusedVariable can only be suppressed with `_` prefix | palantir/gradle-baseline#854 |
| Improvement | StrictUnusedVariable is now an error by default | palantir/gradle-baseline#855 |
| Fix | The PreferAssertj refactoring will only be applied if you have explicitly opted in (e.g. using `baselineErrorProne { patchChecks += 'PreferAssertj' }` | palantir/gradle-baseline#861 |


## 2.9.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Error prone will correctly ignore all source files in the build directory and in any generated source directory | palantir/gradle-baseline#864 |
| Fix | Ensure that `StrictUnusedVariable` correctly converts previously suppressed variables `unused` to `_` | palantir/gradle-baseline#865 |


## 2.9.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | When removing unused variables, `StrictUnusedVariable` will preserve side effects | palantir/gradle-baseline#870 |


## 2.10.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | A new `checkJUnitDependencies` task detects misconfigured JUnit dependencies which could result in some tests silently not running. | palantir/gradle-baseline#837 |
| Improvement | Some AssertJ assertions can now be automatically replaced with more idiomatic ones using refaster. | palantir/gradle-baseline#851 |
| Fix | PreferAssertj check avoids ambiguity in assertThat invocations | palantir/gradle-baseline#874 |
| Improvement | Improve performannce of error prone PreferAssertj check | palantir/gradle-baseline#875 |
| Improvement | StringBuilderConstantParameters suggested fix doesn't remove comments | palantir/gradle-baseline#877 |


## 2.10.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Allow junit4 dependencies to exist without junit4 tests | palantir/gradle-baseline#880 |


## 2.11.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | PreferAssertj supports migration of zero-delta floating point array asserts | palantir/gradle-baseline#883 |


## 2.11.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | checkJunitDependencies only checks Java source | palantir/gradle-baseline#885 |


## 2.11.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | AssertJ Refaster fixes use static `assertThat` imports | palantir/gradle-baseline#887 |


## 2.12.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Disable `UnusedVariable` error prone rule by default | palantir/gradle-baseline#888 |


## 2.13.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Refaster for AssertJ isZero/isNotZero/isOne and collections | palantir/gradle-baseline#881 |
| Improvement | AssertJ refaster migrations support string descriptions | palantir/gradle-baseline#891 |
| Fix | Certain error-prone checks are disabled in test code, and the presence of JUnit5's `@TestTemplate` annotation is now used to detect whether a class is test code. | palantir/gradle-baseline#892 |
| Fix | BaselineFormat task exclude generated code on Windows | palantir/gradle-baseline#896 |


## 2.14.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Refaster rules for AssertJ tests | palantir/gradle-baseline#898 |
| Improvement | refaster replacement for assertj hasSize(foo.size) -> hasSameSizeAs | palantir/gradle-baseline#900 |
| Fix | Keep spotless plugin from eagerly configuring all tasks | diffplug/spotless#444 |
| Fix | Continue when RefasterRuleBuilderScanner throws | palantir/gradle-baseline#904 |
| Improvement | Refaster now works on repos using Gradle 6.0 | palantir/gradle-baseline#804, palantir/gradle-baseline#906 |


## 2.15.0
_No documented user facing changes_

## 2.16.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Rewrite ImmutableCollection#addAll to add for arrays | palantir/gradle-baseline#743 |
| Improvement | Add refaster rule to simplify empty optional asserts | palantir/gradle-baseline#911 |
| Improvement | Baseline now allows static imports of AssertJ and Mockito methods. | palantir/gradle-baseline#915 |
| Improvement | Remove refaster AssertjIsOne rule. | palantir/gradle-baseline#917 |
| Improvement | Add assertj refaster rules for map size asserts | palantir/gradle-baseline#919 |
| Improvement | Added a Refaster rule to change `isEqualTo` checks into `hasValue` checks |  |


## 2.17.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Implement AssertjCollectionHasSameSizeAsArray | palantir/gradle-baseline#922 |
| Improvement | Implement assertj map refactors for containsKey and containsEntry | palantir/gradle-baseline#925 |
| Improvement | Refaster assertj migrations support descriptions with format args | palantir/gradle-baseline#926 |
| Improvement | Refaster out String.format from describedAs | palantir/gradle-baseline#927 |


## 2.18.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Refaster rules to simplify negated boolean expressions and extract null checks. | palantir/gradle-baseline#935 |
| Improvement | Refaster rules for checks that maps do not contain a specific key | palantir/gradle-baseline#935 |
| Improvement | Refaster rule 'CollectionStreamForEach' | palantir/gradle-baseline#942 |
| Improvement | ExecutorSubmitRunnableFutureIgnored as error prone ERROR | palantir/gradle-baseline#943 |


## 2.19.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | checkJUnitDependencies detects a possible misconfiguration with spock and JUnit5 which could lead to tests silently not running. | palantir/gradle-baseline#951 |


## 2.20.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Use Mockito verifyNoInteractions over deprecated verifyZeroInteractions | palantir/gradle-baseline#924 |
| Improvement | Errorprone rules for usage of Guava static factory methods | palantir/gradle-baseline#941 |
| Improvement | Fix error-prone `UnnecessaryParentheses` by default | palantir/gradle-baseline#952 |
| Improvement | Implement Error Prone `ThrowError` to discourage throwing Errors in production code<br>Errors are often handled poorly by libraries resulting in unexpected<br>behavior and resource leaks. It's not obvious that 'catch (Exception e)'<br>does not catch Error.<br>This check  is intended to be advisory - it's fine to<br>`@SuppressWarnings("ThrowError")` in certain cases, but is usually not<br>recommended unless you are writing a testing library that throws<br>AssertionError. | palantir/gradle-baseline#957 |
| Improvement | Improve TestCheckUtils.isTestCode test detection | palantir/gradle-baseline#958 |
| Improvement | Implement Error Prone `Slf4jLevelCheck` to validate that slf4j level checks agree with contained logging. | palantir/gradle-baseline#960 |


## 2.20.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Suppress error-prone PreferCollectionConstructors on jdk13 | palantir/gradle-baseline#968 |


## 2.21.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Feature | Users can opt-in to format their files using our fork of google-java-format (palantir-java-format) | palantir/gradle-baseline#936 |


## 2.22.0
_Automated release, no documented user facing changes_

## 2.23.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Implement error prone ReverseDnsLookup for unexpected reverse dns lookups<br><br>Calling address.getHostName may result in a DNS lookup which is a network request,<br>making the invocation significantly more expensive than expected depending on the<br>environment.<br>This check  is intended to be advisory - it's fine to<br>@SuppressWarnings("ReverseDnsLookup") in certain cases, but is usually not<br>recommended. | palantir/gradle-baseline#970 |


## 2.24.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | The deprecated `verifyZeroInteractions` now gets rewritten to `verifyNoMoreInteractions`, which has the same behaviour. | palantir/gradle-baseline#975 |
| Improvement | ReadReturnValueIgnored: Check that read operation results are not ignored | palantir/gradle-baseline#978 |
| Improvement | Stop migrating source sets to safe-logging, unless they already have the requisite library (`com.palantir.safe-logging:preconditions`). | palantir/gradle-baseline#981 |
| Improvement | For users who opted into palantir-java-format, we now reflow strings and reorder imports. | palantir/gradle-baseline#982 |


## 2.25.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | checkstyle Indentation rule is disabled when palantir-java-format is enabled | palantir/gradle-baseline#987 |
| Improvement | Load palantir-java-format dynamically from the same configuration set up by `com.palantir-java-format` which is also used to determine the version used by IntelliJ. | palantir/gradle-baseline#989 |


## 2.26.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Run `./gradlew formatDiff` to reformat the relevant sections of any uncommitted changed Java files (relies on `git diff -U0 HEAD` under the hood) | palantir/gradle-baseline#988 |


## 2.27.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Slf4jLogsafeArgs fixes safe-log wrapped throwables | palantir/gradle-baseline#1001 |
| Improvement | `DangerousParallelStreamUsage` checks for `Collection.parallelStream()` and `StreamSupport` utility methods with parallel=true. | palantir/gradle-baseline#1005 |
| Improvement | DangerousThrowableMessageSafeArg disallows Throwables in SafeArg values.<br>Throwables must be logged without an Arg wrapper as the last parameter, otherwise unsafe data may be leaked from the unsafe message or the unsafe message of a cause. | palantir/gradle-baseline#997 |
| Improvement | Implement a suggested fix for CatchBlockLogException | palantir/gradle-baseline#998 |


## 2.28.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Implement `FinalClass` error prone check, replacing the checkstyle implementation | palantir/gradle-baseline#1008 |
| Improvement | Error prone validation to avoid redundant modifiers | palantir/gradle-baseline#1010 |


## 2.28.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Fix `RedundantModifier` interpretation of implicit modifiers | palantir/gradle-baseline#1014 |


## 2.28.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Fix RedundantModifier failures types nested in interfaces | palantir/gradle-baseline#1017 |


## 2.28.3
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Fix error-prone mathcing literal null as a subtype.<br>The most common issue this fixes is failures on `SafeArg.of("name", null)`<br>assuming that the null literal value parameter may be a throwable. | palantir/gradle-baseline#1020 |



To enable or disable this check, please contact the maintainers of Excavator.
@robert3005 robert3005 closed this Feb 28, 2020
@robert3005 robert3005 deleted the stage-dsv2-changes branch February 28, 2020 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.