INSERT INTO fails to append data [Spark] #213

osopardo1 · 2023-08-31T10:13:37Z

What went wrong?

After a few tests with types and SQL statements for #211 , we found out that the INSERT INTO was not behaving as expected. Fails to write data or does not find the expected columns of the schema.

How to reproduce?

Different steps about how to reproduce the problem.

1. Code that triggered the bug, or steps to reproduce:

spark.sql( "CREATE TABLE tbl(c1 STRING, c2 TIMESTAMP) " +
    	"USING qbeast OPTIONS ('columnsToIndex’=‘c1’)”)

spark.sql("""INSERT INTO tbl VALUES('foo','2022-01-02 03:04:05.123456')""".stripMargin)

The test throws the following error:

c1 does not exist. Available: col1, col2, col3

2. Branch and commit id:

main at commit f9c7ab0

3. Spark version:

3.2.1

4. Hadoop version:

3.4.0

5. How are you running Spark?

Running Spark in Local Machine

6. Stack trace:

Described in 1.

The text was updated successfully, but these errors were encountered:

osopardo1 · 2023-08-31T10:18:44Z

After some analysis of the way Delta Lake handles INSERT INTO, I've found an explanation:

The command does not load the schema of the existing table. It tries to write the data as it arrives, and since the data does not have any schema, Spark generates one automatically with values: "col1, col2, col3".
According to comments on code in the Delta Lake project:

  /**
   * With Delta, we ACCEPT_ANY_SCHEMA, meaning that Spark doesn't automatically adjust the schema
   * of INSERT INTO. Here we check if we need to perform any schema adjustment for INSERT INTO by
   * name queries. We also check that any columns not in the list of user-specified columns must
   * have a default expression.
   */

A solution should be to call the code in DeltaAnalysis to avoid duplicating the same behavior.
Since a lot of methods that reconstruct and check the schema are complex, I encourage us to not develop the same solution ourselves. But if there's not an easy way of delegating, that could be another possibility.

osopardo1 added the type: bug Something isn't working label Aug 31, 2023

This was referenced Aug 31, 2023

Upgrade to Spark 3.4.1 and Delta 2.4.0 #211

Merged

Schema cast for appends using INSERT INTO #214

Merged

cugni closed this as completed in #214 Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSERT INTO fails to append data [Spark] #213

INSERT INTO fails to append data [Spark] #213

osopardo1 commented Aug 31, 2023

osopardo1 commented Aug 31, 2023

INSERT INTO fails to append data [Spark] #213

INSERT INTO fails to append data [Spark] #213

Comments

osopardo1 commented Aug 31, 2023

What went wrong?

How to reproduce?

1. Code that triggered the bug, or steps to reproduce:

2. Branch and commit id:

3. Spark version:

4. Hadoop version:

5. How are you running Spark?

6. Stack trace:

osopardo1 commented Aug 31, 2023