You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We create redundant metadata entries for each write operation, including appends that don't update neither the schema nor the space Revision.
On top of that, the creation of metadata when not required prevents interleaved concurrent writes from committing. (see ConflictChecker.checkNoMetadataUpdates from delta lake)
How to reproduce?
Append some data to an existing table. The data used should not cause schema change, nor the creation of a new revision.
Check the existance of an metadata entry in the _delta_log for this append.
1. Code that triggered the bug, or steps to reproduce:
// Create table
df.write.mode("overwrite").format("qbeast").option("columnsToIndex", "col_1,col_2").save(tmpDir)
// Append with no schema change, nor Revision update.
df.write.mode("append").format("qbeast").save(tmpDir)
// Check metadata entry for appendvaldeltaLog=DeltaLog.forTable(spark, tmpDir)
valnoMetadataForAppend= (deltaLog
.store.read(FileNames.deltaFile(deltaLog.logPath, 1L), deltaLog.newDeltaHadoopConf())
.map(Action.fromJson)
.collect { casea: Metadata=> a }
.isEmpty)
assert(noMetadataForAppend, "Redundant metadata detected!")
5. How are you running Spark?: Locally and on AWS EMR
6. Stack trace:
The redundat metadata prevent concurrent writes:
io.delta.exceptions.MetadataChangedException: The metadata of the Delta table has been changed by a concurrent update. Please try the operation again.
Conflicting commit: {"timestamp":...,"operation":"WRITE","operationParameters":{"mode":Append},"readVersion":...,"isolationLevel":"Serializable","isBlindAppend":true,"operationMetrics":{"numFiles":"...","numOutputRows":"...","numOutputBytes":"..."},"engineInfo":"Apache-Spark/3.4.2 Delta-Lake/2.4.0","txnId":"..."}
The text was updated successfully, but these errors were encountered:
What went wrong?
We create redundant
metadata
entries for each write operation, including appends that don't update neither theschema
nor the spaceRevision
.On top of that, the creation of
metadata
when not required prevents interleaved concurrent writes from committing. (see ConflictChecker.checkNoMetadataUpdates from delta lake)How to reproduce?
1. Code that triggered the bug, or steps to reproduce:
2. Branch and commit id: 6a780ea
3. Spark version:
3.5.0
4. Hadoop version:
3.3.4
5. How are you running Spark?: Locally and on AWS EMR
6. Stack trace:
The redundat metadata prevent concurrent writes:
The text was updated successfully, but these errors were encountered: