Skip to content

shangyuantech/hudi-delete-view

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Delete processing

The logic of delete processing :

  • Build a delete view java object to record the deleted files and the corresponding last submitted files. (COW is supported now)

  • The RDD is constructed according to the deleted files involved, and only the keys needs to be recorded.

  • To process RDD, the operation is to load the keys of the data, and then use the reader of parquet to read it. If the read data is not in the set, it will be marked as the deleted data.

    Note: if you have done the delete query operation before, you can read the history file directly and omit the next save operation.

  • Save delete data (if it's the first time to query)

  • Query delete data as a spark view. The specific query methods are as follows:

String hudiPath = "/hive/warehouse/test.db/test/";
String timstamp = "202012121212";

DeleteSupport deleteSupport = new DeleteSupport(hudiPath, timstamp);
Dataset<Row> deleteRows = deleteSupport.getDeleteDataset();
deleteRows.show();

About

Support querying Apache Hudi delete rows cross timeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages