Delete processing

The logic of delete processing :

Build a delete view java object to record the deleted files and the corresponding last submitted files. (COW is supported now)
The RDD is constructed according to the deleted files involved, and only the keys needs to be recorded.
To process RDD, the operation is to load the keys of the data, and then use the reader of parquet to read it. If the read data is not in the set, it will be marked as the deleted data.

Note: if you have done the delete query operation before, you can read the history file directly and omit the next save operation.
Save delete data (if it's the first time to query)
Query delete data as a spark view. The specific query methods are as follows:

String hudiPath = "/hive/warehouse/test.db/test/";
String timstamp = "202012121212";

DeleteSupport deleteSupport = new DeleteSupport(hudiPath, timstamp);
Dataset<Row> deleteRows = deleteSupport.getDeleteDataset();
deleteRows.show();

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
README-zh.md		README-zh.md
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Delete processing

About

Releases

Packages

Languages

shangyuantech/hudi-delete-view

Folders and files

Latest commit

History

Repository files navigation

Delete processing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages