Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-1347]fix Hbase index partition changes cause data duplication p… #2188
[HUDI-1347]fix Hbase index partition changes cause data duplication p… #2188
Changes from all commits
d7ea9cd
be01143
e9bef5a
0c3a394
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove comment in line 488.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hj2016 Can you please make the following changes :
`public boolean rollbackCommit(String instantTime) {
if (config.getHBaseIndexRollbackSync()) {
//
}
return true;
`
This keeps old behavior unchanged and safe and allows you to control deletes via the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be problems with the hbase index. The scenario that needs to be rolled back is that the hbase partition change is turned on and an error is reported after the hbase index is written for some reasons (some reasons may be due to jvm memory overflow, hbase suddenly crashes), for example, At the beginning, the data was id:1 partition:2019, and then another commit failed and the index was written to hbase. At this time, the index partition was changed to 2020. So the next time the data is written, it will only be written to In the 2020 partition, resulting in data duplication. After judging based on the rollbackSync parameter, the following logic will not be executed. If you set hbase.index.rollback.sync = false, hoodie.hbase.index.update.partition.path = true, there will still be problems. I think it would be more reasonable to write like this:
if (!config.getHbaseIndexUpdatePartitionPath()){
return true;
}
synchronized (SparkHoodieHBaseIndex.class) {
....
}
return true;
Because only when the partition changes, problems may occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@n3nash : looks like author is waiting for your response.