-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37894][SQL] Add trash feature to FileCommitProtocol.deleteWithJob #35188
Conversation
Can one of the admins verify this patch? |
@dongjoon-hyun Could you help take a look when you have time? Thanks. |
@@ -178,7 +178,12 @@ abstract class FileCommitProtocol extends Logging { | |||
* implementation deletes the file immediately. | |||
*/ | |||
def deleteWithJob(fs: FileSystem, path: Path, recursive: Boolean): Boolean = { | |||
fs.delete(path, recursive) | |||
if (fs.getConf.getInt("fs.trash.interval", 0) > 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where can I find the document of fs.trash.interval
? It's better to add a comment to link to the hadoop conf doc page, to explain that this conf indicates enabling the trash feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had added a comment to link hadoop conf doc page. Thanks.
*/ | ||
def deleteWithJob(fs: FileSystem, path: Path, recursive: Boolean): Boolean = { | ||
fs.delete(path, recursive) | ||
if (fs.getConf.getInt("fs.trash.interval", 0) > 0 && | ||
Trash.moveToAppropriateTrash(fs, path, fs.getConf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this function call fail? if it will then it's better to try-catch it, log the error and fallback to non-trash behavior.
*/ | ||
def deleteWithJob(fs: FileSystem, path: Path, recursive: Boolean): Boolean = { | ||
fs.delete(path, recursive) | ||
if (fs.getConf.getInt("fs.trash.interval", 0) > 0 && | ||
Trash.moveToAppropriateTrash(fs, path, fs.getConf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me of #29552. I think at least the concerns raised there should be addressed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the trash feature was merged and reverted historically from Apache Spark repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trash is a feature of Hadoop file system, and we need to think about how to integrate it with Spark. This may not be the only place to consider the trash feature.
What changes were proposed in this pull request?
Add trash feature to function
FileCommitProtocol.deleteWithJob
. When enable trash, useTrash.moveToAppropriateTrash
instead offs.delete
Why are the changes needed?
For some reason (e.g. analyze or roll back data) we need keep old data to trash.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing tests.