-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cleanup with remote work dir #3836
Conversation
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
I think the issue is that closing the cache db also waits for it to finish writing anything. So I will try closing and re-opening the cache like before. |
That did the trick. Cleanup works! So now the Session destroy is as follows:
@pditommaso all yours to review and merge |
@pditommaso bump |
@pditommaso here are the recent commits I found related to the shutdown logic: The first commit moved the file transfer shutdown from the start of session destroy to after it. This change caused issues when publishing to S3, because the plugin manager was being shutdown before waiting for all the files to be published. The second commit fixed this problem by creating separate thread pools for publishing files and the Tower archiver. Basically, I think the introduction of the Tower archiver created a circular dependency between waiting for file transfers and shutting down the plugin system, and it was resolved by giving the Tower archiver its own file transfer pool. This PR does not change any of the file transfer shutdown. It only moves the work dir cleanup to before the plugin manager, which is necessary to make it work with cloud storage. This change also puts it before deleting the script classes dir and updating the history file, neither of which are related to the cache DB. I feel confident that we can merge this change. A good stress test would be to run nf-core rnaseq test on Tower with archiving and cleanup enabled, that should test all of the shutdown logic. Why don't we merge it, run the test, and if we find an error then we'll fix it? |
81f7cb7
to
8a43489
Compare
Not inclined to merge this. The cleanup has been implemented as an experimental feature. when used with object storage with large real-world pipelines can hang the execution for a long time and result in extra charges. Don't think we should promote this behaviour. |
Fair enough. It will be replaced by automatic task cleanup anyway. |
I would like this to be reconsidered. I am using AWS Batch with Fargate and S3 to process some large VCF files into a smaller size that I pull down for local use. The storage costs for the files in the staging areas far exceed the cost of waiting for them to be cleaned up. I think it should be left to the users whether they want to incur additional execution time costs to reduce storage costs. |
@Shians I'm in the same boat and would love to see this functionality implemented (as would many other people from what I can see), but I don't think it's likely in the near future. If you use one main folder in S3 as the Nextflow work directory, you could create a Lifecycle Rule for the bucket (on the AWS side) that automatically deletes files with a given prefix after some period of time (minimum is 1 day). This would even work if you use a versioned S3 bucket (which probably adds another layer of complexity for a Nextflow-based implementation). In my case, I create a Nextflow work folder in each project folder, so there isn't a single prefix I can use to limit the application of a lifecycle rule. Consequently, I'd love to be able to refer to Nextflow work files using object tags. It seems that Nextflow already automatically tags the metadata files with However, (and this seems like an oversight), Nextflow fails to tag the non-metadata files (i.e. the process input files) it creates in the work folder. If this were fixed/updated, I’d recommend tagging the non-metadata files differently so that they can be handled distinctly from the metadata files by lifecycle rules. @bentsherman or @pditommaso do you have any insight as to how Nextflow tags the S3 objects that it does and what code would need to be modified so that it would also tag the non-metadata files in the work folder? |
S3 tagging is supported via the Fusion file system. See fusion tags here |
Closes #3645 and #2683
Fixes an issue with cleanup on a remote work directory, which is caused by the plugin manager being shutdown before the cleanup.
I moved the cleanup code to happen after publishing is done but before the plugin manager and cache db are shutdown. However, I haven't been able to verify that the cleanup works. When I run the
hello
pipeline on AWS Batch with-trace nextflow
, I don't get the error message anymore, but also the task directories aren't deleted.Currently I have this code:
The log shows the outer trace messages, but not the inner message for each task. It's like the task records aren't there during the iteration.
The only difference with the old code is that the cleanup now uses the session's cache db reference rather than opening the cache itself. That was necessary because at that point the session's cache had been closed. Also, the old version used openForRead() instead of open(). But that shouldn't matter right?
@pditommaso what do you suggest?