-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add backup to bucket functionality #33559
Conversation
Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket.
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
This reverts commit da59387.
c04766c
to
58d7248
Compare
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
hello @jniebuhr , thanks for the contribution the feature you added is very interesting indeed :) it needs some refinement anyway: the main problem is that having the backup and delete strategy in the s3 processor can lead to some issue when the input is set up with sqs+s3 a single sqs message could reference multiple s3 objects: if one of this objects processing returns an error the message will go back to the queue, but in the meantime the other objects could be already deleted. once we process the sqs message again now the previous backed up and deleted s3 objects will return an error, and the message will go back to the queue again. and so on untile the max number of retries is reached. I think the best we be to have a finalise processor where we reference the s3 objects in batch, as required by the content of the sqs message or the s3 listing, and proceed with running the backup and deletion only if when the listing or the sqs message are fully processed without errors. it requires a little more complexity but it will have the stability and reliance we need. also I'd prefer to force a different bucket as backup destination: while with the correct setting of sqs notification and prefix path listing using the same bucket is not a problem, it requires prior knowledge a proper setup. doing differently will produce and endless loop where an s3 object is processed, backed up, then the backup will be processed, backup ed on its own, and the backup of the backup processed etc etc if possible I would avoid user to be put in the situation to fire themselves in the foot like this :) |
…into feature/backup-to-bucket
@aspacca sorry, added it |
@jniebuhr all good please: add an entry in |
done :) |
This pull request is now in conflicts. Could you fix it? 🙏
|
/test |
This pull request is now in conflicts. Could you fix it? 🙏
|
@jniebuhr , can you resolve the conflicts? thanks |
Hi @aspacca, all resolved |
@jniebuhr thanks I will run integration tests locally, since they are currently skipped in CI, and if everything is good I will merge |
@@ -437,6 +437,34 @@ This is only supported with 3rd party S3 providers. AWS does not support path s | |||
In order to make AWS API calls, `aws-s3` input requires AWS credentials. Please | |||
see <<aws-credentials-config,AWS credentials options>> for more details. | |||
|
|||
[float] | |||
==== `backup_to_bucket_arn` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add s3:PutObject
into the AWS Permissions section of this documentation at line 469?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! also s3:DeleteObject
SonarCloud Quality Gate failed. 0 Bugs No Coverage information |
* Add backup to bucket functionality Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket. * Add documentation for backup_to_bucket configuration parameters * Add configuration to reference config file * Revert "Add configuration to reference config file" This reverts commit da59387. * Add back reference config changes without whitespace changes * fix typo that makes linter fail * change reference config the right way * Add later finalizing, missing tests for now * Add code review feedback & unit tests * Try fix G601 error * Fix last code review feedback * Add missing unit test * add entry to changelog * rename to , add permissions required for backup feature in docs * fix integration tests Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co> (cherry picked from commit 5df1895) # Conflicts: # x-pack/filebeat/input/awss3/input.go # x-pack/filebeat/input/awss3/metrics.go
This merges the changes from main minus the conflicting changes from 065307c in elastic#33559 which fixed the tests in a manner that conflicts with these changes. That change also revert previous enhancements to report metrics under the "inputs" dataset.
* Add backup to bucket functionality (#33559) * Add backup to bucket functionality Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket. * Add documentation for backup_to_bucket configuration parameters * Add configuration to reference config file * Revert "Add configuration to reference config file" This reverts commit da59387. * Add back reference config changes without whitespace changes * fix typo that makes linter fail * change reference config the right way * Add later finalizing, missing tests for now * Add code review feedback & unit tests * Try fix G601 error * Fix last code review feedback * Add missing unit test * add entry to changelog * rename to , add permissions required for backup feature in docs * fix integration tests Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co> (cherry picked from commit 5df1895) # Conflicts: # x-pack/filebeat/input/awss3/input.go # x-pack/filebeat/input/awss3/metrics.go * resolve merge conflict * remove CreateWithoutClosingMetrics from backport --------- Co-authored-by: Jochen Ullrich <kontakt@ju-hh.de> Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
* Add backup to bucket functionality Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket. * Add documentation for backup_to_bucket configuration parameters * Add configuration to reference config file * Revert "Add configuration to reference config file" This reverts commit da59387. * Add back reference config changes without whitespace changes * fix typo that makes linter fail * change reference config the right way * Add later finalizing, missing tests for now * Add code review feedback & unit tests * Try fix G601 error * Fix last code review feedback * Add missing unit test * add entry to changelog * rename to , add permissions required for backup feature in docs * fix integration tests Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
What does this PR do?
Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket. This functionality is influenced by the same feature in the logstash s3 plugin.
Why is it important?
This is important to signal that specific files are processed in S3 buckets and perform actions such as archiving / deleting those. It will also help not process files again when state has been lost.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Changelog files seem to contain a lot of old stuff, how do those work?
Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs