Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backup to bucket functionality #33559

Merged
merged 28 commits into from
Dec 22, 2022

Conversation

jniebuhr
Copy link
Contributor

@jniebuhr jniebuhr commented Nov 2, 2022

  • Enhancement

What does this PR do?

Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket. This functionality is influenced by the same feature in the logstash s3 plugin.

Why is it important?

This is important to signal that specific files are processed in S3 buckets and perform actions such as archiving / deleting those. It will also help not process files again when state has been lost.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Changelog files seem to contain a lot of old stuff, how do those work?

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Adds a functionality to backup processed files to another or the same bucket with an optional prefix.
If enabled it will delete files from the source bucket.
@jniebuhr jniebuhr requested a review from a team as a code owner November 2, 2022 13:05
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 2, 2022
@mergify
Copy link
Contributor

mergify bot commented Nov 2, 2022

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @jniebuhr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 2, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-12-22T11:00:41.855+0000

  • Duration: 132 min 26 sec

Test stats 🧪

Test Results
Failed 0
Passed 7718
Skipped 513
Total 8231

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@jniebuhr jniebuhr requested a review from a team as a code owner November 2, 2022 13:23
@jniebuhr jniebuhr requested review from fearful-symmetry and faec and removed request for a team November 2, 2022 13:23
@jniebuhr jniebuhr force-pushed the feature/backup-to-bucket branch from c04766c to 58d7248 Compare November 2, 2022 13:36
@criamico criamico added the Team:Elastic-Agent Label for the Agent team label Nov 7, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 7, 2022
@cmacknz cmacknz added Team:Cloud-Monitoring Label for the Cloud Monitoring team and removed Team:Elastic-Agent Label for the Agent team labels Nov 7, 2022
@girodav girodav added enhancement aws Enable builds in the CI for aws cloud testing labels Nov 7, 2022
@aspacca aspacca self-requested a review November 14, 2022 11:22
@aspacca
Copy link

aspacca commented Nov 14, 2022

hello @jniebuhr , thanks for the contribution

the feature you added is very interesting indeed :)

it needs some refinement anyway: the main problem is that having the backup and delete strategy in the s3 processor can lead to some issue when the input is set up with sqs+s3

a single sqs message could reference multiple s3 objects: if one of this objects processing returns an error the message will go back to the queue, but in the meantime the other objects could be already deleted. once we process the sqs message again now the previous backed up and deleted s3 objects will return an error, and the message will go back to the queue again. and so on untile the max number of retries is reached.

I think the best we be to have a finalise processor where we reference the s3 objects in batch, as required by the content of the sqs message or the s3 listing, and proceed with running the backup and deletion only if when the listing or the sqs message are fully processed without errors.

it requires a little more complexity but it will have the stability and reliance we need.

also I'd prefer to force a different bucket as backup destination: while with the correct setting of sqs notification and prefix path listing using the same bucket is not a problem, it requires prior knowledge a proper setup. doing differently will produce and endless loop where an s3 object is processed, backed up, then the backup will be processed, backup ed on its own, and the backup of the backup processed etc etc

if possible I would avoid user to be put in the situation to fire themselves in the foot like this :)

@jniebuhr
Copy link
Contributor Author

@aspacca sorry, added it

@aspacca aspacca added the backport-v8.6.0 Automated backport with mergify label Nov 24, 2022
@aspacca
Copy link

aspacca commented Nov 24, 2022

@jniebuhr all good

please: add an entry in CHANGELOG.next.asciidoc, thanks

@jniebuhr
Copy link
Contributor Author

@jniebuhr all good

please: add an entry in CHANGELOG.next.asciidoc, thanks

done :)

@mergify
Copy link
Contributor

mergify bot commented Nov 24, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature/backup-to-bucket upstream/feature/backup-to-bucket
git merge upstream/main
git push upstream feature/backup-to-bucket

@aspacca
Copy link

aspacca commented Nov 28, 2022

/test

@mergify
Copy link
Contributor

mergify bot commented Nov 29, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature/backup-to-bucket upstream/feature/backup-to-bucket
git merge upstream/main
git push upstream feature/backup-to-bucket

@aspacca
Copy link

aspacca commented Nov 29, 2022

@jniebuhr , can you resolve the conflicts?

thanks

@jniebuhr
Copy link
Contributor Author

jniebuhr commented Dec 5, 2022

Hi @aspacca, all resolved

@aspacca
Copy link

aspacca commented Dec 5, 2022

@jniebuhr thanks

I will run integration tests locally, since they are currently skipped in CI, and if everything is good I will merge

@@ -437,6 +437,34 @@ This is only supported with 3rd party S3 providers. AWS does not support path s
In order to make AWS API calls, `aws-s3` input requires AWS credentials. Please
see <<aws-credentials-config,AWS credentials options>> for more details.

[float]
==== `backup_to_bucket_arn`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add s3:PutObject into the AWS Permissions section of this documentation at line 469?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! also s3:DeleteObject

@aspacca
Copy link

aspacca commented Dec 22, 2022

Hi @aspacca, all resolved

hello @jniebuhr , sorry, I've missed the notification about conflicts resolved.

I'm going to add the documentation about the permission required by the back feature and took the liberty to rename delete to delete_after_backup

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 19 Code Smells

No Coverage information No Coverage information
9.8% 9.8% Duplication

@aspacca aspacca merged commit 5df1895 into elastic:main Dec 22, 2022
mergify bot pushed a commit that referenced this pull request Dec 22, 2022
* Add backup to bucket functionality

Adds a functionality to backup processed files to another or the same bucket with an optional prefix.
If enabled it will delete files from the source bucket.

* Add documentation for backup_to_bucket configuration parameters

* Add configuration to reference config file

* Revert "Add configuration to reference config file"

This reverts commit da59387.

* Add back reference config changes without whitespace changes

* fix typo that makes linter fail

* change reference config the right way

* Add later finalizing, missing tests for now

* Add code review feedback & unit tests

* Try fix G601 error

* Fix last code review feedback

* Add missing unit test

* add entry to changelog

* rename  to , add permissions required for backup feature in docs

* fix integration tests

Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
(cherry picked from commit 5df1895)

# Conflicts:
#	x-pack/filebeat/input/awss3/input.go
#	x-pack/filebeat/input/awss3/metrics.go
andrewkroh added a commit to andrewkroh/beats that referenced this pull request Jan 3, 2023
This merges the changes from main minus the conflicting changes
from 065307c in elastic#33559 which fixed the
tests in a manner that conflicts with these changes. That change also
revert previous enhancements to report metrics under the "inputs" dataset.
aspacca pushed a commit that referenced this pull request Mar 2, 2023
* Add backup to bucket functionality (#33559)

* Add backup to bucket functionality

Adds a functionality to backup processed files to another or the same bucket with an optional prefix.
If enabled it will delete files from the source bucket.

* Add documentation for backup_to_bucket configuration parameters

* Add configuration to reference config file

* Revert "Add configuration to reference config file"

This reverts commit da59387.

* Add back reference config changes without whitespace changes

* fix typo that makes linter fail

* change reference config the right way

* Add later finalizing, missing tests for now

* Add code review feedback & unit tests

* Try fix G601 error

* Fix last code review feedback

* Add missing unit test

* add entry to changelog

* rename  to , add permissions required for backup feature in docs

* fix integration tests

Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
(cherry picked from commit 5df1895)

# Conflicts:
#	x-pack/filebeat/input/awss3/input.go
#	x-pack/filebeat/input/awss3/metrics.go

* resolve merge conflict

* remove CreateWithoutClosingMetrics from backport

---------

Co-authored-by: Jochen Ullrich <kontakt@ju-hh.de>
Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
* Add backup to bucket functionality

Adds a functionality to backup processed files to another or the same bucket with an optional prefix.
If enabled it will delete files from the source bucket.

* Add documentation for backup_to_bucket configuration parameters

* Add configuration to reference config file

* Revert "Add configuration to reference config file"

This reverts commit da59387.

* Add back reference config changes without whitespace changes

* fix typo that makes linter fail

* change reference config the right way

* Add later finalizing, missing tests for now

* Add code review feedback & unit tests

* Try fix G601 error

* Fix last code review feedback

* Add missing unit test

* add entry to changelog

* rename  to , add permissions required for backup feature in docs

* fix integration tests

Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Enable builds in the CI for aws cloud testing backport-v8.6.0 Automated backport with mergify enhancement Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backup to bucket feature in filebeat aws-s3 input
7 participants