Add backup to bucket functionality #33559

jniebuhr · 2022-11-02T13:05:19Z

Enhancement

What does this PR do?

Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket. This functionality is influenced by the same feature in the logstash s3 plugin.

Why is it important?

This is important to signal that specific files are processed in S3 buckets and perform actions such as archiving / deleting those. It will also help not process files again when state has been lost.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Changelog files seem to contain a lot of old stuff, how do those work?

Author's Checklist

How to test this PR locally

Related issues

Closes Backup to bucket feature in filebeat aws-s3 input #30696

Use cases

Screenshots

Logs

Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket.

mergify · 2022-11-02T13:06:18Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @jniebuhr? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

elasticmachine · 2022-11-02T13:11:18Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-12-22T11:00:41.855+0000
Duration: 132 min 26 sec

Test stats 🧪

Test	Results
Failed	0
Passed	7718
Skipped	513
Total	8231

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

…into feature/backup-to-bucket

This reverts commit da59387.

elasticmachine · 2022-11-07T13:54:46Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

aspacca · 2022-11-14T11:32:57Z

hello @jniebuhr , thanks for the contribution

the feature you added is very interesting indeed :)

it needs some refinement anyway: the main problem is that having the backup and delete strategy in the s3 processor can lead to some issue when the input is set up with sqs+s3

a single sqs message could reference multiple s3 objects: if one of this objects processing returns an error the message will go back to the queue, but in the meantime the other objects could be already deleted. once we process the sqs message again now the previous backed up and deleted s3 objects will return an error, and the message will go back to the queue again. and so on untile the max number of retries is reached.

I think the best we be to have a finalise processor where we reference the s3 objects in batch, as required by the content of the sqs message or the s3 listing, and proceed with running the backup and deletion only if when the listing or the sqs message are fully processed without errors.

it requires a little more complexity but it will have the stability and reliance we need.

also I'd prefer to force a different bucket as backup destination: while with the correct setting of sqs notification and prefix path listing using the same bucket is not a problem, it requires prior knowledge a proper setup. doing differently will produce and endless loop where an s3 object is processed, backed up, then the backup will be processed, backup ed on its own, and the backup of the backup processed etc etc

if possible I would avoid user to be put in the situation to fire themselves in the foot like this :)

…into feature/backup-to-bucket

jniebuhr · 2022-11-23T14:03:36Z

@aspacca sorry, added it

aspacca · 2022-11-24T04:39:06Z

@jniebuhr all good

please: add an entry in CHANGELOG.next.asciidoc, thanks

jniebuhr · 2022-11-24T10:36:23Z

@jniebuhr all good

please: add an entry in CHANGELOG.next.asciidoc, thanks

done :)

mergify · 2022-11-24T10:36:43Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature/backup-to-bucket upstream/feature/backup-to-bucket
git merge upstream/main
git push upstream feature/backup-to-bucket

aspacca · 2022-11-28T03:11:13Z

/test

mergify · 2022-11-29T03:20:05Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feature/backup-to-bucket upstream/feature/backup-to-bucket
git merge upstream/main
git push upstream feature/backup-to-bucket

aspacca · 2022-11-29T03:55:13Z

@jniebuhr , can you resolve the conflicts?

thanks

jniebuhr · 2022-12-05T09:38:25Z

Hi @aspacca, all resolved

aspacca · 2022-12-05T10:08:11Z

@jniebuhr thanks

I will run integration tests locally, since they are currently skipped in CI, and if everything is good I will merge

kaiyan-sheng · 2022-12-05T20:57:15Z

x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc

@@ -437,6 +437,34 @@ This is only supported with 3rd party S3 providers.  AWS does not support path s
 In order to make AWS API calls, `aws-s3` input requires AWS credentials. Please
 see <<aws-credentials-config,AWS credentials options>> for more details.

+[float]
+==== `backup_to_bucket_arn`


Do we need to add s3:PutObject into the AWS Permissions section of this documentation at line 469?

good catch! also s3:DeleteObject

aspacca · 2022-12-22T08:09:40Z

Hi @aspacca, all resolved

hello @jniebuhr , sorry, I've missed the notification about conflicts resolved.

I'm going to add the documentation about the permission required by the back feature and took the liberty to rename delete to delete_after_backup

sonarqubecloud · 2022-12-22T11:10:11Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
0 Security Hotspots
19 Code Smells

No Coverage information
9.8% Duplication

* Add backup to bucket functionality Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket. * Add documentation for backup_to_bucket configuration parameters * Add configuration to reference config file * Revert "Add configuration to reference config file" This reverts commit da59387. * Add back reference config changes without whitespace changes * fix typo that makes linter fail * change reference config the right way * Add later finalizing, missing tests for now * Add code review feedback & unit tests * Try fix G601 error * Fix last code review feedback * Add missing unit test * add entry to changelog * rename to , add permissions required for backup feature in docs * fix integration tests Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co> (cherry picked from commit 5df1895) # Conflicts: # x-pack/filebeat/input/awss3/input.go # x-pack/filebeat/input/awss3/metrics.go

This merges the changes from main minus the conflicting changes from 065307c in elastic#33559 which fixed the tests in a manner that conflicts with these changes. That change also revert previous enhancements to report metrics under the "inputs" dataset.

* Add backup to bucket functionality (#33559) * Add backup to bucket functionality Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket. * Add documentation for backup_to_bucket configuration parameters * Add configuration to reference config file * Revert "Add configuration to reference config file" This reverts commit da59387. * Add back reference config changes without whitespace changes * fix typo that makes linter fail * change reference config the right way * Add later finalizing, missing tests for now * Add code review feedback & unit tests * Try fix G601 error * Fix last code review feedback * Add missing unit test * add entry to changelog * rename to , add permissions required for backup feature in docs * fix integration tests Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co> (cherry picked from commit 5df1895) # Conflicts: # x-pack/filebeat/input/awss3/input.go # x-pack/filebeat/input/awss3/metrics.go * resolve merge conflict * remove CreateWithoutClosingMetrics from backport --------- Co-authored-by: Jochen Ullrich <kontakt@ju-hh.de> Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>

* Add backup to bucket functionality Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket. * Add documentation for backup_to_bucket configuration parameters * Add configuration to reference config file * Revert "Add configuration to reference config file" This reverts commit da59387. * Add back reference config changes without whitespace changes * fix typo that makes linter fail * change reference config the right way * Add later finalizing, missing tests for now * Add code review feedback & unit tests * Try fix G601 error * Fix last code review feedback * Add missing unit test * add entry to changelog * rename to , add permissions required for backup feature in docs * fix integration tests Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>

Add backup to bucket functionality

6ae9c77

Adds a functionality to backup processed files to another or the same bucket with an optional prefix. If enabled it will delete files from the source bucket.

jniebuhr requested a review from a team as a code owner November 2, 2022 13:05

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 2, 2022

mergify bot assigned jniebuhr Nov 2, 2022

Merge branch 'main' into feature/backup-to-bucket

48dbfaa

jniebuhr added 3 commits November 2, 2022 14:19

Add documentation for backup_to_bucket configuration parameters

962fc91

Merge branch 'feature/backup-to-bucket' of github.com:jniebuhr/beats …

a2d47f9

…into feature/backup-to-bucket

Add configuration to reference config file

da59387

jniebuhr requested a review from a team as a code owner November 2, 2022 13:23

jniebuhr requested review from fearful-symmetry and faec and removed request for a team November 2, 2022 13:23

Revert "Add configuration to reference config file"

58d7248

This reverts commit da59387.

jniebuhr force-pushed the feature/backup-to-bucket branch from c04766c to 58d7248 Compare November 2, 2022 13:36

jniebuhr added 3 commits November 2, 2022 14:39

Add back reference config changes without whitespace changes

e1d5089

fix typo that makes linter fail

183f908

change reference config the right way

7bcd68e

criamico added the Team:Elastic-Agent Label for the Agent team label Nov 7, 2022

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 7, 2022

cmacknz added Team:Cloud-Monitoring Label for the Cloud Monitoring team and removed Team:Elastic-Agent Label for the Agent team labels Nov 7, 2022

cmacknz removed request for fearful-symmetry and faec November 7, 2022 14:25

girodav added enhancement aws Enable builds in the CI for aws cloud testing labels Nov 7, 2022

aspacca self-requested a review November 14, 2022 11:22

jniebuhr added 2 commits November 23, 2022 15:02

Add missing unit test

4f1a2d0

Merge branch 'feature/backup-to-bucket' of github.com:jniebuhr/beats …

301c9d0

…into feature/backup-to-bucket

aspacca approved these changes Nov 24, 2022

View reviewed changes

aspacca added the backport-v8.6.0 Automated backport with mergify label Nov 24, 2022

add entry to changelog

8bd859a

Merge branch 'main' into feature/backup-to-bucket

1d5cc8d

Andrea Spacca added 2 commits November 28, 2022 12:11

Merge branch 'main' into feature/backup-to-bucket

0ec3d61

Merge branch 'main' into feature/backup-to-bucket

c02e5aa

Merge branch 'main' into feature/backup-to-bucket

508531b

kaiyan-sheng reviewed Dec 5, 2022

View reviewed changes

Merge branch 'main' into feature/backup-to-bucket

7e4fc9a

Andrea Spacca added 2 commits December 22, 2022 17:10

rename to , add permissions required for backup feature in docs

c48fd17

fix integration tests

065307c

aspacca merged commit 5df1895 into elastic:main Dec 22, 2022

mergify bot mentioned this pull request Dec 22, 2022

[8.6](backport #33559) Add backup to bucket functionality #34098

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add backup to bucket functionality #33559

Add backup to bucket functionality #33559

jniebuhr commented Nov 2, 2022 •

edited

Loading

mergify bot commented Nov 2, 2022

elasticmachine commented Nov 2, 2022 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

elasticmachine commented Nov 7, 2022

aspacca commented Nov 14, 2022

jniebuhr commented Nov 23, 2022

aspacca commented Nov 24, 2022

jniebuhr commented Nov 24, 2022

mergify bot commented Nov 24, 2022

aspacca commented Nov 28, 2022

mergify bot commented Nov 29, 2022

aspacca commented Nov 29, 2022

jniebuhr commented Dec 5, 2022

aspacca commented Dec 5, 2022

kaiyan-sheng Dec 5, 2022

aspacca Dec 6, 2022

aspacca commented Dec 22, 2022

sonarqubecloud bot commented Dec 22, 2022

Add backup to bucket functionality #33559

Add backup to bucket functionality #33559

Conversation

jniebuhr commented Nov 2, 2022 • edited Loading

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

mergify bot commented Nov 2, 2022

elasticmachine commented Nov 2, 2022 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

elasticmachine commented Nov 7, 2022

aspacca commented Nov 14, 2022

jniebuhr commented Nov 23, 2022

aspacca commented Nov 24, 2022

jniebuhr commented Nov 24, 2022

mergify bot commented Nov 24, 2022

aspacca commented Nov 28, 2022

mergify bot commented Nov 29, 2022

aspacca commented Nov 29, 2022

jniebuhr commented Dec 5, 2022

aspacca commented Dec 5, 2022

kaiyan-sheng Dec 5, 2022

Choose a reason for hiding this comment

aspacca Dec 6, 2022

Choose a reason for hiding this comment

aspacca commented Dec 22, 2022

sonarqubecloud bot commented Dec 22, 2022

jniebuhr commented Nov 2, 2022 •

edited

Loading

elasticmachine commented Nov 2, 2022 •

edited by jenkins-beats-ci bot

Loading