-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aws] [s3] Introduce ignore_older & start_timestamp for S3 input allowing better registry cleanups #41817
[aws] [s3] Introduce ignore_older & start_timestamp for S3 input allowing better registry cleanups #41817
Conversation
4924d70
to
79ae2c1
Compare
52fad61
to
6f5472c
Compare
6f5472c
to
ec00024
Compare
85f883e
to
fb4990b
Compare
fb4990b
to
dab88c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'd like @faec to take a look.
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
…them Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/s3_test.go
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
…tation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
3a4c0bd
to
82db670
Compare
…wing better registry cleanups (#41817) * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * sort config entries Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * introduce ignore old and start timestamp configurations and document them Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add filtering logic Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * filter tests Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add component test for filtering and fix lint issues Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/s3_test.go * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * review changes - improve naming, change signature and improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> (cherry picked from commit 4ba7d1c)
…wing better registry cleanups (#41817) (#42246) * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * sort config entries Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * introduce ignore old and start timestamp configurations and document them Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add filtering logic Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * filter tests Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add component test for filtering and fix lint issues Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/s3_test.go * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * review changes - improve naming, change signature and improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> (cherry picked from commit 4ba7d1c) Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
After considering with the team and with the background from this conversation when upgrading the Integration 1, I have decided to backport the change to 8.16.x and 8.17.x release tracks. Footnotes |
…wing better registry cleanups (#41817) * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * sort config entries Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * introduce ignore old and start timestamp configurations and document them Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add filtering logic Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * filter tests Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add component test for filtering and fix lint issues Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/s3_test.go * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * review changes - improve naming, change signature and improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> (cherry picked from commit 4ba7d1c) # Conflicts: # x-pack/filebeat/input/awss3/s3_test.go
…wing better registry cleanups (#41817) * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * sort config entries Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * introduce ignore old and start timestamp configurations and document them Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add filtering logic Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * filter tests Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add component test for filtering and fix lint issues Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/s3_test.go * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * review changes - improve naming, change signature and improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> (cherry picked from commit 4ba7d1c) # Conflicts: # x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc # x-pack/filebeat/input/awss3/s3_test.go
…wing better registry cleanups (#41817) * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * sort config entries Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * introduce ignore old and start timestamp configurations and document them Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add filtering logic Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * filter tests Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add component test for filtering and fix lint issues Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * review changes - improve naming, change signature and improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
…estamp for S3 input allowing better registry cleanups (#42716) * [aws] [s3] Introduce ignore_older & start_timestamp for S3 input allowing better registry cleanups (#41817) * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * sort config entries Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * introduce ignore old and start timestamp configurations and document them Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add filtering logic Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * filter tests Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add component test for filtering and fix lint issues Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/s3_test.go * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * review changes - improve naming, change signature and improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> (cherry picked from commit 4ba7d1c) # Conflicts: # x-pack/filebeat/input/awss3/s3_test.go * [aws] [s3] Introduce ignore_older & start_timestamp for S3 input allowing better registry cleanups (#41817) * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * sort config entries Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * introduce ignore old and start timestamp configurations and document them Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add filtering logic Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * filter tests Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add component test for filtering and fix lint issues Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * review changes - improve naming, change signature and improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * fix backport conflicts Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com> Co-authored-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
…estamp for S3 input allowing better registry cleanups (#42717) * [aws] [s3] Introduce ignore_older & start_timestamp for S3 input allowing better registry cleanups (#41817) * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * sort config entries Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * introduce ignore old and start timestamp configurations and document them Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add filtering logic Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * filter tests Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * add component test for filtering and fix lint issues Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> # Conflicts: # x-pack/filebeat/input/awss3/s3_test.go * add changelog entry Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> * review changes - improve naming, change signature and improve documentation Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> (cherry picked from commit 4ba7d1c) # Conflicts: # x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc # x-pack/filebeat/input/awss3/s3_test.go * fix backport conflicts Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> --------- Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com> Co-authored-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Proposed commit message
Introduce
ignore_older
andstart_timestamp
properties to AWS S3 input. This is a follow-up for #41694.The configurations introduced here act as input object filters. If the object fails to match derived filters, the entries will be cleaned up from the registry, reducing filebeat memory consumption.
Introduced configurations are,
For both inputs, the object's last modified timestamp is taken into comparison. See Use cases section for further explanation
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
None as defaults are disabled. However, when configurations introduced here are used, the following can have an impact on the user,
start_timestamp
is defined, then objects with the last modified timestamps prior to the timestamp are ignored from processing (documented 1)ignore_older
is defined, then objects that do not fall within the look-back period when processing starts (polling run) are ignored (documented 1)start_timestamp
&ignore_older
are defined, the initial run will process all entries up tostart_timestamp
. The subsequent runs will not include entries that do not fall withinignore_older
even if processing failed for an object. (documented 1)How to test this PR locally
ignore_older
&start_timestamp
to see how data ingestion change with their values. See Use cases section for further explanationRelated issues
aws-s3
input's bucket polling accumulates state in the registry #39116Use cases
Consider below diagrams where there're 3 objects Object A, Object B and Object C with their last modified timestamps of t1, t2 and t3.
And consider how filebeat processes and tracks registry entries based on the following scenarios
Default behavior
If none of the configurations are used, then filebeat will process and the internal registry will track all objects continuously unless they are removed from the bucket.
Use start_timestamp
If
start_timestamp
is used, objects newer than the timestamp are accepted for processing. The registry will grow unless objects are removed from the bucket by other means (ex:- lifecycle policy).Use ignore_older
If
ignore_older
is defined, input will process objects within the provided duration, calculated from the current time. The registry will track objects within the current timeframe and others will get cleaned up eventually by subsequent runs.Use both ignore_older & start_timestamp
If both properties are defined,
ignore_older
duration).ignore_older
duration.Footnotes
https://github.com/elastic/beats/pull/41817/files#diff-422765b7341c5bbf6de7af38927e34e00a5073b188585a7af3c4fee1175b64a6 ↩ ↩2 ↩3
https://github.com/Kavindu-Dodan/data-gen ↩