Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect new files under known paths in filestream input #31268

Merged

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Apr 12, 2022

What does this PR do?

This PR fixes the FileWatcher of filestream input. Now a file is considered new even if the scanner has already found it in the previous iteration and the underlying file is different.

In the PR the file comparator function is passed as a parameter to make unit testing easier.

Why is it important?

The problem is if an input file is renamed and a new file shows up, Filebeat did not register is as a new file. The new file was either considered updated. Or if the new file was smaller than the previous file, the file was deemed truncated and the complete contents of the previous file was reread from the beginning.

The issue was reported initially on Discuss: https://discuss.elastic.co/t/filebeat-filestream-input-rereading-rotated-log-files/300038

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@kvch kvch requested a review from a team as a code owner April 12, 2022 13:38
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 12, 2022
@kvch kvch added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Apr 12, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 12, 2022
@kvch kvch added the bug label Apr 12, 2022
@mergify mergify bot assigned kvch Apr 12, 2022
@kvch kvch added backport-v8.1.0 Automated backport with mergify backport-v8.2.0 Automated backport with mergify backport-7.17 Automated backport to the 7.17 branch with mergify labels Apr 12, 2022
@kvch kvch requested a review from belimawr April 12, 2022 13:43
@mergify
Copy link
Contributor

mergify bot commented Apr 12, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix-filebeat-filestream-new-file-under-existing-path upstream/fix-filebeat-filestream-new-file-under-existing-path
git merge upstream/main
git push upstream fix-filebeat-filestream-new-file-under-existing-path

@elasticmachine
Copy link
Collaborator

elasticmachine commented Apr 12, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-04-12T17:57:49.089+0000

  • Duration: 67 min 42 sec

Test stats 🧪

Test Results
Failed 0
Passed 6200
Skipped 728
Total 6928

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some small details in the tests.

filebeat/input/filestream/fswatch_test.go Outdated Show resolved Hide resolved
filebeat/input/filestream/fswatch_test.go Outdated Show resolved Hide resolved
filebeat/input/filestream/fswatch_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@kvch
Copy link
Contributor Author

kvch commented Apr 12, 2022

The test is not passing on Windows for some reason. So I moved the test to non-windows tests for now. I am comfortable with merging this because we are not breaking anything for Windows users. We just still do not support something that was broken in the first place. The test failure needs more investigation on Windows.

mergify bot pushed a commit that referenced this pull request Apr 13, 2022
## What does this PR do?

This PR fixes the `FileWatcher` of `filestream` input. Now a file is considered new even if the scanner has already found it in the previous iteration and the underlying file is different.

In the PR the file comparator function is passed as a parameter to make unit testing easier.

## Why is it important?

The problem is if an input file is renamed and a new file shows up, Filebeat did not register is as a new file. The new file was either considered updated. Or if the new file was smaller than the previous file, the file was deemed truncated and the complete contents of the previous file was reread from the beginning.

(cherry picked from commit 54997ac)
kvch added a commit that referenced this pull request Apr 13, 2022
This PR fixes the `FileWatcher` of `filestream` input. Now a file is considered new even if the scanner has already found it in the previous iteration and the underlying file is different.

In the PR the file comparator function is passed as a parameter to make unit testing easier.

The problem is if an input file is renamed and a new file shows up, Filebeat did not register is as a new file. The new file was either considered updated. Or if the new file was smaller than the previous file, the file was deemed truncated and the complete contents of the previous file was reread from the beginning.

(cherry picked from commit 54997ac)
kvch added a commit that referenced this pull request Apr 13, 2022
## What does this PR do?

This PR fixes the `FileWatcher` of `filestream` input. Now a file is considered new even if the scanner has already found it in the previous iteration and the underlying file is different.

In the PR the file comparator function is passed as a parameter to make unit testing easier.

## Why is it important?

The problem is if an input file is renamed and a new file shows up, Filebeat did not register is as a new file. The new file was either considered updated. Or if the new file was smaller than the previous file, the file was deemed truncated and the complete contents of the previous file was reread from the beginning.

(cherry picked from commit 54997ac)

Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
kvch added a commit that referenced this pull request Apr 13, 2022
This PR fixes the `FileWatcher` of `filestream` input. Now a file is considered new even if the scanner has already found it in the previous iteration and the underlying file is different.

In the PR the file comparator function is passed as a parameter to make unit testing easier.

The problem is if an input file is renamed and a new file shows up, Filebeat did not register is as a new file. The new file was either considered updated. Or if the new file was smaller than the previous file, the file was deemed truncated and the complete contents of the previous file was reread from the beginning.

(cherry picked from commit 54997ac)
kvch added a commit that referenced this pull request Apr 13, 2022
This PR fixes the `FileWatcher` of `filestream` input. Now a file is considered new even if the scanner has already found it in the previous iteration and the underlying file is different.

In the PR the file comparator function is passed as a parameter to make unit testing easier.

The problem is if an input file is renamed and a new file shows up, Filebeat did not register is as a new file. The new file was either considered updated. Or if the new file was smaller than the previous file, the file was deemed truncated and the complete contents of the previous file was reread from the beginning.

(cherry picked from commit 54997ac)

Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
kvch added a commit that referenced this pull request Apr 13, 2022
This PR fixes the `FileWatcher` of `filestream` input. Now a file is considered new even if the scanner has already found it in the previous iteration and the underlying file is different.

In the PR the file comparator function is passed as a parameter to make unit testing easier.

The problem is if an input file is renamed and a new file shows up, Filebeat did not register is as a new file. The new file was either considered updated. Or if the new file was smaller than the previous file, the file was deemed truncated and the complete contents of the previous file was reread from the beginning.

(cherry picked from commit 54997ac)

Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
v1v added a commit to v1v/beats that referenced this pull request Apr 18, 2022
…er-tar-gz

* upstream/main: (139 commits)
  [Automation] Update elastic stack version to 8.3.0-c655cda8 for testing (elastic#31322)
  Define a queue metrics reporter interface  (elastic#31289)
  [Oracle Module] Change tablespace metricset collection period (elastic#31259)
  libbeat/reader/syslog: relax timestamp parsing to allow leading zero (elastic#31254)
  [Automation] Update elastic stack version to 8.3.0-55ba6f37 for testing (elastic#31311)
  [libbeat] Remove unused fields and functions in the memory queue (elastic#31302)
  [libbeat] Cleaning up some unneeded helper types (elastic#31290)
  Readme for kibana module (elastic#31276)
  [Automation] Update elastic stack version to 8.3.0-4be61f32 for testing (elastic#31296)
  x-pack/winlogbeat/module/routing/ingest: fix typo for channel name (elastic#31291)
  Small pipeline cleanup removing some unused data fields (elastic#31288)
  removing info log (elastic#30971)
  Simplify TLS config deserialization (elastic#31168)
  Detect new files under known paths in filestream input (elastic#31268)
  Add support for port mapping in docker hints (elastic#31243)
  Update qa-labels.yml (elastic#31260)
  libbeat: log debug for `proxy_url` and fixed docs (elastic#31130)
  [heartbeat][docs] Add note about ensuring correct index settings for uptime (elastic#31146)
  [Automation] Update elastic stack version to 8.3.0-2c8f9574 for testing (elastic#31256)
  [Filebeat] fix m365_defender pipeline bug (elastic#31227)
  ...
kush-elastic pushed a commit to kush-elastic/beats that referenced this pull request May 2, 2022
## What does this PR do?

This PR fixes the `FileWatcher` of `filestream` input. Now a file is considered new even if the scanner has already found it in the previous iteration and the underlying file is different.

In the PR the file comparator function is passed as a parameter to make unit testing easier.

## Why is it important?

The problem is if an input file is renamed and a new file shows up, Filebeat did not register is as a new file. The new file was either considered updated. Or if the new file was smaller than the previous file, the file was deemed truncated and the complete contents of the previous file was reread from the beginning.
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
## What does this PR do?

This PR fixes the `FileWatcher` of `filestream` input. Now a file is considered new even if the scanner has already found it in the previous iteration and the underlying file is different.

In the PR the file comparator function is passed as a parameter to make unit testing easier.

## Why is it important?

The problem is if an input file is renamed and a new file shows up, Filebeat did not register is as a new file. The new file was either considered updated. Or if the new file was smaller than the previous file, the file was deemed truncated and the complete contents of the previous file was reread from the beginning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.1.0 Automated backport with mergify backport-v8.2.0 Automated backport with mergify bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants