Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove erroneous status reporting #42435

Merged
merged 3 commits into from
Jan 28, 2025

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Jan 27, 2025

Proposed commit message

This commit removes a erroneous status reporting from the Filestream input. inp.readFromSource can only return the error from the canceler, this error should not be reported to the manager/Elastic-Agent.

inp.readFromSource is called by filestream.Run, which is called by the startHarvester function. This function already reports the error returned by filestream.Run and correctly filters out 'context cancelled' errors.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

## Disruptive User Impact
## Author's Checklist

How to test this PR locally

That's a tricky PR to test because there is a timing issue involved, essentially using a local Kubernets cluster, deploy Elastic-Agent v8.17.1 collecting logs, make containers generate so many logs the host machine is going to have it's CPU at 100%, the Filestream input will start reporting unhealthy without the changes of this PR.

There are some more details of how I reproduced/tested it here: elastic/elastic-agent#6596 (comment)

Related issues

## Use cases
## Screenshots
## Logs

@belimawr belimawr added bug skip-ci Skip the build in the CI but linting Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team skip docs-build Skips docs build CI labels Jan 27, 2025
@belimawr belimawr self-assigned this Jan 27, 2025
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 27, 2025
@belimawr belimawr force-pushed the fix-filestream-status-reporting branch from 2d811ee to 43d111b Compare January 27, 2025 16:08
Copy link
Contributor

mergify bot commented Jan 27, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Jan 27, 2025

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 27, 2025
@belimawr belimawr removed skip-ci Skip the build in the CI but linting skip docs-build Skips docs build CI labels Jan 27, 2025
This commit removes a redundant status reporting from the Filestream
input. `inp.readFromSource` can only return the error from the
canceler, this error should not be reported to the
manager/Elastic-Agent.

`inp.readFromSource` is called by `filestream.Run`, which is called by
the `startHarvester` function. This function already reports the error
returned by `filestream.Run` and correctly filters out 'context
cancelled' errors.
@belimawr belimawr force-pushed the fix-filestream-status-reporting branch from 43d111b to bb3fb08 Compare January 27, 2025 16:09
@belimawr belimawr marked this pull request as ready for review January 27, 2025 16:10
@belimawr belimawr requested a review from a team as a code owner January 27, 2025 16:10
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@belimawr belimawr changed the title Remove redundant status reporting Remove erroneous status reporting Jan 27, 2025
// there is no point in reporting it to the Manager (aka Elastic-Agent).
// Also, the caller of Run, will correctly report the error and filter
// out 'context cancelled'.
return inp.readFromSource(ctx, log, r, fs.newPath, state, publisher, metrics)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the case today, but what if someone changes the logic inside readFromSource to return another kind of error? This promise would then become invalid and could cause silent failures. If you mean that Run up the chain has logic to report the errors then it is fine, then disregard my comment.

Copy link
Contributor Author

@belimawr belimawr Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean that Run up the chain has logic to report the errors then it is fine, then disregard my comment.

Yes, that's what I meant.

However if it wasn't clear when you read the comment, this means it can be improved ;)

Do you have any suggestions on how I can make it clear? So future changes won't fall into the trap of trying to report it?

Should I go with something simpler like:

The caller of Run already reports the error and filter out errors that must not be reported, like 'context cancelled'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like your suggestion, it is both clearer and shorter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateStatus could probably just unconditionally filter out context.Cancelled and the Beats context equivalent.

context.Cancelled is not an actionable user error, it's something that would only ever arise because of a bug. We could log it but not change the agent state perhaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree if the error is context.Cancelled it's coming from a bug, however it could also break/stop something essential (like an input) and if that affects the data ingestion or the overall behaviour of the Elastic-Agent it should be reported to the user so they know something is wrong.

Ideally there would be no bug, but if I have to choose between a silent bug and a verbose/noisy one, I'll take the noisy one, it's gonna be much easier to catch and less likely to make it into a final release.

@belimawr belimawr requested review from mauri870 and cmacknz January 27, 2025 21:08
@belimawr belimawr merged commit 1a0a732 into elastic:main Jan 28, 2025
31 checks passed
mergify bot pushed a commit that referenced this pull request Jan 28, 2025
This commit removes a redundant status reporting from the Filestream
input. `inp.readFromSource` can only return the error from the
canceler, this error should not be reported to the
manager/Elastic-Agent.

`inp.readFromSource` is called by `filestream.Run`, which is called by
the `startHarvester` function. This function already reports the error
returned by `filestream.Run` and correctly filters out 'context
cancelled' errors.

(cherry picked from commit 1a0a732)
belimawr added a commit that referenced this pull request Jan 30, 2025
This commit removes a redundant status reporting from the Filestream
input. `inp.readFromSource` can only return the error from the
canceler, this error should not be reported to the
manager/Elastic-Agent.

`inp.readFromSource` is called by `filestream.Run`, which is called by
the `startHarvester` function. This function already reports the error
returned by `filestream.Run` and correctly filters out 'context
cancelled' errors.

(cherry picked from commit 1a0a732)

Co-authored-by: Tiago Queiroz <tiago.queiroz@elastic.co>
@belimawr belimawr added backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify labels Jan 31, 2025
mergify bot pushed a commit that referenced this pull request Jan 31, 2025
This commit removes a redundant status reporting from the Filestream
input. `inp.readFromSource` can only return the error from the
canceler, this error should not be reported to the
manager/Elastic-Agent.

`inp.readFromSource` is called by `filestream.Run`, which is called by
the `startHarvester` function. This function already reports the error
returned by `filestream.Run` and correctly filters out 'context
cancelled' errors.

(cherry picked from commit 1a0a732)
mergify bot pushed a commit that referenced this pull request Jan 31, 2025
This commit removes a redundant status reporting from the Filestream
input. `inp.readFromSource` can only return the error from the
canceler, this error should not be reported to the
manager/Elastic-Agent.

`inp.readFromSource` is called by `filestream.Run`, which is called by
the `startHarvester` function. This function already reports the error
returned by `filestream.Run` and correctly filters out 'context
cancelled' errors.

(cherry picked from commit 1a0a732)
@cmacknz cmacknz added the backport-8.18 Automated backport to the 8.18 branch label Jan 31, 2025
@cmacknz
Copy link
Member

cmacknz commented Jan 31, 2025

Needs to go in 8.18

belimawr pushed a commit that referenced this pull request Feb 3, 2025
This commit removes a redundant status reporting from the Filestream
input. `inp.readFromSource` can only return the error from the
canceler, this error should not be reported to the
manager/Elastic-Agent.

`inp.readFromSource` is called by `filestream.Run`, which is called by
the `startHarvester` function. This function already reports the error
returned by `filestream.Run` and correctly filters out 'context
cancelled' errors.

(cherry picked from commit 1a0a732)
belimawr pushed a commit that referenced this pull request Feb 3, 2025
This commit removes a redundant status reporting from the Filestream
input. `inp.readFromSource` can only return the error from the
canceler, this error should not be reported to the
manager/Elastic-Agent.

`inp.readFromSource` is called by `filestream.Run`, which is called by
the `startHarvester` function. This function already reports the error
returned by `filestream.Run` and correctly filters out 'context
cancelled' errors.

(cherry picked from commit 1a0a732)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify backport-8.18 Automated backport to the 8.18 branch bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Elastic Agent higher "Unhealthy" rates of Kubernetes Agents in 8.17.1
4 participants