-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Fleet Enrollment Handling for Containerized Agent #6568
Fix Fleet Enrollment Handling for Containerized Agent #6568
Conversation
f6d06ec
to
ab63682
Compare
// this elastic-agent has not valid api token thus it should enroll | ||
return true, nil | ||
case err != nil: | ||
return false, fmt.Errorf("failed to validate api token: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple network error could cause this, should there be some retries in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a really simple retry logic, let me know if that's aligned with what you had on mind 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blakerouse any objections merging this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with the added retry loop. Would have been a little cleaner to place it in its own function, but that is okay. Feel free to merge.
bd3acad
to
db8078b
Compare
db8078b
to
f7fd4cb
Compare
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just a couple of nitpicks on the usage of mocks in tests but that's not a blocker for merging
...ontainerised-agent-when-there-is-an-enrollement-token-change-or-the-agent-is-unenrolled.yaml
Outdated
Show resolved
Hide resolved
// this elastic-agent has not valid api token thus it should enroll | ||
return true, nil | ||
case err != nil: | ||
return false, fmt.Errorf("failed to validate api token: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with the added retry loop. Would have been a little cleaner to place it in its own function, but that is okay. Feel free to merge.
|
* feat: add k8s integration test to check fleet enrollment * fix: container correct fleet enrollment when token changes or the agent is unenrolled * fix: update TestDiagnosticLocalConfig to include enrollment_token_hash * feat: add a simple retry logic while validate the stored agent api token * feat: add unit-test for shouldFleetEnroll * fix: improve unit-test explicitness and check for expected number of calls * fix: kind in changelog fragment * fix: split up ack-ing fleet in a separate function (cherry picked from commit 17814cc)
* feat: add k8s integration test to check fleet enrollment * fix: container correct fleet enrollment when token changes or the agent is unenrolled * fix: update TestDiagnosticLocalConfig to include enrollment_token_hash * feat: add a simple retry logic while validate the stored agent api token * feat: add unit-test for shouldFleetEnroll * fix: improve unit-test explicitness and check for expected number of calls * fix: kind in changelog fragment * fix: split up ack-ing fleet in a separate function (cherry picked from commit 17814cc) Co-authored-by: Panos Koutsovasilis <panos.koutsovasilis@elastic.co>
What does this PR do?
This PR introduces a fix for containerized Fleet-managed Elastic Agents to handle scenarios where:
It achieves the above by enhancing the agent's logic inside the
container
cmd to:The PR also adds a Kubernetes integration tests to verify the proper behaviour of enrollment and re-enrollment under various scenarios, including:
Key changes include:
shouldFleetEnroll
to centralize logic for fleet enrollment decisions.agent/{id}/acks
path of Fleet server with empty events to check if the agent API token is still valid. More than happy to introduce a separate path just for this cause on Fleet server, although it will be the same the ACKs one with empty events (as of now at least)You can easily see here in the CI run of the first commit in this PR that the Elastic Agent, before this PR, doesn't handle enrollment correctly and always resorts to using the Fleet token and URL stored in its state. This PR addresses that issue and ensures enrollment uses the correct configuration and token.
PS: the actual changes of this PR are this commit b6596d0 which is
+305 -25
thus I consider this PR aligned with the team policies 🙂Why is it important?
This fix ensures robust and predictable behavior for containerized Fleet-managed Elastic Agents and enhance user experience of managing elastic-agent in Kubernetes.
Checklist
./changelog/fragments
using the changelog toolDisruptive User Impact
The changes introduced in this PR are non-disruptive. They improve fleet enrollment handling and maintain backward compatibility.
How to test this PR locally
Related issues