Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove automatic unenrollment after 7 Fleet authentication failures #5428

Closed
cmacknz opened this issue Sep 4, 2024 · 1 comment · Fixed by #6619
Closed

Remove automatic unenrollment after 7 Fleet authentication failures #5428

cmacknz opened this issue Sep 4, 2024 · 1 comment · Fixed by #6619
Assignees
Labels
Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@cmacknz
Copy link
Member

cmacknz commented Sep 4, 2024

Today Elastic Agent will unenroll itself automatically after receiving 7 consecutive 401 responses from Fleet when checking in. This was done to prevent agents that have been forced unenrolled (which revokes their API key) from checking in continuously until they can be re-installed.

// Max number of times an invalid API Key is checked
const maxUnauthCounter int = 6

// shouldUnenroll checks if the max number of trying an invalid key is reached
func (f *FleetGateway) shouldUnenroll() bool {
return f.unauthCounter > maxUnauthCounter
}

resp, took, err := cmd.Execute(ctx, req)
if isUnauth(err) {
f.unauthCounter++
if f.shouldUnenroll() {
f.log.Warnf("retrieved an invalid api key error '%d' times. Starting to unenroll the elastic agent.", f.unauthCounter)
return &fleetapi.CheckinResponse{
Actions: []fleetapi.Action{&fleetapi.ActionUnenroll{ActionID: "", ActionType: "UNENROLL", IsDetected: true}},
}, took, nil
}
return nil, took, err
}

This prevents force unenrolled agents from continuing to contact Fleet Server, but represents an edge case that can be hit in disaster recovery situations. To eliminate the chance that users recovering their cluster need to manually intervene on machines, we should stop unenrolling and instead greatly increase the checkin interval.

The initial proposal is that instead of unenrolling, we should switch to checking in once per hour. A successful checkin must return the agent to its original checkin interval.

@cmacknz cmacknz added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Sep 4, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
3 participants