Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

A failed or panicked available plugin can leave an apPool with not enough resources. #497

Closed
pittma opened this issue Nov 12, 2015 · 4 comments · Fixed by #688
Closed

A failed or panicked available plugin can leave an apPool with not enough resources. #497

pittma opened this issue Nov 12, 2015 · 4 comments · Fixed by #688

Comments

@pittma
Copy link
Contributor

pittma commented Nov 12, 2015

When a plugin panics while it is in use by a task, it emits a DeadAvailablePlugin event. This event should be handled by runner, and it should apply the same eligibility rules to the affected apPool. I.e. start a new available plugin, and insert it in the dead plugin's place.

@lynxbat
Copy link
Contributor

lynxbat commented Nov 12, 2015

Should the amount of times this occurs be saved? And should that value disable a consistently panicking plugin with the end result that it is removed from the active list?

@pittma
Copy link
Contributor Author

pittma commented Nov 12, 2015

Yeah, there should be an upper bound on the number of times it restarts the plugin. However, if the task continues to fail because the plugin continues to fail, they will both stop when the task failure reaches its upper bound.

Not sure about the second thing. Rather than removed maybe loaded plugins should have states akin to Pulse's tasks, and when a plugin fails consistently we set its state to disabled.

@lynxbat
Copy link
Contributor

lynxbat commented Nov 12, 2015

Only worry I have is if the error is environmental and long tail. Lets say it fails once a month and restarts. To the user this is a minor disturbance because of something on the system. If we could make it record failures but have those failures age out that would be nice. Maybe a N failures per T time window.

@lynxbat
Copy link
Contributor

lynxbat commented Nov 12, 2015

And yes, I actually had a story a while back on a "Disabled" plugin state. Allow the user to enable or unload/load but prevent it being considered for Metric Catalog and remove all running APs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants