Skip to content

Commit

Permalink
(SDI-2393) Fix intelsdi-x#1448 log and document maxpluginrestart
Browse files Browse the repository at this point in the history
* support -1 for max_plugin_restarts, so it works better with max-failure:
-1.
* improve logging of MaxPluginRestarts so users are aware why a plugin is
disabled.
* document changes and relationship to max-failure settings.
  • Loading branch information
nanliu committed Jan 5, 2017
1 parent 4d117c6 commit 8eee8eb
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 6 deletions.
12 changes: 8 additions & 4 deletions control/runner.go
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ func (r *runner) HandleGomitEvent(e gomit.Event) {
}

if pool.Eligible() {
if pool.RestartCount() < MaxPluginRestartCount {
if pool.RestartCount() < MaxPluginRestartCount || MaxPluginRestartCount == -1 {
e := r.restartPlugin(v.Key)
if e != nil {
runnerLog.WithFields(log.Fields{
Expand All @@ -257,9 +257,8 @@ func (r *runner) HandleGomitEvent(e gomit.Event) {

runnerLog.WithFields(log.Fields{
"_block": "handle-events",
"event": v.Name,
"aplugin": v.Version,
"restart_count": pool.RestartCount(),
"aplugin": v.String,
"restart-count": pool.RestartCount(),
}).Warning("plugin restarted")

r.emitter.Emit(&control_event.RestartedAvailablePluginEvent{
Expand All @@ -270,6 +269,11 @@ func (r *runner) HandleGomitEvent(e gomit.Event) {
Type: v.Type,
})
} else {
runnerLog.WithFields(log.Fields{
"_block": "handle-events",
"aplugin": v.String,
}).Warning("plugin disabled due to exceeding restart limit: ", MaxPluginRestartCount)

r.emitter.Emit(&control_event.MaxPluginRestartsExceededEvent{
Id: v.Id,
Name: v.Name,
Expand Down
4 changes: 4 additions & 0 deletions docs/SNAPTELD_CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,10 @@ control:
# not be loaded. Valid values are 0 - Off, 1 - Enabled, 2 - Warning
plugin_trust_level: 1

# max_plugin_restarts controls how many times a plugin is allowed to be restarted
# before failing. Snap will not disable a plugin due to failures when this value is -1.
max_plugin_restarts: 10

# plugins section contains plugin config settings that will be applied for
# plugins across tasks.
plugins:
Expand Down
5 changes: 4 additions & 1 deletion docs/TASKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,11 +85,14 @@ or without time zone offset (in that cases uppercase'Z' must be present):
More on cron expressions can be found here: https://godoc.org/github.com/robfig/cron

#### Max-Failures

By default, Snap will disable a task if there are 10 consecutive errors from any plugins within the workflow. The configuration
can be changed by specifying the number of failures value in the task header. If the max-failures value is -1, Snap will
can be changed by specifying the number of failures value in the task header. If the `max-failures` value is -1, Snap will
not disable a task with consecutive failure. Instead, Snap will sleep for 1 second for every 10 consecutive failures
and retry again.

If you intend to run tasks with `max-failures: -1`, please also configure `max_plugin_restarts: -1` in [snap daemon control configuration section](SNAPTELD_CONFIGURATION.md).

For more on tasks, visit [`SNAPTEL.md`](SNAPTEL.md).

### The Workflow
Expand Down
2 changes: 1 addition & 1 deletion examples/configs/snap-config-sample.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ control:
plugin_trust_level: 0

# max_plugin_restarts controls how many times a plugin is allowed to be restarted
# before failing.
# before failing. Snap will not disable a plugin due to failures when this value is -1.
max_plugin_restarts: 10

# plugins section contains plugin config settings that will be applied for
Expand Down

0 comments on commit 8eee8eb

Please sign in to comment.