Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to abort on fatal errors #997

Merged

Conversation

danielmitterdorfer
Copy link
Member

With this commit we add a third option to the command line parameter
on-error. So far it has been possible to either continue or abort on
error. Now we also allow to continue except for errors that indicate
that the cluster is unreachable (e.g. because it died with an OOME).

With this commit we add a third option to the command line parameter
`on-error`. So far it has been possible to either continue or abort on
error. Now we also allow to continue except for errors that indicate
that the cluster is unreachable (e.g. because it died with an OOME).
@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. labels May 19, 2020
@danielmitterdorfer danielmitterdorfer added this to the 2.0.1 milestone May 19, 2020
@danielmitterdorfer danielmitterdorfer self-assigned this May 19, 2020
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This option controls how Rally behaves when a response error occurs. The following values are possible:

* ``continue``: only records that an error has happened and will continue with the benchmark. At the end of a race, errors show up in the "error rate" metric.
* ``continue-on-non-fatal`` (default): Behaves as ``continue`` but aborts the benchmark immediately on all fatal errors. At the moment a refused connection is considered fatal. All other errors are considered non-fatal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to reread a few times the second sentence (At the moment a refused connection is considered fatal.). I don't think we need to explain that this may (or not) change in the future.

I suggest we just state that the only fatal error is receiving "Connection Refused"

(ECONNREFUSED) / (http://man7.org/linux/man-pages/man2/connect.2.html)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Addressed in 9925e35.

@danielmitterdorfer danielmitterdorfer merged commit 1ca68e9 into elastic:master May 19, 2020
@danielmitterdorfer danielmitterdorfer deleted the on-non-fatal-continue branch May 19, 2020 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants