Fix exception in task executor where task fails but doesn't set request_context.request_end #1864
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Under "high" rally load (~10Gbit/s using the elastic/logs track with 1000 bulk indexing clients), I can consistently get Rally to crash within some minutes. Looking at the logs, it seems to happen when the target ES momentarily cannot accommodate the load (e.g., returns 429s). The logs prior to the exception will be like:
thereafter, Rally will stop with an exception:
Looking at a stack trace, this exception is generated because:
where
request_end
is unexpectedly None.Looking at the code, I think the intent is that
request_context.request_end
should always be set, even on error scenarios. Perhaps there is some race condition with the HTTP request lib, though, where for some reasonon_request_end()
is never called, leavingrequest_context.request_end
unset.This PR doesn't try to address the root cause (why is
request_context.request_end
unset), but rather acknowledges that it is unset (only in already acknowledged error scenarios, i.e.,request_meta_data["success"] == False
), and sets it. My assumption is that becauserequest_meta_data["success"] == False
and because it can be False at this point (with request_end set to something), that some other code down the line will handle counting the error and what we need to focus on here is just not stopping overall execution (due to an unplanned exception) on an otherwise non-fatal error..rally_log.txt
rally_config.txt