Call the correct functions for `scroll-search` and `paginated-search` operations #1368

michaelbaamonde · 2021-10-26T19:51:52Z

In the course of getting started on #1215, I discovered that scroll-search operations don't default to the index (or data stream) defined at the top level of the track under the indices key. The pmc and geonames tracks expose this. But if you explicitly provide an index parameter to the operation (as http_logs does)¹ it works...

...by accident, it turns out. After digging into it, the core issue seems to be that there was no parameter source registered for scroll-search or paginated-search operations. So, neither the track object's indices attribute (necessary for choosing a default index parameter for the operation) nor the operation-type parameter (necessary for choosing the right function to call) was getting passed down the chain. But, the way the control flow works for dispatching to the appropriate function in the in the Query.__call__ method, it wound up just issuing a request body query with a pages parameter, which happens to be the legacy implementation of scroll searches.

So, this commit addresses this by doing three things:

Augmenting SearchParamSource to pass the operation-type it receives through to its callers.
Registering SearchParamSource for both scrolls and paginated queries.
Enforcing operation-type as a mandatory paramater when calling the Query runner (necessary for custom parameter sources to avoid unexpected behavior.)

That said, I do wonder if it'd make sense to have separate classes (perhaps subclasses of Query) for each of these three operations. Their logic is distinct, it's possible that they should have their own parameter sources, and dispatching on the operation type within one large __call__ method feels a little indirect given that we already have a mechanism for explicitly registering runners on a per-operation basis. I worry there may be some other trappy behavior lurking in here somewhere, but I opted to fix the bug within the current implementation instead of doing a larger refactoring along those lines. Happy to hear any thoughts on that front.

https://github.com/michaelbaamonde/rally-tracks/commit/b700eac059ab42246768f44d0e656caf144a0269 contains the relevant changes that sparked this investigation. ↩

The `Query` runner services three operation types: `search`, `paginated-search`, and `scroll-search`. Its `__call__` method dispatches to the appropriate function for one of these operations based on the `operation-type` passed in via the `params` dict that it accepts as an argument. The `operation-type` actually never got passed to this method, however. This is because the `SearchParamSource` did not include the `operation-type` key in the dict ultimately returned by its `params` method. The upshot is that all three operation types wound up ultimately calling the same function (`_request_body_query`), which was not intentional. This commit addresses this by: 1. Augmenting `SearchParamSource` to pass the `operation-type` it receives through to its callers. 2. Registering `SearchParamSource` for both scrolls and paginated queries. 3. Enforcing `operation-type` as a mandatory paramater when calling the `Query` runner (necessary for custom parameter sources to avoid unexpected behavior.)

esrally/driver/runner.py

danielmitterdorfer

Great finding and thanks for addressing this. Looks pretty good already but I have a few comments.

esrally/driver/runner.py

danielmitterdorfer · 2021-10-27T07:41:20Z

I do wonder if it'd make sense to have separate classes (perhaps subclasses of Query) for each of these three operations. Their logic is distinct, it's possible that they should have their own parameter sources, and dispatching on the operation type within one large call method feels a little indirect given that we already have a mechanism for explicitly registering runners on a per-operation basis.

I agree on the refactoring and I also agree that it's good to separate fixing this bug from the refactoring. Personally I'd try to avoid subclasses if we don't absolutely need them.

I worry there may be some other trappy behavior lurking in here somewhere [...]

I don't think there is anything that is prohibiting us from refactoring this (in a separate PR). The current implementation is due to the fact that historically this supported regular search and scrolls felt too similar to put it in a different class. But now that we added more functionality I agree that it makese more sense to separate them.

- Use fstrings in exception messages - Ensure that a warning is logged but execution still proceeds when searches include a `pages` parameter - Document that the `pages` parameter is deprecated - Test that unknown operation types passed to the Query runner cause an exception to be raised

michaelbaamonde · 2021-10-27T20:00:07Z

Thanks for the feedback @danielmitterdorfer. See 313408f for the relevant changes.

I don't think there is anything that is prohibiting us from refactoring this (in a separate PR).

Sounds good. I'll do that when I have some time.

esrally/driver/runner.py

danielmitterdorfer · 2021-10-28T07:17:31Z

The CI (test) failure appears unrelated to me.

@elasticmachine test this please

danielmitterdorfer · 2021-10-28T07:30:17Z

The RTD build failure is due to readthedocs/readthedocs.org#8616 and should be addressed separately.

DJRickyB · 2021-10-28T12:02:14Z

does this work here?

@elasticmachine update branch

danielmitterdorfer · 2021-10-28T12:11:31Z

@elasticmachine update branch

DJRickyB · 2021-10-28T13:21:57Z

There's a hint in the logs of the CI failure that this change breaks the deprecated behavior:

2021-10-26 20:16:15,181 ActorAddr-(T|:41205)/PID:24535 esrally.driver.runner WARNING Invoking a scroll search with the 'search' operation is deprecated and will be removed in a future release. Use 'scroll-search' instead.
2021-10-26 20:16:15,176 ActorAddr-(T|:36999)/PID:24402 esrally.driver.driver INFO Scheduling next task for worker id [2] at their timestamp [395.707632] (master timestamp [395.715152])
2021-10-26 20:16:15,181 ActorAddr-(T|:41205)/PID:24535 esrally.driver.driver ERROR Could not execute schedule
Traceback (most recent call last):

  File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py38/lib/python3.8/site-packages/esrally/driver/driver.py", line 1782, in __call__
    request_start = request_context.request_start

  File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py38/lib/python3.8/site-packages/esrally/client.py", line 47, in request_start
    return self.ctx["request_start"]

KeyError: 'request_start'


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py38/lib/python3.8/site-packages/esrally/driver/driver.py", line 1783, in __call__
    request_end = request_context.request_end

  File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py38/lib/python3.8/site-packages/esrally/client.py", line 55, in __aexit__
    request_start = self.request_start

  File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py38/lib/python3.8/site-packages/esrally/client.py", line 47, in request_start
    return self.ctx["request_start"]

KeyError: 'request_start'

DJRickyB · 2021-10-28T13:30:24Z

confirmed I can reproduce the failure locally with

esrally race --test-mode --kill-running-processes --track geonames --challenge append-no-conflicts --distribution-version 7.13.3

michaelbaamonde · 2021-10-28T16:15:35Z

There are two failures. The test_sources[in-memory-it] failure reproduces reliably for me on master, so it's unrelated to this change. I'll look into that.

I was also able to reproduce this failure and it is indeed a problem related to this PR.

The issue is that the geonames track defines and registers a few custom parameter sources that do not pass through the operation-type parameter defined for the operation, which this PR introduces as mandatory. The nested track will also surface this problem.

So, while there may be an argument to be made that this leniency was always a bug, this PR as it currently stands introduces a breaking change to the API for parameter sources registered for any operations that ultimately call the Query runner.

@danielmitterdorfer @DJRickyB Let's discuss options offline. I have a few questions that will be easier to talk through.

michaelbaamonde · 2021-10-28T20:53:32Z

I gave this some more thought (and testing) and I think the best option is to simply not make operation-type manadatory in parameter sources, and default to search if another one isn't provided. My rationale is:

Parameter sources shouldn't need to define an operation type. That's simply part of what's required to define an operation, and we enforce that operations have both names and types when the track is loaded and validated against the schema. The only reason we need the operation type to make its way into this specific runner is that we're using the same class for multiple operation types, which we've agreed should be refactored into the class-per-operation model in the future, anyway.

My latest commit should fix the bug that motivated this PR in the first place in a way that preserves backwards compatibility.

Co-authored-by: Rick Boyd <boyd.richardj@gmail.com>

danielmitterdorfer · 2021-10-29T08:22:07Z

I gave this some more thought (and testing) and I think the best option is to simply not make operation-type manadatory in parameter sources, and default to search if another one isn't provided.

FWIW, I agree. Thanks for looking into this.

There is one test failure left but as far as I understood your comment above you're looking into this.

michaelbaamonde · 2021-11-01T18:52:43Z

@elasticmachine test this please

michaelbaamonde · 2021-11-01T20:00:26Z

@danielmitterdorfer @DJRickyB The unrelated test failures are resolved now, so this is ready for another review. Thanks!

DJRickyB · 2021-11-01T20:21:29Z

thanks. now that operation-type is not mandatory, can you undo changes to the tests here?

michaelbaamonde · 2021-11-02T19:45:45Z

now that operation-type is not mandatory, can you undo changes to the tests here?

It's not mandatory in the sense that we don't require custom parameter sources to provide an operation type, but it is required that calls to the Query runner include one, somehow. The runner needs to have some awareness of the operation type in order to avoid the bug that motivated this PR initially.

That said, your comment did make me consider another problem, which is that defaulting to search as the operation type in the absence of a parameter source providing one means that custom parameter sources for scroll-search or paginated-search operations won't work as expected; they'll wind up ultimately calling the wrong function. So, this PR fixes the bug in the default case, but it's still lurking there for more advanced usage.

Ultimately, we really do rely on a 1:1 mapping between operation types and runners, and the registration mechanism provides a straightforward way to look up the exact runner for a given operation. The Query runner breaks this model in a way that forces us to tell it what type of operation it needs to execute. Parameter sources give us a way to do that, but this is a bit of a hack on my part, and it entails a tighter coupling between parameter sources and runners than I think is intended.

With that said, I just pushed a commit that passes the operation-type to the runner by essentially injecting it into the params returned by the parameter source, but in a way that doesn't require any changes to the parameter source itself. This preserves the contract between parameter sources and runners, at least from the track author's perspective. It's another hack, though, which brings me back to:

I worry there may be some other trappy behavior lurking in here somewhere, but I opted to fix the bug within the current implementation instead of doing a larger refactoring

I think it may be safer and quicker to refactor the Query runner into three separate classes instead of iterating further on workarounds. If this PR is mergeable after this latest commit, we can defer that, but if not, I think the time would be better spent on the more fundamental refactoring.

What do you think @DJRickyB @danielmitterdorfer?

DJRickyB · 2021-11-02T20:18:59Z

It's the end of my workday, I want to look at this deeper tomorrow, but I'll note that the "1:1" question came up in the course of #1190, when at the time it was deemed not necessary. As far as I can tell you've bumped up against the limitations of that idea and I'm curious about @danielmitterdorfer's thoughts as well.

danielmitterdorfer · 2021-11-03T14:03:50Z

I think it may be safer and quicker to refactor the Query runner into three separate classes instead of iterating further on workarounds. If this PR is mergeable after this latest commit, we can defer that, but if not, I think the time would be better spent on the more fundamental refactoring.

@michaelbaamonde and I had a chat on another channel about this. tl;dr as long as none of our tracks break with this change, we can go ahead and merge this and then start working on the refactoring in a separate PR immediately after.

michaelbaamonde · 2021-11-04T13:09:07Z

I ran all of our tracks and found no issues, so I'll merge.

michaelbaamonde requested review from danielmitterdorfer and DJRickyB October 26, 2021 19:51

michaelbaamonde commented Oct 26, 2021

View reviewed changes

esrally/driver/runner.py Outdated Show resolved Hide resolved

Fix linting errors.

79506e1

danielmitterdorfer assigned michaelbaamonde Oct 27, 2021

danielmitterdorfer added :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. bug Something's wrong labels Oct 27, 2021

danielmitterdorfer added this to the 2.3.1 milestone Oct 27, 2021

danielmitterdorfer reviewed Oct 27, 2021

View reviewed changes

esrally/driver/runner.py Outdated Show resolved Hide resolved

esrally/driver/runner.py Show resolved Hide resolved

DJRickyB reviewed Oct 27, 2021

View reviewed changes

esrally/driver/runner.py Outdated Show resolved Hide resolved

Merge branch 'master' into query-runner-dispatch-on-operation-type

52b3846

Make operation-type optional for query-related param sources.

56f2e64

Mike Baamonde and others added 3 commits October 28, 2021 16:57

"search" is now the default operation type.

a234a9e

Fix missing bracket

47a9f29

Co-authored-by: Rick Boyd <boyd.richardj@gmail.com>

Make error message formatting consistent.

4e8f63c

Inject operation-type into params returned by the param source.

182ff84

michaelbaamonde merged commit 7460e31 into elastic:master Nov 4, 2021

michaelbaamonde deleted the query-runner-dispatch-on-operation-type branch November 4, 2021 13:18

This was referenced Nov 23, 2021

Initial approach to refactoring the Query runner. #1388

Closed

Initial approach to refactoring the Query runner. #1389

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call the correct functions for `scroll-search` and `paginated-search` operations #1368

Call the correct functions for `scroll-search` and `paginated-search` operations #1368

michaelbaamonde commented Oct 26, 2021 •

edited

Loading

danielmitterdorfer left a comment

danielmitterdorfer commented Oct 27, 2021

michaelbaamonde commented Oct 27, 2021

danielmitterdorfer commented Oct 28, 2021

danielmitterdorfer commented Oct 28, 2021

DJRickyB commented Oct 28, 2021

danielmitterdorfer commented Oct 28, 2021

DJRickyB commented Oct 28, 2021

DJRickyB commented Oct 28, 2021

michaelbaamonde commented Oct 28, 2021

michaelbaamonde commented Oct 28, 2021

danielmitterdorfer commented Oct 29, 2021

michaelbaamonde commented Nov 1, 2021

michaelbaamonde commented Nov 1, 2021

DJRickyB commented Nov 1, 2021

michaelbaamonde commented Nov 2, 2021

DJRickyB commented Nov 2, 2021

danielmitterdorfer commented Nov 3, 2021

michaelbaamonde commented Nov 4, 2021

Call the correct functions for scroll-search and paginated-search operations #1368

Call the correct functions for scroll-search and paginated-search operations #1368

Conversation

michaelbaamonde commented Oct 26, 2021 • edited Loading

Footnotes

danielmitterdorfer left a comment

Choose a reason for hiding this comment

danielmitterdorfer commented Oct 27, 2021

michaelbaamonde commented Oct 27, 2021

danielmitterdorfer commented Oct 28, 2021

danielmitterdorfer commented Oct 28, 2021

DJRickyB commented Oct 28, 2021

danielmitterdorfer commented Oct 28, 2021

DJRickyB commented Oct 28, 2021

DJRickyB commented Oct 28, 2021

michaelbaamonde commented Oct 28, 2021

michaelbaamonde commented Oct 28, 2021

danielmitterdorfer commented Oct 29, 2021

michaelbaamonde commented Nov 1, 2021

michaelbaamonde commented Nov 1, 2021

DJRickyB commented Nov 1, 2021

michaelbaamonde commented Nov 2, 2021

DJRickyB commented Nov 2, 2021

danielmitterdorfer commented Nov 3, 2021

michaelbaamonde commented Nov 4, 2021

Call the correct functions for `scroll-search` and `paginated-search` operations #1368

Call the correct functions for `scroll-search` and `paginated-search` operations #1368

michaelbaamonde commented Oct 26, 2021 •

edited

Loading