Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call the correct functions for scroll-search and paginated-search operations #1368

Conversation

michaelbaamonde
Copy link
Contributor

@michaelbaamonde michaelbaamonde commented Oct 26, 2021

In the course of getting started on #1215, I discovered that scroll-search operations don't default to the index (or data stream) defined at the top level of the track under the indices key. The pmc and geonames tracks expose this. But if you explicitly provide an index parameter to the operation (as http_logs does)1 it works...

...by accident, it turns out. After digging into it, the core issue seems to be that there was no parameter source registered for scroll-search or paginated-search operations. So, neither the track object's indices attribute (necessary for choosing a default index parameter for the operation) nor the operation-type parameter (necessary for choosing the right function to call) was getting passed down the chain. But, the way the control flow works for dispatching to the appropriate function in the in the Query.__call__ method, it wound up just issuing a request body query with a pages parameter, which happens to be the legacy implementation of scroll searches.

So, this commit addresses this by doing three things:

  1. Augmenting SearchParamSource to pass the operation-type it receives through to its callers.
  2. Registering SearchParamSource for both scrolls and paginated queries.
  3. Enforcing operation-type as a mandatory paramater when calling the Query runner (necessary for custom parameter sources to avoid unexpected behavior.)

That said, I do wonder if it'd make sense to have separate classes (perhaps subclasses of Query) for each of these three operations. Their logic is distinct, it's possible that they should have their own parameter sources, and dispatching on the operation type within one large __call__ method feels a little indirect given that we already have a mechanism for explicitly registering runners on a per-operation basis. I worry there may be some other trappy behavior lurking in here somewhere, but I opted to fix the bug within the current implementation instead of doing a larger refactoring along those lines. Happy to hear any thoughts on that front.

Footnotes

  1. https://github.com/michaelbaamonde/rally-tracks/commit/b700eac059ab42246768f44d0e656caf144a0269 contains the relevant changes that sparked this investigation.

The `Query` runner services three operation types: `search`,
`paginated-search`, and `scroll-search`. Its `__call__` method dispatches to
the appropriate function for one of these operations based on the
`operation-type` passed in via the `params` dict that it accepts as an argument.

The `operation-type` actually never got passed to this method, however. This is
because the `SearchParamSource` did not include the `operation-type` key in
the dict ultimately returned by its `params` method. The upshot is that all
three operation types wound up ultimately calling the same function
(`_request_body_query`), which was not intentional.

This commit addresses this by:

1. Augmenting `SearchParamSource` to pass the `operation-type` it receives
   through to its callers.
2. Registering `SearchParamSource` for both scrolls and paginated queries.
3. Enforcing `operation-type` as a mandatory paramater when calling the `Query`
   runner (necessary for custom parameter sources to avoid unexpected behavior.)
@danielmitterdorfer danielmitterdorfer added :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. bug Something's wrong labels Oct 27, 2021
@danielmitterdorfer danielmitterdorfer added this to the 2.3.1 milestone Oct 27, 2021
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great finding and thanks for addressing this. Looks pretty good already but I have a few comments.

esrally/driver/runner.py Outdated Show resolved Hide resolved
esrally/driver/runner.py Show resolved Hide resolved
@danielmitterdorfer
Copy link
Member

I do wonder if it'd make sense to have separate classes (perhaps subclasses of Query) for each of these three operations. Their logic is distinct, it's possible that they should have their own parameter sources, and dispatching on the operation type within one large call method feels a little indirect given that we already have a mechanism for explicitly registering runners on a per-operation basis.

I agree on the refactoring and I also agree that it's good to separate fixing this bug from the refactoring. Personally I'd try to avoid subclasses if we don't absolutely need them.

I worry there may be some other trappy behavior lurking in here somewhere [...]

I don't think there is anything that is prohibiting us from refactoring this (in a separate PR). The current implementation is due to the fact that historically this supported regular search and scrolls felt too similar to put it in a different class. But now that we added more functionality I agree that it makese more sense to separate them.

- Use fstrings in exception messages
- Ensure that a warning is logged but execution still proceeds when searches
  include a `pages` parameter
- Document that the `pages` parameter is deprecated
- Test that unknown operation types passed to the Query runner cause an
  exception to be raised
@michaelbaamonde
Copy link
Contributor Author

Thanks for the feedback @danielmitterdorfer. See 313408f for the relevant changes.

I don't think there is anything that is prohibiting us from refactoring this (in a separate PR).

Sounds good. I'll do that when I have some time.

esrally/driver/runner.py Outdated Show resolved Hide resolved
@danielmitterdorfer
Copy link
Member

The CI (test) failure appears unrelated to me.

@elasticmachine test this please

@danielmitterdorfer
Copy link
Member

The RTD build failure is due to readthedocs/readthedocs.org#8616 and should be addressed separately.

@DJRickyB
Copy link
Contributor

does this work here?

@elasticmachine update branch

@danielmitterdorfer
Copy link
Member

@elasticmachine update branch

@DJRickyB
Copy link
Contributor

There's a hint in the logs of the CI failure that this change breaks the deprecated behavior:

2021-10-26 20:16:15,181 ActorAddr-(T|:41205)/PID:24535 esrally.driver.runner WARNING Invoking a scroll search with the 'search' operation is deprecated and will be removed in a future release. Use 'scroll-search' instead.
2021-10-26 20:16:15,176 ActorAddr-(T|:36999)/PID:24402 esrally.driver.driver INFO Scheduling next task for worker id [2] at their timestamp [395.707632] (master timestamp [395.715152])
2021-10-26 20:16:15,181 ActorAddr-(T|:41205)/PID:24535 esrally.driver.driver ERROR Could not execute schedule
Traceback (most recent call last):

  File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py38/lib/python3.8/site-packages/esrally/driver/driver.py", line 1782, in __call__
    request_start = request_context.request_start

  File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py38/lib/python3.8/site-packages/esrally/client.py", line 47, in request_start
    return self.ctx["request_start"]

KeyError: 'request_start'


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py38/lib/python3.8/site-packages/esrally/driver/driver.py", line 1783, in __call__
    request_end = request_context.request_end

  File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py38/lib/python3.8/site-packages/esrally/client.py", line 55, in __aexit__
    request_start = self.request_start

  File "/var/lib/jenkins/workspace/elastic+rally+pull-request/.tox/py38/lib/python3.8/site-packages/esrally/client.py", line 47, in request_start
    return self.ctx["request_start"]

KeyError: 'request_start'

@DJRickyB
Copy link
Contributor

confirmed I can reproduce the failure locally with

esrally race --test-mode --kill-running-processes --track geonames --challenge append-no-conflicts --distribution-version 7.13.3

@michaelbaamonde
Copy link
Contributor Author

There are two failures. The test_sources[in-memory-it] failure reproduces reliably for me on master, so it's unrelated to this change. I'll look into that.

I was also able to reproduce this failure and it is indeed a problem related to this PR.

The issue is that the geonames track defines and registers a few custom parameter sources that do not pass through the operation-type parameter defined for the operation, which this PR introduces as mandatory. The nested track will also surface this problem.

So, while there may be an argument to be made that this leniency was always a bug, this PR as it currently stands introduces a breaking change to the API for parameter sources registered for any operations that ultimately call the Query runner.

@danielmitterdorfer @DJRickyB Let's discuss options offline. I have a few questions that will be easier to talk through.

@michaelbaamonde
Copy link
Contributor Author

I gave this some more thought (and testing) and I think the best option is to simply not make operation-type manadatory in parameter sources, and default to search if another one isn't provided. My rationale is:

Parameter sources shouldn't need to define an operation type. That's simply part of what's required to define an operation, and we enforce that operations have both names and types when the track is loaded and validated against the schema. The only reason we need the operation type to make its way into this specific runner is that we're using the same class for multiple operation types, which we've agreed should be refactored into the class-per-operation model in the future, anyway.

My latest commit should fix the bug that motivated this PR in the first place in a way that preserves backwards compatibility.

Mike Baamonde and others added 3 commits October 28, 2021 16:57
@danielmitterdorfer
Copy link
Member

I gave this some more thought (and testing) and I think the best option is to simply not make operation-type manadatory in parameter sources, and default to search if another one isn't provided.

FWIW, I agree. Thanks for looking into this.

There is one test failure left but as far as I understood your comment above you're looking into this.

@michaelbaamonde
Copy link
Contributor Author

@elasticmachine test this please

@michaelbaamonde
Copy link
Contributor Author

@danielmitterdorfer @DJRickyB The unrelated test failures are resolved now, so this is ready for another review. Thanks!

@DJRickyB
Copy link
Contributor

DJRickyB commented Nov 1, 2021

thanks. now that operation-type is not mandatory, can you undo changes to the tests here?

@michaelbaamonde
Copy link
Contributor Author

now that operation-type is not mandatory, can you undo changes to the tests here?

It's not mandatory in the sense that we don't require custom parameter sources to provide an operation type, but it is required that calls to the Query runner include one, somehow. The runner needs to have some awareness of the operation type in order to avoid the bug that motivated this PR initially.

That said, your comment did make me consider another problem, which is that defaulting to search as the operation type in the absence of a parameter source providing one means that custom parameter sources for scroll-search or paginated-search operations won't work as expected; they'll wind up ultimately calling the wrong function. So, this PR fixes the bug in the default case, but it's still lurking there for more advanced usage.

Ultimately, we really do rely on a 1:1 mapping between operation types and runners, and the registration mechanism provides a straightforward way to look up the exact runner for a given operation. The Query runner breaks this model in a way that forces us to tell it what type of operation it needs to execute. Parameter sources give us a way to do that, but this is a bit of a hack on my part, and it entails a tighter coupling between parameter sources and runners than I think is intended.

With that said, I just pushed a commit that passes the operation-type to the runner by essentially injecting it into the params returned by the parameter source, but in a way that doesn't require any changes to the parameter source itself. This preserves the contract between parameter sources and runners, at least from the track author's perspective. It's another hack, though, which brings me back to:

I worry there may be some other trappy behavior lurking in here somewhere, but I opted to fix the bug within the current implementation instead of doing a larger refactoring

I think it may be safer and quicker to refactor the Query runner into three separate classes instead of iterating further on workarounds. If this PR is mergeable after this latest commit, we can defer that, but if not, I think the time would be better spent on the more fundamental refactoring.

What do you think @DJRickyB @danielmitterdorfer?

@DJRickyB
Copy link
Contributor

DJRickyB commented Nov 2, 2021

It's the end of my workday, I want to look at this deeper tomorrow, but I'll note that the "1:1" question came up in the course of #1190, when at the time it was deemed not necessary. As far as I can tell you've bumped up against the limitations of that idea and I'm curious about @danielmitterdorfer's thoughts as well.

@danielmitterdorfer
Copy link
Member

I think it may be safer and quicker to refactor the Query runner into three separate classes instead of iterating further on workarounds. If this PR is mergeable after this latest commit, we can defer that, but if not, I think the time would be better spent on the more fundamental refactoring.

@michaelbaamonde and I had a chat on another channel about this. tl;dr as long as none of our tracks break with this change, we can go ahead and merge this and then start working on the refactoring in a separate PR immediately after.

@michaelbaamonde
Copy link
Contributor Author

I ran all of our tracks and found no issues, so I'll merge.

@michaelbaamonde michaelbaamonde merged commit 7460e31 into elastic:master Nov 4, 2021
@michaelbaamonde michaelbaamonde deleted the query-runner-dispatch-on-operation-type branch November 4, 2021 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something's wrong :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants