-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call the correct functions for scroll-search
and paginated-search
operations
#1368
Call the correct functions for scroll-search
and paginated-search
operations
#1368
Conversation
The `Query` runner services three operation types: `search`, `paginated-search`, and `scroll-search`. Its `__call__` method dispatches to the appropriate function for one of these operations based on the `operation-type` passed in via the `params` dict that it accepts as an argument. The `operation-type` actually never got passed to this method, however. This is because the `SearchParamSource` did not include the `operation-type` key in the dict ultimately returned by its `params` method. The upshot is that all three operation types wound up ultimately calling the same function (`_request_body_query`), which was not intentional. This commit addresses this by: 1. Augmenting `SearchParamSource` to pass the `operation-type` it receives through to its callers. 2. Registering `SearchParamSource` for both scrolls and paginated queries. 3. Enforcing `operation-type` as a mandatory paramater when calling the `Query` runner (necessary for custom parameter sources to avoid unexpected behavior.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great finding and thanks for addressing this. Looks pretty good already but I have a few comments.
I agree on the refactoring and I also agree that it's good to separate fixing this bug from the refactoring. Personally I'd try to avoid subclasses if we don't absolutely need them.
I don't think there is anything that is prohibiting us from refactoring this (in a separate PR). The current implementation is due to the fact that historically this supported regular search and scrolls felt too similar to put it in a different class. But now that we added more functionality I agree that it makese more sense to separate them. |
- Use fstrings in exception messages - Ensure that a warning is logged but execution still proceeds when searches include a `pages` parameter - Document that the `pages` parameter is deprecated - Test that unknown operation types passed to the Query runner cause an exception to be raised
Thanks for the feedback @danielmitterdorfer. See 313408f for the relevant changes.
Sounds good. I'll do that when I have some time. |
The CI (test) failure appears unrelated to me. @elasticmachine test this please |
The RTD build failure is due to readthedocs/readthedocs.org#8616 and should be addressed separately. |
does this work here? @elasticmachine update branch |
@elasticmachine update branch |
There's a hint in the logs of the CI failure that this change breaks the deprecated behavior:
|
confirmed I can reproduce the failure locally with esrally race --test-mode --kill-running-processes --track geonames --challenge append-no-conflicts --distribution-version 7.13.3 |
There are two failures. The I was also able to reproduce this failure and it is indeed a problem related to this PR. The issue is that the So, while there may be an argument to be made that this leniency was always a bug, this PR as it currently stands introduces a breaking change to the API for parameter sources registered for any operations that ultimately call the @danielmitterdorfer @DJRickyB Let's discuss options offline. I have a few questions that will be easier to talk through. |
I gave this some more thought (and testing) and I think the best option is to simply not make Parameter sources shouldn't need to define an operation type. That's simply part of what's required to define an operation, and we enforce that operations have both names and types when the track is loaded and validated against the schema. The only reason we need the operation type to make its way into this specific runner is that we're using the same class for multiple operation types, which we've agreed should be refactored into the class-per-operation model in the future, anyway. My latest commit should fix the bug that motivated this PR in the first place in a way that preserves backwards compatibility. |
Co-authored-by: Rick Boyd <boyd.richardj@gmail.com>
FWIW, I agree. Thanks for looking into this. There is one test failure left but as far as I understood your comment above you're looking into this. |
@elasticmachine test this please |
@danielmitterdorfer @DJRickyB The unrelated test failures are resolved now, so this is ready for another review. Thanks! |
thanks. now that |
It's not mandatory in the sense that we don't require custom parameter sources to provide an operation type, but it is required that calls to the That said, your comment did make me consider another problem, which is that defaulting to Ultimately, we really do rely on a 1:1 mapping between operation types and runners, and the registration mechanism provides a straightforward way to look up the exact runner for a given operation. The With that said, I just pushed a commit that passes the
I think it may be safer and quicker to refactor the What do you think @DJRickyB @danielmitterdorfer? |
It's the end of my workday, I want to look at this deeper tomorrow, but I'll note that the "1:1" question came up in the course of #1190, when at the time it was deemed not necessary. As far as I can tell you've bumped up against the limitations of that idea and I'm curious about @danielmitterdorfer's thoughts as well. |
@michaelbaamonde and I had a chat on another channel about this. tl;dr as long as none of our tracks break with this change, we can go ahead and merge this and then start working on the refactoring in a separate PR immediately after. |
I ran all of our tracks and found no issues, so I'll merge. |
In the course of getting started on #1215, I discovered that
scroll-search
operations don't default to the index (or data stream) defined at the top level of the track under theindices
key. Thepmc
andgeonames
tracks expose this. But if you explicitly provide anindex
parameter to the operation (ashttp_logs
does)1 it works......by accident, it turns out. After digging into it, the core issue seems to be that there was no parameter source registered for
scroll-search
orpaginated-search
operations. So, neither the track object'sindices
attribute (necessary for choosing a defaultindex
parameter for the operation) nor theoperation-type
parameter (necessary for choosing the right function to call) was getting passed down the chain. But, the way the control flow works for dispatching to the appropriate function in the in theQuery.__call__
method, it wound up just issuing a request body query with apages
parameter, which happens to be the legacy implementation of scroll searches.So, this commit addresses this by doing three things:
SearchParamSource
to pass theoperation-type
it receives through to its callers.SearchParamSource
for both scrolls and paginated queries.operation-type
as a mandatory paramater when calling theQuery
runner (necessary for custom parameter sources to avoid unexpected behavior.)That said, I do wonder if it'd make sense to have separate classes (perhaps subclasses of
Query
) for each of these three operations. Their logic is distinct, it's possible that they should have their own parameter sources, and dispatching on the operation type within one large__call__
method feels a little indirect given that we already have a mechanism for explicitly registering runners on a per-operation basis. I worry there may be some other trappy behavior lurking in here somewhere, but I opted to fix the bug within the current implementation instead of doing a larger refactoring along those lines. Happy to hear any thoughts on that front.Footnotes
https://github.com/michaelbaamonde/rally-tracks/commit/b700eac059ab42246768f44d0e656caf144a0269 contains the relevant changes that sparked this investigation. ↩