-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for search_after and point-in-time #1190
Conversation
docs/track.rst
Outdated
@@ -883,8 +883,13 @@ Properties | |||
* ``body`` (mandatory): The query body. | |||
* ``response-compression-enabled`` (optional, defaults to ``true``): Allows to disable HTTP compression of responses. As these responses are sometimes large and decompression may be a bottleneck on the client, it is possible to turn off response compression. | |||
* ``detailed-results`` (optional, defaults to ``false``): Records more detailed meta-data about queries. As it analyzes the corresponding response in more detail, this might incur additional overhead which can skew measurement results. This flag is ineffective for scroll queries. | |||
* ``pages`` (optional): Number of pages to retrieve. If this parameter is present, a scroll query will be executed. If you want to retrieve all result pages, use the value "all". | |||
* ``results-per-page`` (optional): Number of documents to retrieve per page for scroll queries. | |||
* ``results-per-page`` (optional): Number of results to retrieve per page. This maps to the Search API's ``size`` parameter, and can be used for paginated and non-paginated searches. Defaults to ``10`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this just be called size
? there didn't seem to be a good reason to make this only for scroll/pagination given it maps to something consequential to the single request_body search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me the name results-per-page
provides less room for interpretation than size
. We could, however, introduce size
as a parameter and allow results-per-page
as an alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can/should i deprecate results-per-page
here?
esrally/track/params.py
Outdated
target_name = params.get("index") | ||
type_name = params.get("type") | ||
if not target_name: | ||
target_name = params.get("data-stream", default_target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is data-stream
intentionally undocumented for search
? also undocumented here, tentatively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I did a first pass and left a couple of thoughts.
docs/track.rst
Outdated
@@ -883,8 +883,13 @@ Properties | |||
* ``body`` (mandatory): The query body. | |||
* ``response-compression-enabled`` (optional, defaults to ``true``): Allows to disable HTTP compression of responses. As these responses are sometimes large and decompression may be a bottleneck on the client, it is possible to turn off response compression. | |||
* ``detailed-results`` (optional, defaults to ``false``): Records more detailed meta-data about queries. As it analyzes the corresponding response in more detail, this might incur additional overhead which can skew measurement results. This flag is ineffective for scroll queries. | |||
* ``pages`` (optional): Number of pages to retrieve. If this parameter is present, a scroll query will be executed. If you want to retrieve all result pages, use the value "all". | |||
* ``results-per-page`` (optional): Number of documents to retrieve per page for scroll queries. | |||
* ``results-per-page`` (optional): Number of results to retrieve per page. This maps to the Search API's ``size`` parameter, and can be used for paginated and non-paginated searches. Defaults to ``10`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me the name results-per-page
provides less room for interpretation than size
. We could, however, introduce size
as a parameter and allow results-per-page
as an alias.
esrally/track/params.py
Outdated
target_name = params.get("index") | ||
type_name = params.get("type") | ||
if not target_name: | ||
target_name = params.get("data-stream", default_target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating. I did another pass through the code and left a couple of suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating! I left some minor suggestions and corrections on the docs and some formatting. SearchAfterExtractor
looks good but I think we can use one shared instance per runner.
docs/track.rst
Outdated
Meta-data | ||
""""""""" | ||
|
||
The following meta data are always returned: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this superfluous?
docs/track.rst
Outdated
|
||
The following meta data are always returned: | ||
|
||
* ``weight``: "weight" of an operation. Always 1 for regular queries and the number of retrieved pages for scroll queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mentions regular queries but the section is about paginated queries so we can simplify the sentence.
docs/track.rst
Outdated
The following meta data are always returned: | ||
|
||
* ``weight``: "weight" of an operation. Always 1 for regular queries and the number of retrieved pages for scroll queries. | ||
* ``unit``: The unit in which to interpret ``weight``. Always "ops" for regular queries and "pages" for scroll queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mentions regular queries but the section is about paginated queries so we can simplify the sentence.
docs/track.rst
Outdated
|
||
The following meta data are always returned: | ||
|
||
* ``weight``: "weight" of an operation. Always 1 for regular queries and the number of retrieved pages for scroll queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mentions regular queries but the section is about scroll queries so we can simplify the sentence.
docs/track.rst
Outdated
The following meta data are always returned: | ||
|
||
* ``weight``: "weight" of an operation. Always 1 for regular queries and the number of retrieved pages for scroll queries. | ||
* ``unit``: The unit in which to interpret ``weight``. Always "ops" for regular queries and "pages" for scroll queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mentions regular queries but the section is about scroll queries so we can simplify the sentence.
docs/track.rst
Outdated
|
||
**Example** | ||
|
||
In this example, a point-in-time is opened, used by a ``search_after``-based search operation, and closed:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... used by a paginated-search
operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was going for a "this is what we are doing to elasticsearch" description rather than a "this is what we are doing in rally" description, but I take this hesitation as sufficient evidence it should be changed (so it was)
esrally/driver/runner.py
Outdated
@@ -799,111 +821,156 @@ async def request_body_query(self, es, params): | |||
# disable eager response parsing - responses might be huge thus skewing results | |||
es.return_raw_response() | |||
|
|||
r = await self._raw_search(es, doc_type, index, body, request_params, headers=headers) | |||
async def _search_after_query(es, params): | |||
extract = SearchAfterExtractor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively means that the regex is recompiled on every request on the hot code path. As this class is immutable I propose that we instead create one instance of SearchAfterExtractor
in the runner's constructor and reuse it.
esrally/driver/runner.py
Outdated
r = await self._raw_search(es, doc_type, index, body, params, headers=headers) | ||
|
||
props = parse(r, | ||
["_scroll_id", "hits.total", "hits.total.value", "hits.total.relation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting of parameters looks funny here. Is this intentional?
tests/driver/runner_test.py
Outdated
# make sure pit_id is updated afterward | ||
self.assertEqual("fedcba9876543211", runner.CompositeContext.get(pit_op)) | ||
|
||
es.transport.perform_request.assert_has_calls([mock.call('GET', '/_search', params={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can you please check that we use double-quotes everywhere consistently?
tests/driver/runner_test.py
Outdated
self.assertEqual("fedcba9876543211", runner.CompositeContext.get(pit_op)) | ||
|
||
es.transport.perform_request.assert_has_calls([mock.call('GET', '/_search', params={}, | ||
body={'query': {'match-all': {}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Personally, I find the following formatting easier to read:
body={
'query': {
'match-all': {}
},
'sort': [{
'timestamp': 'asc',
'tie_breaker_id': 'asc'
}],
'size': 2,
'pit': {
'id': '0123456789abcdef',
'keep_alive': '1m'
}
},
I'm fine if we keep it as is but I want to offer it as a thought.
(and i'm all out of nits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating. Looks good now! :)
This change adds support for
search_after
usage andpoint-in-time
interactions. Specifically:open-point-in-time
andclose-point-in-time
operations, for use in composite contextssearch_after
pagination, using a new operation typepaginated-search
scroll-search
operation type for usabilityBonus:
detailed-results
and the properties required forsearch_after
supportget-async-search
todelete-async-search
)Closes #1141