Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: Adds an ability to start an asynchronous EQL search #56147

Closed
wants to merge 5 commits into from

Conversation

imotov
Copy link
Contributor

@imotov imotov commented May 4, 2020

Adds support for async searches to eql search API. This commit is limited to
only submitting search API requests and doesn't provide APIs to get results
nor delete the results. These functions will be added in follow up PRs.

Relates to #49638

Adds support for async searches to eql search API. This commit is limited to
only submitting search API requests and doesn't provide APIs to get results
nor delete the results. These functions will be added in follow up PRs.

Relates to elastic#49638
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/EQL)

@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label May 4, 2020
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main comment is on generalizing the async infrastructure bits and promoting them as utilities inside the xpack.core.async. I would imagine any consumer of said API would have to go through the same process of storing and retrieving the results, having consistent parameters (wait_for_*).
This would also apply to testing which is focused on the async nature.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two questions. My main question is whether the new abstractions that are introduced with this PR will be shared with other async API. I worry that it may be too early to go through the exercise of generalizing as the async behaviour of eql is not consolidated yet and it seems to differ from what e.g. async search does.

import java.util.Map;
import java.util.concurrent.atomic.AtomicReference;

public class AsyncTaskManagementService<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this service? It looks to me like a simplified fork of what async search does when it comes to waiting for completion and storing the response. Yet I see only one user of these new abstractions. Are there going to be more users of this? Async search will not be migrated to it, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can migrate async search here in the future. The main difference is that this service relies on cascading cancellation and async search is preforming cancellation internally. So, some pieces of this logic are missing here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancellation is not a problem, we need to migrate async search to rely on cascading cancellation anyways. But the logic around storing partial responses and wait for completion is very different from what you have now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think logic is pretty much the same but implementation is quite a bit different. Removing cancellation concerns allowed me to remove all the thread management and client calls from the task into the service and keep the task lightweight. I don't think a task should be a control structure or use clients. It should be just a simple light weight data transfer structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point where the behaviour differs at first glance is that in async search, we never want to return nor store a response before the initial onListShards listener callback. Hence we register a completion listener, but that is not invoked until onListShards is called.
What I am saying is that there are subtle differences that are independent from task cancellation, and I am not sure that async search can be migrated to this simplified version of async management service.

public class EqlSearchResponse {

private final Hits hits;
private final long tookInMillis;
private final boolean isTimeout;
private final String asyncExecutionId;
private final boolean isRunning;
private final boolean isPartial;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is is_partial needed in eql given that no partial results are returned? We are regretting to have added this flag in the response of async search, and we will likely remove it, so it would be good to make sure that it's is needed in eql.

@imotov
Copy link
Contributor Author

imotov commented May 12, 2020

My main comment is on generalizing the async infrastructure bits and promoting them as utilities inside the xpack.core.async. I would imagine any consumer of said API would have to go through the same process of storing and retrieving the results, having consistent parameters (wait_for_*).
This would also apply to testing which is focused on the async nature.

My main question is whether the new abstractions that are introduced with this PR will be shared with other async API. I worry that it may be too early to go through the exercise of generalizing as the async behaviour of eql is not consolidated yet and it seems to differ from what e.g. async search does.

I cannot win this game 😄. @javanna , @costin : Should I undo my last changes and merge it back into the EQL transport?

@javanna
Copy link
Member

javanna commented May 12, 2020

I am not against abstractions and sharing code once there is the need, but I don't see what the direction is here. Abstractions add complexity and we have only one user of those at the moment, and I would like to understand what the plan is around having more users of these new abstractions. Ideally, I would delay abstracting things until there is concrete need for it. I am saying this especially looking at how in theory EQL async aspects should share behaviour with async search although in practice they do only partially.

@imotov
Copy link
Contributor Author

imotov commented May 12, 2020

Talked about it offline. We came up with the following plan:

In this PR:

  • since we are not sure about reuse, we are going to move the logic of AsyncTaskManagementService inside EQL branch
  • we will get rid of a separate index and security user for eql and reuse async search one and start reusing the same security user in EQL until we can switch to system indices

In the follow up PR:

  • we will move TransportGetAsyncSearchAction and TransportDeleteAsyncSearchAction to core and make it reusable

@jimczi, @javanna , @costin please add if I missed something.

@imotov
Copy link
Contributor Author

imotov commented May 12, 2020

Closing. Will reopen it against a branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/EQL EQL querying >enhancement Team:QL (Deprecated) Meta label for query languages team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants