Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ClusterManager operations are not called with "task" param #4062

Closed
saikaranam-amazon opened this issue Aug 1, 2022 · 7 comments · Fixed by #4103
Closed

[BUG] ClusterManager operations are not called with "task" param #4062

saikaranam-amazon opened this issue Aug 1, 2022 · 7 comments · Fixed by #4103
Labels

Comments

@saikaranam-amazon
Copy link
Member

saikaranam-amazon commented Aug 1, 2022

Describe the bug
Recent changes on main (and backport of these changes on v2.2 branch) has a bug where corresponding method of clusterManagerOperation with task param is not called.

** Issue/Fix **
clusterManagerOperations with task param needs to call corresponding previous method with task param

protected void clusterManagerOperation(Task task, Request request, ClusterState state, ActionListener<Response> listener)
        throws Exception {
        masterOperation(task, request, state, listener);
    }

Ref: https://github.com/opensearch-project/OpenSearch/blob/2.x/server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeAction.java#L141-L144

@saikaranam-amazon
Copy link
Member Author

@Rishikesh1159 - Could you take a look ?

@tlfeng
Copy link
Collaborator

tlfeng commented Aug 1, 2022

Hi @saikaranam-amazon , I'm afraid this is not a bug caused by the inclusive terminology renaming.
This is the code in version 2.1 before the renaming.

/**
* Override this operation if access to the task parameter is needed
*/
protected void masterOperation(Task task, Request request, ClusterState state, ActionListener<Response> listener) throws Exception {
masterOperation(request, state, listener);
}

(https://github.com/opensearch-project/OpenSearch/blob/2.1.0/server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeAction.java#L122-L127)

The task parameter in method masterOperation(...) was not called by default, and the comment told to override the method if accessing task parameter is needed.

@tlfeng tlfeng added v2.2.0 and removed untriaged labels Aug 1, 2022
@saikaranam-amazon
Copy link
Member Author

@tlfeng

CCR is using the TransportMasterNodeAction (and there are no other changes apart from the version bump to 2.2) and masterOperation (with task param) method is overriden to get the relevant task details.
Problem is, this method is never called evident from the below relevant stack trace as eventually all methods under TransportClusterManagerNodeAction are wired to call masterOperation (without task params)

at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction.clusterManagerOperation(TransportClusterManagerNodeAction.java:129)
        at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction.clusterManagerOperation(TransportClusterManagerNodeAction.java:143)
        at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction.lambda$doStart$3(TransportClusterManagerNodeAction.java:221)
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88)
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
        at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:301)
        at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction$AsyncSingleAction.doStart(TransportClusterManagerNodeAction.java:221)
        at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction.doExecute(TransportClusterManagerNodeAction.java:159)
        at org.opensearch.action.support.clustermanager.TransportClusterManagerNodeAction.doExecute(TransportClusterManagerNodeAction.java:74)
        at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:204)
        at org.opensearch.action.support.TransportAction.execute(TransportAction.java:174)
        at org.opensearch.action.support.TransportAction.execute(TransportAction.java:102)
        at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110)
        at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:97)
        at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:423)
        at org.opensearch.replication.util.CoroutinesKt$suspendExecute$2.invokeSuspend(Coroutines.kt:80)
        at org.opensearch.replication.util.CoroutinesKt$suspendExecute$2.invoke(Coroutines.kt)
        at org.opensearch.replication.util.CoroutinesKt$suspendExecute$2.invoke(Coroutines.kt)
        at kotlinx.coroutines.intrinsics.UndispatchedKt.startUndispatchedOrReturn(Undispatched.kt:89)
        at kotlinx.coroutines.BuildersKt__Builders_commonKt.withContext(Builders.common.kt:165)
        at kotlinx.coroutines.BuildersKt.withContext(Unknown Source)
        at org.opensearch.replication.util.CoroutinesKt.suspendExecute(Coroutines.kt:79)
        at org.opensearch.replication.util.CoroutinesKt.suspendExecute$default(Coroutines.kt:75)

@tlfeng
Copy link
Collaborator

tlfeng commented Aug 3, 2022

Thanks for pointing out this problem! 👍

I found that the cause is the method masterOperation(Task task, Request request, ClusterState state, ActionListener<Response> listener) is totally bypassed.

.execute(ActionRunnable.wrap(delegate, l -> clusterManagerOperation(task, request, clusterState, l)));

The above line should be reverted to using masterOperation (with task param). I will make the code change for it.

@tlfeng
Copy link
Collaborator

tlfeng commented Aug 3, 2022

Created PR #4103 to solve the problem. Thank you for reporting this problem.

@saikaranam-amazon
Copy link
Member Author

saikaranam-amazon commented Aug 3, 2022

Created PR #4103 to solve the problem

Thanks @tlfeng

@tlfeng
Copy link
Collaborator

tlfeng commented Aug 3, 2022

The fix is backported to 2.x and 2.2 branch by PR #4114 and #4115.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants