Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Layer changes for Recommission API #4320

Merged
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
8c38c53
Initial commit for recommission service level changes
pranikum Aug 27, 2022
40d739b
Remove commented changes from service class
pranikum Aug 29, 2022
32d1499
Update Service layer to handle removal of recommission zone attribute
pranikum Sep 1, 2022
79561e1
Merge branch 'main' into recommission-service-level-support
pranikum Sep 7, 2022
f3b8dd8
Merge branch 'main' into recommission-service-level-support
pranikum Sep 7, 2022
ace11fe
Add change log and address comment.
pranikum Sep 7, 2022
3988e2d
Fix EOL Errors
pranikum Sep 7, 2022
fe3f01a
Fix spotless check
pranikum Sep 7, 2022
7ffb650
Remove unwanted files
pranikum Sep 7, 2022
bf0af0a
Remove unwanted files
pranikum Sep 7, 2022
1972243
Add unwanted deletion
pranikum Sep 7, 2022
2584d51
Add the correct gradle jar
pranikum Sep 7, 2022
e9d50a6
Fix spotless java check
pranikum Sep 7, 2022
4c6273b
Merge with latest
pranikum Sep 21, 2022
0b56861
Merge branch 'main' into recommission-service-level-support
pranikum Sep 21, 2022
95c41ff
Update Changelog
pranikum Sep 21, 2022
6e2ae8f
Add call to set weights
pranikum Sep 23, 2022
eaeefa6
Fix spotless check
pranikum Sep 23, 2022
8ca49e7
Fix spotless java check
pranikum Sep 23, 2022
97c4d32
Fix logger check
pranikum Sep 23, 2022
2d83698
Remove unused class
pranikum Sep 23, 2022
e7946d8
Add package info
pranikum Sep 24, 2022
98b4ac8
Merge change log to latest
pranikum Sep 24, 2022
fcf36cf
Merge branch 'main' into recommission-service-level-support
pranikum Sep 24, 2022
4f6543b
Add to changelog
pranikum Sep 24, 2022
02a96ba
add new line to package info
pranikum Sep 24, 2022
b2bfb31
Resolve conflict with latest
pranikum Sep 28, 2022
f3582e3
Merge branch 'main' into recommission-service-level-support
pranikum Sep 28, 2022
fe8f3b1
Take latest and add recommission changes
pranikum Sep 28, 2022
d52d48e
Remove unused class
pranikum Sep 28, 2022
93ee423
Merge changelog
pranikum Sep 29, 2022
45b98d3
Merge branch 'main' into recommission-service-level-support
pranikum Sep 29, 2022
56eca8f
Update changelog. Address minor changes
pranikum Sep 29, 2022
4d04395
Remove setting of weights. Just delete the attribute
pranikum Sep 30, 2022
c7c9f35
Update test cases. Call clearVotingConfigExclusion
pranikum Sep 30, 2022
0cf8def
Merge with latest
pranikum Oct 3, 2022
e203a57
Merge branch 'main' into recommission-service-level-support
pranikum Oct 3, 2022
7c14326
Merge with latest
pranikum Oct 3, 2022
f281473
PR comments
pranikum Oct 3, 2022
f22307f
PR comments
pranikum Oct 3, 2022
22a1d5e
Update method name
pranikum Oct 3, 2022
5227538
Update method name
pranikum Oct 3, 2022
b0cfef5
PR comments
pranikum Oct 5, 2022
c16ac3a
Resolve conflict with main
pranikum Oct 5, 2022
1b3471b
Merge branch 'main' into recommission-service-level-support
pranikum Oct 5, 2022
b1cf7a5
Merge changelog with latest
pranikum Oct 5, 2022
22cd77c
Update logger message
pranikum Oct 5, 2022
b3930c5
Fix log messages
pranikum Oct 5, 2022
5228119
Merge with main
pranikum Oct 6, 2022
4f58513
Merge branch 'main' into recommission-service-level-support
pranikum Oct 6, 2022
224dad4
Change log changes
pranikum Oct 6, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Add getter for path field in NestedQueryBuilder ([#4636](https://github.com/opensearch-project/OpenSearch/pull/4636))
- Added precommit support for windows ([#4676](https://github.com/opensearch-project/OpenSearch/pull/4676))
- Added precommit support for MacOS ([#4682](https://github.com/opensearch-project/OpenSearch/pull/4682))
- Recommission API changes for service layer ([#4320](https://github.com/opensearch-project/OpenSearch/pull/4320))
### Dependencies
- Bumps `log4j-core` from 2.18.0 to 2.19.0
- Bumps `reactor-netty-http` from 1.0.18 to 1.0.23
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import org.opensearch.OpenSearchTimeoutException;
import org.opensearch.action.ActionListener;
import org.opensearch.action.admin.cluster.decommission.awareness.put.DecommissionResponse;
import org.opensearch.action.support.master.AcknowledgedResponse;
import org.opensearch.cluster.ClusterState;
import org.opensearch.cluster.ClusterStateObserver;
import org.opensearch.cluster.ClusterStateUpdateTask;
Expand Down Expand Up @@ -481,4 +482,53 @@ public void onFailure(Exception e) {
}
};
}

public void startRecommissionAction(final ActionListener<AcknowledgedResponse> listener) {
/*
* For abandoned requests, we might not really know if it actually restored the exclusion list.
* And can land up in cases where even after recommission, exclusions are set(which is unexpected).
* And by definition of OpenSearch - Clusters should have no voting configuration exclusions in normal operation.
* Once the excluded nodes have stopped, clear the voting configuration exclusions with DELETE /_cluster/voting_config_exclusions.
* And hence it is safe to remove the exclusion if any. User should make conscious choice before decommissioning awareness attribute.
*/
decommissionController.clearVotingConfigExclusion(new ActionListener<Void>() {
pranikum marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we provide an option to not force clear voting as a flag to the API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster might get into an unexpected state if we don't clear the exclusion and recommissioned the zone. When the cluster decommissioned one zone, it was expected that no further exclusion will be set as we already decommissioned one master. Any exclusion after decommissioning, would just be making things worse for a cluster already under stress. Not sure, if the user would need any exclusion set when the recommissioning the zone back.

Also as per discussion in #4084 PR, the DELETE API should try to restore the system back for multiple failure cases

Let me know if you see a case where recommissioning won't expect the exclusion to be wiped off

@Override
public void onResponse(Void unused) {
logger.info("successfully cleared voting config exclusion for deleting the decommission");
deleteDecommissionState(listener);
}

@Override
public void onFailure(Exception e) {
logger.error(new ParameterizedMessage("failure in clearing voting config during delete_decommission request"), e);
listener.onFailure(e);
}
}, false);
}

void deleteDecommissionState(ActionListener<AcknowledgedResponse> listener) {
clusterService.submitStateUpdateTask("delete_decommission_state", new ClusterStateUpdateTask(Priority.URGENT) {
@Override
public ClusterState execute(ClusterState currentState) {
logger.info("Deleting the decommission attribute from cluster state");
Metadata metadata = currentState.metadata();
Metadata.Builder mdBuilder = Metadata.builder(metadata);
mdBuilder.removeCustom(DecommissionAttributeMetadata.TYPE);
return ClusterState.builder(currentState).metadata(mdBuilder).build();
}

@Override
public void onFailure(String source, Exception e) {
logger.error(() -> new ParameterizedMessage("Failed to clear decommission attribute. Exception: [{}]"), e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done..

listener.onFailure(e);
}

@Override
public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) {
// Cluster state Processed for deleting the decommission attribute.
assert newState.metadata().decommissionAttributeMetadata() == null;
listener.onResponse(new AcknowledgedResponse(true));
}
});
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,13 @@
import org.hamcrest.Matchers;
import org.junit.After;
import org.junit.Before;
import org.mockito.ArgumentCaptor;
import org.mockito.Mockito;
import org.opensearch.Version;
import org.opensearch.action.ActionListener;
import org.opensearch.action.admin.cluster.decommission.awareness.put.DecommissionResponse;
import org.opensearch.action.admin.cluster.configuration.ClearVotingConfigExclusionsRequest;
import org.opensearch.action.support.master.AcknowledgedResponse;
import org.opensearch.cluster.ClusterName;
import org.opensearch.cluster.ClusterState;
import org.opensearch.cluster.coordination.CoordinationMetadata;
Expand All @@ -30,6 +34,7 @@
import org.opensearch.test.transport.MockTransport;
import org.opensearch.threadpool.TestThreadPool;
import org.opensearch.threadpool.ThreadPool;
import org.opensearch.transport.TransportResponseHandler;
import org.opensearch.transport.TransportService;

import java.util.Collections;
Expand Down Expand Up @@ -201,6 +206,86 @@ public void onFailure(Exception e) {
assertTrue(countDownLatch.await(30, TimeUnit.SECONDS));
}

public void testClearClusterDecommissionState() throws InterruptedException {
final CountDownLatch countDownLatch = new CountDownLatch(1);
DecommissionAttribute decommissionAttribute = new DecommissionAttribute("zone", "zone-2");
DecommissionAttributeMetadata decommissionAttributeMetadata = new DecommissionAttributeMetadata(
decommissionAttribute,
DecommissionStatus.SUCCESSFUL
);
ClusterState state = ClusterState.builder(new ClusterName("test"))
.metadata(Metadata.builder().putCustom(DecommissionAttributeMetadata.TYPE, decommissionAttributeMetadata).build())
.build();

ActionListener<AcknowledgedResponse> listener = new ActionListener<AcknowledgedResponse>() {
@Override
public void onResponse(AcknowledgedResponse decommissionResponse) {
DecommissionAttributeMetadata metadata = clusterService.state().metadata().custom(DecommissionAttributeMetadata.TYPE);
assertNull(metadata);
countDownLatch.countDown();
}

@Override
public void onFailure(Exception e) {
fail("on failure shouldn't have been called");
countDownLatch.countDown();
}
};

this.decommissionService.deleteDecommissionState(listener);

// Decommission Attribute should be removed.
assertTrue(countDownLatch.await(30, TimeUnit.SECONDS));
}

public void testDeleteDecommissionAttributeClearVotingExclusion() {
TransportService mockTransportService = Mockito.mock(TransportService.class);
Mockito.when(mockTransportService.getLocalNode()).thenReturn(Mockito.mock(DiscoveryNode.class));
DecommissionService decommissionService = new DecommissionService(
Settings.EMPTY,
clusterSettings,
clusterService,
mockTransportService,
threadPool,
allocationService
);
decommissionService.startRecommissionAction(Mockito.mock(ActionListener.class));

ArgumentCaptor<ClearVotingConfigExclusionsRequest> clearVotingConfigExclusionsRequestArgumentCaptor = ArgumentCaptor.forClass(
ClearVotingConfigExclusionsRequest.class
);
Mockito.verify(mockTransportService)
.sendRequest(
Mockito.any(DiscoveryNode.class),
Mockito.anyString(),
clearVotingConfigExclusionsRequestArgumentCaptor.capture(),
Mockito.any(TransportResponseHandler.class)
);

ClearVotingConfigExclusionsRequest request = clearVotingConfigExclusionsRequestArgumentCaptor.getValue();
assertFalse(request.getWaitForRemoval());
}

public void testClusterUpdateTaskForDeletingDecommission() throws InterruptedException {
final CountDownLatch countDownLatch = new CountDownLatch(1);
ActionListener<AcknowledgedResponse> listener = new ActionListener<>() {
@Override
public void onResponse(AcknowledgedResponse response) {
assertTrue(response.isAcknowledged());
assertNull(clusterService.state().metadata().decommissionAttributeMetadata());
countDownLatch.countDown();
}

@Override
public void onFailure(Exception e) {
fail("On Failure shouldn't have been called");
countDownLatch.countDown();
}
};
decommissionService.deleteDecommissionState(listener);
assertTrue(countDownLatch.await(30, TimeUnit.SECONDS));
}

private ClusterState addDataNodes(ClusterState clusterState, String zone, String... nodeIds) {
DiscoveryNodes.Builder nodeBuilder = DiscoveryNodes.builder(clusterState.nodes());
org.opensearch.common.collect.List.of(nodeIds).forEach(nodeId -> nodeBuilder.add(newDataNode(nodeId, singletonMap("zone", zone))));
Expand Down