Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove 'cluster_manager' role attachment when using 'node.master' deprecated setting #6331

Merged
merged 12 commits into from
Mar 28, 2023

Conversation

sandeshkr419
Copy link
Contributor

@sandeshkr419 sandeshkr419 commented Feb 15, 2023

Description

Using node.master: true configures both cluster_manager and master role to a node which is not correct, only one of them should be assigned.

Reference in cdk where it uses legacy settings to bootstrap nodes: https://github.com/opensearch-project/opensearch-cluster-cdk/blob/main/lib/opensearch-config/multi-node-base-config.yml

Allowing only one of the both roles to be attached instead of both - 'master' which is deprecated as part of change of 'master' nomenclature: #472 - since both the legacy setting node.master and master role are deprecated - so we need to keep the initial behavior only with deprecated roles and settings.

Issues Resolved

Resolves #6103

The change modifies the role attached in multi-node dedicated cluster-manager setup.

API response before the change:

Multi node with dedicated cluster-manager when bootstrapped with node.master: true:

..
"nodes" : {
    "count" : {
      "total" : 6,
      "cluster_manager" : 6,
      "coordinating_only" : 0,
      "data" : 3,
      "ingest" : 3,
      "master" : 6,
      "remote_cluster_client" : 6
    },
    "versions" : [
      "2.5.1"
    ],
..
curl "<cluster-endpoint>/_cat/nodes?v"
ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role node.roles                                   cluster_manager name
10.0.3.110           62          70   0    0.00    0.00     0.00 mmr       cluster_manager,master,remote_cluster_client -               manager-node
10.0.3.72            60          69   0    0.00    0.00     0.00 dir       data,ingest,remote_cluster_client            -               data-node
10.0.4.132           20          70   0    0.00    0.00     0.00 dir       data,ingest,remote_cluster_client            -               data-node
10.0.5.42            26          69   0    0.04    0.02     0.00 dir       data,ingest,remote_cluster_client            -               data-node
10.0.4.156           39          69   0    0.00    0.00     0.00 mmr       cluster_manager,master,remote_cluster_client -               manager-node
10.0.5.39            47          70   0    0.00    0.04     0.01 mmr       cluster_manager,master,remote_cluster_client *               seed

API response after the change:

Multi node with dedicated cluster-manager when bootstrapped with node.master: true:

..
"nodes" : {
    "count" : {
      "total" : 6,
      "cluster_manager" : 3,
      "coordinating_only" : 0,
      "data" : 3,
      "ingest" : 3,
      "master" : 3,
      "remote_cluster_client" : 6,
      "search" : 0
    },
..
curl   "<endpoint>/_cat/nodes?v"
ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role node.roles                            cluster_manager name
10.0.5.192           38          63   0    0.00    0.00     0.00 mr        master,remote_cluster_client -               manager-node
10.0.3.75            30          63   0    0.00    0.00     0.00 mr        master,remote_cluster_client -               manager-node
10.0.5.168           16          63   0    0.65    0.16     0.05 dir       data,ingest,remote_cluster_client     -               data-node
10.0.4.83            46          63   0    0.00    0.00     0.00 dir       data,ingest,remote_cluster_client     -               data-node
10.0.5.169           55          63   0    0.00    0.00     0.00 mr        master,remote_cluster_client *               seed
10.0.3.251           60          63   0    0.00    0.00     0.00 dir       data,ingest,remote_cluster_client     -               data-node

Check List

  • New functionality includes testing.
  • All tests pass
  • New functionality has been documented.
  • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@sandeshkr419 sandeshkr419 changed the title [DRAFT] Remove deprecated 'master' role [DRAFT] Remove deprecated 'master' role attachment from 'cluster_manager` nodes Feb 16, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2023

Gradle Check (Jenkins) Run Completed with:

@andrross
Copy link
Member

andrross commented Mar 2, 2023

At first glance, adding the cluster manager role when node.master is true seems wrong. node.master is deprecated the same as adding master to node.roles, so it seems like that legacy setting should just add the master role and not the cluster_manager role. Can we just remove the linked code to fix this bug?

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2023

Gradle Check (Jenkins) Run Completed with:

@sandeshkr419
Copy link
Contributor Author

Thanks @andrross for providing feedback. I appreciate the ideation of keeping the deprecated legacy settings to be restricted to deprecated roles, in terms of selection of master and cluster_manager for the deprecated node.master setting. This makes sense with the understanding that in future OS 3.x release, master role and all legacy settings including node.master will be removed in entirely.
I don't think removing this is within the scope of OS 2.x and specially in this PR. I tried removing this legacy setting from here and it breaks a whole lot of different things - which requires a complete code refactoring exercise which IMO is worth with OS 3.x where we deprecate this entirely.

Here is the new response from _cat/nodes when the cluster is bootstrapped using legacy settings:

ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role node.roles                        cluster_manager name
10.0.5.117           21          61   0    0.00    0.01     0.03 dir       data,ingest,remote_cluster_client -               data-node
10.0.3.38            52          62   0    0.00    0.00     0.01 mr        master,remote_cluster_client      *               seed
10.0.4.97            26          61   0    0.00    0.00     0.00 mr        master,remote_cluster_client      -               manager-node
10.0.5.82            28          61   0    0.00    0.00     0.00 mr        master,remote_cluster_client      -               manager-node
10.0.4.191           27          62   0    0.00    0.00     0.00 dir       data,ingest,remote_cluster_client -               data-node

_cluster/stats:

..
...
 "nodes" : {
    "count" : {
      "total" : 5,
      "cluster_manager" : 3,
      "coordinating_only" : 0,
      "data" : 2,
      "ingest" : 2,
      "master" : 3,
      "remote_cluster_client" : 5,
      "search" : 0
    }
...
..

The tests added validate node roles and cluster stats response

I still want to point that the legacy settings: when you provide a combination of node.master, node.data, node.ingest, it sets up these 3 roles: cluster_manager/master, data, ingest alongwith the 4th remote_cluster_client role for a node. remote_cluster_client doesn't gets setup automatically unless provided explicitly when using node.roles - I have tested this out by spinning up a cluster using cdk and providing these 3 node roles (data, ingest, master) instead of legacy settings - you get only the roles you specify and not the remote_cluster_client.

When setting up explicit roles - master or cluster_manager roles via node.roles, the behavior would continue to exist as before: the role will be provided as specified by the user.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2023

Gradle Check (Jenkins) Run Completed with:

@@ -322,4 +323,81 @@ public void testFieldTypes() {
}
}
}

public void testNodeCountsWithLegacyMasterSetting() throws ExecutionException, InterruptedException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add test with cluster manager role as well

Copy link
Contributor Author

@sandeshkr419 sandeshkr419 Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shwetathareja for checking. Added some tests and refactored some.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2023

Gradle Check (Jenkins) Run Completed with:

@sandeshkr419 sandeshkr419 changed the title [DRAFT] Remove deprecated 'master' role attachment from 'cluster_manager` nodes [DRAFT] Remove 'cluster_manager' role attachment when using 'node.master' deprecated setting Mar 6, 2023
@sandeshkr419 sandeshkr419 marked this pull request as ready for review March 6, 2023 06:05
@tlfeng tlfeng added the backport 2.x Backport to 2.x branch label Mar 27, 2023
@@ -273,6 +274,10 @@ public void validateRole(List<DiscoveryNodeRole> roles) {
}
}

@Override
public boolean isEnabledByDefault(final Settings settings) {
return !Booleans.isBoolean(settings.get("node.master"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, the the prevailing style is Booleans.isBoolean(settings.get("node.master")) == false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing @andrross! Addressed the comment and re-based my changes against main to resolve conflicts in CHANGELOG.md

…ovided

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>

Remove 'node.master' legacy setting from 'cluster_manager' role

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>

Typo in test

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>

Add tests with node.roles for cluster manager

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>

minor refactoring with specifying node roles

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>

Revert "minor refactoring with specifying node roles"

This reverts commit c9c123e.

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>

minor refactoring with specifying node roles

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@tlfeng tlfeng merged commit 5f89081 into opensearch-project:main Mar 28, 2023
@sandeshkr419 sandeshkr419 deleted the node-roles-fix branch March 28, 2023 05:19
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 28, 2023
…recated setting (#6331)

`cluster_manager` role no longer configures the legacy `node.master` setting.

Using `node.master: true` configures both cluster_manager and master role to a node which is not correct, only one of them should be assigned

Allowing only one of the both roles to be attached instead of both - 'master' which is deprecated - since both the legacy setting `node.master` and `master` role are deprecated - so we need to keep the initial behavior only with deprecated roles and settings.

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
(cherry picked from commit 5f89081)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
tlfeng pushed a commit that referenced this pull request Mar 28, 2023
…recated setting (#6331) (#6853)

`cluster_manager` role no longer configures the legacy `node.master` setting.

Using `node.master: true` configures both cluster_manager and master role to a node which is not correct, only one of them should be assigned

Allowing only one of the both roles to be attached instead of both - 'master' which is deprecated - since both the legacy setting `node.master` and `master` role are deprecated - so we need to keep the initial behavior only with deprecated roles and settings.

(cherry picked from commit 5f89081)

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
mitrofmep pushed a commit to mitrofmep/OpenSearch that referenced this pull request Apr 5, 2023
…recated setting (opensearch-project#6331)

`cluster_manager` role no longer configures the legacy `node.master` setting.

Using `node.master: true` configures both cluster_manager and master role to a node which is not correct, only one of them should be assigned

Allowing only one of the both roles to be attached instead of both - 'master' which is deprecated - since both the legacy setting `node.master` and `master` role are deprecated - so we need to keep the initial behavior only with deprecated roles and settings.

Signed-off-by: Sandesh Kumar <sandeshkr419@gmail.com>
Signed-off-by: Valentin Mitrofanov <mitrofmep@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch bug Something isn't working v2.7.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] _cluster/stats API returning incorrect cluster_manager count
5 participants