Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cluster bootstrap fails due to cluster.initial_master_nodes -> cluster.initial_cluster_manager_nodes renaming #2769

Closed
cliu123 opened this issue Apr 5, 2022 · 9 comments · Fixed by #2779
Assignees
Labels
bug Something isn't working release Severity-Blocker v2.0.0 Version 2.0.0

Comments

@cliu123
Copy link
Member

cliu123 commented Apr 5, 2022

Describe the bug
opensearch-project/opensearch-build#1624 (comment)

To Reproduce
Steps to reproduce the behavior:

  1. Checkout this branch
  2. Run ./gradlew clean assemble to build security plugin
  3. Security plugin artifact gets generated in build/distributions folder.
  4. Install security plugin on OpenSearch 2.0.0-alpha1
  5. Start cluster

Expected behavior
Cluster starts successfully.

Plugins
security plugin

Error log

opensearch-2.0.0-alpha1 % bin/opensearch
[2022-04-05T08:14:46,609][INFO ][o.o.n.Node               ] [smoketestnode] version[2.0.0-alpha1], pid[65264], build[tar/8b4d8797dd99c4cd1d9ebb4d4a189b2603ee9b62/2022-04-04T19:55:10.117456775Z], OS[Mac OS X/10.15.7/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/14.0.2/14.0.2+12-46]
[2022-04-05T08:14:46,611][INFO ][o.o.n.Node               ] [smoketestnode] JVM home [/Library/Java/JavaVirtualMachines/jdk-14.0.2.jdk/Contents/Home], using bundled JDK [false]
[2022-04-05T08:14:46,611][INFO ][o.o.n.Node               ] [smoketestnode] JVM arguments [-Xshare:auto, -Dopensearch.networkaddress.cache.ttl=60, -Dopensearch.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms1g, -Xmx1g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/var/folders/7w/pf_kzhln0f17f4mv1_79nrjhzkss_b/T/opensearch-3856531273804745298, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -XX:MaxDirectMemorySize=536870912, -Dopensearch.path.home=/Users/cgliu/git/opensearch-2.0.0-alpha1, -Dopensearch.path.conf=/Users/cgliu/git/opensearch-2.0.0-alpha1/config, -Dopensearch.distribution.type=tar, -Dopensearch.bundled_jdk=true]
[2022-04-05T08:14:46,612][WARN ][o.o.n.Node               ] [smoketestnode] version [2.0.0-alpha1] is a pre-release version of OpenSearch and is not suitable for production
[2022-04-05T08:14:47,377][WARN ][stderr                   ] [smoketestnode] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
[2022-04-05T08:14:47,378][WARN ][stderr                   ] [smoketestnode] SLF4J: Defaulting to no-operation (NOP) logger implementation
[2022-04-05T08:14:47,378][WARN ][stderr                   ] [smoketestnode] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
[2022-04-05T08:14:48,205][INFO ][o.o.i.r.ReindexPlugin    ] [smoketestnode] ReindexPlugin reloadSPI called
[2022-04-05T08:14:48,207][INFO ][o.o.i.r.ReindexPlugin    ] [smoketestnode] Unable to find any implementation for RemoteReindexExtension
[2022-04-05T08:14:48,217][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [aggs-matrix-stats]
[2022-04-05T08:14:48,218][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [analysis-common]
[2022-04-05T08:14:48,218][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [geo]
[2022-04-05T08:14:48,218][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [ingest-common]
[2022-04-05T08:14:48,218][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [ingest-geoip]
[2022-04-05T08:14:48,218][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [ingest-user-agent]
[2022-04-05T08:14:48,218][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [lang-expression]
[2022-04-05T08:14:48,219][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [lang-mustache]
[2022-04-05T08:14:48,219][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [lang-painless]
[2022-04-05T08:14:48,219][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [mapper-extras]
[2022-04-05T08:14:48,219][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [opensearch-dashboards]
[2022-04-05T08:14:48,219][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [parent-join]
[2022-04-05T08:14:48,219][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [percolator]
[2022-04-05T08:14:48,219][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [rank-eval]
[2022-04-05T08:14:48,220][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [reindex]
[2022-04-05T08:14:48,220][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [repository-url]
[2022-04-05T08:14:48,220][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded module [transport-netty4]
[2022-04-05T08:14:48,220][INFO ][o.o.p.PluginsService     ] [smoketestnode] loaded plugin [opensearch-security]
[2022-04-05T08:14:48,263][INFO ][o.o.e.NodeEnvironment    ] [smoketestnode] using [1] data paths, mounts [[/System/Volumes/Data (/dev/disk1s1)]], net usable_space [242.8gb], net total_space [465.6gb], types [apfs]
[2022-04-05T08:14:48,263][INFO ][o.o.e.NodeEnvironment    ] [smoketestnode] heap size [1gb], compressed ordinary object pointers [true]
[2022-04-05T08:14:48,306][INFO ][o.o.n.Node               ] [smoketestnode] node name [smoketestnode], node ID [Gva382quT6OyioZIY4Palw], cluster name [opensearch], roles [cluster_manager, remote_cluster_client, data, ingest]
[2022-04-05T08:14:50,865][INFO ][o.o.t.NettyAllocator     ] [smoketestnode] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=256kb, factors={opensearch.unsafe.use_unpooled_allocator=null, g1gc_enabled=true, g1gc_region_size=1mb, heap_size=1gb}]
[2022-04-05T08:14:50,935][INFO ][o.o.d.DiscoveryModule    ] [smoketestnode] using discovery type [zen] and seed hosts providers [settings]
[2022-04-05T08:14:51,167][WARN ][o.o.g.DanglingIndicesState] [smoketestnode] gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually
[2022-04-05T08:14:51,327][INFO ][o.o.n.Node               ] [smoketestnode] initialized
[2022-04-05T08:14:51,327][INFO ][o.o.n.Node               ] [smoketestnode] starting ...
[2022-04-05T08:14:51,435][INFO ][o.o.t.TransportService   ] [smoketestnode] publish_address {192.168.1.70:9300}, bound_addresses {[::]:9300}
[2022-04-05T08:14:51,739][INFO ][o.o.b.BootstrapChecks    ] [smoketestnode] bound or publishing to a non-loopback address, enforcing bootstrap checks
ERROR: [1] bootstrap checks failed
[1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_cluster_manager_nodes / cluster.initial_master_nodes] must be configured
ERROR: OpenSearch did not exit normally - check the logs at /Users/cgliu/git/opensearch-2.0.0-alpha1/logs/opensearch.log
[2022-04-05T08:14:51,746][INFO ][o.o.n.Node               ] [smoketestnode] stopping ...
[2022-04-05T08:14:51,758][INFO ][o.o.n.Node               ] [smoketestnode] stopped
[2022-04-05T08:14:51,758][INFO ][o.o.n.Node               ] [smoketestnode] closing ...
[2022-04-05T08:14:51,766][INFO ][o.o.n.Node               ] [smoketestnode] closed
@tlfeng
Copy link
Collaborator

tlfeng commented Apr 5, 2022

Update the progress:
My Steps:
Reference: https://github.com/opensearch-project/security/blob/1.3.1.0/DEVELOPER_GUIDE.md
Build security plugin:

git clone https://github.com/opensearch-project/security.git
cd security
git remote add cliu123 git@github.com:cliu123/security.git
git switch -c upgrade_to_opensearch_2.0.0_alpha1 cliu123/upgrade_to_opensearch_2.0.0_alpha1
./gradlew clean assemble

Then get security plugin zip file from build/distributions/opensearch-security-2.0.0.0-alpha1-SNAPSHOT.zip

Install security plugin on OpenSearch 2.0.0_alpah1:

(download from https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.0.0-alpha1/latest/linux/x64/builds/opensearch/dist/opensearch-min-2.0.0-alpha1-linux-x64.tar.gz)
cd opensearch-2.0.0-alpha1
bin/opensearch-plugin install file:///.../opensearch-security-2.0.0.0-alpha1-SNAPSHOT.zip
cd pluigns/opensearch-security
chmod +x ./tools/install_demo_configuration.sh 
tools/install_demo_configuration.sh -> press y y n

@cliu123
Copy link
Member Author

cliu123 commented Apr 5, 2022

Update the progress: My Steps:

git clone https://github.com/opensearch-project/security.git
cd security
git remote add cliu123 git@github.com:cliu123/security.git
git switch -c upgrade_to_opensearch_2.0.0_alpha1 cliu123/upgrade_to_opensearch_2.0.0_alpha1
./gradlew clean assemble

Then get security plugin zip file from build/distributions/opensearch-security-2.0.0.0-alpha1-SNAPSHOT.zip

@tlfeng Were you able to start the cluster after installing plugin?

@tlfeng
Copy link
Collaborator

tlfeng commented Apr 5, 2022

Hi @cliu123 Please wait a moment, I'm configuring the necessary settings, such as plugins.security.ssl.transport.keystore_filepath or plugins.security.ssl.transport.server.pemcert_filepath and plugins.security.ssl.transport.client.pemcert_filepath.

@tlfeng
Copy link
Collaborator

tlfeng commented Apr 5, 2022

😁 I completed the security pluign configuration with tools/install_demo_configuration.sh according to the developer guide.

Lacking the cluster.initial_cluster_manager_nodes / cluster.initial_master_nodes setting didn't fail the bootstrap, it's a warning log in my host, not an error. The node eventually started.
While I will verify the behavior after adding the above setting for further verification.

...
[2022-04-05T14:51:14,663][INFO ][o.o.n.Node               ] [ip-172-31-15-168] initialized
[2022-04-05T14:51:14,664][INFO ][o.o.n.Node               ] [ip-172-31-15-168] starting ...
[2022-04-05T14:51:14,793][INFO ][o.o.t.TransportService   ] [ip-172-31-15-168] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2022-04-05T14:51:14,929][WARN ][o.o.b.BootstrapChecks    ] [ip-172-31-15-168] the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_cluster_manager_nodes / cluster.initial_master_nodes] must be configured
[2022-04-05T14:51:14,930][INFO ][o.o.c.c.Coordinator      ] [ip-172-31-15-168] cluster UUID [PgXWcGw-RACaa91QAkZvVw]
[2022-04-05T14:51:14,938][INFO ][o.o.c.c.ClusterBootstrapService] [ip-172-31-15-168] no discovery configuration found, will perform best-effort cluster bootstrapping after [3s] unless existing cluster-manager is discovered
[2022-04-05T14:51:15,074][INFO ][o.o.c.s.MasterService    ] [ip-172-31-15-168] elected-as-cluster-manager ([1] nodes joined)[{ip-172-31-15-168}{LYDZvjEcR6C0yros7T_VEw}{aYaT-K40SFac8aCmy1Dhkg}{127.0.0.1}{127.0.0.1:9300}{dimr}{shard_indexing_pressure_enabled=true} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 3, version: 21, delta: cluster-manager node changed {previous [], current [{ip-172-31-15-168}{LYDZvjEcR6C0yros7T_VEw}{aYaT-K40SFac8aCmy1Dhkg}{127.0.0.1}{127.0.0.1:9300}{dimr}{shard_indexing_pressure_enabled=true}]}
[2022-04-05T14:51:15,110][INFO ][o.o.c.s.ClusterApplierService] [ip-172-31-15-168] cluster-manager node changed {previous [], current [{ip-172-31-15-168}{LYDZvjEcR6C0yros7T_VEw}{aYaT-K40SFac8aCmy1Dhkg}{127.0.0.1}{127.0.0.1:9300}{dimr}{shard_indexing_pressure_enabled=true}]}, term: 3, version: 21, reason: Publication{term=3, version=21}
[2022-04-05T14:51:15,128][INFO ][o.o.h.AbstractHttpServerTransport] [ip-172-31-15-168] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2022-04-05T14:51:15,128][INFO ][o.o.n.Node               ] [ip-172-31-15-168] started
[2022-04-05T14:51:15,158][INFO ][o.o.g.GatewayService     ] [ip-172-31-15-168] recovered [2] indices into cluster_state
[2022-04-05T14:51:15,484][INFO ][o.o.c.r.a.AllocationService] [ip-172-31-15-168] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[.opendistro_security][0], [security-auditlog-2022.04.05][0]]]).
[2022-04-05T14:51:15,726][INFO ][stdout                   ] [ip-172-31-15-168] [FINE] No subscribers registered for event class org.opensearch.security.securityconf.DynamicConfigFactory$NodesDnModelImpl
[2022-04-05T14:51:15,727][INFO ][stdout                   ] [ip-172-31-15-168] [FINE] No subscribers registered for event class org.greenrobot.eventbus.NoSubscriberEvent

@cliu123
Copy link
Member Author

cliu123 commented Apr 5, 2022

@tlfeng Thanks for the update! Didn't tools/install_demo_configuration.sh add cluster.initial_master_nodes: smoketestnode into opensearch.yml? That sounds new to me.

@tlfeng
Copy link
Collaborator

tlfeng commented Apr 5, 2022

I do notice adding the cluster setting cluster.initial_master_nodes: localhost in opensearch.yml doesn't work.
The warning still shows
[2022-04-05T14:58:34,448][WARN ][o.o.b.BootstrapChecks ] [ip-172-31-15-168] the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_cluster_manager_nodes / cluster.initial_master_nodes] must be configured
While the warning disappears after adding cluster.initial_cluster_manager_nodes: localhost.

This behavior is not expected, so it's still likely a bug.

@cliu123 cliu123 changed the title [BUG] Cluster bootstrap failes due to cluster.initial_master_nodes -> cluster.initial_cluster_manager_nodes renaming [BUG] Cluster bootstrap fails due to cluster.initial_master_nodes -> cluster.initial_cluster_manager_nodes renaming Apr 5, 2022
@tlfeng
Copy link
Collaborator

tlfeng commented Apr 5, 2022

Didn't tools/install_demo_configuration.sh add cluster.initial_master_nodes: smoketestnode into opensearch.yml?

Seems that is because I didn't enable "cluster mode":

$ tools/install_demo_configuration.sh
OpenSearch Security Demo Installer
 ** Warning: Do not use on production or public reachable systems **
Install demo certificates? [y/N] y
Initialize Security Modules? [y/N] y
Cluster mode requires maybe additional setup of:
  - Virtual memory (vm.max_map_count)

Enable cluster mode? [y/N] 

Then that initial master node setting is not added (https://github.com/opensearch-project/security/blob/v1.13.1.0/tools/install_demo_configuration.sh#L389).
Let me try enabling cluster mode.

@tlfeng
Copy link
Collaborator

tlfeng commented Apr 5, 2022

Great, now I can reproduce your error. 😂
Looks like the problem here is the deprecated setting cluster.initial_master_nodes is not identified during the node bootstrap check.

@tlfeng
Copy link
Collaborator

tlfeng commented Apr 5, 2022

@cliu123 Thank you so much for pointing out this defect! 👍👍
It also reflects that the unit test added for deprecating the setting cluster.initial_master_nodes is not enough (which was done in commit 19eadb4 / PR #2463).

The bug is caused by the the setting INITIAL_MASTER_NODES_SETTING was replaced by INITIAL_CLUSTER_MANAGER_NODES_SETTING directly in the code, while they should both exist. The method discoveryIsConfigured() is used in this place.

I will make code change soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working release Severity-Blocker v2.0.0 Version 2.0.0
Projects
None yet
3 participants