Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev: bootstrap: A scaffold for crmsh to configure cluster with corosync3 #1172

Conversation

liangxin1300
Copy link
Collaborator

@liangxin1300 liangxin1300 commented Apr 13, 2023

Major changes:

  • Support corosync3 and knet
hanode1:/ # rpm -q corosync
corosync-3.1.7+20221024.91348f86-lp154.1.1.x86_64
hanode1:/ # crm cluster init -y
hanode2:/ # crm cluster join -c hanode1 -y  
hanode2:/ # corosync-cfgtool -s
Local node ID 2, transport knet
LINK ID 0 udp
	addr	= 172.18.0.3
	status:
		nodeid:          1:	connected
		nodeid:          2:	localhost
  • Option -i/--interface now supports specifying NIC or IP, or combine them, on both cluster init and cluster join sides
hanode1:/ # crm cluster init -i 172.18.0.3 -i 172.19.0.3 -y
# same with "crm cluster init -i eth1 -i eth2 -y" or
# same with "crm cluster init -i eth1 -i 172.19.0.3 -y"
hanode1:/ # cat /etc/corosync/corosync.conf
...
nodelist {
	node {
		ring0_addr: 172.18.0.3
		ring1_addr: 172.19.0.3
		name: hanode1
		nodeid: 1
	}

}
  • Add -t/--transport option on cluster init side
-t TRANSPORT, --transport TRANSPORT
    The transport mechanism. Allowed value is knet(kronosnet)/udpu(unicast)/udp(multicast). Default is knet

Changes include:

  • Drop --no-overwrite-sshkey option, which already deprecated
  • Drop -u/--unicast option
  • Drop -M/--multi-heartbeats option
  • Drop -w/--watchdog option on join side, which already deprecated
  • Change -i option accept both nic name and IP address, both on init and join side
  • Add -t/--transport option, valid value include knet/udpu/udp, default is knet
  • Write new config parser to parse corosync.conf for corosync3
    Since original config parser couldn't modify multi interfaces
  • Gradually replace related interface from corosync to conf_parser
  • Default transport type will be knet
  • Enhancement option validation, like:
    • Only one link is allowed for the non-knet transport type
    • Maximum number of interface is 8
    • Transport udp (multicast) cannot be used in cloud platform
    • Detect possible duplicated -i input, like:
    # crm cluster init -i eth1 -i 172.19.0.3 -y
    ERROR: cluster.init: Invalid input '172.19.0.3': the IP in the same NIC already used
    
  • Refactor, simplify corosync config process on init and join side
  • Not maintain/update expected_votes any more (nodelist section is mandatory)
  • totem.interface section is not necessary for all transports type

Feature list:

  • Configure as knet/udpu/udp
  • Configure as knet/udpu/udp in interactive mode
  • Configure qdevice with knet/udpu/udp
  • Configure sbd with knet/udpu/udp
  • Remove cluster node

Todo list:

  • Unit test
  • Behave functional test

How to play with it

server1:/opt/crmsh # ls
AUTHORS     ChangeLog     COPYING      crmsh.egg-info         data-manifest  etc      .gitignore   NEWS           README.md         setup.py   TODO     update-data-manifest.sh  .vscode
autogen.sh  configure.ac  .coveragerc  crmsh.spec.in          doc            .git     .hgignore    .pytest_cache  requirements.txt  templates  .tox     utils
bin         contrib       crmsh        crmsh.tmpfiles.d.conf  Dockerfile     .github  Makefile.am  pytest.ini     scripts           test       tox.ini  version.in

server1:/opt/crmsh # ./test/run-functional-tests -d
INFO: Cleanup container "hanode1"...
INFO: Cleanup container "hanode2"...
INFO: Cleanup container "hanode3"...
INFO: Cleanup ha specific docker network "ha_network_first"...
INFO: Cleanup ha specific docker network "ha_network_second"...

server1:/opt/crmsh # ./test/run-functional-tests -n 2 -x
INFO: Loading docker image liangxin1300/alpha...
INFO: Create ha specific docker network "ha_network_first"...
INFO: Create ha specific docker network "ha_network_second"...
INFO: Deploying "hanode1"...
INFO: Deploying "hanode2"...
INFO: Building crmsh on "hanode1"...
INFO: Building crmsh on "hanode2"...

server1:/opt/crmsh # docker exec -it hanode1 bash

hanode1:/ # rpm -qa|grep corosync
corosync-qdevice-3.0.3+20230322.4331c7d-lp154.1.1.x86_64
corosynclib-3.1.7+20221024.91348f86-lp154.1.1.x86_64
corosync-3.1.7+20221024.91348f86-lp154.1.1.x86_64

hanode1:/ # crm cluster init -y
INFO: Loading "default" profile from /etc/crm/profiles.yml
WARNING: No NTP service found.
INFO: SSH key for hacluster does not exist, hence generate it now
INFO: Configuring csync2
INFO: Starting csync2.socket service on hanode1
INFO: BEGIN csync2 checking files
INFO: END csync2 checking files
INFO: Configuring corosync(knet)
WARNING: Hawk not installed - not configuring web management interface.
WARNING: You should change the hacluster password to something more secure!
INFO: BEGIN Waiting for cluster
...........                                                                                                                                                                                                                                                   INFO: END Waiting for cluster
INFO: Loading initial cluster configuration
INFO: Done (log saved to /var/log/crmsh/crmsh.log)

hanode1:/ # exit
exit

server1:/opt/crmsh # docker exec -it hanode2 bash

hanode2:/ # crm cluster join -c hanode1 -y  
WARNING: No NTP service found.
INFO: SSH key for hacluster does not exist, hence generate it now
INFO: Configuring csync2
INFO: Starting csync2.socket service
INFO: BEGIN csync2 syncing files in cluster
INFO: END csync2 syncing files in cluster
INFO: Merging known_hosts
INFO: BEGIN Probing for new partitions
INFO: END Probing for new partitions

WARNING: Hawk not installed - not configuring web management interface.
WARNING: You should change the hacluster password to something more secure!
INFO: BEGIN Waiting for cluster
..                                                                                                                                                                                                                                                            INFO: END Waiting for cluster
INFO: BEGIN Reloading cluster configuration
INFO: END Reloading cluster configuration
INFO: Done (log saved to /var/log/crmsh/crmsh.log)

hanode2:/ # crm_mon -1
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: hanode1 (version 2.1.5+20230314.692147cd3-150400.385.2-2.1.5+20230314.692147cd3) - partition with quorum
  * Last updated: Tue Apr 18 01:56:47 2023 on hanode2
  * Last change:  Tue Apr 18 01:56:43 2023 by root via cibadmin on hanode2
  * 2 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ hanode1 hanode2 ]

Active Resources:
  * No active resources
  * 
hanode2:/ # corosync-cfgtool -s
Local node ID 2, transport knet
LINK ID 0 udp
	addr	= 172.18.0.3
	status:
		nodeid:          1:	connected
		nodeid:          2:	localhost

hanode2:/ # cat /etc/corosync/corosync.conf
totem {
	version: 2
	crypto_cipher: aes256
	crypto_hash: sha1
	cluster_name: hacluster
	transport: knet
	token: 5000
	join: 60
	max_messages: 20
	token_retransmits_before_loss_const: 10
}

quorum {
	provider: corosync_votequorum
	two_node: 1
}

logging {
	to_logfile: yes
	logfile: /var/log/cluster/corosync.log
	to_syslog: yes
	timestamp: on
}

nodelist {
	node {
		ring0_addr: 172.18.0.2
		name: hanode1
		nodeid: 1
	}

	node {
		ring0_addr: 172.18.0.3
		name: hanode2
		nodeid: 2
	}

}

Test rpm

https://build.opensuse.org/package/show/home:XinLiang:branches:network:ha-clustering:Unstable/crmsh

@liangxin1300 liangxin1300 changed the title Dev: bootstrap: A scaffold for crmsh to configure with corosync3 Dev: bootstrap: A scaffold for crmsh to configure cluster with corosync3 Apr 13, 2023
@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch from 50c8c54 to 53ff3b5 Compare April 13, 2023 09:04
@liangxin1300 liangxin1300 changed the title Dev: bootstrap: A scaffold for crmsh to configure cluster with corosync3 [ALP] Dev: bootstrap: A scaffold for crmsh to configure cluster with corosync3 Apr 13, 2023
@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch 9 times, most recently from 691136e to 9463e39 Compare April 17, 2023 03:11
@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch 4 times, most recently from 419f1aa to 3024f60 Compare April 24, 2023 09:18
@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch from 3024f60 to 6781505 Compare May 4, 2023 05:19
@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch 2 times, most recently from b9f459e to f8f10de Compare May 15, 2023 07:52
@liangxin1300 liangxin1300 changed the title [ALP] Dev: bootstrap: A scaffold for crmsh to configure cluster with corosync3 Dev: bootstrap: A scaffold for crmsh to configure cluster with corosync3 May 16, 2023
@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch 5 times, most recently from 2096e99 to 5bb01a4 Compare May 22, 2023 09:01
@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch 2 times, most recently from 58b8678 to ac316b9 Compare June 1, 2023 03:24
@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch 3 times, most recently from 2984eb9 to bfefeb7 Compare June 14, 2023 07:04
@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch from 344afe2 to 66c8b75 Compare July 1, 2023 08:37
@codecov
Copy link

codecov bot commented Jul 1, 2023

Codecov Report

Merging #1172 (3570991) into master (fca186e) will increase coverage by 0.34%.
Report is 3 commits behind head on master.
The diff coverage is 93.65%.

❗ Current head 3570991 differs from pull request most recent head c3b987a. Consider uploading reports for the commit c3b987a to get more accurate results

@@            Coverage Diff             @@
##           master    #1172      +/-   ##
==========================================
+ Coverage   52.51%   52.85%   +0.34%     
==========================================
  Files          78       79       +1     
  Lines       24998    25066      +68     
==========================================
+ Hits        13127    13248     +121     
+ Misses      11871    11818      -53     
Files Changed Coverage Δ
crmsh/sbd.py 28.38% <0.00%> (+2.50%) ⬆️
crmsh/ui_corosync.py 43.15% <50.00%> (+3.83%) ⬆️
crmsh/bootstrap.py 76.16% <92.23%> (+1.95%) ⬆️
crmsh/conf_parser.py 93.29% <93.29%> (ø)
crmsh/corosync.py 64.36% <94.23%> (-14.66%) ⬇️
crmsh/utils.py 59.59% <97.50%> (+1.38%) ⬆️
crmsh/constants.py 100.00% <100.00%> (ø)
crmsh/qdevice.py 93.76% <100.00%> (+0.54%) ⬆️
crmsh/tmpfiles.py 80.00% <100.00%> (+1.95%) ⬆️
crmsh/ui_cluster.py 73.29% <100.00%> (-1.82%) ⬇️

... and 35 files with indirect coverage changes

@zzhou1
Copy link
Contributor

zzhou1 commented Aug 9, 2023

The first taste of corosync3 bootstrap is good.

One comment, when the number of links is smaller or bigger

  tws-corosync3-1:~ # crm cluster init -ys /dev/disk/by-partlabel/sbd-160 -i enp1s0 -i enp9s0 -i enp10s0
  tws-corosync3-2:~ # crm cluster join -i enp1s0 -i enp9s0 -yc tws-corosync3-1
  ERROR: cluster.join: Please specify the corresponding network interface by '-i' option

  tws-corosync3-1:~ # crm cluster init -ys /dev/disk/by-partlabel/sbd-160 -i enp1s0 -i enp9s0
  tws-corosync3-2:~ # crm cluster join -i enp1s0 -i enp9s0 -i enp10s0 -yc tws-corosync3-1
  ERROR: cluster.join: Please specify the corresponding network interface by '-i' option

Better to rephrase the error message something like,

"ERROR: knet transport of all cluster nodes need 3 links via '-i' options, but provided 2"
"ERROR: knet transport of all cluster nodes need 2 links via '-i' options, but provided 3"

@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch 2 times, most recently from c763283 to 1966e6b Compare August 10, 2023 21:51
Copy link
Contributor

@zzhou1 zzhou1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!

@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch from 1966e6b to 115e72d Compare August 21, 2023 06:47
@liangxin1300 liangxin1300 force-pushed the 20230406_crmsh_for_corosync3_scaffold branch from 115e72d to c3b987a Compare August 21, 2023 07:46
@liangxin1300 liangxin1300 merged commit 31ab66c into ClusterLabs:master Aug 21, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants