May
5
[原]解决CentOS 7 下corosync 2.3.3 无法组成两个节点集群的问题
采用corosync 构成Pacemaker 集群。但发现启动corosync 服务后,不会自动启动pacemaker 服务。
经确认,在CentOS 7 的corosync 2.3.3 下,pacemaker 默认是disable 的,需要自行激活。
启动corosync 服务后,发现两个节点无法构成集群,没有Nodes:
1.排查
经分析,corosync 服务和pacemaker 服务启动都是正常的。但日志中显示 quorum 没有配置:
2.解决
参考 man votequorum 的说明,增加 quorum 配置段。
把节点一的配置文件修改为:
重启corosync 和 pacemaker 服务:
再次查看集群信息:
节点一已经加入集群。
把配置文件拷贝到第二个节点:
重启服务:
集群状态:
两个节点都已经加入集群,问题解决。
3.遗留问题
执行pcs status 的时候有报错
参考:
Why is the message "Error: no nodes found in corosync.conf" in the output of "pcs cluster status" command ?
https://access.redhat.com/solutions/663283
决议
The errors need to be ignored as no corosync.conf file is used.
根源
The error messages seen are not harmful and are expected due to cman stack is being used.
所以,可以忽略该问题。
经确认,在CentOS 7 的corosync 2.3.3 下,pacemaker 默认是disable 的,需要自行激活。
启动corosync 服务后,发现两个节点无法构成集群,没有Nodes:
引用
[root@gz-controller-209100 ~]# crm status
Last updated: Mon May 4 14:43:13 2015
Last change: Mon May 4 14:26:45 2015
Current DC: NONE
0 Nodes configured
0 Resources configured
Last updated: Mon May 4 14:43:13 2015
Last change: Mon May 4 14:26:45 2015
Current DC: NONE
0 Nodes configured
0 Resources configured
1.排查
经分析,corosync 服务和pacemaker 服务启动都是正常的。但日志中显示 quorum 没有配置:
引用
[root@gz-controller-209100 corosync]# systemctl status pacemaker
pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled)
Active: active (running) since 一 2015-05-04 11:59:10 CST; 1s ago
Main PID: 8378 (pacemakerd)
CGroup: /system.slice/pacemaker.service
├─8378 /usr/sbin/pacemakerd -f
├─8379 /usr/libexec/pacemaker/cib
├─8380 /usr/libexec/pacemaker/stonithd
├─8381 /usr/libexec/pacemaker/lrmd
Attempting connection to the cluster...
Last updated: Mon May 4 12:03:39 2015
Last change: Mon May 4 11:59:10 2015
Current DC: NONE
0 Nodes configured
0 Resources configured
├─8382 /usr/libexec/pacemaker/attrd
├─8383 /usr/libexec/pacemaker/pengine
└─8384 /usr/libexec/pacemaker/crmd
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: error: cluster_connect_quorum: Corosync quorum is not configured
5月 04 11:59:11 gz-controller-209100.vclound.com stonith-ng[8380]: notice: setup_cib: Watching for stonith topology changes
pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled)
Active: active (running) since 一 2015-05-04 11:59:10 CST; 1s ago
Main PID: 8378 (pacemakerd)
CGroup: /system.slice/pacemaker.service
├─8378 /usr/sbin/pacemakerd -f
├─8379 /usr/libexec/pacemaker/cib
├─8380 /usr/libexec/pacemaker/stonithd
├─8381 /usr/libexec/pacemaker/lrmd
Attempting connection to the cluster...
Last updated: Mon May 4 12:03:39 2015
Last change: Mon May 4 11:59:10 2015
Current DC: NONE
0 Nodes configured
0 Resources configured
├─8382 /usr/libexec/pacemaker/attrd
├─8383 /usr/libexec/pacemaker/pengine
└─8384 /usr/libexec/pacemaker/crmd
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: error: cluster_connect_quorum: Corosync quorum is not configured
5月 04 11:59:11 gz-controller-209100.vclound.com stonith-ng[8380]: notice: setup_cib: Watching for stonith topology changes
2.解决
参考 man votequorum 的说明,增加 quorum 配置段。
把节点一的配置文件修改为:
重启corosync 和 pacemaker 服务:
引用
[root@gz-controller-209100 ~]# systemctl restart corosync
[root@gz-controller-209100 ~]# systemctl restart pacemaker
[root@gz-controller-209100 ~]# systemctl restart pacemaker
再次查看集群信息:
引用
[root@gz-controller-209100 ~]# crm status
Last updated: Mon May 4 14:44:47 2015
Last change: Mon May 4 14:43:33 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
1 Nodes configured
0 Resources configured
Online: [ gz-controller-209100.vclound.com ]
Last updated: Mon May 4 14:44:47 2015
Last change: Mon May 4 14:43:33 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
1 Nodes configured
0 Resources configured
Online: [ gz-controller-209100.vclound.com ]
节点一已经加入集群。
把配置文件拷贝到第二个节点:
引用
[root@gz-controller-209100 ~]# scp /etc/corosync/corosync.conf 192.168.209.101:/etc/corosync/
重启服务:
引用
[root@gz-controller-209101 ~]# systemctl restart corosync
[root@gz-controller-209101 ~]# systemctl restart pacemaker
[root@gz-controller-209101 ~]# systemctl restart pacemaker
集群状态:
引用
[root@gz-controller-209100 ~]# crm status
Last updated: Mon May 4 14:44:55 2015
Last change: Mon May 4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured
Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]
Last updated: Mon May 4 14:44:55 2015
Last change: Mon May 4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured
Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]
两个节点都已经加入集群,问题解决。
3.遗留问题
执行pcs status 的时候有报错
引用
[root@gz-controller-209100 ~]# pcs status
Cluster name:
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Mon May 4 15:09:08 2015
Last change: Mon May 4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured
Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]
Full list of resources:
PCSD Status:
Error: no nodes found in corosync.conf
Cluster name:
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Mon May 4 15:09:08 2015
Last change: Mon May 4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured
Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]
Full list of resources:
PCSD Status:
Error: no nodes found in corosync.conf
参考:
Why is the message "Error: no nodes found in corosync.conf" in the output of "pcs cluster status" command ?
https://access.redhat.com/solutions/663283
决议
The errors need to be ignored as no corosync.conf file is used.
根源
The error messages seen are not harmful and are expected due to cman stack is being used.
所以,可以忽略该问题。