[TriLUG] Configuring a simple 1:1 Linux HA cluster on CentOS 5.3
Ronald Kelley
rkelleyrtp at gmail.com
Tue Oct 27 13:58:22 EDT 2009
Since I missed the last TriLUG meeting (specifically on Linux HA), I
spent the better part of two days getting a couple of servers
configured for a 1:1 failover solution. So far, my cluster has run
well with no hiccups. Although many articles describe how to
configure HA, they often leave out specific details about resource
stickiness, STONITH, etc.
Here are my notes - I thought the community could benefit as well...
=
=
=
=
=
========================================================================
Configuring a simple 1:1 Linux HA cluster on CentOS Servers
-Ron Kelley
=
=
=
=
=
========================================================================
This article describes how to install and configure a simple 1:1 Linux
HA cluster
on CentOS using the latest heartbeat (v3.0.0) and pacemaker (v1.0.5)
packages.
Taken from the excellent article here:
http://clusterlabs.org/mediawiki/images/9/9d/Clusters_from_Scratch_-_Apache_on_Fedora11.pdf
#
=
=
=
=
=======================================================================#
# I. Install Operating System and HA
packages #
#
=
=
=
=
=======================================================================#
(1) Install CentOS 5.3 or 5.4 (i386 or x86-64)
--> Don't include any clustering packages during the install
(2) Identify a secondary network interface (eth1) and associated IP
address
that will be used for cluster heartbeat. Make sure this address
does
not collide with something else on the network! If a secondary
NIC is
not available, you can use the primary NIC with a subinterface.
(3) On all servers, remove any existing heartbeat, cluster, or pacemaker
packages via the “yum erase” command
-----------------------------------------------------
#shell> yum erase heartbeat* cluster* pacemaker*
-----------------------------------------------------
(4) On all servers, exclude any HA software from the existing CentOS
repo files
-----------------------------------------------------
<foreach section in /etc/ym.repos.d/*repo>
exclude=heartbeat* pacemaker* cluster*
-----------------------------------------------------
(5) On all servers, create a new CentOS-HA repo file using the info
below
-----------------------------------------------------
[server_ha-clustering]
name=High Availability/Clustering server technologies (CentOS_5)
type=rpm-md
baseurl=http://download.opensuse.org/repositories/server:/ha-
clustering/CentOS_5/
gpgcheck=1
gpgkey=http://download.opensuse.org/repositories/server:/ha-
clustering/CentOS_5/repodata/repomd.xml.key
enabled=1
-----------------------------------------------------
(6) On all servers, install the latest HA software
---------------------------------------------------------
#shell> yum install heartbeat*64 cluster*64 pacemaker*64
---------------------------------------------------------
Note: Make sure the packages are from the HA-Repository and not
the
standard CentOS repos. Also, choose either the 32-bit or
64-bit
package versions. Installing both platform packages on
the same
machine could lead to problems.
#
=
=
=
=
=======================================================================#
# II. Configure the HA
Software #
#
=
=
=
=
=======================================================================#
(1) On primary HA node, create /etc/ha.d/authkeys with the following
info
-----------------------------------------------------
auth 2
2 sha1 test-ha
-----------------------------------------------------
Note: The "test-ha" value is the secret key shared between nodes
(2) On primary HA node, change the file permissions to 600 on /etc/
ha.d/authkeys
-----------------------------------------------------
#shell> chmod 600 /etc/ha.d/authkeys
-----------------------------------------------------
(3) On primary HA node, create /etc/ha.d/ha.cf (HA config file)
with the following info:
-----------------------------------------------------
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 15
initdead 30
bcast eth1
udpport 694
auto_failback off
node pacemaker-svr1.home.local
node pacemaker-svr2.home.local
crm on
apiauth mgmtd uid=root
respawn root /usr/lib64/heartbeat/mgmtd -v ha
-----------------------------------------------------
Notes:
--> “bcast eth1" defines the HA NIC. In this case, we are
using eth1
via broadcasting. We can also use "eth0" or "ucast
<IP_ADDR_OF_NIC>"
--> The "node" lines must include hostnames from all nodes.
The hostnames
MUST match the "uname -n" output from each host!
--> The lines "crm on", "apiauth...", and "repsawn..." are
required for the
hb_gui tool to work properly
--> In order to use the hb_gui tool, you need to give the
"hacluster" user
a valid password
--> If you are using the 32bit version of heartbeat and
pacemaker, make sure
the "respawn" directive points to the correct location (/
usr/lib/heartbeat)
for "mgmtd".
(4) On primary HA node, copy files from /etc/ha.d to all other nodes
-----------------------------------------------------
#shell> scp -r /etc/ha.d root@<node_2>
-----------------------------------------------------
(5) On all nodes, make sure the ha software is set to autostart
-----------------------------------------------------
#shell> /sbin/chkconfig heartbeat on
#shell> /sbin/service heartbeat start
-----------------------------------------------------
#
=
=
=
=
=======================================================================#
# III. Creating the
Cluster #
#
=
=
=
=
=======================================================================#
(1) On the primary node, run the following commands:
-----------------------------------------------------
#shell> crm configure primitive ClusterIP ocf:heartbeat:IPaddr2
params ip=192.168.1.50 cidr_netmask=32 op monitor interval=30s
#shell> crm configure property stonith-enabled=false
#shell> crm configure property no-quorum-policy=ignore
#shell> crm configure rsc_defaults resource-stickiness=100
-----------------------------------------------------
Notes:
--> The first command creates an entity called "ClusterIP" with
a cluster
IP Address of 192.168.1.50 and a /32 mask. The HA software
will monitor
this address every 30secs for failure.
--> The second command disables the STONITH (shoot-the-other-
node-in-the-head)
feature.
--> The third command configures a "no quorum" policy for the
cluster. With
this option enabled, only one node can be up and running
HA. If this
option is disabled, both nodes must be up and running HA
(thereby negating
a 1:1 failover solution). Disable this only if you can
guarantee both
nodes will be up all the time.
--> The fourth command applies a resource-stickiness value of
100 as a
cluster default. This option controls how much a service
prefers to
stay running where it is. When enabled, it ensures the
ClusterIP will
remain on the active host instead of failing over to a
designated primary
host. In some cases, a specific server may be the preferred
server
for a given entity.
#
=
=
=
=
=======================================================================#
# IV. Monitoring the Cluster and
VIP #
#
=
=
=
=
=======================================================================#
(1) To view the virtual IP Address, use the command "ip addr list" on
the nodes
-----------------------------------------------------
#shell> ip addr list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
pfifo_fast qlen 1000
link/ether 00:50:56:b1:4f:9e brd ff:ff:ff:ff:ff:ff
inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0
inet 192.168.1.50/32 brd 192.168.1.255 scope global eth0
inet6 fe80::250:56ff:feb1:4f9e/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
pfifo_fast qlen 1000
link/ether 00:50:56:b1:21:e8 brd ff:ff:ff:ff:ff:ff
inet 10.1.1.1/30 brd 10.1.1.3 scope global eth1
inet6 fe80::250:56ff:feb1:21e8/64 scope link
valid_lft forever preferred_lft forever
4: sit0: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
-----------------------------------------------------
(2) To view the heartbeat log files, simply "tail -100f /var/log/ha-log"
(3) To check the reliability of the cluster, reboot the nodes and verify
the virtual IP comes back up. Here are some basic failover tests:
Notes:
--> N1=node-1 and N2=node-2
--> On each node, install and start apache, create a unique
/var/www/html/index.html file on each node. Using a 3rd
machine,
run a "while" loop using wget to constantly get the
index.html file
from the cluster VIP.
--> If you intend to run Apache with a VIP, you must add the
following
entry to the /etc/sysctl.conf file on all HA nodes:
---------------------------------------------
sysctl net.ipv4.ip_nonlocal_bind=1
---------------------------------------------
You can either reboot the host or run the "sysctl -p"
command for the
new setting to take affect.
-----------------------------------------------------
--> Test-1: N1 and N2 Active, reboot N2, IP remains
active on N1
--> Test-2: N1 and N2 Active, reboot N1, IP fails over to
N2
--> Test-3: N1 active, N2 powered off, power on N2, IP
remains on N1
--> Test-4: N1 and N2 powered off, power on N1, IP starts
on N1
--> Test-5: N1 and N2 powered off, power on N2, IP starts
on N2
--> Test-6: N1 and N2 powered off, power on N1, IP starts
on N1,
power on N2, IP remains on N1
--> Test-7: N1 and N2 powered off, power on N2, IP starts
on N2,
power on N1, IP remains on N2
--> Test-8: N1 and N2 powered off, power on N1 and N2, IP
starts
on one node, other node is ready to take over
IP
-----------------------------------------------------
(4) To reset the cluster configuration to scratch, stop the heartbeat
software, remove all files from /var/lib/heartbeat/cim on all
nodes,
then restart the heartbeat software
-----------------------------------------------------
#shell> /sbin/service heartbeat stop
#shell> \rm -r /var/lib/heartbeat/crm/*
#shell> /sbin/service heartbeat start
-----------------------------------------------------
#
=
=
=
=
=======================================================================#
# V. Learning more about Linux
HA #
#
=
=
=
=
=======================================================================#
The Linux HA pages are somewhat outdated and lack in detailed
configuration
data. The best links to date include:
--> http://clusterlabs.org/mediawiki/images/9/9d/Clusters_from_Scratch_-_Apache_on_Fedora11.pdf
--> http://clusterlabs.org/wiki/Main_Page
Note: Download the above PDF and look thru the sections. It
contains a
wealth of information detailing how to combine resource
stickiness and
program/process dependencies for a given cluster.
More information about the TriLUG
mailing list