***** infoCopter.com/perl *****
CPMS Software-based Failover
CPMS High-Availability Strategy
Desired Goal
A system failure of a CPMS configuration server (CFG) must not lead into an outage
of the E-Mail service.
Desired objectives
- As short as possible outage time of relevant system processes (within a one digit minute range)
- Guaranteed storage and access to a consistent user database of its applications
- The less possible potential data loss (< 60 minutes of potentially loss of user configuration data in a worst-case, no data loss on a system failure in a best-case)
- Full-featured E-Mail operations to the customer in a case of a system failure
Solution
Brief Failover Description
In case of any system failure of the Master Server, a «Remote Health Check Daemon» on the Slave Server sends a «SHUTDOWN_YOUR_SERVICE» command to the Master Server. After the Master Server has confirmed the «shutdown command» or a timeout occured, the Slave Server starts up his own services, based on its configuration file.
The IP-based failover to the Slave Server will be established by a load-balancer (Alteon).
Overview of the Main Services being monitored
The user database of the CPMS system resides on a NetApp NFS Filer. There is only one running productive CFG where another is configured as a stopped Failover CFG.
Hardware Inventory
- 1 productive and running CFG Server
- 1 Hot stand-by CFG Server (stopped)
Download
Index of /failover
Name Last modified Size Description
Parent Directory -
failover_0_27_master.tar 05-Sep-2003 00:21 30K
failover_0_27_slave.tar 05-Sep-2003 00:21 30K
readme.txt 05-Sep-2003 00:21 43
Installation
Master Server (CFG 1)
cd /opt/criticalpath/global/
tar -xf failover_0_27_master.tar
This will extract all necessary files into a new created directory failover under /opt/criticalpath/global/.
- failover_config.txt
- lhcd.pl
- lhcd.pl
- My/
- rcmd_leg.pl
- scripts/
- socketserver.pl
Start Failover Monitoring on CFG 1
start.sh (as root)
#!/usr/local/bin/bash
##################################
# FAILOVER Startup
##################################
echo "Make sure Daemon SocketServer2 (on Slave Host) is already running!!"
sleep 1
echo "Starting SocketServer..."
./socketserver.pl >> /tmp/socketserver.log 2>> /tmp/socketserver.err &
# Give SocketServer a short time to get ready for the lhcd Daemon
sleep 2
echo "Starting Local Heart-beat Daemon..."
./lhcd.pl >>/tmp/lhcd.log 2>>/tmp/lhcd.err &
ps -A|grep sock
ps -A|grep lhcd
|
Slave Server (CFG 2)
cd /opt/criticalpath/global/
tar -xf failover_0_27_slave.tar
This will extract all necessary files into a new created directory failover under /opt/criticalpath/global/.
- failover_config.txt
- My/
- rcmd.pl
- rhcd.pl
- scripts/foo_start_1.sh
- scripts/foo_start_2.sh ...
- socketserver2.pl
Start Failover Service on CFG 2
start.sh (as root)
#!/bin/sh
./socketserver2.pl >> /tmp/socketserver.log 2>> /tmp/socketserver.err &
ps -axwww|grep sock
|
How it works
How to configure?
$ vi failover_config.txt
Tip: Check with perl -cw failover_config.txt for consitency of the configuration file.
# failover config (Perl style)
# check with perl -cw failover_config.txt for consitency
# if any changes were made
sub CONFIG {
(
VERSION => '0.80.01' ,
REMOTE_POP3_HOST => 'cp-isp.access.ch' ,
REMOTE_POP3_TIMEOUT => 60 , # timeout in seconds
MONITOR_PORT => 7890 ,
MONITOR_PORT_SLAVE => 7891 ,
SLAVE_HOST => 'talisker.access.ch' ,
RLOGIN_USER => 'retoh' ,
RLOGIN_PASS => 'Xfoo' ,
# Multiple scripts to invoke had to be separated by commas
SHUTDOWN_SCRIPTS => './scripts/shutdown_1.sh >/tmp/foo.log 2>/tmp/foo.err,' .
'./scripts/shutdown_2.sh' ,
STARTUP_SCRIPTS => './scripts/foo2.sh,' .
'./scripts/bar.sh'
)
}
1;
|
What To Do in Case of a Real Emergency Failover?
OPERATION SHEET IN CASE OF A FAILOVER
- Fix CFG 1 ;-)
- heart-beatd CFG 2 stop
- CPMS CFG 2 stop
- CPMS CFG 1 start
- heart-beatd CFG 2 start
|