***** infoCopter.com/perl *****

CPMS Software-based Failover

CPMS High-Availability Strategy

Desired Goal

A system failure of a CPMS configuration server (CFG) must not lead into an outage of the E-Mail service.

Desired objectives

  1. As short as possible outage time of relevant system processes (within a one digit minute range)
  2. Guaranteed storage and access to a consistent user database of its applications
  3. The less possible potential data loss (< 60 minutes of potentially loss of user configuration data in a worst-case, no data loss on a system failure in a best-case)
  4. Full-featured E-Mail operations to the customer in a case of a system failure

Solution

Brief Failover Description

In case of any system failure of the Master Server, a «Remote Health Check Daemon» on the Slave Server sends a «SHUTDOWN_YOUR_SERVICE» command to the Master Server. After the Master Server has confirmed the «shutdown command» or a timeout occured, the Slave Server starts up his own services, based on its configuration file.

The IP-based failover to the Slave Server will be established by a load-balancer (Alteon).

Overview of the Main Services being monitored

The user database of the CPMS system resides on a NetApp NFS Filer. There is only one running productive CFG where another is configured as a stopped Failover CFG.

failover

Hardware Inventory

  • 1 productive and running CFG Server
  • 1 Hot stand-by CFG Server (stopped)


Download

Index of /failover

Icon  Name                     Last modified      Size  Description
[DIR] Parent Directory - [   ] failover_0_27_master.tar 05-Sep-2003 00:21 30K [   ] failover_0_27_slave.tar 05-Sep-2003 00:21 30K [TXT] readme.txt 05-Sep-2003 00:21 43

Installation

Master Server (CFG 1)

cd /opt/criticalpath/global/
tar -xf failover_0_27_master.tar

This will extract all necessary files into a new created directory failover under /opt/criticalpath/global/.

  • failover_config.txt
  • lhcd.pl
  • lhcd.pl
  • My/
  • rcmd_leg.pl
  • scripts/
  • socketserver.pl

Start Failover Monitoring on CFG 1

start.sh (as root)
#!/usr/local/bin/bash

##################################
# FAILOVER  Startup
##################################

echo "Make sure Daemon SocketServer2 (on Slave Host) is already running!!"

sleep 1

echo "Starting SocketServer..."
./socketserver.pl >> /tmp/socketserver.log 2>> /tmp/socketserver.err &

# Give SocketServer a short time to get ready for the lhcd Daemon
sleep 2

echo "Starting Local Heart-beat Daemon..."
./lhcd.pl >>/tmp/lhcd.log 2>>/tmp/lhcd.err &

ps -A|grep sock
ps -A|grep lhcd

Slave Server (CFG 2)

cd /opt/criticalpath/global/
tar -xf failover_0_27_slave.tar

This will extract all necessary files into a new created directory failover under /opt/criticalpath/global/.

  • failover_config.txt
  • My/
  • rcmd.pl
  • rhcd.pl
  • scripts/foo_start_1.sh
  • scripts/foo_start_2.sh ...
  • socketserver2.pl

Start Failover Service on CFG 2

start.sh (as root)
#!/bin/sh

./socketserver2.pl >> /tmp/socketserver.log 2>> /tmp/socketserver.err &

ps -axwww|grep sock

How it works

how it works

How to configure?

$ vi failover_config.txt

Tip: Check with perl -cw failover_config.txt for consitency of the configuration file.

# failover config (Perl style)

# check with perl -cw failover_config.txt for consitency
# if any changes were made

sub CONFIG {

    (
        VERSION                 => '0.80.01' ,

        REMOTE_POP3_HOST        => 'cp-isp.access.ch' ,
        REMOTE_POP3_TIMEOUT     => 60 , # timeout in seconds
        MONITOR_PORT            => 7890 ,
        MONITOR_PORT_SLAVE      => 7891 ,
        SLAVE_HOST              => 'talisker.access.ch' ,

        RLOGIN_USER             => 'retoh' ,
        RLOGIN_PASS             => 'Xfoo' ,

        # Multiple scripts to invoke had to be separated by commas
        SHUTDOWN_SCRIPTS        => './scripts/shutdown_1.sh   >/tmp/foo.log   2>/tmp/foo.err,' .
                                   './scripts/shutdown_2.sh' ,

        STARTUP_SCRIPTS         => './scripts/foo2.sh,' .
                                   './scripts/bar.sh'
    )
}

1;


What To Do in Case of a Real Emergency Failover?

OPERATION SHEET IN CASE OF A FAILOVER

  1. Fix CFG 1 ;-)
  2. heart-beatd CFG 2 stop
  3. CPMS CFG 2 stop
  4. CPMS CFG 1 start
  5. heart-beatd CFG 2 start
© reto :)