Perl Basics
Introduction to Perl
FAQ's
CGI
Regular Expressions

PC Overview
Cool Stuff
My Modules
Success Stories
Links
Perl in the News
Logos

PC Internals
About
Contact
Handy Logos
What's new

CPMS Software-based Failover

CPMS High-Availability Strategy
See also
P-friendly

[ home ] - [ search ] - [ sitemap ]


Desired Goal

A system failure of a CPMS configuration server (CFG) must not lead into an outage of the E-Mail service.

Desired objectives

  1. As short as possible outage time of relevant system processes (within a one digit minute range)
  2. Guaranteed storage and access to a consistent user database of its applications
  3. The less possible potential data loss (< 60 minutes of potentially loss of user configuration data in a worst-case, no data loss on a system failure in a best-case)
  4. Full-featured E-Mail operations to the customer in a case of a system failure

Solution

Brief Failover Description

In case of any system failure of the Master Server, a «Remote Health Check Daemon» on the Slave Server sends a «SHUTDOWN_YOUR_SERVICE» command to the Master Server. After the Master Server has confirmed the «shutdown command» or a timeout occured, the Slave Server starts up his own services, based on its configuration file.

The IP-based failover to the Slave Server will be established by a load-balancer (Alteon).

Overview of the Main Services being monitored

The user database of the CPMS system resides on a NetApp NFS Filer. There is only one running productive CFG where another is configured as a stopped Failover CFG.

failover

Hardware Inventory

  • 1 productive and running CFG Server
  • 1 Hot stand-by CFG Server (stopped)


Download

Index of /failover

Icon  Name                     Last modified      Size  Description
[DIR] Parent Directory - [   ] failover_0_27_master.tar 05-Sep-2003 00:21 30K [   ] failover_0_27_slave.tar 05-Sep-2003 00:21 30K [TXT] readme.txt 05-Sep-2003 00:21 43

Installation

Master Server (CFG 1)

cd /opt/criticalpath/global/
tar -xf failover_0_27_master.tar

This will extract all necessary files into a new created directory failover under /opt/criticalpath/global/.

  • failover_config.txt
  • lhcd.pl
  • lhcd.pl
  • My/
  • rcmd_leg.pl
  • scripts/
  • socketserver.pl

Start Failover Monitoring on CFG 1

start.sh (as root)
#!/usr/local/bin/bash

##################################
# FAILOVER  Startup
##################################

echo "Make sure Daemon SocketServer2 (on Slave Host) is already running!!"

sleep 1

echo "Starting SocketServer..."
./socketserver.pl >> /tmp/socketserver.log 2>> /tmp/socketserver.err &

# Give SocketServer a short time to get ready for the lhcd Daemon
sleep 2

echo "Starting Local Heart-beat Daemon..."
./lhcd.pl >>/tmp/lhcd.log 2>>/tmp/lhcd.err &

ps -A|grep sock
ps -A|grep lhcd

Slave Server (CFG 2)

cd /opt/criticalpath/global/
tar -xf failover_0_27_slave.tar

This will extract all necessary files into a new created directory failover under /opt/criticalpath/global/.

  • failover_config.txt
  • My/
  • rcmd.pl
  • rhcd.pl
  • scripts/foo_start_1.sh
  • scripts/foo_start_2.sh ...
  • socketserver2.pl

Start Failover Service on CFG 2

start.sh (as root)
#!/bin/sh

./socketserver2.pl >> /tmp/socketserver.log 2>> /tmp/socketserver.err &

ps -axwww|grep sock

How it works

how it works

How to configure?

$ vi failover_config.txt

Tip: Check with perl -cw failover_config.txt for consitency of the configuration file.

# failover config (Perl style)

# check with perl -cw failover_config.txt for consitency
# if any changes were made

sub CONFIG {

    (
        VERSION                 => '0.80.01' ,

        REMOTE_POP3_HOST        => 'cp-isp.access.ch' ,
        REMOTE_POP3_TIMEOUT     => 60 , # timeout in seconds
        MONITOR_PORT            => 7890 ,
        MONITOR_PORT_SLAVE      => 7891 ,
        SLAVE_HOST              => 'talisker.access.ch' ,

        RLOGIN_USER             => 'retoh' ,
        RLOGIN_PASS             => 'Xfoo' ,

        # Multiple scripts to invoke had to be separated by commas
        SHUTDOWN_SCRIPTS        => './scripts/shutdown_1.sh   >/tmp/foo.log   2>/tmp/foo.err,' .
                                   './scripts/shutdown_2.sh' ,

        STARTUP_SCRIPTS         => './scripts/foo2.sh,' .
                                   './scripts/bar.sh'
    )
}

1;


What To Do in Case of a Real Emergency Failover?

OPERATION SHEET IN CASE OF A FAILOVER

  1. Fix CFG 1 ;-)
  2. heart-beatd CFG 2 stop
  3. CPMS CFG 2 stop
  4. CPMS CFG 1 start
  5. heart-beatd CFG 2 start
home - feedback - search

$Id: cms_doc_failover.htm,v 1.2 2004/03/11 22:09:51 reto Exp $
© 1998-2004 reto :)