Time Navigator HA Cluster Agent Configuration

I’ve been wanting to post about a configuration that allows for seamless file-level backup of storage attached to an active/passive high availability cluster in an uninterrupted fashion using Atempo’s Time Navigator and I’m finally going to do it.

The Problem

The initial difficulty lies in the requirement that the data must be consistently backed up at every interval, no matter which cluster node is currently the active node with the backend storage mounted. To do this, an agent is required to be configured as a cluster resource in order to “follow” the mounting/exporting of the storage to any cluster node. So in order to accomplish this,  N + 1 tina agents are required. That is, if you have two cluster nodes, you need three agents to successfully backup each node with the local agent and the storage, as it floats about the cluster nodes depending on failure or migration events.

Luckily for me, the good people at Atempo have engineered the agent in such a way that multiple agents can be ran on a single node, each binding to it’s own IP address and each individually controlled via it’s own init script. Of course, we need to make some file edits to make all this happen and that’s what I’m going share!

System Configuration

This configuration is based on CentOS 5.x and Time Navigator 4.2 but should the concepts should be mostly portable to other popular Linux or UNIX distributions. The underlying cluster software used for the majority of my experience with this configuration is Heartbeat 2.1.3, right before the Pacemaker split but has also been more recently tested on Pacemaker 1.0 / Heartbeat 3.0.x. DRBD is used to provide the active/passive cluster-aware state and configuration information to where I’ve installed the Atempo Time Navigator agent but it is possible to install a second agent on each cluster node and configure it identically but this just seems like more work. DRBD does a great job of making sure the latest cluster-aware tina agent is consistently configured and available on the active cluster node, no matter which node that actually is.

For the purpose of this post, I’m going to assume you already have a working Heartbeat/Pacemaker/DRBD configuration up and running with proper STONITH and all that good jazz. Maybe some other time.

Installing and Configuring the Agent on DRBD

The first thing that needs to be done is the tina agent must be installed to a filesystem hosted on DRBD. I generally just SSH around the Linux-X64.tar or Linux-X86.tar Time Navigator installation archive and then decompress and run the install script.

Assuming the dedicated (to this agent resource) DRBD filesystem is mounted as /cluster/tina on the active cluster node:

$ cd /cluster/tina
$ scp user@remote.fqdn:/path/to/Linux-X86.tar ./
$ tar -xf Linux-X86.tar
$ cd Linux-X86
$ ./install.sh

This will bring up the GUI installer. Alternatively use the batch install method, whatever works for you.

Set /cluster/tina as the installation directory and otherwise proceed normally as per site configuration. Unique ports do not need to be used for the second cluster agent as this configuration bind to a floating cluster resource IP address while the local agent binds to (one of) the servers “real” IP address(es).

Once installed, there is one important edit to make in the tina agent environment configuration scripts named .tina.sh (sh/bash) and .tina.csh (csh/tcsh) located in the installation directory (/cluster/tina). The key change to make in the relevant script is to modify the value where the $TINA environment variable is being set. In .tina.sh that would be changing the line:

TINA=tina

to instead read something like this:

TINA=tina_ha

where tina_ha is a unique identifier for this instance of the agent. Basically, it needs to be anything BUT tina. This is one of two key components that had me tricked for a while. I had first tried modifying the $TINA_SERVICE_NAME environment variable but that was a giant red herring because uniquely setting that variable to something other than tina does not produce the desired effect, despite what the looking through the tina environment scripts and init scripts might have you believe.

The second thing we must do is to create an LSB-compliant init script for the cluster-aware tina agent. The LSB compliance is very important to ensure the cluster can manage the resource properly. If any return codes are out of the LSB spec, the cluster will behave erratically and unpredictably when dealing with starting, stopping and monitoring the tina agent.

Since the installation creates a good init script for us, we can copy that script with a new name and edit it.

$ cp /etc/init.d/tina.tina /etc/init.d/tina.tina_ha
$ nano /etc/init.d/tina.tina_ha

First, replace every instance of the path to the local agent’s tina install path with that of the cluster agent’s installation path. A simple search (Ctrl-W) then replace (Cntrl-R) in nano should suffice.

Additionally, we need a small section at the top that will exit the script in case the DRBD filesystem is not mounted. The HA cluster will do resource status checks on all nodes in the cluster and we need the init script to be able to exit with a sane exit code, even if the DRBD filesystem is not accessible (as it is on all passive nodes). Something like this:

if [ -f /cluster/tina/.tina.sh ] ; then
  . /cluster/tina/.tina.sh > /dev/null 2>&1
else
  echo "Unable to start Time Navigator daemon"
  echo "because the \"/cluster/tina/.tina.sh\" file does not exist"
  retval=3
fi

In order to make the script LSB compliant, we need to ensure the correct exit status is returned during the correct operations. Instead of pointing out each specific place I had to edit in order for this to happen, I will simply post my entire “/etc/init.d/tina.tina_ha” init script:

#!/bin/sh
# UPDATED BY SETUP - BEGIN
########################################################
#WARNING :
#THIS FILE IS GENERATED AUTOMATICALLY
#AND WILL BE OVERWRITTEN WHEN UPGRADING
#YOUR VERSION OF Time Navigator PRODUCT
########################################################
PATH="$PATH:/bin:/usr/bin:/sbin:/usr/sbin:/etc:/usr/etc"
export PATH
if [ "${TINA_HOME:+$TINA_HOME}" != "" ] ; then
    if [ "/cluster/tina" != "$TINA_HOME" ] ; then
        echo "Unable to start Time Navigator daemon for \"/cluster/tina\""
        echo "because the Time Navigator environment is already set by \"$TINA_HOME\""
        retval=3
    fi
fi
if [ -f /cluster/tina/.tina.sh ] ; then
    . /cluster/tina/.tina.sh > /dev/null 2>&1
else
    echo "Unable to start Time Navigator daemon"
    echo "because the \"/cluster/tina/.tina.sh\" file does not exist"
    retval=3
fi
# UPDATED BY SETUP - END
# @(#) $Id: rc.tina.orig,v 1.1.6.10.4.4.2.4 2007/09/20 16:26:50 dle Exp $
#
# Time Navigator startup script
# (C) 1999-2005 - Atempo
# tina_daemon starting...
#
 
OS_TYPE=`uname -s`
 
if echo "\c" | grep "c">/dev/null ; then
    ECHOMODE=Bsd
else
    ECHOMODE=Sys5
fi
 
ECHONOCR() {
    if [ "$ECHOMODE" = Bsd ] ; then
        echo -n "$*"
    else
        echo "$*\c"
    fi
}
 
PING() {
    os_type=`uname -s`
    case $os_type in
        HP-UX) result=`ping $1 -n 2 2>/dev/null`; return $?;;
        *) result=`ping -c 2 $1 2>/dev/null`; return $?;;
    esac
}
 
ISREDHATLIKE=1
# Source function library.
if [ -f /etc/init.d/functions ] ; then
    . /etc/init.d/functions
elif [ -f /etc/rc.d/init.d/functions ] ; then
    . /etc/rc.d/init.d/functions
else
    ISREDHATLIKE=0
fi
 
ISSUSE=1
if [ -f /etc/rc.status ] ; then
    . /etc/rc.status
else
    ISSUSE=0
fi
 
RCStart()
{
    if [ -x ${TINA_HOME}/Bin/ndmpd ] ; then
        echo "Starting NDMP Data Server..."
        ${TINA_HOME}/Bin/ndmpd
    elif [ -x ${TINA_HOME}/Bin/tina_nts ] ; then
        echo "Starting NDMP Tape Server..."
        ${TINA_HOME}/Bin/tina_nts
    fi
 
    TINA_DAEMON=$TINA_HOME/Bin/tina_daemon
    if [ -x "$TINA_DAEMON" ]; then
        ECHONOCR "Starting Time Navigator ($TINA_SERVICE_NAME)..."
        if [ -d /var/lock/subsys ] ; then
            touch /var/lock/subsys/tina.$TINA_SERVICE_NAME
        fi
        i=1
        while [ $i -le 60 ] ; do
            if [ $OS_TYPE = "Darwin" ] ; then
                echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon" >> /var/log/system.log
            fi
            echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon $i" >> ${TINA_HOME}/Adm/auto_start.log
            hostname=`hostname 2>/dev/null`
            if [ ! -z "$hostname" ] ; then
                echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: hostname $hostname is defined" >> ${TINA_HOME}/Adm/auto_start.log
                PING $hostname
                status=$?
                if [ $status -eq 0 ] ; then
                    echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: ping $hostname is ok" >> ${TINA_HOME}/Adm/auto_start.log
                    $TINA_DAEMON
                    sleep 2
                    RCStatus no_mess
                    if [ ! -z "$is_running" ] ; then
                        if [ $OS_TYPE = "Darwin" ] ; then
                            echo `date` "tina_daemon ($TINA_SERVICE_NAME) daemon is started" >> /var/log/system.log
                        fi
                        echo `date` "tina_daemon ($TINA_SERVICE_NAME) daemon is started" >> ${TINA_HOME}/Adm/auto_start.log
                        break
                    else
                        echo `date` "tina_daemon ($TINA_SERVICE_NAME) daemon is not started" >> ${TINA_HOME}/Adm/auto_start.log
                    fi
                else
                    echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: ping $hostname is ko" >> ${TINA_HOME}/Adm/auto_start.log
                fi
            else
                echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: hostname is not defined" >> ${TINA_HOME}/Adm/auto_start.log
            fi
            sleep 5
            i=`expr $i + 1`
        done
 
        if [ $ISREDHATLIKE -eq 1 ]; then
            echo_success
            echo
        elif [ $ISSUSE -eq 1 ]; then
            rc_status -v
        else
            echo
        fi
 
        # Start ACSLS daemons (mini_el and ssi)
        if [ -d "$TINA_HOME/Vtl" ] ; then
            for VL_path in $TINA_HOME/Vtl/*
            do
                [ ! -d $VL_path ] && continue
                VL_name=`basename $VL_path`
                if [ $VL_name = "Install" -o $VL_name = "Bin" -o $VL_name = "Log" -o $VL_name = "Tmp" ] ; then
                    continue
                fi
 
                # If there is no tina_stk.conf, give up
                [ ! -f "$VL_path/tina_stk.conf" ] && continue
 
                [ ! -x "$TINA_HOME/Vtl/Bin/ACSLS/start.sh" ] && continue
 
                ECHONOCR "Starting ACSLS client daemon for $VL_name virtual library ..."
                $TINA_HOME/Vtl/Bin/ACSLS/start.sh $VL_name
                echo
            done
        fi
    elif [ ! -f ${TINA_HOME}/.ndmp.sh ] ; then
        if [ $ISREDHATLIKE -eq 1 ]; then
            ECHONOCR "Starting Time Navigator (${TINA_SERVICE_NAME})..."
            echo_failure
            echo
        elif [ $ISSUSE -eq 1 ]; then
            rc_failed 1
        else
            echo
        fi
    fi
}
 
RCStop()
{
    #Stop ndmp daemon
    NDMPDAEMON=""
    if [ -x ${TINA_HOME}/Bin/ndmpd ] ; then
        NDMPDAEMON="ndmpd"
    elif [ -x ${TINA_HOME}/Bin/tina_nts ] ; then
        NDMPDAEMON="tina_nts"
    fi
    if [ ! -z "$NDMPDAEMON" ] ; then
        file="/var/tmp/$NDMPDAEMON.pid"
        if [ -f $file ] ; then
            if [ "$NDMPDAEMON" = ndmpd ] ; then
                echo "Shutting down NDMP Data Server..."
            elif [ "$NDMPDAEMON" = tina_nts ] ; then
                echo "Shutting down NDMP Tape Server..."
            fi
            kill `cat $file`
        fi
    fi
 
    #Stop Time Navigator daemon
    if [ -x ${TINA_HOME}/Bin/tina_stop ]; then
        if [ -d /var/lock/subsys ] ; then
            rm -f /var/lock/subsys/tina.$TINA_SERVICE_NAME
        fi
        ECHONOCR "Shutting down Time Navigator ($TINA_SERVICE_NAME)..."
        if [ $OS_TYPE = "Darwin" ] ; then
            echo `date` "Stopping tina_daemon ($TINA_SERVICE_NAME) daemon" >> /var/log/system.log
        fi
        echo `date` "Stopping tina_daemon ($TINA_SERVICE_NAME) daemon" >> ${TINA_HOME}/Adm/auto_start.log
        $TINA_HOME/Bin/tina_stop > /dev/null
        retval=0
        if [ $ISREDHATLIKE -eq 1 ]; then
            echo_success
            echo
        elif [ $ISSUSE -eq 1 ]; then
            rc_status -v
        else
            echo
        fi
    elif [ ! -f ${TINA_HOME}/.ndmp.sh ] ; then
        if [ $ISREDHATLIKE -eq 1 ]; then
            echo "Shutting down Time Navigator ($TINA_SERVICE_NAME)..."
            echo_failure
            echo
        elif [ $ISSUSE -eq 1 ]; then
            rc_failed 1
        else
            echo
        fi
    fi
}
 
RCStatus()
{
    ## Check status with checkproc(8), if process is running
    ## checkproc will return with exit status 0.
 
    # Status has a slightly different for the status command:
    # 0 - service running
    # 1 - service dead, but /var/run/ pid file exists
    # 2 - service dead, but /var/lock/ lock file exists
    # 3 - service not running
 
    if [ -f $TINA_HOME/Conf/hosts ] ; then
        host_to_ping=`cat $TINA_HOME/Conf/hosts | grep ^localhostname | awk '{print $2}' 2>/dev/null`
        if [ $? != 0 -o -z "$host_to_ping" ] ; then
            host_to_ping="127.0.0.1"
        fi
    else
        host_to_ping="127.0.0.1"
    fi
 
    is_running=`$TINA_HOME/Bin/tina_ping -host $host_to_ping -language English | grep "is running"`
    if [ $# -eq 0 ] ; then
        ECHONOCR "Checking for Time Navigator ($TINA_SERVICE_NAME): "
        if [ $OS_TYPE = "Darwin" ] ; then
            echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon" >> /var/log/system.log
        fi
        echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon" >> ${TINA_HOME}/Adm/auto_start.log
        if [ ! -z "$is_running" ] ; then
            echo "tina_daemon is running"
            echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon: tina_daemon is running" >> ${TINA_HOME}/Adm/auto_start.log
            retval=0
        else
            echo "tina_daemon is stopped"
            echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon: tina_daemon is stopped" >> ${TINA_HOME}/Adm/auto_start.log
                        retval=3
        fi
    fi
}
 
test "$ISSUSE" -eq 1 && rc_reset
 
case "$1" in
start)
    RCStart
    retval=0
    ;;
 
stop)
    RCStop
    retval=0
    ;;
 
start_msg)
    echo "Starting Time Navigator ($TINA_SERVICE_NAME)" ;;
 
stop_msg)
    echo "Shutting down Time Navigator ($TINA_SERVICE_NAME)" ;;
 
restart)
    RCStop
    sleep 3
    RCStart ;;
 
status)
    RCStatus ;;
 
*)
    echo "usage: /etc/init.d/tina {start|stop|restart|status}" ;;
esac
 
exit $retval

One final Time Navigator configuration change must be made. The tina agent “hosts” file must be configured to set the “localhostname” of our agent to the FQDN of the floating or virtual IP address service so that the agent will only try to bind to that IP address instead of all IP addresses on the system.

$ cd /cluster/tina/Conf
$ cp hosts.sample hosts
$ nano hosts

Add a line to the file specifying the “localhostname” like so:

localhostname myserver.company.com

For this to work properly, you must also set any other tina agents running on the cluster nodes to also have a “localhostname” set in their respective “hosts” file to prevent other host-based agents from binding to all IP addresses on the host, including the virtual IP address.

That’s it! The tina service can be added to the HA cluster as an LSB resource agent, grouped with your storage resource agents so it will always be running on the same node as your storage.

Conclusion

Ok, so I rushed the end. Big deal. Sue me. I doubt anyone cares anyways!

Leave a Reply

Your email address will not be published. Required fields are marked *