I’ve been wanting to post about a configuration that allows for seamless file-level backup of storage attached to an active/passive high availability cluster in an uninterrupted fashion using Atempo’s Time Navigator and I’m finally going to do it.
The Problem
The initial difficulty lies in the requirement that the data must be consistently backed up at every interval, no matter which cluster node is currently the active node with the backend storage mounted. To do this, an agent is required to be configured as a cluster resource in order to “follow” the mounting/exporting of the storage to any cluster node. So in order to accomplish this,  N + 1 tina agents are required. That is, if you have two cluster nodes, you need three agents to successfully backup each node with the local agent and the storage, as it floats about the cluster nodes depending on failure or migration events.
Luckily for me, the good people at Atempo have engineered the agent in such a way that multiple agents can be ran on a single node, each binding to it’s own IP address and each individually controlled via it’s own init script. Of course, we need to make some file edits to make all this happen and that’s what I’m going share!
System Configuration
This configuration is based on CentOS 5.x and Time Navigator 4.2 but should the concepts should be mostly portable to other popular Linux or UNIX distributions. The underlying cluster software used for the majority of my experience with this configuration is Heartbeat 2.1.3, right before the Pacemaker split but has also been more recently tested on Pacemaker 1.0 / Heartbeat 3.0.x. DRBD is used to provide the active/passive cluster-aware state and configuration information to where I’ve installed the Atempo Time Navigator agent but it is possible to install a second agent on each cluster node and configure it identically but this just seems like more work. DRBD does a great job of making sure the latest cluster-aware tina agent is consistently configured and available on the active cluster node, no matter which node that actually is.
For the purpose of this post, I’m going to assume you already have a working Heartbeat/Pacemaker/DRBD configuration up and running with proper STONITH and all that good jazz. Maybe some other time.
Installing and Configuring the Agent on DRBD
The first thing that needs to be done is the tina agent must be installed to a filesystem hosted on DRBD. I generally just SSH around the Linux-X64.tar or Linux-X86.tar Time Navigator installation archive and then decompress and run the install script.
Assuming the dedicated (to this agent resource) DRBD filesystem is mounted as /cluster/tina on the active cluster node:
$ cd /cluster/tina $ scp user@remote.fqdn:/path/to/Linux-X86.tar ./ $Â tar -xf Linux-X86.tar $ cd Linux-X86 $Â ./install.sh
This will bring up the GUI installer. Alternatively use the batch install method, whatever works for you.
Set /cluster/tina as the installation directory and otherwise proceed normally as per site configuration. Unique ports do not need to be used for the second cluster agent as this configuration bind to a floating cluster resource IP address while the local agent binds to (one of) the servers “real” IP address(es).
Once installed, there is one important edit to make in the tina agent environment configuration scripts named .tina.sh (sh/bash) and .tina.csh (csh/tcsh) located in the installation directory (/cluster/tina). The key change to make in the relevant script is to modify the value where the $TINA environment variable is being set. In .tina.sh that would be changing the line:
TINA=tina
to instead read something like this:
TINA=tina_ha
where tina_ha is a unique identifier for this instance of the agent. Basically, it needs to be anything BUT tina. This is one of two key components that had me tricked for a while. I had first tried modifying the $TINA_SERVICE_NAME environment variable but that was a giant red herring because uniquely setting that variable to something other than tina does not produce the desired effect, despite what the looking through the tina environment scripts and init scripts might have you believe.
The second thing we must do is to create an LSB-compliant init script for the cluster-aware tina agent. The LSB compliance is very important to ensure the cluster can manage the resource properly. If any return codes are out of the LSB spec, the cluster will behave erratically and unpredictably when dealing with starting, stopping and monitoring the tina agent.
Since the installation creates a good init script for us, we can copy that script with a new name and edit it.
$ cp /etc/init.d/tina.tina /etc/init.d/tina.tina_ha $ nano /etc/init.d/tina.tina_ha
First, replace every instance of the path to the local agent’s tina install path with that of the cluster agent’s installation path. A simple search (Ctrl-W) then replace (Cntrl-R) in nano should suffice.
Additionally, we need a small section at the top that will exit the script in case the DRBD filesystem is not mounted. The HA cluster will do resource status checks on all nodes in the cluster and we need the init script to be able to exit with a sane exit code, even if the DRBD filesystem is not accessible (as it is on all passive nodes). Something like this:
if [ -f /cluster/tina/.tina.sh ] ; then . /cluster/tina/.tina.sh > /dev/null 2>&1 else echo "Unable to start Time Navigator daemon" echo "because the \"/cluster/tina/.tina.sh\" file does not exist" retval=3 fi
In order to make the script LSB compliant, we need to ensure the correct exit status is returned during the correct operations. Instead of pointing out each specific place I had to edit in order for this to happen, I will simply post my entire “/etc/init.d/tina.tina_ha” init script:
#!/bin/sh # UPDATED BY SETUP - BEGIN ######################################################## #WARNING : #THIS FILE IS GENERATED AUTOMATICALLY #AND WILL BE OVERWRITTEN WHEN UPGRADING #YOUR VERSION OF Time Navigator PRODUCT ######################################################## PATH="$PATH:/bin:/usr/bin:/sbin:/usr/sbin:/etc:/usr/etc" export PATH if [ "${TINA_HOME:+$TINA_HOME}" != "" ] ; then if [ "/cluster/tina" != "$TINA_HOME" ] ; then echo "Unable to start Time Navigator daemon for \"/cluster/tina\"" echo "because the Time Navigator environment is already set by \"$TINA_HOME\"" retval=3 fi fi if [ -f /cluster/tina/.tina.sh ] ; then . /cluster/tina/.tina.sh > /dev/null 2>&1 else echo "Unable to start Time Navigator daemon" echo "because the \"/cluster/tina/.tina.sh\" file does not exist" retval=3 fi # UPDATED BY SETUP - END # @(#) $Id: rc.tina.orig,v 1.1.6.10.4.4.2.4 2007/09/20 16:26:50 dle Exp $ # # Time Navigator startup script # (C) 1999-2005 - Atempo # tina_daemon starting... # OS_TYPE=`uname -s` if echo "\c" | grep "c">/dev/null ; then ECHOMODE=Bsd else ECHOMODE=Sys5 fi ECHONOCR() { if [ "$ECHOMODE" = Bsd ] ; then echo -n "$*" else echo "$*\c" fi } PING() { os_type=`uname -s` case $os_type in HP-UX) result=`ping $1 -n 2 2>/dev/null`; return $?;; *) result=`ping -c 2 $1 2>/dev/null`; return $?;; esac } ISREDHATLIKE=1 # Source function library. if [ -f /etc/init.d/functions ] ; then . /etc/init.d/functions elif [ -f /etc/rc.d/init.d/functions ] ; then . /etc/rc.d/init.d/functions else ISREDHATLIKE=0 fi ISSUSE=1 if [ -f /etc/rc.status ] ; then . /etc/rc.status else ISSUSE=0 fi RCStart() { if [ -x ${TINA_HOME}/Bin/ndmpd ] ; then echo "Starting NDMP Data Server..." ${TINA_HOME}/Bin/ndmpd elif [ -x ${TINA_HOME}/Bin/tina_nts ] ; then echo "Starting NDMP Tape Server..." ${TINA_HOME}/Bin/tina_nts fi TINA_DAEMON=$TINA_HOME/Bin/tina_daemon if [ -x "$TINA_DAEMON" ]; then ECHONOCR "Starting Time Navigator ($TINA_SERVICE_NAME)..." if [ -d /var/lock/subsys ] ; then touch /var/lock/subsys/tina.$TINA_SERVICE_NAME fi i=1 while [ $i -le 60 ] ; do if [ $OS_TYPE = "Darwin" ] ; then echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon" >> /var/log/system.log fi echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon $i" >> ${TINA_HOME}/Adm/auto_start.log hostname=`hostname 2>/dev/null` if [ ! -z "$hostname" ] ; then echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: hostname $hostname is defined" >> ${TINA_HOME}/Adm/auto_start.log PING $hostname status=$? if [ $status -eq 0 ] ; then echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: ping $hostname is ok" >> ${TINA_HOME}/Adm/auto_start.log $TINA_DAEMON sleep 2 RCStatus no_mess if [ ! -z "$is_running" ] ; then if [ $OS_TYPE = "Darwin" ] ; then echo `date` "tina_daemon ($TINA_SERVICE_NAME) daemon is started" >> /var/log/system.log fi echo `date` "tina_daemon ($TINA_SERVICE_NAME) daemon is started" >> ${TINA_HOME}/Adm/auto_start.log break else echo `date` "tina_daemon ($TINA_SERVICE_NAME) daemon is not started" >> ${TINA_HOME}/Adm/auto_start.log fi else echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: ping $hostname is ko" >> ${TINA_HOME}/Adm/auto_start.log fi else echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: hostname is not defined" >> ${TINA_HOME}/Adm/auto_start.log fi sleep 5 i=`expr $i + 1` done if [ $ISREDHATLIKE -eq 1 ]; then echo_success echo elif [ $ISSUSE -eq 1 ]; then rc_status -v else echo fi # Start ACSLS daemons (mini_el and ssi) if [ -d "$TINA_HOME/Vtl" ] ; then for VL_path in $TINA_HOME/Vtl/* do [ ! -d $VL_path ] && continue VL_name=`basename $VL_path` if [ $VL_name = "Install" -o $VL_name = "Bin" -o $VL_name = "Log" -o $VL_name = "Tmp" ] ; then continue fi # If there is no tina_stk.conf, give up [ ! -f "$VL_path/tina_stk.conf" ] && continue [ ! -x "$TINA_HOME/Vtl/Bin/ACSLS/start.sh" ] && continue ECHONOCR "Starting ACSLS client daemon for $VL_name virtual library ..." $TINA_HOME/Vtl/Bin/ACSLS/start.sh $VL_name echo done fi elif [ ! -f ${TINA_HOME}/.ndmp.sh ] ; then if [ $ISREDHATLIKE -eq 1 ]; then ECHONOCR "Starting Time Navigator (${TINA_SERVICE_NAME})..." echo_failure echo elif [ $ISSUSE -eq 1 ]; then rc_failed 1 else echo fi fi } RCStop() { #Stop ndmp daemon NDMPDAEMON="" if [ -x ${TINA_HOME}/Bin/ndmpd ] ; then NDMPDAEMON="ndmpd" elif [ -x ${TINA_HOME}/Bin/tina_nts ] ; then NDMPDAEMON="tina_nts" fi if [ ! -z "$NDMPDAEMON" ] ; then file="/var/tmp/$NDMPDAEMON.pid" if [ -f $file ] ; then if [ "$NDMPDAEMON" = ndmpd ] ; then echo "Shutting down NDMP Data Server..." elif [ "$NDMPDAEMON" = tina_nts ] ; then echo "Shutting down NDMP Tape Server..." fi kill `cat $file` fi fi #Stop Time Navigator daemon if [ -x ${TINA_HOME}/Bin/tina_stop ]; then if [ -d /var/lock/subsys ] ; then rm -f /var/lock/subsys/tina.$TINA_SERVICE_NAME fi ECHONOCR "Shutting down Time Navigator ($TINA_SERVICE_NAME)..." if [ $OS_TYPE = "Darwin" ] ; then echo `date` "Stopping tina_daemon ($TINA_SERVICE_NAME) daemon" >> /var/log/system.log fi echo `date` "Stopping tina_daemon ($TINA_SERVICE_NAME) daemon" >> ${TINA_HOME}/Adm/auto_start.log $TINA_HOME/Bin/tina_stop > /dev/null retval=0 if [ $ISREDHATLIKE -eq 1 ]; then echo_success echo elif [ $ISSUSE -eq 1 ]; then rc_status -v else echo fi elif [ ! -f ${TINA_HOME}/.ndmp.sh ] ; then if [ $ISREDHATLIKE -eq 1 ]; then echo "Shutting down Time Navigator ($TINA_SERVICE_NAME)..." echo_failure echo elif [ $ISSUSE -eq 1 ]; then rc_failed 1 else echo fi fi } RCStatus() { ## Check status with checkproc(8), if process is running ## checkproc will return with exit status 0. # Status has a slightly different for the status command: # 0 - service running # 1 - service dead, but /var/run/ pid file exists # 2 - service dead, but /var/lock/ lock file exists # 3 - service not running if [ -f $TINA_HOME/Conf/hosts ] ; then host_to_ping=`cat $TINA_HOME/Conf/hosts | grep ^localhostname | awk '{print $2}' 2>/dev/null` if [ $? != 0 -o -z "$host_to_ping" ] ; then host_to_ping="127.0.0.1" fi else host_to_ping="127.0.0.1" fi is_running=`$TINA_HOME/Bin/tina_ping -host $host_to_ping -language English | grep "is running"` if [ $# -eq 0 ] ; then ECHONOCR "Checking for Time Navigator ($TINA_SERVICE_NAME): " if [ $OS_TYPE = "Darwin" ] ; then echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon" >> /var/log/system.log fi echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon" >> ${TINA_HOME}/Adm/auto_start.log if [ ! -z "$is_running" ] ; then echo "tina_daemon is running" echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon: tina_daemon is running" >> ${TINA_HOME}/Adm/auto_start.log retval=0 else echo "tina_daemon is stopped" echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon: tina_daemon is stopped" >> ${TINA_HOME}/Adm/auto_start.log retval=3 fi fi } test "$ISSUSE" -eq 1 && rc_reset case "$1" in start) RCStart retval=0 ;; stop) RCStop retval=0 ;; start_msg) echo "Starting Time Navigator ($TINA_SERVICE_NAME)" ;; stop_msg) echo "Shutting down Time Navigator ($TINA_SERVICE_NAME)" ;; restart) RCStop sleep 3 RCStart ;; status) RCStatus ;; *) echo "usage: /etc/init.d/tina {start|stop|restart|status}" ;; esac exit $retval |
One final Time Navigator configuration change must be made. The tina agent “hosts” file must be configured to set the “localhostname” of our agent to the FQDN of the floating or virtual IP address service so that the agent will only try to bind to that IP address instead of all IP addresses on the system.
$ cd /cluster/tina/Conf $ cp hosts.sample hosts $ nano hosts
Add a line to the file specifying the “localhostname” like so:
localhostname myserver.company.com
For this to work properly, you must also set any other tina agents running on the cluster nodes to also have a “localhostname” set in their respective “hosts” file to prevent other host-based agents from binding to all IP addresses on the host, including the virtual IP address.
That’s it! The tina service can be added to the HA cluster as an LSB resource agent, grouped with your storage resource agents so it will always be running on the same node as your storage.
Conclusion
Ok, so I rushed the end. Big deal. Sue me. I doubt anyone cares anyways!