"Robots march to their own beat"
avatar

Time Navigator HA Cluster Agent Configuration

Posted: August 5th, 2010 | Author: | Filed under: Sysadmin | Tags: , , , , , , | No Comments »

I’ve been wanting to post about a configuration that allows for seamless file-level backup of storage attached to an active/passive high availability cluster in an uninterrupted fashion using Atempo’s Time Navigator and I’m finally going to do it.

The Problem

The initial difficulty lies in the requirement that the data must be consistently backed up at every interval, no matter which cluster node is currently the active node with the backend storage mounted. To do this, an agent is required to be configured as a cluster resource in order to “follow” the mounting/exporting of the storage to any cluster node. So in order to accomplish this,  N + 1 tina agents are required. That is, if you have two cluster nodes, you need three agents to successfully backup each node with the local agent and the storage, as it floats about the cluster nodes depending on failure or migration events.

Luckily for me, the good people at Atempo have engineered the agent in such a way that multiple agents can be ran on a single node, each binding to it’s own IP address and each individually controlled via it’s own init script. Of course, we need to make some file edits to make all this happen and that’s what I’m going share!

System Configuration

This configuration is based on CentOS 5.x and Time Navigator 4.2 but should the concepts should be mostly portable to other popular Linux or UNIX distributions. The underlying cluster software used for the majority of my experience with this configuration is Heartbeat 2.1.3, right before the Pacemaker split but has also been more recently tested on Pacemaker 1.0 / Heartbeat 3.0.x. DRBD is used to provide the active/passive cluster-aware state and configuration information to where I’ve installed the Atempo Time Navigator agent but it is possible to install a second agent on each cluster node and configure it identically but this just seems like more work. DRBD does a great job of making sure the latest cluster-aware tina agent is consistently configured and available on the active cluster node, no matter which node that actually is.

For the purpose of this post, I’m going to assume you already have a working Heartbeat/Pacemaker/DRBD configuration up and running with proper STONITH and all that good jazz. Maybe some other time.

Installing and Configuring the Agent on DRBD

The first thing that needs to be done is the tina agent must be installed to a filesystem hosted on DRBD. I generally just SSH around the Linux-X64.tar or Linux-X86.tar Time Navigator installation archive and then decompress and run the install script.

Assuming the dedicated (to this agent resource) DRBD filesystem is mounted as /cluster/tina on the active cluster node:

$ cd /cluster/tina
$ scp user@remote.fqdn:/path/to/Linux-X86.tar ./
$ tar -xf Linux-X86.tar
$ cd Linux-X86
$ ./install.sh

This will bring up the GUI installer. Alternatively use the batch install method, whatever works for you.

Set /cluster/tina as the installation directory and otherwise proceed normally as per site configuration. Unique ports do not need to be used for the second cluster agent as this configuration bind to a floating cluster resource IP address while the local agent binds to (one of) the servers “real” IP address(es).

Once installed, there is one important edit to make in the tina agent environment configuration scripts named .tina.sh (sh/bash) and .tina.csh (csh/tcsh) located in the installation directory (/cluster/tina). The key change to make in the relevant script is to modify the value where the $TINA environment variable is being set. In .tina.sh that would be changing the line:

TINA=tina

to instead read something like this:

TINA=tina_ha

where tina_ha is a unique identifier for this instance of the agent. Basically, it needs to be anything BUT tina. This is one of two key components that had me tricked for a while. I had first tried modifying the $TINA_SERVICE_NAME environment variable but that was a giant red herring because uniquely setting that variable to something other than tina does not produce the desired effect, despite what the looking through the tina environment scripts and init scripts might have you believe.

The second thing we must do is to create an LSB-compliant init script for the cluster-aware tina agent. The LSB compliance is very important to ensure the cluster can manage the resource properly. If any return codes are out of the LSB spec, the cluster will behave erratically and unpredictably when dealing with starting, stopping and monitoring the tina agent.

Since the installation creates a good init script for us, we can copy that script with a new name and edit it.

$ cp /etc/init.d/tina.tina /etc/init.d/tina.tina_ha
$ nano /etc/init.d/tina.tina_ha

First, replace every instance of the path to the local agent’s tina install path with that of the cluster agent’s installation path. A simple search (Ctrl-W) then replace (Cntrl-R) in nano should suffice.

Additionally, we need a small section at the top that will exit the script in case the DRBD filesystem is not mounted. The HA cluster will do resource status checks on all nodes in the cluster and we need the init script to be able to exit with a sane exit code, even if the DRBD filesystem is not accessible (as it is on all passive nodes). Something like this:

if [ -f /cluster/tina/.tina.sh ] ; then
  . /cluster/tina/.tina.sh > /dev/null 2>&1
else
  echo "Unable to start Time Navigator daemon"
  echo "because the \"/cluster/tina/.tina.sh\" file does not exist"
  retval=3
fi

In order to make the script LSB compliant, we need to ensure the correct exit status is returned during the correct operations. Instead of pointing out each specific place I had to edit in order for this to happen, I will simply post my entire “/etc/init.d/tina.tina_ha” init script:

#!/bin/sh
# UPDATED BY SETUP - BEGIN
########################################################
#WARNING :
#THIS FILE IS GENERATED AUTOMATICALLY
#AND WILL BE OVERWRITTEN WHEN UPGRADING
#YOUR VERSION OF Time Navigator PRODUCT
########################################################
PATH="$PATH:/bin:/usr/bin:/sbin:/usr/sbin:/etc:/usr/etc"
export PATH
if [ "${TINA_HOME:+$TINA_HOME}" != "" ] ; then
	if [ "/cluster/tina" != "$TINA_HOME" ] ; then
		echo "Unable to start Time Navigator daemon for \"/cluster/tina\""
		echo "because the Time Navigator environment is already set by \"$TINA_HOME\""
		retval=3
	fi
fi
if [ -f /cluster/tina/.tina.sh ] ; then
	. /cluster/tina/.tina.sh > /dev/null 2>&1
else
	echo "Unable to start Time Navigator daemon"
	echo "because the \"/cluster/tina/.tina.sh\" file does not exist"
	retval=3
fi
# UPDATED BY SETUP - END
# @(#) $Id: rc.tina.orig,v 1.1.6.10.4.4.2.4 2007/09/20 16:26:50 dle Exp $
#
# Time Navigator startup script
# (C) 1999-2005 - Atempo
# tina_daemon starting...
#

OS_TYPE=`uname -s`

if echo "\c" | grep "c">/dev/null ; then
	ECHOMODE=Bsd
else
	ECHOMODE=Sys5
fi

ECHONOCR() {
	if [ "$ECHOMODE" = Bsd ] ; then
		echo -n "$*"
	else
		echo "$*\c"
	fi
}

PING() {
    os_type=`uname -s`
    case $os_type in
        HP-UX) result=`ping $1 -n 2 2>/dev/null`; return $?;;
        *) result=`ping -c 2 $1 2>/dev/null`; return $?;;
    esac
}

ISREDHATLIKE=1
# Source function library.
if [ -f /etc/init.d/functions ] ; then
	. /etc/init.d/functions
elif [ -f /etc/rc.d/init.d/functions ] ; then
	. /etc/rc.d/init.d/functions
else
	ISREDHATLIKE=0
fi

ISSUSE=1
if [ -f /etc/rc.status ] ; then
	. /etc/rc.status
else
	ISSUSE=0
fi

RCStart()
{
	if [ -x ${TINA_HOME}/Bin/ndmpd ] ; then
		echo "Starting NDMP Data Server..."
		${TINA_HOME}/Bin/ndmpd
	elif [ -x ${TINA_HOME}/Bin/tina_nts ] ; then
		echo "Starting NDMP Tape Server..."
		${TINA_HOME}/Bin/tina_nts
	fi

	TINA_DAEMON=$TINA_HOME/Bin/tina_daemon
	if [ -x "$TINA_DAEMON" ]; then
		ECHONOCR "Starting Time Navigator ($TINA_SERVICE_NAME)..."
		if [ -d /var/lock/subsys ] ; then
			touch /var/lock/subsys/tina.$TINA_SERVICE_NAME
		fi
		i=1
		while [ $i -le 60 ] ; do
			if [ $OS_TYPE = "Darwin" ] ; then
				echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon" >> /var/log/system.log
			fi
			echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon $i" >> ${TINA_HOME}/Adm/auto_start.log
			hostname=`hostname 2>/dev/null`
			if [ ! -z "$hostname" ] ; then
				echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: hostname $hostname is defined" >> ${TINA_HOME}/Adm/auto_start.log
				PING $hostname
				status=$?
				if [ $status -eq 0 ] ; then
					echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: ping $hostname is ok" >> ${TINA_HOME}/Adm/auto_start.log
					$TINA_DAEMON
					sleep 2
					RCStatus no_mess
					if [ ! -z "$is_running" ] ; then
						if [ $OS_TYPE = "Darwin" ] ; then
							echo `date` "tina_daemon ($TINA_SERVICE_NAME) daemon is started" >> /var/log/system.log
						fi
						echo `date` "tina_daemon ($TINA_SERVICE_NAME) daemon is started" >> ${TINA_HOME}/Adm/auto_start.log
						break
					else
						echo `date` "tina_daemon ($TINA_SERVICE_NAME) daemon is not started" >> ${TINA_HOME}/Adm/auto_start.log
					fi
				else
					echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: ping $hostname is ko" >> ${TINA_HOME}/Adm/auto_start.log
				fi
			else
				echo `date` "Trying to start tina_daemon ($TINA_SERVICE_NAME) daemon: hostname is not defined" >> ${TINA_HOME}/Adm/auto_start.log
			fi
			sleep 5
			i=`expr $i + 1`
		done

		if [ $ISREDHATLIKE -eq 1 ]; then
			echo_success
			echo
		elif [ $ISSUSE -eq 1 ]; then
			rc_status -v
		else
			echo
		fi

		# Start ACSLS daemons (mini_el and ssi)
		if [ -d "$TINA_HOME/Vtl" ] ; then
			for VL_path in $TINA_HOME/Vtl/*
			do
				[ ! -d $VL_path ] && continue
				VL_name=`basename $VL_path`
				if [ $VL_name = "Install" -o $VL_name = "Bin" -o $VL_name = "Log" -o $VL_name = "Tmp" ] ; then
					continue
				fi

				# If there is no tina_stk.conf, give up
				[ ! -f "$VL_path/tina_stk.conf" ] && continue

				[ ! -x "$TINA_HOME/Vtl/Bin/ACSLS/start.sh" ] && continue

				ECHONOCR "Starting ACSLS client daemon for $VL_name virtual library ..."
				$TINA_HOME/Vtl/Bin/ACSLS/start.sh $VL_name
				echo
			done
		fi
	elif [ ! -f ${TINA_HOME}/.ndmp.sh ] ; then
		if [ $ISREDHATLIKE -eq 1 ]; then
			ECHONOCR "Starting Time Navigator (${TINA_SERVICE_NAME})..."
			echo_failure
			echo
		elif [ $ISSUSE -eq 1 ]; then
			rc_failed 1
		else
			echo
		fi
	fi
}

RCStop()
{
	#Stop ndmp daemon
	NDMPDAEMON=""
	if [ -x ${TINA_HOME}/Bin/ndmpd ] ; then
		NDMPDAEMON="ndmpd"
	elif [ -x ${TINA_HOME}/Bin/tina_nts ] ; then
		NDMPDAEMON="tina_nts"
	fi
	if [ ! -z "$NDMPDAEMON" ] ; then
		file="/var/tmp/$NDMPDAEMON.pid"
		if [ -f $file ] ; then
			if [ "$NDMPDAEMON" = ndmpd ] ; then
				echo "Shutting down NDMP Data Server..."
			elif [ "$NDMPDAEMON" = tina_nts ] ; then
				echo "Shutting down NDMP Tape Server..."
			fi
			kill `cat $file`
		fi
	fi

	#Stop Time Navigator daemon
	if [ -x ${TINA_HOME}/Bin/tina_stop ]; then
		if [ -d /var/lock/subsys ] ; then
			rm -f /var/lock/subsys/tina.$TINA_SERVICE_NAME
		fi
		ECHONOCR "Shutting down Time Navigator ($TINA_SERVICE_NAME)..."
		if [ $OS_TYPE = "Darwin" ] ; then
			echo `date` "Stopping tina_daemon ($TINA_SERVICE_NAME) daemon" >> /var/log/system.log
		fi
		echo `date` "Stopping tina_daemon ($TINA_SERVICE_NAME) daemon" >> ${TINA_HOME}/Adm/auto_start.log
		$TINA_HOME/Bin/tina_stop > /dev/null
		retval=0
		if [ $ISREDHATLIKE -eq 1 ]; then
			echo_success
			echo
		elif [ $ISSUSE -eq 1 ]; then
			rc_status -v
		else
			echo
		fi
	elif [ ! -f ${TINA_HOME}/.ndmp.sh ] ; then
		if [ $ISREDHATLIKE -eq 1 ]; then
			echo "Shutting down Time Navigator ($TINA_SERVICE_NAME)..."
			echo_failure
			echo
		elif [ $ISSUSE -eq 1 ]; then
			rc_failed 1
		else
			echo
		fi
	fi
}

RCStatus()
{
	## Check status with checkproc(8), if process is running
	## checkproc will return with exit status 0.

	# Status has a slightly different for the status command:
	# 0 - service running
	# 1 - service dead, but /var/run/ pid file exists
	# 2 - service dead, but /var/lock/ lock file exists
	# 3 - service not running

	if [ -f $TINA_HOME/Conf/hosts ] ; then
		host_to_ping=`cat $TINA_HOME/Conf/hosts | grep ^localhostname | awk '{print $2}' 2>/dev/null`
		if [ $? != 0 -o -z "$host_to_ping" ] ; then
			host_to_ping="127.0.0.1"
		fi
	else
		host_to_ping="127.0.0.1"
	fi

	is_running=`$TINA_HOME/Bin/tina_ping -host $host_to_ping -language English | grep "is running"`
	if [ $# -eq 0 ] ; then
		ECHONOCR "Checking for Time Navigator ($TINA_SERVICE_NAME): "
		if [ $OS_TYPE = "Darwin" ] ; then
			echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon" >> /var/log/system.log
		fi
		echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon" >> ${TINA_HOME}/Adm/auto_start.log
		if [ ! -z "$is_running" ] ; then
			echo "tina_daemon is running"
			echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon: tina_daemon is running" >> ${TINA_HOME}/Adm/auto_start.log
			retval=0
		else
			echo "tina_daemon is stopped"
			echo `date` "Checking tina_daemon ($TINA_SERVICE_NAME) daemon: tina_daemon is stopped" >> ${TINA_HOME}/Adm/auto_start.log
                        retval=3
		fi
	fi
}

test "$ISSUSE" -eq 1 && rc_reset

case "$1" in
start)
	RCStart
	retval=0
	;;

stop)
	RCStop
	retval=0
	;;

start_msg)
	echo "Starting Time Navigator ($TINA_SERVICE_NAME)" ;;

stop_msg)
	echo "Shutting down Time Navigator ($TINA_SERVICE_NAME)" ;;

restart)
	RCStop
	sleep 3
	RCStart ;;

status)
	RCStatus ;;

*)
	echo "usage: /etc/init.d/tina {start|stop|restart|status}" ;;
esac

exit $retval

One final Time Navigator configuration change must be made. The tina agent “hosts” file must be configured to set the “localhostname” of our agent to the FQDN of the floating or virtual IP address service so that the agent will only try to bind to that IP address instead of all IP addresses on the system.

$ cd /cluster/tina/Conf
$ cp hosts.sample hosts
$ nano hosts

Add a line to the file specifying the “localhostname” like so:

localhostname myserver.company.com

For this to work properly, you must also set any other tina agents running on the cluster nodes to also have a “localhostname” set in their respective “hosts” file to prevent other host-based agents from binding to all IP addresses on the host, including the virtual IP address.

That’s it! The tina service can be added to the HA cluster as an LSB resource agent, grouped with your storage resource agents so it will always be running on the same node as your storage.

Conclusion

Ok, so I rushed the end. Big deal. Sue me. I doubt anyone cares anyways!

Related posts:

  1. Atempo Time Navigator 4.2 Archive Media Selection Tunable
  2. Nanorcs: Ultrasimplistic Configuration File Revision Control
  3. Migration Weekend: Success
  4. Cfengine 3 Snippets Part 1: DenyHosts
  5. Migration Weekend


Leave a Reply