Migration Weekend: Success

It was a long weekend of watching tape restores and restarting them as necessary but it’s finally over and everything appears to be mostly hunky dory!

I did discovery yet more small misconfigurations and strange behaviour along the way:

  1. OpenLDAP’s syncrepl using “refereshAndPersist” wasn’t working how I expected it to, no new changes were replicating to the slave LDAP server! I changed the directive to “refreshOnly” and set a 10 minute interval. I made several changes and monitored the slave LDAP server. Changes propagated in about 10 minutes, every time.
  2. Despite iSCSI’s maturity and the maturity of QLogic’s HBAs I still noticed strange, unexplained target drop outs. Two HBAs per server, two controllers in the IBM DS3300 and just one target out of four was dropping. At first, I couldn’t figure out how to properly reconnect the target on a live system so I rebooted. Later, I discovered you can “disable” and then “enable” the specific target in SANsurfer or iscli, which worked to bring back the dropped target on a live system. Multipath picked up the “new” path right away, as expected.
  3. Always remember to leave free physical extents in any LVM Volume Group in which you are taking snapshots of the Logical Volumes. It’s freakin’ obvious but I forgot and when I went to do snapshot backups, the snapshots were failing. Now I’m growing some LUNs on the DS3300 so that my VGs have room for snapshots.

All in all, a good weekend that was mostly filled with success.

IT Watchdogs SuperGoose (WxGoos-2) Review

Some time ago it became apparent that we would require environmental monitoring in our server room. The primary reason being that our server room was never initially intended to be a server room and the after-the-fact A/C unit installation (size, vent placement, etc.) is definitely less than optimal. Not to mention the A/C unit is likely overloaded as well, judging by some of the data we gathered after installing the environmental monitoring equipment and software. Basically, I needed to be made aware of any potential problems with the environment in that room so that should anything go wrong, I can act quickly. A secondary use of the data is to trend the environment changes in order to reveal specific patterns that may help with long-term planning.

Time Navigator HA Cluster Agent Configuration

I’ve been wanting to post about a configuration that allows for seamless file-level backup of storage attached to an active/passive high availability cluster in an uninterrupted fashion using Atempo’s Time Navigator and I’m finally going to do it.

The Problem

The initial difficulty lies in the requirement that the data must be consistently backed up at every interval, no matter which cluster node is currently the active node with the backend storage mounted. To do this, an agent is required to be configured as a cluster resource in order to “follow” the mounting/exporting of the storage to any cluster node. So in order to accomplish this,  N + 1 tina agents are required. That is, if you have two cluster nodes, you need three agents to successfully backup each node with the local agent and the storage, as it floats about the cluster nodes depending on failure or migration events.

Luckily for me, the good people at Atempo have engineered the agent in such a way that multiple agents can be ran on a single node, each binding to it’s own IP address and each individually controlled via it’s own init script. Of course, we need to make some file edits to make all this happen and that’s what I’m going share!

Read More