"Robots march to their own beat"
avatar

Migration Weekend: Success

Posted: September 8th, 2010 | Author: | Filed under: Sysadmin | Tags: , , , , , , , , , , | No Comments »

It was a long weekend of watching tape restores and restarting them as necessary but it’s finally over and everything appears to be mostly hunky dory!

I did discovery yet more small misconfigurations and strange behaviour along the way:

  1. OpenLDAP’s syncrepl using “refereshAndPersist” wasn’t working how I expected it to, no new changes were replicating to the slave LDAP server! I changed the directive to “refreshOnly” and set a 10 minute interval. I made several changes and monitored the slave LDAP server. Changes propagated in about 10 minutes, every time.
  2. Despite iSCSI’s maturity and the maturity of QLogic’s HBAs I still noticed strange, unexplained target drop outs. Two HBAs per server, two controllers in the IBM DS3300 and just one target out of four was dropping. At first, I couldn’t figure out how to properly reconnect the target on a live system so I rebooted. Later, I discovered you can “disable” and then “enable” the specific target in SANsurfer or iscli, which worked to bring back the dropped target on a live system. Multipath picked up the “new” path right away, as expected.
  3. Always remember to leave free physical extents in any LVM Volume Group in which you are taking snapshots of the Logical Volumes. It’s freakin’ obvious but I forgot and when I went to do snapshot backups, the snapshots were failing. Now I’m growing some LUNs on the DS3300 so that my VGs have room for snapshots.

All in all, a good weekend that was mostly filled with success.


avatar

Migration Weekend

Posted: September 4th, 2010 | Author: | Filed under: Sysadmin | Tags: , , , , , , , , | No Comments »

The big data migration is upon me!

This weekend we will be migrating about 3TB of data from aging 5 year old servers with internal DAS RAID over to the new infrastructure I’ve been building over the last two months. Part of my genius plan is to migrate the data using our backup & restore software. The genius I believe comes from the fact that doing the migration via tape restore will provide a long overdue full test of our capability to restore in the event of catastrophic storage failure. Data migration and restore testing, two birds with one stone.

Wish me luck!


avatar

Time Navigator HA Cluster Agent Configuration

Posted: August 5th, 2010 | Author: | Filed under: Sysadmin | Tags: , , , , , , | No Comments »

I’ve been wanting to post about a configuration that allows for seamless file-level backup of storage attached to an active/passive high availability cluster in an uninterrupted fashion using Atempo’s Time Navigator and I’m finally going to do it.

The Problem

The initial difficulty lies in the requirement that the data must be consistently backed up at every interval, no matter which cluster node is currently the active node with the backend storage mounted. To do this, an agent is required to be configured as a cluster resource in order to “follow” the mounting/exporting of the storage to any cluster node. So in order to accomplish this,  N + 1 tina agents are required. That is, if you have two cluster nodes, you need three agents to successfully backup each node with the local agent and the storage, as it floats about the cluster nodes depending on failure or migration events.

Luckily for me, the good people at Atempo have engineered the agent in such a way that multiple agents can be ran on a single node, each binding to it’s own IP address and each individually controlled via it’s own init script. Of course, we need to make some file edits to make all this happen and that’s what I’m going share!

[ Read More ]»


avatar

Atempo Time Navigator 4.2 Archive Media Selection Tunable

Posted: May 5th, 2010 | Author: | Filed under: Sysadmin, Tips & Tricks | Tags: , , , , , , , | No Comments »

Just a quick post here to share a non-obvious tunable for Atempo’s Time Navigator 4.2 regarding archiving and media selection.

Before upgrading from 4.1 to 4.2 Time Navigator’s media selection for archive jobs with standalone drives behaved as expected: If existing partly filled and open cartridges in the associated media pool existed, Time Navigator would request those media be placed in the drives upon the start a new archive operation, effectively only asking for new, unlabeled media to be inserted once the existing media was full.

However, with the upgrade to 4.2 we found that Time Navigator was no longer requesting the existing, partly filled, open cartridges and was instead requesting new, unlabeled media to be inserted into the drives instead! The result of this new behavior was that Time Navigator would use new tapes for every new archive operation, no matter if existing, partly filled and open media was available in the media pool. Basically 4.2′s default behavior was preventing us from filling any archive media unless the particular archive job would happen to be larger than a single tape.

While I don’t know why the functionality changed, I do know what tunable to modify in order to make 4.2 behave like 4.1. The tunable is “check_external_cart_when_recycling“. Setting this tunable to “Yes” has restored the 4.1 behavior, allowing us to make full use of all archive media capacity by only requesting new media when all the existing media in the media pool has been filled.

I believe we only faced this problem because we use standalone archive tape drives that do not have an autoloader or robot nor an “inventory” of online tape. Each tape must be manually loaded. I suspect that if we had an autoloader for our tape drives, that 4.2 would have made the correct/expected selection of media.

I doubt that anyone else is going to face this problem but it took about 3 weeks with Atempo’s R&D department to figure out the problem so I figure if posting here can save anyone that amount of time, then I’ll have done my part!