system administrator
noun Computing
"a person who manages the operation of a computer system, such as an electronic bulletin board."

Migration Weekend: Success

Posted: September 8th, 2010 | Author: cense | Filed under: Sysadmin | Tags: , , , , , , , , , , | No Comments »

It was a long weekend of watching tape restores and restarting them as necessary but it’s finally over and everything appears to be mostly hunky dory!

I did discovery yet more small misconfigurations and strange behaviour along the way:

  1. OpenLDAP’s syncrepl using “refereshAndPersist” wasn’t working how I expected it to, no new changes were replicating to the slave LDAP server! I changed the directive to “refreshOnly” and set a 10 minute interval. I made several changes and monitored the slave LDAP server. Changes propagated in about 10 minutes, every time.
  2. Despite iSCSI’s maturity and the maturity of QLogic’s HBAs I still noticed strange, unexplained target drop outs. Two HBAs per server, two controllers in the IBM DS3300 and just one target out of four was dropping. At first, I couldn’t figure out how to properly reconnect the target on a live system so I rebooted. Later, I discovered you can “disable” and then “enable” the specific target in SANsurfer or iscli, which worked to bring back the dropped target on a live system. Multipath picked up the “new” path right away, as expected.
  3. Always remember to leave free physical extents in any LVM Volume Group in which you are taking snapshots of the Logical Volumes. It’s freakin’ obvious but I forgot and when I went to do snapshot backups, the snapshots were failing. Now I’m growing some LUNs on the DS3300 so that my VGs have room for snapshots.

All in all, a good weekend that was mostly filled with success.


Migration Weekend

Posted: September 4th, 2010 | Author: cense | Filed under: Sysadmin | Tags: , , , , , , , , | No Comments »

The big data migration is upon me!

This weekend we will be migrating about 3TB of data from aging 5 year old servers with internal DAS RAID over to the new infrastructure I’ve been building over the last two months. Part of my genius plan is to migrate the data using our backup & restore software. The genius I believe comes from the fact that doing the migration via tape restore will provide a long overdue full test of our capability to restore in the event of catastrophic storage failure. Data migration and restore testing, two birds with one stone.

Wish me luck!


LVM filters and initrd

Posted: September 1st, 2010 | Author: cense | Filed under: Sysadmin | Tags: , , , , , | No Comments »

Another “don’t make the same mistake I did” post, you say? Yippee! I seem to be running into quite a few of these (semi) complex gotchas lately but I suppose they at least fuel techslaves with a bit of content which I can’t be too angry about these days.

Today’s gotcha is all about LVM filters and initrd and really this wouldn’t be a big problem but because I’ve never sat down to appreciate the initrd process in any great depth, it took me two days to figure out just exactly what went wrong (recovery however, was much faster). Read the rest of this entry »


When using Syncrepl…

Posted: September 1st, 2010 | Author: cense | Filed under: Sysadmin, Tips & Tricks | Tags: , , , , | No Comments »

Quick OpenLDAP tip boys & girls…

When using syncrepl to replicate from a master LDAP server to a slave LDAP server, always remember to configure the ACLs on the master LDAP server to allow the “sync dn” to read everything.

I know it sounds entirely obvious but today I realized that the order in which I had defined the ACLs on the master LDAP server was preventing the sync dn from reading the “userPassword” attribute and thus also preventing it from syncing it to the slave. The consequence of which was that users would not be able to authenticate against the slave server! Shit!

Of course, since everything else was syncing properly, all the NSS (lookup) stuff worked fine but anything authentication related like PAM wouldn’t work because the user bind would fail with “Invalid credentials” in /var/log/secure. It had a been some time since I tested authentication so I must never have actually tested authentication against the slave (whoops!) and thus didn’t notice until now. I know I tested lookups but testing authentication must have slipped by somehow. Grrr, testing.

Good thing I caught the problem early and it never escalated into a problem, that really could have sucked down the line.

Don’t make the same mistake I did.


IT Watchdogs SuperGoose (WxGoos-2) Review

Posted: August 25th, 2010 | Author: cense | Filed under: Reviews | Tags: , , , | 2 Comments »

Some time ago it became apparent that we would require environmental monitoring in our server room. The primary reason being that our server room was never initially intended to be a server room and the after-the-fact A/C unit installation (size, vent placement, etc.) is definitely less than optimal. Not to mention the A/C unit is likely overloaded as well, judging by some of the data we gathered after installing the environmental monitoring equipment and software. Basically, I needed to be made aware of any potential problems with the environment in that room so that should anything go wrong, I can act quickly. A secondary use of the data is to trend the environment changes in order to reveal specific patterns that may help with long-term planning.