system administrator
noun Computing
"a person who manages the operation of a computer system, such as an electronic bulletin board."

Migration Weekend

Posted: September 4th, 2010 | Author: cense | Filed under: Sysadmin | Tags: , , , , , , , , | No Comments »

The big data migration is upon me!

This weekend we will be migrating about 3TB of data from aging 5 year old servers with internal DAS RAID over to the new infrastructure I’ve been building over the last two months. Part of my genius plan is to migrate the data using our backup & restore software. The genius I believe comes from the fact that doing the migration via tape restore will provide a long overdue full test of our capability to restore in the event of catastrophic storage failure. Data migration and restore testing, two birds with one stone.

Wish me luck!


LVM filters and initrd

Posted: September 1st, 2010 | Author: cense | Filed under: Sysadmin | Tags: , , , , , | No Comments »

Another “don’t make the same mistake I did” post, you say? Yippee! I seem to be running into quite a few of these (semi) complex gotchas lately but I suppose they at least fuel techslaves with a bit of content which I can’t be too angry about these days.

Today’s gotcha is all about LVM filters and initrd and really this wouldn’t be a big problem but because I’ve never sat down to appreciate the initrd process in any great depth, it took me two days to figure out just exactly what went wrong (recovery however, was much faster). Read the rest of this entry »


When using Syncrepl…

Posted: September 1st, 2010 | Author: cense | Filed under: Sysadmin, Tips & Tricks | Tags: , , , , | No Comments »

Quick OpenLDAP tip boys & girls…

When using syncrepl to replicate from a master LDAP server to a slave LDAP server, always remember to configure the ACLs on the master LDAP server to allow the “sync dn” to read everything.

I know it sounds entirely obvious but today I realized that the order in which I had defined the ACLs on the master LDAP server was preventing the sync dn from reading the “userPassword” attribute and thus also preventing it from syncing it to the slave. The consequence of which was that users would not be able to authenticate against the slave server! Shit!

Of course, since everything else was syncing properly, all the NSS (lookup) stuff worked fine but anything authentication related like PAM wouldn’t work because the user bind would fail with “Invalid credentials” in /var/log/secure. It had a been some time since I tested authentication so I must never have actually tested authentication against the slave (whoops!) and thus didn’t notice until now. I know I tested lookups but testing authentication must have slipped by somehow. Grrr, testing.

Good thing I caught the problem early and it never escalated into a problem, that really could have sucked down the line.

Don’t make the same mistake I did.


Time Navigator HA Cluster Agent Configuration

Posted: August 5th, 2010 | Author: cense | Filed under: Sysadmin | Tags: , , , , , , | No Comments »

I’ve been wanting to post about a configuration that allows for seamless file-level backup of storage attached to an active/passive high availability cluster in an uninterrupted fashion using Atempo’s Time Navigator and I’m finally going to do it.

The Problem

The initial difficulty lies in the requirement that the data must be consistently backed up at every interval, no matter which cluster node is currently the active node with the backend storage mounted. To do this, an agent is required to be configured as a cluster resource in order to “follow” the mounting/exporting of the storage to any cluster node. So in order to accomplish this,  N + 1 tina agents are required. That is, if you have two cluster nodes, you need three agents to successfully backup each node with the local agent and the storage, as it floats about the cluster nodes depending on failure or migration events.

Luckily for me, the good people at Atempo have engineered the agent in such a way that multiple agents can be ran on a single node, each binding to it’s own IP address and each individually controlled via it’s own init script. Of course, we need to make some file edits to make all this happen and that’s what I’m going share!

Read the rest of this entry »


Cfengine 3 Snippets Part 1: DenyHosts

Posted: May 18th, 2010 | Author: cense | Filed under: Sysadmin | Tags: , , , , , , , | No Comments »

I’ve recently begun looking into configuration management with cfengine 3. I’ve ignored this growing sub-field of system administration for too long and I just can’t ignore it anymore. After spending quite some time researching the philosophies, methods and different tools out there, I settled on starting out with cfengine 3. There’s no special reason that I chose cfengine instead of puppet, bcfg2, chef or AutomateIT. I haven’t used any of these tools and thus I cannot pass judgement on them or their methods. All these projects seem to have intelligent and highly motivated people behind them. I simply gravitated towards cfengine because of its strong academic background and the fact that version 3 now represents the most recent and modern research in the field by Mark Burgess et. al.

As part of my learning experience with cfengine, I’ve decided to start posting some of the code that I’ve begun developing in the hopes that by writing about it, I can learn better, faster and maybe even receive some helpful comments from readers along the way. Beware, I’m a cfengine newbie and so what I post here should NOT be copy and pasted into your environment unless you’re ok with the potential of wildly breaking things!

The first snippet of code I want to discuss is related to managing our DenyHosts configuration. As part of our “security policy”, I would like to ensure that every RedHat/CentOS system is running a properly configured DenyHosts instance. Here is what I’ve come up with so far.

Read the rest of this entry »