system administrator
noun Computing
"a person whom nurtures a computer system, also known as the computer mommy"
avatar

Browsing Automounted NFS with Nautilus

Posted: May 13th, 2011 | Author: | Filed under: Sysadmin, Tips & Tricks | Tags: , , , , , , , , , | No Comments »

Has browsing automounted NFS shares with nautilus got you pulling out hair in frustration?

Ever since we transitioned from the RHEL4 environment to Fedora 14, people have been reporting terrible slowness and delays in nautilus when browsing our NFS shares. Reports of waiting over a minute for an NFS automount root-level directory with < 100 sub directories to display the contents are not good.

This wasn’t a problem on our old RHEL4 terminal server and I couldn’t for the life of me understand how nautilus could have become so slow in the years since RHEL4 was released. It just didn’t make sense. I started to think something had to be wrong and that this wasn’t just the new normal expected behaviour but I had nothing to go on.

I tried the basic recommendations: Disable thumbnails, disable preview, disable directory item counts. That didn’t help the user experience in any dramatic way. At this point, I started recommended pcmanfm and thunar as a way to workaround nautilus’ terrible performance. I even wrote a fairly concise script for modifying the default file manager and desktop-drawing application so that using a different file manager wouldn’t be so foreign in GNOME.

Then one day I started looking at the verbose level output from automount while browsing the NFS mounts with nautilus and found a substantial amount of this in the logs:

Apr 28 11:19:10 hostname automount[18959]: attempting to mount entry /home/.svn
Apr 28 11:19:10 hostname automount[18959]: key ".svn" not found in map source(s).
Apr 28 11:19:10 hostname automount[18959]: failed to mount /home/.svn

Oh my! Why are there repeated access attempts for “.svn”? What is causing automount to perform map lookups for “.svn” in the automount-controlled directories? Could it be nautilus?

Why yes!

As it turns out the GNOME SVN integration package “gnubversion” includes a nautilus extension and this extension was causing Nautilus to look for “.svn” directories everywhere and it just so happens that looking for “.svn” in a root-level automount directory causes slow map lookup failures that (presumably) kill the perceptible performance of browsing automounted NFS shares.

I removed gnubversion (as no one was using it) and the user experience for nautilus has normalized. While nautilus still isn’t as speedy as pcmanfm or thunar, its no longer a cause of forceful hair removal incidents… and all is well in the world.


avatar

Plotting for Sysadmins by Example: Part 1

Posted: April 15th, 2011 | Author: | Filed under: Sysadmin, Tips & Tricks | Tags: , , , , , , , , , , | No Comments »

As my entire career as a sysadmin (~7 years) has been within academia, you’d think that by now I’d be a master of collecting, plotting and analyzing data. However, I wasn’t bred in academia and the fact that I work where I do is more of a circumstance than anything else. I was never properly taught very much about data collection, plotting and analysis beyond high school and anything I can practically use today is because I was required to learn it to get the job done or to try and prove a point. I’ve always been able to find a way to whip out xmgrace or generate simple plots with gnuplot but it’s never been something that I’m super confident with, especially being surrounded by people who live and breath this stuff day in day out.

So why bother with knowing anything about this whole plotting thing? It’s clear how it can be useful in monitoring-style applications where data points are collected over time and then visualized via a plot or graph. Such plotting exposes trends in our environments and that’s usually a helpful tool to have around. Of course, there are other more specific problems and/or questions where collecting, plotting and analyzing data is very helpful as well. I will do my best to describe one such example.

Over the last few days I’ve been trying to find an answer to the question:

“Does the VPN add latency to our remote NX connections and if so, is it significant?”

This is a question where I believe plotting data will prove useful. There are some other sub-questions I’d like answered as well but that is the overarching issue at hand. I realized that this would be a great opportunity to re-learn some of the basics and maybe try out a few new tools at my disposal so I decided to document my journey through this foreign land for all to criticize and enjoy.

! Scientific Method

Of course, I’m not following a strict scientific method with this endeavor. The question simply doesn’t warrant an entire drawn out, highly statistically relevant result despite my best intentions in delivering exactly that. What I’m trying to do is get an accurate sense more than an exact measurement, as flawed as that might be. It’s all I can justify in terms of time and effort for this project. From that strictly academic point of view, I’m sure to fail. My hope is that the results will be pseudo-science’d enough to provide confidence in my answer and that I’ll improve my skills throughout the exercise.

What Tests?

In order to determine if the VPN is affecting our latency I need at least two tests:

  1. NX connection without VPN
  2. NX connection with VPN

But while I’m at it, I figured I would gather additional data in order to attempt an answer at other RTT related questions. Adding additional tests based on client system “location” (local LAN, local wireless, various locations on campus wireless, home internet connection, etc. and NX compression settings (MODEM, ISDN, ADSL, WAN and LAN) greatly increases the amount of testing required but will provide for richer data to visualize.

On top of that , each one of these additional variables I am testing is to also be tested with and without VPN. To add even more tests, each one of these combination of tests needs to be performed multiple times in order to normalize the data and to increase the statistical relevance. More samples = better data = more accurate results (at least this is the hope).

Data Collection

In order to start analyzing data, I need data. And that data needs to of be quality. And to have quality of data, I need multiple samples. And to make useful comparisons I need multiple variable data sets and at least one control data set. For all that to work, I needed a reproducible set of actions to generate traffic, collect data and extract the relevant parts.

My basic method is as follows:

  1. Configure wireshark or tcpdump on the remote host to capture packets related to the NX/SSH connection that we are testing. Capture filters are used to prevent capture of any other packets.
  2. Initiate NX connection to remote host (login)
  3. Perform predefined action X on remote host via NX
  4. Logout of NX connection from remote host
  5. Stop and save packet capture
  6. Export RTT statistics from capture file with tcptrace
  7. Extract only the RTT data from `tcptrace` output (discard the TCP sequence # column because the absolute value doesn’t matter, we’ll use the index for the x-axis)
  8. Label and save extracted RTT data as txt format for input to plotting function

Plot Types

There are two primary plot types that are going to help me answer the question at hand: scatter plots and histograms.

Scatter plots are basically used to visualize at least one data set with two display values. In this case, plotting the round trip time (RTT) in milliseconds by the corresponding TCP sequence number for various data sets. What’s more interesting though is juxtaposing combinations of data sets against each other in order to quickly visualize and observe qualitative differences.

Histograms are a way of visualizing the distribution of data set. In this case, a histogram will plot the number of TCP sequences at each millisecond increment in the data set. Visualizing the distribution of our data set will help to clarify what the least to most frequent round trip times are, something which cannot be quickly visualized in a dense scatter plot.

Looking Forward to Part 2

Now that you’ve made it through the snooze-fest that was part 1, I hope you’re eager for part 2! Oh boy! More blabbering, right? Hopefully not. Part 2 is where I’ll share some scripts, tips, techniques and finally, some finished plots for all to behold. You know, the technical stuff that we all love.

It shall be grand, now I just need to write it…

Comments are highly welcome.


avatar

Fresh Win2k Install and Windows Update Error

Posted: January 7th, 2011 | Author: | Filed under: Sysadmin, Tips & Tricks | Tags: , , , , , , , | No Comments »

I needed to re-install a Windows 2000 Pro system today because the HDD was failing and we wanted to convert from ATA to SATA at some point anyways. We have nice gzipped dd images of the system, but that’s with the ATA drive and a different SATA controller. The install is also old and crufty. We need a system in a better known state and so fresh re-install it is.

As to why I’m installing Windows 2000 in 2011? This application requires Windows 2000 Pro as it is an instrument controller and thus the proprietary control software is finicky and we only receive support with the manufacturer-mandated OS and software stack version(s). There are a few other reasons why we also need to keep Windows 2000 Pro at this point but they aren’t relevant or interesting.

Now on to the problem.

Windows Update no longer works from a fresh install of Win2k Pro! The issue is Internet Explorer 5, the version of IE bundled with Windows 2000 Pro. Windows Update now requires at least IE6 in order to function properly. I don’t know when that changed but presumably some time ago as I haven’t run Windows Update on a fresh 2000 install in years. Luckily the solution is fairly simple, just download IE6 from microsoft.com and get rockin’.

It struck me as strange at first but quite understandable after a few moments of reflecting on it, especially considering Windows 2000 reached end-of-life in July 2010.

Yeah, that post pretty much sucked. Sorry, folks.


avatar

LTSP 5 and AIGLX

Posted: November 23rd, 2010 | Author: | Filed under: Sysadmin, Tips & Tricks | Tags: , , , , , , , | No Comments »

Woot! LTSP 5 + LDM over SSH (LDM_DIRECTX=False in lts.conf) + Open source radeon driver with AIGLX is working!

Nothing like running compiz smoothly on a dual monitor thin client :D

The problem I was having was that despite the X server on the thin client being fully configured and tested to use hardware acceleration locally, when connected to the terminal server over the secure LDM tunnel I was getting direct rendering with the software renderer which results in a big fail for compiz.

The key to avoiding the software renderer from being used for DRI was setting LIBGL_ALWAYS_INDIRECT=1 as an environment variable. I don’t know why with everything configured correctly that the system defaults to using the software renderer instead of indirect rendering + hardware renderer but at least forcing this environment variable in a global profile script allows for sexy hardware accelerated compiz goodness from securely connected thin clients.

Without the environment variable to force indirect rendering, glxinfo output with the LIBGL_DEBUG=verbose env variable set was complaining that the “drm device” didn’t exist. I suspect this is because glxinfo was expecting to somehow find the /dev/dri/card0 device on the terminal server itself instead of on the thin client and of course it doesn’t exist on the server… the OpenGL card is installed on the thin client!

There must be a way to get this working without the LIBGL_ALWAYS_INDIRECT environment variable but I couldn’t figure it out… this really smells of a hack but since it’s very easy to apply globally and it works just how I expect things to work, I’ll have to leave it in place until the time I can figure out another non-hacky way of getting the results I want with this configuration.


avatar

When using Syncrepl…

Posted: September 1st, 2010 | Author: | Filed under: Sysadmin, Tips & Tricks | Tags: , , , , | No Comments »

Quick OpenLDAP tip boys & girls…

When using syncrepl to replicate from a master LDAP server to a slave LDAP server, always remember to configure the ACLs on the master LDAP server to allow the “sync dn” to read everything.

I know it sounds entirely obvious but today I realized that the order in which I had defined the ACLs on the master LDAP server was preventing the sync dn from reading the “userPassword” attribute and thus also preventing it from syncing it to the slave. The consequence of which was that users would not be able to authenticate against the slave server! Shit!

Of course, since everything else was syncing properly, all the NSS (lookup) stuff worked fine but anything authentication related like PAM wouldn’t work because the user bind would fail with “Invalid credentials” in /var/log/secure. It had a been some time since I tested authentication so I must never have actually tested authentication against the slave (whoops!) and thus didn’t notice until now. I know I tested lookups but testing authentication must have slipped by somehow. Grrr, testing.

Good thing I caught the problem early and it never escalated into a problem, that really could have sucked down the line.

Don’t make the same mistake I did.