Posted: May 13th, 2011 | Author: rthomson | Filed under: Sysadmin, Tips & Tricks | Tags: autofs, automount, export, gnome, linux, nautilus, nfs, share, slow, software | No Comments »
Has browsing automounted NFS shares with nautilus got you pulling out hair in frustration?
Ever since we transitioned from the RHEL4 environment to Fedora 14, people have been reporting terrible slowness and delays in nautilus when browsing our NFS shares. Reports of waiting over a minute for an NFS automount root-level directory with < 100 sub directories to display the contents are not good.
This wasn’t a problem on our old RHEL4 terminal server and I couldn’t for the life of me understand how nautilus could have become so slow in the years since RHEL4 was released. It just didn’t make sense. I started to think something had to be wrong and that this wasn’t just the new normal expected behaviour but I had nothing to go on.
I tried the basic recommendations: Disable thumbnails, disable preview, disable directory item counts. That didn’t help the user experience in any dramatic way. At this point, I started recommended pcmanfm and thunar as a way to workaround nautilus’ terrible performance. I even wrote a fairly concise script for modifying the default file manager and desktop-drawing application so that using a different file manager wouldn’t be so foreign in GNOME.
Then one day I started looking at the verbose level output from automount while browsing the NFS mounts with nautilus and found a substantial amount of this in the logs:
Apr 28 11:19:10 hostname automount[18959]: attempting to mount entry /home/.svn
Apr 28 11:19:10 hostname automount[18959]: key ".svn" not found in map source(s).
Apr 28 11:19:10 hostname automount[18959]: failed to mount /home/.svn
Oh my! Why are there repeated access attempts for “.svn”? What is causing automount to perform map lookups for “.svn” in the automount-controlled directories? Could it be nautilus?
Why yes!
As it turns out the GNOME SVN integration package “gnubversion” includes a nautilus extension and this extension was causing Nautilus to look for “.svn” directories everywhere and it just so happens that looking for “.svn” in a root-level automount directory causes slow map lookup failures that (presumably) kill the perceptible performance of browsing automounted NFS shares.
I removed gnubversion (as no one was using it) and the user experience for nautilus has normalized. While nautilus still isn’t as speedy as pcmanfm or thunar, its no longer a cause of forceful hair removal incidents… and all is well in the world.
Posted: April 15th, 2011 | Author: rthomson | Filed under: Sysadmin, Tips & Tricks | Tags: academia, collection, data, graph, graphing, network, nx, plot, plotting, scientific, vpn | No Comments »
As my entire career as a sysadmin (~7 years) has been within academia, you’d think that by now I’d be a master of collecting, plotting and analyzing data. However, I wasn’t bred in academia and the fact that I work where I do is more of a circumstance than anything else. I was never properly taught very much about data collection, plotting and analysis beyond high school and anything I can practically use today is because I was required to learn it to get the job done or to try and prove a point. I’ve always been able to find a way to whip out xmgrace or generate simple plots with gnuplot but it’s never been something that I’m super confident with, especially being surrounded by people who live and breath this stuff day in day out.
So why bother with knowing anything about this whole plotting thing? It’s clear how it can be useful in monitoring-style applications where data points are collected over time and then visualized via a plot or graph. Such plotting exposes trends in our environments and that’s usually a helpful tool to have around. Of course, there are other more specific problems and/or questions where collecting, plotting and analyzing data is very helpful as well. I will do my best to describe one such example.
Over the last few days I’ve been trying to find an answer to the question:
“Does the VPN add latency to our remote NX connections and if so, is it significant?”
This is a question where I believe plotting data will prove useful. There are some other sub-questions I’d like answered as well but that is the overarching issue at hand. I realized that this would be a great opportunity to re-learn some of the basics and maybe try out a few new tools at my disposal so I decided to document my journey through this foreign land for all to criticize and enjoy.
! Scientific Method
Of course, I’m not following a strict scientific method with this endeavor. The question simply doesn’t warrant an entire drawn out, highly statistically relevant result despite my best intentions in delivering exactly that. What I’m trying to do is get an accurate sense more than an exact measurement, as flawed as that might be. It’s all I can justify in terms of time and effort for this project. From that strictly academic point of view, I’m sure to fail. My hope is that the results will be pseudo-science’d enough to provide confidence in my answer and that I’ll improve my skills throughout the exercise.
What Tests?
In order to determine if the VPN is affecting our latency I need at least two tests:
- NX connection without VPN
- NX connection with VPN
But while I’m at it, I figured I would gather additional data in order to attempt an answer at other RTT related questions. Adding additional tests based on client system “location” (local LAN, local wireless, various locations on campus wireless, home internet connection, etc. and NX compression settings (MODEM, ISDN, ADSL, WAN and LAN) greatly increases the amount of testing required but will provide for richer data to visualize.
On top of that , each one of these additional variables I am testing is to also be tested with and without VPN. To add even more tests, each one of these combination of tests needs to be performed multiple times in order to normalize the data and to increase the statistical relevance. More samples = better data = more accurate results (at least this is the hope).
Data Collection
In order to start analyzing data, I need data. And that data needs to of be quality. And to have quality of data, I need multiple samples. And to make useful comparisons I need multiple variable data sets and at least one control data set. For all that to work, I needed a reproducible set of actions to generate traffic, collect data and extract the relevant parts.
My basic method is as follows:
- Configure wireshark or tcpdump on the remote host to capture packets related to the NX/SSH connection that we are testing. Capture filters are used to prevent capture of any other packets.
- Initiate NX connection to remote host (login)
- Perform predefined action X on remote host via NX
- Logout of NX connection from remote host
- Stop and save packet capture
- Export RTT statistics from capture file with tcptrace
- Extract only the RTT data from `tcptrace` output (discard the TCP sequence # column because the absolute value doesn’t matter, we’ll use the index for the x-axis)
- Label and save extracted RTT data as txt format for input to plotting function
Plot Types
There are two primary plot types that are going to help me answer the question at hand: scatter plots and histograms.
Scatter plots are basically used to visualize at least one data set with two display values. In this case, plotting the round trip time (RTT) in milliseconds by the corresponding TCP sequence number for various data sets. What’s more interesting though is juxtaposing combinations of data sets against each other in order to quickly visualize and observe qualitative differences.
Histograms are a way of visualizing the distribution of data set. In this case, a histogram will plot the number of TCP sequences at each millisecond increment in the data set. Visualizing the distribution of our data set will help to clarify what the least to most frequent round trip times are, something which cannot be quickly visualized in a dense scatter plot.
Looking Forward to Part 2
Now that you’ve made it through the snooze-fest that was part 1, I hope you’re eager for part 2! Oh boy! More blabbering, right? Hopefully not. Part 2 is where I’ll share some scripts, tips, techniques and finally, some finished plots for all to behold. You know, the technical stuff that we all love.
It shall be grand, now I just need to write it…
Comments are highly welcome.
Posted: March 8th, 2011 | Author: rthomson | Filed under: Sysadmin | Tags: crux ppc, cruxppc, debian, distribution, distro, gentoo, ibm, linux, p505, p505 express, ppc, ppc64, pseries, server | 2 Comments »
We (work) have two IBM p505 Express Servers.
Right now one machine is running an old way out of support RHEL4 installation and the other is on Fedora 12, which is no longer supported by the Fedora Project. Paid support/subscription is not a consideration yet for this project, but I do want to run a modern Linux distribution for the associated modern application software and maintenance.
I basically need to move these servers to something free and supportable. I’m finding out that there aren’t a lot of options in PPC Linux as when I was last interested in this architecture. It’s pretty much just:
I realize there is RHEL and SuSE Enterprise for PPC64 but those are subscription products without free binaries available. I’m not prepared to build an RPM-based distro from source at this point so I need something with binaries or something where building from source is highly automated and integrated, such as Gentoo. Digression…
The question is which of these distros do I go with? To answer the question I suppose I need to define the roles.
These two pSeries servers a redundant pair running LDAP/Auth Service, NTP, DNS and DHCP. The load is low but I want a solid modern software platform on both these servers from now until they are replaced with in the future (which is likely to be integration into a centralized architecture).
With that said, and with my familiarity level of these distros, I would first lean towards Debian and then to Gentoo and finally to CRUX PPC.
Debian is a binary distribution, which is nice for maintaining a server. Debian is more familiar to me. What are the arguments for Gentoo or CRUX PPC?
Agree or Disagree?
Posted: February 21st, 2011 | Author: rthomson | Filed under: Sysadmin | Tags: campus, remote access, security, users, vpn | No Comments »
Yes! We’ve implemented what I think is the best compromise for our remote access problem that I outlined earlier.
Three major things happened that made it possible to compromise:
1. I received an OK from the higher-ups that indeed it would be ok to mandate that users who want remote access to our system have all their Internet traffic routed via the VPN.
2. The VPN configuration that was proposed is not that of a private group but the generic campus-wide VPN solution with ACLs to allow VPN clients to access our resources.
3. The generic campus-wide VPN service does *not* implement any outgoing restrictions, unlike the private VPN groups do. This makes it possible for VPN users to do whatever they want when connected, with the stipulation that it’s akin to bringing your personal computer to campus in terms of privacy.
This really is the best of both worlds (security, ease of use). The campus VPN service is super-simple to use and we restrict access to our services to VPN users instead of the whole Internet. While we don’t control who gets access to the VPN, the pool of users is MUCH smaller and at least semi-trusted by the organization. Thus the risk is greatly reduced.
I’m happy.
Posted: February 10th, 2011 | Author: rthomson | Filed under: Sysadmin | Tags: linux, nx, remote access, ssh, vpn | No Comments »
NEW: See the Follow Up.
I’m in a bit of a pickle.
Traditionally, we’ve always allowed wide-open SSH access from anywhere to our main terminal server for remote access. Since we use NX (neatx, FreeNX, NXclient, etc.), all we ever needed open was SSH to make it all work nicely. Sure, SSH is a big bruteforce target but with DenyHosts and low thresholds things are pretty well under control. I realize huge distributed bruteforce attacks are still possible against a DenyHosts protected SSH daemon but we have to factor in ease of use when thinking about security and the low risk of massively distributed bruteforce attacks.
[ Read More ]»