NFSv4: Interop, ACLs & Automount

NFSv4 has been around for a long time but it still seems a bit foreign to me. The following is a quick rundown of things I recent learned related to NFSv4 from limited experience in implementing it.

Interoperability

Is it possible to setup NFSv4 along side NFSv3 on the same server, serving the same volumes? Of course. However, it might not always work exactly as expected with legacy clients.

A normal /etc/exports for NFSv3/v4 interoperability might look like so:

/export                   10.0.0.0/8(rw,no_subtree_check,fsid=0)
/export/namespace         10.0.0.0/8(rw,no_subtree_check)
/export/namespace/share1  10.0.0.0/8(rw,no_subtree_check)
/export/namespace/share2  10.0.0.0/8(rw,no_subtree_check)

With this configuration, we have the “virtual root” export (fsid=0), the namespace export (for mounting the whole namespace with one mount) and the individual “share” exports (for mounting individual shares, most likely with automount). The NFSv4 clients can perform mounts using the servername:/namepace syntax and the NFSv3 clients can mount the whole root, namepace or individual “shares” with servername:/export, servername:/export/namespace or servername:/export/namespace/share1.

All is well in the NFS world… or so it seems at first. It turns out that an older SunOS does not entirely like how this RHEL 6 NFS server is exporting the file systems:

hostname% cd /namespace
hostname% ls
share1     share2     share3     share4
hostname% pwd
/namespace
ubcpetnxi% cd share1
ubcpetnxi% pwd
/share1

Notice the final line. I was just in /namespace then I changed into /namespace/share1. Now pwd tells me the path is only /share1. I was expecting /namespace/share1. It looks to me like the SunOS NFS client is not behaving well with how the NFS server exporting the file systems and/or how the bind mounts are setup locally on the server to map the storage into the NFSv4 “virtual root”.

Please leave a comment to if you know of a different /etc/exports and/or mount configuration that would alleviate the SunOS NFS client issues noted here!

Access Control Lists

NFSv4 defines a model for Access Control Lists (ACLs) that has similarities to that of Microsoft’s NTFS. But don’t worry about interoperability: NFSv4 translates your existing “POSIX” ACLs on ext3,ext4,xfs,etc. to NFSv4 ACLs automatically.

The main gotcha with exporting a filesystem with “POSIX” ACLs with the NFSv4 server is that the normal getfacl and setfacl tools don’t seem to work on the NFS client side! Because the NFSv4 server only presents the translated NFSv4 ACLs to the clients, the nfs4-progs package must be installed and the nfs4_getfacl and nfs4_setfacl commands used instead to view and manipulate the ACLs on NFSv4 clients.

Also, the little + at the end of the rwxrwxrwx permissions listing you can see with some variant of ls -l, the symbol that normally indicates the presence of an ACL, it simply doesn’t appear on a (Linux?) NFSv4 mount where ACLs exist. Sadness.

Automount

Automount on RHEL 6 (and clones) appears to have a bug related to bind mounts. NFSv4 exports cannot (trivially?) be mounted locally on the NFSv4 server on itself with bind mounts as is possible with NFSv3 (or lower) exports. I have read that this inability is due to the “virtual root” abstraction that NFSv4 employs. Instead, automount should be performing true NFSv4 mounts when operating locally on the server… but it doesn’t do that on CentOS 6 (and in my experience RHEL 6):

See: http://bugs.centos.org/view.php?id=6101

The workaround is to specify port=2049 in the NFS mount options of the automount map in use (where 2049 is the port the NFS server is listening on). This appears to cause automount to immediately attempt an NFS mount, bypassing the (failing) attempt at a bind mount.

 

Browsing Automounted NFS with Nautilus

Has browsing automounted NFS shares with nautilus got you pulling out hair in frustration?

Ever since we transitioned from the RHEL4 environment to Fedora 14, people have been reporting terrible slowness and delays in nautilus when browsing our NFS shares. Reports of waiting over a minute for an NFS automount root-level directory with < 100 sub directories to display the contents are not good.

This wasn’t a problem on our old RHEL4 terminal server and I couldn’t for the life of me understand how nautilus could have become so slow in the years since RHEL4 was released. It just didn’t make sense. I started to think something had to be wrong and that this wasn’t just the new normal expected behaviour but I had nothing to go on.

I tried the basic recommendations: Disable thumbnails, disable preview, disable directory item counts. That didn’t help the user experience in any dramatic way. At this point, I started recommended pcmanfm and thunar as a way to workaround nautilus’ terrible performance. I even wrote a fairly concise script for modifying the default file manager and desktop-drawing application so that using a different file manager wouldn’t be so foreign in GNOME.

Then one day I started looking at the verbose level output from automount while browsing the NFS mounts with nautilus and found a substantial amount of this in the logs:

Apr 28 11:19:10 hostname automount[18959]: attempting to mount entry /home/.svn
Apr 28 11:19:10 hostname automount[18959]: key ".svn" not found in map source(s).
Apr 28 11:19:10 hostname automount[18959]: failed to mount /home/.svn

Oh my! Why are there repeated access attempts for “.svn”? What is causing automount to perform map lookups for “.svn” in the automount-controlled directories? Could it be nautilus?

Why yes!

As it turns out the GNOME SVN integration package “gnubversion” includes a nautilus extension and this extension was causing Nautilus to look for “.svn” directories everywhere and it just so happens that looking for “.svn” in a root-level automount directory causes slow map lookup failures that (presumably) kill the perceptible performance of browsing automounted NFS shares.

I removed gnubversion (as no one was using it) and the user experience for nautilus has normalized. While nautilus still isn’t as speedy as pcmanfm or thunar, its no longer a cause of forceful hair removal incidents… and all is well in the world.

Is Ubuntu Ready for the Enterprise?

Yep, the title is click bait intended to grab attention… well as much of click bait as anything on techslaves.org can be (which is decidedly not very much) but I’ve just been pretty frustrated with Ubuntu as a client OS recently.

There are two really annoying and critical bugs that have been sitting around, unresolved for too long. One revolves around the NFS client. Apparently there was a regression in the mainline kernel at version 2.6.27 that causes NFS lockup/freeze. Both 10.04 and 10.10 have been affected but Ubuntu has yet to release the fix although it’s been available since august in the mainline kernel. The second bug revolves around Network Manager and autofs maps in LDAP. Basically, you have to get Network Manager to “autofs reload” every time it brings up or down the network interface. No big deal as this can be scripted, but I would really expect an official fix for this.

Ok, so it’s not Ubuntu’s fault there was a mainline kernel regression regarding NFS client code and it’s not Ubuntu’s fault that Network Manager behaves the way it does. However, I do expect a Linux vendor that considers themselves ready for the Enterprise to be able to backport critical kernel fixes so that their users don’t have to sit around waiting with their thumbs up their asses until the fix makes it’s way into an official kernel release and then into an Ubuntu kernel update. As for the autofs maps in LDAP/Network Manager issue, I would not only expect an enterprise ready distribution to have tested this functionality before release but also that once it’s reported that a real, official fix released quickly that everyone can use instead of having to follow bug report comment suggestions to get things working.

I realize Ubuntu is mainly a desktop OS. That’s ok. But all this “Ubuntu is ready for the Enterprise!!! GO CANONICAL!!!!” stuff simply can’t be justified when two official releases in a row come up with show-stopping bugs and there still isn’t a fix nor an official recommended workaround.

</rant>