There have been some interesting new developments lately! Here’s a shrunken summary.
At present I’m doing a technology review for implementing a new terminal server. Our existing terminal server is a 4-way AMD Opteron 848 system that’s about 5 years old right now. It runs CentOS 4 and has been so mega-customized over those 5 years, I’ve never wanted to go through the pain of in-place upgrading to CentOS 5. We also have a simple IBM 1U server running Windows 2003 Server for windows purposes. It’s ok but also about 5 years old.
The idea is to roll both these servers into a large single physical server with some kind of virtualization. The large system would also have the resources to run other VMs, as necessary. Development/test boxes or what not.
Hardware and Virtualization
The server is a 48 core AMD 6172 (2.1GHz) with 64GB of DDR3 ECC RAM with a bunch of 15K RPM SAS drives. I’m not sure whether the AMD solution is entirely the right solution. Intel does score amazingly well on virtualization workload benchmarks… but our workload is different than the benchmark workloads. What we need is one huge guest with a crap-ton of VCPUs and one guest with 2 VCPU and 4GB of RAM. The last NUMA node/cell is pinned to the Windows (smaller) VM and the Linux (large) VM is pinned across several NUMA cells. The Linux VM is a multi-user system where many users will want to run intensive computation but desktop as well. The existing server would sometimes be starved of resources when too many users would start intensive programs simultaneously.
What I can’t quite seem to figure out regarding this whole KVM this yet is if the guest is smart enough and if KVM allows for mapping the VCPUs and the guest memory according to physical NUMA topology thus reducing likelyhood of slow inter-cell memory access?
Software
Anyways… now that the server has arrived, I’ve started out with RHEL6 beta2 as the “hypervisor”, if you will. I’m obviously using KVM and libvirt as this is what RedHat is backing. So far, so good. I’ve only used virt-manager and virtsh thus far, I’ll explore other tools a little later. Fedora 14 is being used as the Linux VM and Windows 7 Enterprise for the guest… I’m going to try out using the Terminal Server multi-user hack and see how that goes. If it won’t go, I’ll recommend actually buying the correct Windows Server license, I suppose. Or buy into the VDI stuff that’s going on around me. I’ll check out the pricing I suppose… but I digress.
Linux Terminal Server: LTSP5
Fedora 14 with LTSP5 works pretty well. But there are some caveats.
1. The chroot
The current ltsp* packages in Fedora14 aren’t able to build Fedora 14 chroots. You can currently only build Fedora 13 or older because it takes some work to make a complete kickstart LTSP chroot from a new release. Instead of  invoking “ltsp-build-client” blindly, you’ll need to do something like:
# ltsp-build-client --release 13
This isn’t a huuuuuuge deal, but in an ideal world I would prefer to use the same release for clients and servers. It’s just cleaner and makes maintaing everything a bit come congruent for me. It also has some troubleshooting benefits. Bah!
2. The Display Manager Situation
LDM, the LTSP5 display manager is both wonderful and woeful. There are some really nice things that LTSP5 can tout due to LDM, but it’s also a step backwards compared to GDM (yes, even the new all-gtk GDM with reduced XDMCP functionality) or KDM in many ways.
What LDM does so well is proper setup and teardown of LOCALDEV and sound via pulse. It’s actually pretty slick, especially in GNOME where it gives the users desktop drive icons for portable USB drives/keys. The automatic unmounting of the drive is actually pretty slick but initially it was highly counterintuitive to someone like me who expects to require ejection/unmounting of the drive before pulling it out.
What LDM does poorly is… well frankly a few things. First off, the feedback from password prompt is poor. It can’t tell WHY your password failed due to the way it interacts with SSH. It’s a hard problem to solve, apparently but it’s a terrible user experience. Second, when you type your password incorrectly, it pauses for some time, tells you it can’t connect to the server and X restarts to load LDM again. Again, bad user experience. It’s also not highly customizable. As an example, the login box will span multiple monitors by default so it’s split across the bezels of your sweet dual monitor thin client. While you can “hack” it by providing a wide logo to force the login box off to the right, it’s not exactly super slick that way.
The last thing LDM doesn’t seem to do at all is allow for AIGLX/DRI2. If I login using LDM, glxinfo/glxgears barfs on a BadRequest related to DRI2/DRI2Connect. With an XDMCP connection to GDM on the same server, with the same client chroot, lts.conf and xorg configuration and glxinfo displays software rendering but that’s a different story. Even using a X11 forwarding to another server provides yet more different results. Either way, it appears LDM basically breaks functionality where it would otherwise run, but run fairly slowly. Could be the open source “radeon” driver I’m using on the test box, I suppose. I hear the intel driver works well…
Since the LOCALDEV and sound routing stuff is tightly integrated to LDM, it appears to be some work to get it working through GDM or KDM… then there is the problem that I haven’t had any success making the new GDM (~ >=2.30, I think) to act as an XDMCP chooser. I do have other XDMCP hosts that I want to connect to…
There is some hope to be had from this interesting post on the Ubuntu help/documentation wiki. It details how to install GDM on the LTSP chroot and get LOCALDEV working with it. They claim sound just works on that older version of Ubuntu but I don’t recall my sound device showing up when using XDMCP and remote GDM. Either way, LOCALDEV is more important.
Windows Terminal Server: Windows 7 Enterprise
I can’t start without saying that Windows 7 via RDP feels “slower” than Windows 2003 over RDP. I detect more mouse and menu lag from the same client systems and versions. I disabled aero and it helped but only a very little bit. I’m not ready to give up yet, there may be more things I can do here.
As mentioned, I’m trying the termsrv.dll hack that allows for multiple RDP users to non-Windows Server Terminal Services hosts. I wonder if it’s part of the performance issue I’m seeing… I should quickly revert the hack to test that possibility.
On the terminal server and with dual 4:3 monitors, not much of the GUI pizzaz in Windows 7 is all that interesting or useful. Some UI changes compared to XP make things a tad unfamiliar for me but overall it’s the same experience with a few nice things and seemingly better stability and driver support.