Brooks Davis gave a talk at this year's BSDCan conference on building clusters using FreeBSD. I couldn't attend, but the title of this talk caught my eye; he links to his 2003 talk.
One thing that caught my eye was the potential for a cluster's intercommunication to affect the rest of the network to which it is attached. Considering we're looking at building a small cluster on a Blade Centre, this might be something to keep in mind. He has some other notes on naming conventions (particularly FQDNs) and the effect they could have on other pieces of software as well. Interestingly, while we're looking at solutions for NAS type devices that don't include NetApp, he's talking about moving *to* a NetApp - granted, this talk is now 4 years old so maybe he'd do something else instead. (Our Sun clusters use a Sun 5210 NAS, which exports its shares via NFS.)
We've run into the same issues he describes with PXE booting, although mostly on commodity class workstation motherboards. Our Sun PC hardware has been very good at this, and we use PXE booting to re-image the workstations we use for our grads - those all run on Asus motherboards of one flavour or another, so BIOS support has gotten better since.
I hadn't known that N1GE could run on FreeBSD, that's of potential interest to me.
I feel their pain with regards to remote console access; as they say, it's a stone cold absolute must when diagnosing issues, particularly with clusters. I think the only real recourse is to bite the bullet and purchase hardware that's intended to run as such; our SunFire X4200s in our 3 Sun clusters are very nice this way. I don't think the LOMs on the 2200s are anywhere near as nice though, and a 4200 is very much a Cadillac solution.
All in all, an interesting read for anybody interested in designing their own HPC cluster. There's a lot more work involved in setting up this sort of approach (vs, say, Rocks clusters) but in the end, the system may be more manageable.